[ https://issues.apache.org/jira/browse/SYSTEMML-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prithviraj Sen updated SYSTEMML-537: ------------------------------------ Description: Previous instantiations of text analytics usecases with SystemML used to communicate feature vectors to SystemML using a 3-column table/relation/matrix (see SYSTEMML-452 for a detailed description). The first column was defined to be an integer representing a global feature vector ID. Creating global integer IDs is hard and time consuming. Can this be relaxed to a globally unique string ID instead? SystemML's transform operation will be able to handle the above usecase, but for reasons related to uniformly handling both features and labels (uniformly transforming both X and y matrices) we will need to update transform (or introduce some equivalent functionality) that allows one to apply an already created recode map and recode from scratch on the same matrix. was: Previous instantiations of text analytics usecases with SystemML used to communicate feature vectors to SystemML using a 3-column table/relation/matrix (see SystemML-452 for a detailed description). The first column was defined to be an integer representing a global feature vector ID. Creating global integer IDs is hard and time consuming. Can this be relaxed to a globally unique string ID instead? SystemML's transform operation will be able to handle the above usecase, but for reasons related to uniformly handling both features and labels (uniformly transforming both X and y matrices) we will need to update transform (or introduce some equivalent functionality) that allows one to apply an already created recode map and recode from scratch on the same matrix. > Support for text analytics usecases > ----------------------------------- > > Key: SYSTEMML-537 > URL: https://issues.apache.org/jira/browse/SYSTEMML-537 > Project: SystemML > Issue Type: Improvement > Components: APIs > Reporter: Prithviraj Sen > > Previous instantiations of text analytics usecases with SystemML used to > communicate feature vectors to SystemML using a 3-column > table/relation/matrix (see SYSTEMML-452 for a detailed description). The > first column was defined to be an integer representing a global feature > vector ID. Creating global integer IDs is hard and time consuming. Can this > be relaxed to a globally unique string ID instead? > SystemML's transform operation will be able to handle the above usecase, but > for reasons related to uniformly handling both features and labels (uniformly > transforming both X and y matrices) we will need to update transform (or > introduce some equivalent functionality) that allows one to apply an already > created recode map and recode from scratch on the same matrix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)