[ 
https://issues.apache.org/jira/browse/SYSTEMML-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prithviraj Sen updated SYSTEMML-537:
------------------------------------
    Description: 
Previous instantiations of text analytics usecases with SystemML used to 
communicate feature vectors to SystemML using a 3-column table/relation/matrix 
(see SYSTEMML-452 for a detailed description). The first column was defined to 
be an integer representing a global feature vector ID. Creating global integer 
IDs is hard and time consuming. Can this be relaxed to a globally unique string 
ID instead?

SystemML's transform operation will be able to handle the above usecase, but 
for reasons related to uniformly handling both features and labels (uniformly 
transforming both X and y matrices) we will need to update transform (or 
introduce some equivalent functionality) that allows one to apply an already 
created recode map and recode from scratch on the same matrix.

  was:
Previous instantiations of text analytics usecases with SystemML used to 
communicate feature vectors to SystemML using a 3-column table/relation/matrix 
(see SystemML-452 for a detailed description). The first column was defined to 
be an integer representing a global feature vector ID. Creating global integer 
IDs is hard and time consuming. Can this be relaxed to a globally unique string 
ID instead?

SystemML's transform operation will be able to handle the above usecase, but 
for reasons related to uniformly handling both features and labels (uniformly 
transforming both X and y matrices) we will need to update transform (or 
introduce some equivalent functionality) that allows one to apply an already 
created recode map and recode from scratch on the same matrix.


> Support for text analytics usecases
> -----------------------------------
>
>                 Key: SYSTEMML-537
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-537
>             Project: SystemML
>          Issue Type: Improvement
>          Components: APIs
>            Reporter: Prithviraj Sen
>
> Previous instantiations of text analytics usecases with SystemML used to 
> communicate feature vectors to SystemML using a 3-column 
> table/relation/matrix (see SYSTEMML-452 for a detailed description). The 
> first column was defined to be an integer representing a global feature 
> vector ID. Creating global integer IDs is hard and time consuming. Can this 
> be relaxed to a globally unique string ID instead?
> SystemML's transform operation will be able to handle the above usecase, but 
> for reasons related to uniformly handling both features and labels (uniformly 
> transforming both X and y matrices) we will need to update transform (or 
> introduce some equivalent functionality) that allows one to apply an already 
> created recode map and recode from scratch on the same matrix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to