[jira] [Commented] (SPARK-19208) MaxAbsScaler and MinMaxScaler are very inefficient

2017-01-19 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829863#comment-15829863 ] zhengruifeng commented on SPARK-19208: -- Ok, I will try to make a design doc for this. I think it

[jira] [Commented] (SPARK-19208) MaxAbsScaler and MinMaxScaler are very inefficient

2017-01-18 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829014#comment-15829014 ] Joseph K. Bradley commented on SPARK-19208: --- +1 for [~mlnick]'s suggestion. If we're

[jira] [Commented] (SPARK-19208) MaxAbsScaler and MinMaxScaler are very inefficient

2017-01-17 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826354#comment-15826354 ] Ilya Matiach commented on SPARK-19208: -- [~srowen] Good point, with something like hashing TF you

[jira] [Commented] (SPARK-19208) MaxAbsScaler and MinMaxScaler are very inefficient

2017-01-17 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826096#comment-15826096 ] Nick Pentreath commented on SPARK-19208: If we're going to look at performance optimization here,

[jira] [Commented] (SPARK-19208) MaxAbsScaler and MinMaxScaler are very inefficient

2017-01-15 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823517#comment-15823517 ] zhengruifeng commented on SPARK-19208: -- cc [~josephkb] [~yanboliang] > MaxAbsScaler and

[jira] [Commented] (SPARK-19208) MaxAbsScaler and MinMaxScaler are very inefficient

2017-01-15 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823508#comment-15823508 ] zhengruifeng commented on SPARK-19208: -- I do tests on a dataset with 6,000,000 instances and 780

[jira] [Commented] (SPARK-19208) MaxAbsScaler and MinMaxScaler are very inefficient

2017-01-15 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823422#comment-15823422 ] zhengruifeng commented on SPARK-19208: -- The code in {{MinMaxScaler}} are copied from

[jira] [Commented] (SPARK-19208) MaxAbsScaler and MinMaxScaler are very inefficient

2017-01-14 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822762#comment-15822762 ] Sean Owen commented on SPARK-19208: --- The super sparse high-dimensional cases don't seem like cases

[jira] [Commented] (SPARK-19208) MaxAbsScaler and MinMaxScaler are very inefficient

2017-01-13 Thread Ilya Matiach (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822549#comment-15822549 ] Ilya Matiach commented on SPARK-19208: -- [~srowen] isn't feature hashing (eg HashingTF) to large bit

[jira] [Commented] (SPARK-19208) MaxAbsScaler and MinMaxScaler are very inefficient

2017-01-13 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821728#comment-15821728 ] Sean Owen commented on SPARK-19208: --- You have 29,890,095 features. At extremes of scale this might make

[jira] [Commented] (SPARK-19208) MaxAbsScaler and MinMaxScaler are very inefficient

2017-01-13 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821549#comment-15821549 ] Apache Spark commented on SPARK-19208: -- User 'zhengruifeng' has created a pull request for this