[
https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829863#comment-15829863
]
zhengruifeng commented on SPARK-19208:
--
Ok, I will try to make a design doc for this.
I think it
[
https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15829014#comment-15829014
]
Joseph K. Bradley commented on SPARK-19208:
---
+1 for [~mlnick]'s suggestion. If we're
[
https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826354#comment-15826354
]
Ilya Matiach commented on SPARK-19208:
--
[~srowen] Good point, with something like hashing TF you
[
https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826096#comment-15826096
]
Nick Pentreath commented on SPARK-19208:
If we're going to look at performance optimization here,
[
https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823517#comment-15823517
]
zhengruifeng commented on SPARK-19208:
--
cc [~josephkb] [~yanboliang]
> MaxAbsScaler and
[
https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823508#comment-15823508
]
zhengruifeng commented on SPARK-19208:
--
I do tests on a dataset with 6,000,000 instances and 780
[
https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823422#comment-15823422
]
zhengruifeng commented on SPARK-19208:
--
The code in {{MinMaxScaler}} are copied from
[
https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822762#comment-15822762
]
Sean Owen commented on SPARK-19208:
---
The super sparse high-dimensional cases don't seem like cases
[
https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822549#comment-15822549
]
Ilya Matiach commented on SPARK-19208:
--
[~srowen] isn't feature hashing (eg HashingTF) to large bit
[
https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821728#comment-15821728
]
Sean Owen commented on SPARK-19208:
---
You have 29,890,095 features. At extremes of scale this might make
[
https://issues.apache.org/jira/browse/SPARK-19208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821549#comment-15821549
]
Apache Spark commented on SPARK-19208:
--
User 'zhengruifeng' has created a pull request for this
11 matches
Mail list logo