[jira] [Commented] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-03-29 Thread V Luong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419745#comment-16419745 ] V Luong commented on SPARK-2: - [~bago.amirbekian] thank you, that is indeed a good solution available

[jira] [Commented] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-02-27 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379296#comment-16379296 ] Bago Amirbekian commented on SPARK-2: - [~MBALearnsToCode] you can use a `VectorSizeHint` 

[jira] [Commented] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-02-09 Thread V Luong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358634#comment-16358634 ] V Luong commented on SPARK-2: - [~cloud_fan] alternatively, is there any way that

[jira] [Commented] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-02-09 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358623#comment-16358623 ] Wenchen Fan commented on SPARK-2: - This is not a trivial change, we need to introduce an `AnyRow`

[jira] [Commented] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-02-09 Thread V Luong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358599#comment-16358599 ] V Luong commented on SPARK-2: - [~cloud_fan] there are many scenarios in which oldDF involves sorting

[jira] [Commented] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-02-09 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358067#comment-16358067 ] Wenchen Fan commented on SPARK-2: - I'm a little confused. If we wanna get a random row, why we

[jira] [Commented] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-02-08 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358048#comment-16358048 ] Liang-Chi Hsieh commented on SPARK-2: - Currently I think we don't have API in Dataset to just