[jira] [Created] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-02-05 Thread V Luong (JIRA)
V Luong created SPARK-2: --- Summary: SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame Key: SPARK-2 URL: https://issues.apache.org/jira/browse/SPARK-2

[jira] [Commented] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-02-09 Thread V Luong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358599#comment-16358599 ] V Luong commented on SPARK-2: - [~cloud_fan] there are many scenarios in which oldDF involves sorting

[jira] [Commented] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-02-09 Thread V Luong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358634#comment-16358634 ] V Luong commented on SPARK-2: - [~cloud_fan] alternatively, is there any way that

[jira] [Updated] (SPARK-23467) Enable way to create DataFrame from pre-partitioned files (Parquet/ORC/etc.) with each in-memory partition mapped to 1 physical file partition

2018-02-19 Thread V Luong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated SPARK-23467: Description: I would like to echo the need described here:

[jira] [Updated] (SPARK-23467) Enable way to create DataFrame from pre-partitioned files (Parquet/ORC/etc.) with each in-memory partition mapped to 1 physical file partition

2018-02-19 Thread V Luong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated SPARK-23467: Description: I would like to echo the need described here:

[jira] [Updated] (SPARK-23467) Enable way to create DataFrame from pre-partitioned files (Parquet/ORC/etc.) with each in-memory partition mapped to 1 physical file partition

2018-02-19 Thread V Luong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong updated SPARK-23467: Description: I would like to echo the need described here:

[jira] [Created] (SPARK-23467) Enable way to create DataFrame from pre-partitioned files (Parquet/ORC/etc.) with each in-memory partition mapped to 1 physical file partition

2018-02-19 Thread V Luong (JIRA)
V Luong created SPARK-23467: --- Summary: Enable way to create DataFrame from pre-partitioned files (Parquet/ORC/etc.) with each in-memory partition mapped to 1 physical file partition Key: SPARK-23467 URL:

[jira] [Commented] (SPARK-8582) Optimize checkpointing to avoid computing an RDD twice

2018-08-18 Thread V Luong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584726#comment-16584726 ] V Luong commented on SPARK-8582: hi [~zsxwing], may I check if we're working on resolving this issue? I'm

[jira] [Resolved] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-03-29 Thread V Luong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] V Luong resolved SPARK-2. - Resolution: Won't Fix > SparkML VectorAssembler.transform slow when needing to invoke .first() on >

[jira] [Commented] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-03-29 Thread V Luong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419745#comment-16419745 ] V Luong commented on SPARK-2: - [~bago.amirbekian] thank you, that is indeed a good solution available