[jira] [Commented] (SPARK-29768) nondeterministic expression fails column pruning

2019-11-05 Thread yucai (Jira)
[ https://issues.apache.org/jira/browse/SPARK-29768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968028#comment-16968028 ] yucai commented on SPARK-29768: --- [~smilegator] [~wenchen], is it an issue or work as desgin? >

[jira] [Created] (SPARK-29768) nondeterministic expression fails column pruning

2019-11-05 Thread yucai (Jira)
yucai created SPARK-29768: - Summary: nondeterministic expression fails column pruning Key: SPARK-29768 URL: https://issues.apache.org/jira/browse/SPARK-29768 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-26909) use unsafeRow.hashCode() as hash value in HashAggregate

2019-02-18 Thread yucai (JIRA)
yucai created SPARK-26909: - Summary: use unsafeRow.hashCode() as hash value in HashAggregate Key: SPARK-26909 URL: https://issues.apache.org/jira/browse/SPARK-26909 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-26909) use unsafeRow.hashCode() as hash value in HashAggregate

2019-02-18 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-26909: -- Description: This is a followup PR for #21149. New way uses unsafeRow.hashCode() as hash value in

[jira] [Updated] (SPARK-25864) Make main args set correctly in BenchmarkBase

2018-10-28 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25864: -- Description: Set main args correctly in BenchmarkBase, to make it accessible for its subclass. It will

[jira] [Updated] (SPARK-25864) Make main args set correctly in BenchmarkBase

2018-10-28 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25864: -- Description: Set main args correctly in BenchmarkBase, to make it accessible for its subclass. It will

[jira] [Updated] (SPARK-25864) Make main args set correctly in BenchmarkBase

2018-10-28 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25864: -- Summary: Make main args set correctly in BenchmarkBase (was: Make mainArgs correctly set in BenchmarkBase)

[jira] [Updated] (SPARK-25864) Make mainArgs correctly set in BenchmarkBase

2018-10-28 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25864: -- Description: Save main args correctly in BenchmarkBase, to make it accessible for its subclass. It will

[jira] [Updated] (SPARK-25864) Make mainArgs correctly set in BenchmarkBase

2018-10-28 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25864: -- Description: Make mainArgs correctly set in BenchmarkBase, it will benefit: *

[jira] [Updated] (SPARK-25864) Make mainArgs correctly set in BenchmarkBase

2018-10-28 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25864: -- Description: Make mainArgs correctly set in BenchmarkBase, it will benefit: - 

[jira] [Updated] (SPARK-25864) Make mainArgs correctly set in BenchmarkBase

2018-10-28 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25864: -- Issue Type: Sub-task (was: Bug) Parent: SPARK-25475 > Make mainArgs correctly set in BenchmarkBase >

[jira] [Created] (SPARK-25864) Make mainArgs correctly set in BenchmarkBase

2018-10-28 Thread yucai (JIRA)
yucai created SPARK-25864: - Summary: Make mainArgs correctly set in BenchmarkBase Key: SPARK-25864 URL: https://issues.apache.org/jira/browse/SPARK-25864 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-25663) Refactor BuiltInDataSourceWriteBenchmark and DataSourceWriteBenchmark to use main method

2018-10-27 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1148#comment-1148 ] yucai commented on SPARK-25663: --- [~Gengliang.Wang] I make an improvement on this, could you help review?

[jira] [Created] (SPARK-25850) Make the split threshold for the code generated method configurable

2018-10-26 Thread yucai (JIRA)
yucai created SPARK-25850: - Summary: Make the split threshold for the code generated method configurable Key: SPARK-25850 URL: https://issues.apache.org/jira/browse/SPARK-25850 Project: Spark Issue

[jira] [Commented] (SPARK-25676) Refactor BenchmarkWideTable to use main method

2018-10-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663480#comment-16663480 ] yucai commented on SPARK-25676: --- I am working on this. > Refactor BenchmarkWideTable to use main method >

[jira] [Created] (SPARK-25508) Refactor OrcReadBenchmark to use main method

2018-09-21 Thread yucai (JIRA)
yucai created SPARK-25508: - Summary: Refactor OrcReadBenchmark to use main method Key: SPARK-25508 URL: https://issues.apache.org/jira/browse/SPARK-25508 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-25486) Refactor SortBenchmark to use main method

2018-09-20 Thread yucai (JIRA)
yucai created SPARK-25486: - Summary: Refactor SortBenchmark to use main method Key: SPARK-25486 URL: https://issues.apache.org/jira/browse/SPARK-25486 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-25485) Refactor UnsafeProjectionBenchmark to use main method

2018-09-20 Thread yucai (JIRA)
yucai created SPARK-25485: - Summary: Refactor UnsafeProjectionBenchmark to use main method Key: SPARK-25485 URL: https://issues.apache.org/jira/browse/SPARK-25485 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-25481) Refactor ColumnarBatchBenchmark to use main method

2018-09-20 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25481: -- Issue Type: Sub-task (was: Bug) Parent: SPARK-25475 > Refactor ColumnarBatchBenchmark to use main

[jira] [Created] (SPARK-25481) Refactor ColumnarBatchBenchmark to use main method

2018-09-20 Thread yucai (JIRA)
yucai created SPARK-25481: - Summary: Refactor ColumnarBatchBenchmark to use main method Key: SPARK-25481 URL: https://issues.apache.org/jira/browse/SPARK-25481 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2018-09-11 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-23207: -- Description: Currently shuffle repartition uses RoundRobinPartitioning, the generated result is

[jira] [Resolved] (SPARK-25206) wrong records are returned when Hive metastore schema and parquet schema are in different letter cases

2018-08-30 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai resolved SPARK-25206. --- Resolution: Won't Fix Not backport to 2.3 as per [~cloud_fan]'s summary, closed. > wrong records are

[jira] [Commented] (SPARK-25206) wrong records are returned when Hive metastore schema and parquet schema are in different letter cases

2018-08-30 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598113#comment-16598113 ] yucai commented on SPARK-25206: --- Based on our discussion in

[jira] [Commented] (SPARK-25281) Add tests to check the behavior when the physical schema and logical schema use difference cases

2018-08-30 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597236#comment-16597236 ] yucai commented on SPARK-25281: --- cc [~smilegator], [~cloud_fan]. > Add tests to check the behavior when

[jira] [Commented] (SPARK-25281) Add tests to check the behavior when the physical schema and logical schema use difference cases

2018-08-30 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597217#comment-16597217 ] yucai commented on SPARK-25281: --- [~seancxmao] , since you have done many tests in PR22184 and SPARK-25175,

[jira] [Created] (SPARK-25281) Add tests to check the behavior when the physical schema and logical schema use difference cases

2018-08-30 Thread yucai (JIRA)
yucai created SPARK-25281: - Summary: Add tests to check the behavior when the physical schema and logical schema use difference cases Key: SPARK-25281 URL: https://issues.apache.org/jira/browse/SPARK-25281

[jira] [Comment Edited] (SPARK-25206) wrong records are returned when Hive metastore schema and parquet schema are in different letter cases

2018-08-28 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595298#comment-16595298 ] yucai edited comment on SPARK-25206 at 8/28/18 5:06 PM:  Do you want to simulate

[jira] [Comment Edited] (SPARK-25206) wrong records are returned when Hive metastore schema and parquet schema are in different letter cases

2018-08-28 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595298#comment-16595298 ] yucai edited comment on SPARK-25206 at 8/28/18 5:05 PM:  Do you want to simulate

[jira] [Commented] (SPARK-25206) wrong records are returned when Hive metastore schema and parquet schema are in different letter cases

2018-08-28 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595298#comment-16595298 ] yucai commented on SPARK-25206: ---   Do you want to simulate an Exception in Spark?  Backporting 

[jira] [Commented] (SPARK-25206) wrong records are returned when Hive metastore schema and parquet schema are in different letter cases

2018-08-28 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594958#comment-16594958 ] yucai commented on SPARK-25206: --- [~smilegator] , 2.1's exception is from parquet. {code:java}

[jira] [Updated] (SPARK-25206) wrong records are returned when Hive metastore schema and parquet schema are in different letter cases

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Description: In current Spark 2.3.1, below query returns wrong data silently. {code:java}

[jira] [Commented] (SPARK-25175) Case-insensitive field resolution when reading from ORC

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593152#comment-16593152 ] yucai commented on SPARK-25175: --- I pinged [~seancxmao] offline, he will give more details. >

[jira] [Updated] (SPARK-25206) wrong records are returned when Hive metastore schema and parquet schema are in different letter cases

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Summary: wrong records are returned when Hive metastore schema and parquet schema are in different letter

[jira] [Updated] (SPARK-25206) data issue when Hive metastore schema and parquet schema are in different letter cases

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Description: In current Spark 2.3.1, below query returns wrong data silently. {code:java}

[jira] [Updated] (SPARK-25206) data issue when Hive metastore schema and parquet schema are in different letter cases

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Description: In current Spark 2.3.1, below query returns wrong data silently. {code:java}

[jira] [Updated] (SPARK-25206) data issue when Hive metastore schema and parquet schema are in different letter cases

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Summary: data issue when Hive metastore schema and parquet schema are in different letter cases (was: data

[jira] [Updated] (SPARK-25206) data issue when Hive metastore schema and parquet schema have different letter case

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Summary: data issue when Hive metastore schema and parquet schema have different letter case (was: data

[jira] [Updated] (SPARK-25206) data issue when Hive metastore schema and parquet schema have different letter case

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Description: In current Spark 2.3.1, below query returns wrong data silently. {code:java}

[jira] [Updated] (SPARK-25206) data issue when

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Summary: data issue when (was: data issue because wrong column is pushdown for parquet) > data issue when

[jira] [Updated] (SPARK-25206) data issue because wrong column is pushdown for parquet

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Description: In current Spark 2.3.1, below query returns wrong data silently. {code:java}

[jira] [Updated] (SPARK-25206) data issue because wrong column is pushdown for parquet

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Description: In current Spark 2.3.1, below query returns wrong data silently. {code:java}

[jira] [Updated] (SPARK-25206) data issue because wrong column is pushdown for parquet

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Description: In current Spark 2.3.1, below query returns wrong data silently. {code:java}

[jira] [Comment Edited] (SPARK-25206) data issue because wrong column is pushdown for parquet

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593126#comment-16593126 ] yucai edited comment on SPARK-25206 at 8/27/18 2:27 AM: [~dongjoon], because of

[jira] [Commented] (SPARK-25206) data issue because wrong column is pushdown for parquet

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593126#comment-16593126 ] yucai commented on SPARK-25206: --- [~dongjoon], because of the below root cause {quote}Spark pushdowns

[jira] [Updated] (SPARK-25206) data issue because wrong column is pushdown for parquet

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Description: In current Spark 2.3.1, below query returns wrong data silently. {code:java}

[jira] [Updated] (SPARK-25206) data issue because wrong column is pushdown for parquet

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Summary: data issue because wrong column is pushdown for parquet (was: Wrong data may be returned for

[jira] [Commented] (SPARK-25207) Case-insensitve field resolution for filter pushdown when reading Parquet

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593110#comment-16593110 ] yucai commented on SPARK-25207: --- [~dongjoon] , sorry if I am confusing you.   This bug is created for

[jira] [Commented] (SPARK-25206) Wrong data may be returned for Parquet

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593102#comment-16593102 ] yucai commented on SPARK-25206: --- I am OK with "known correctness bug in 2.3" way, just raise some concern

[jira] [Commented] (SPARK-25206) Wrong data may be returned for Parquet

2018-08-26 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593100#comment-16593100 ] yucai commented on SPARK-25206: --- [~smilegator] , sure, I will add tests.   If we don't backport

[jira] [Comment Edited] (SPARK-25206) Wrong data may be returned for Parquet

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592453#comment-16592453 ] yucai edited comment on SPARK-25206 at 8/25/18 5:01 AM: {quote} # Vanilla Spark

[jira] [Commented] (SPARK-25206) Wrong data may be returned for Parquet

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592460#comment-16592460 ] yucai commented on SPARK-25206: --- [~dongjoon] , thanks a lot for so many explanations, if we both agree to

[jira] [Commented] (SPARK-25206) Wrong data may be returned for Parquet

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592453#comment-16592453 ] yucai commented on SPARK-25206: --- {quote} # Vanilla Spark 2.2.0 ~ 2.3.1 always returns NULL for Parquet

[jira] [Comment Edited] (SPARK-25206) Wrong data may be returned for Parquet

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592425#comment-16592425 ] yucai edited comment on SPARK-25206 at 8/25/18 3:33 AM: [~dongjoon] , correct me

[jira] [Commented] (SPARK-25206) Wrong data may be returned for Parquet

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592425#comment-16592425 ] yucai commented on SPARK-25206: --- [~dongjoon] , correct me if I am wrong. {code:java}

[jira] [Commented] (SPARK-25206) Wrong data may be returned for Parquet

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592406#comment-16592406 ] yucai commented on SPARK-25206: --- Not a simple duplication. Backport -SPARK-25132-, but without 

[jira] [Commented] (SPARK-25206) Wrong data may be returned for Parquet

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592392#comment-16592392 ] yucai commented on SPARK-25206: --- [~dongjoon] , the reason you see `null` without predicate pushdown, it is

[jira] [Commented] (SPARK-25206) Wrong data may be returned for Parquet

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592390#comment-16592390 ] yucai commented on SPARK-25206: --- Link to SPARK-25132, this bug needs two PRs backport. > Wrong data may

[jira] [Commented] (SPARK-25206) Wrong data may be returned for Parquet

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592384#comment-16592384 ] yucai commented on SPARK-25206: --- [~dongjoon], I still think this bug is related to pushdown, but

[jira] [Updated] (SPARK-25206) Wrong data may be returned for Parquet

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Attachment: image-2018-08-25-10-04-21-901.png > Wrong data may be returned for Parquet >

[jira] [Updated] (SPARK-25206) Wrong data may be returned for Parquet

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Attachment: image-2018-08-25-09-54-53-219.png > Wrong data may be returned for Parquet >

[jira] [Commented] (SPARK-25206) Wrong data may be returned when enable pushdown

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591756#comment-16591756 ] yucai commented on SPARK-25206: --- [~cloud_fan] , we need both [https://github.com/apache/spark/pull/21696] 

[jira] [Updated] (SPARK-25206) Wrong data may be returned when enable pushdown

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Attachment: image-2018-08-24-22-46-05-346.png > Wrong data may be returned when enable pushdown >

[jira] [Updated] (SPARK-25206) Wrong data may be returned when enable pushdown

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Attachment: image-2018-08-24-22-34-11-539.png > Wrong data may be returned when enable pushdown >

[jira] [Updated] (SPARK-25206) Wrong data may be returned when enable pushdown

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Attachment: image-2018-08-24-22-33-03-231.png > Wrong data may be returned when enable pushdown >

[jira] [Updated] (SPARK-25206) Wrong data may be returned when enable pushdown

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Attachment: pr22183.png > Wrong data may be returned when enable pushdown >

[jira] [Updated] (SPARK-25206) Wrong data may be returned when enable pushdown

2018-08-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Attachment: image-2018-08-24-18-05-23-485.png > Wrong data may be returned when enable pushdown >

[jira] [Created] (SPARK-25207) Case-insensitve field resolution for filter pushdown when reading Parquet

2018-08-23 Thread yucai (JIRA)
yucai created SPARK-25207: - Summary: Case-insensitve field resolution for filter pushdown when reading Parquet Key: SPARK-25207 URL: https://issues.apache.org/jira/browse/SPARK-25207 Project: Spark

[jira] [Updated] (SPARK-25206) Wrong data may be returned when enable pushdown

2018-08-23 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25206: -- Description: In current Spark 2.3.1, below query returns wrong data silently. {code:java}

[jira] [Created] (SPARK-25206) Wrong data may be returned when enable pushdown

2018-08-23 Thread yucai (JIRA)
yucai created SPARK-25206: - Summary: Wrong data may be returned when enable pushdown Key: SPARK-25206 URL: https://issues.apache.org/jira/browse/SPARK-25206 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-25132) Spark returns NULL for a column whose Hive metastore schema and Parquet schema are in different letter cases

2018-08-16 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583145#comment-16583145 ] yucai commented on SPARK-25132: --- [~cloud_fan] [~smilegator] [~budde] [~ekhliang], do you have any insight?

[jira] [Commented] (SPARK-25132) Spark returns NULL for a column whose Hive metastore schema and Parquet schema are in different letter cases

2018-08-16 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582667#comment-16582667 ] yucai commented on SPARK-25132: --- If Spark allows data source case insensitive, query t2 should return

[jira] [Comment Edited] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue

2018-08-10 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576417#comment-16576417 ] yucai edited comment on SPARK-25084 at 8/10/18 3:17 PM: [~smilegator],

[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue

2018-08-10 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576417#comment-16576417 ] yucai commented on SPARK-25084: --- [~smilegator][~jerryshao] Thanks a lot for marking it blocker. A lot of

[jira] [Updated] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue

2018-08-10 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-25084: -- Description: Test Query: {code:java} select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk,

[jira] [Commented] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue

2018-08-10 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575808#comment-16575808 ] yucai commented on SPARK-25084: --- It is a regression, when the generated codes size is more than 1024,

[jira] [Created] (SPARK-25084) "distribute by" on multiple columns may lead to codegen issue

2018-08-09 Thread yucai (JIRA)
yucai created SPARK-25084: - Summary: "distribute by" on multiple columns may lead to codegen issue Key: SPARK-25084 URL: https://issues.apache.org/jira/browse/SPARK-25084 Project: Spark Issue Type:

[jira] [Commented] (SPARK-24925) input bytesRead metrics fluctuate from time to time

2018-07-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556820#comment-16556820 ] yucai commented on SPARK-24925: --- [~cloud_fan], [~xiaoli] , [~kiszk] , any comments? > input bytesRead

[jira] [Commented] (SPARK-24925) input bytesRead metrics fluctuate from time to time

2018-07-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556818#comment-16556818 ] yucai commented on SPARK-24925: --- I think there could be two issues. In FileScanRDD 1. ColumnarBatch's

[jira] [Updated] (SPARK-24925) input bytesRead metrics fluctuate from time to time

2018-07-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24925: -- Attachment: bytesRead.gif > input bytesRead metrics fluctuate from time to time >

[jira] [Updated] (SPARK-24925) input bytesRead metrics fluctuate from time to time

2018-07-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24925: -- Description: input bytesRead metrics fluctuate from time to time, it is worse when pushdown enabled. Query

[jira] [Created] (SPARK-24925) input bytesRead metrics fluctuate from time to time

2018-07-25 Thread yucai (JIRA)
yucai created SPARK-24925: - Summary: input bytesRead metrics fluctuate from time to time Key: SPARK-24925 URL: https://issues.apache.org/jira/browse/SPARK-24925 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-24832) Improve inputMetrics's bytesRead update for ColumnarBatch

2018-07-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24832: -- Summary: Improve inputMetrics's bytesRead update for ColumnarBatch (was: When pushdown enabled, input

[jira] [Updated] (SPARK-24832) When pushdown enabled, input bytesRead metrics is easy to fluctuate from time to time

2018-07-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24832: -- Summary: When pushdown enabled, input bytesRead metrics is easy to fluctuate from time to time (was: Improve

[jira] [Commented] (SPARK-24832) Improve inputMetrics's bytesRead update for ColumnarBatch

2018-07-17 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546371#comment-16546371 ] yucai commented on SPARK-24832: --- Currently, ColumnarBatch's bytesRead need to be updated for every 4096 *

[jira] [Created] (SPARK-24832) Improve inputMetrics's bytesRead update for ColumnarBatch

2018-07-17 Thread yucai (JIRA)
yucai created SPARK-24832: - Summary: Improve inputMetrics's bytesRead update for ColumnarBatch Key: SPARK-24832 URL: https://issues.apache.org/jira/browse/SPARK-24832 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-24556) ReusedExchange should rewrite output partitioning also when child's partitioning is RangePartitioning

2018-06-14 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24556: -- Description: Currently, ReusedExchange would rewrite output partitioning if child's partitioning is

[jira] [Updated] (SPARK-24556) ReusedExchange should rewrite output partitioning also when child's partitioning is RangePartitioning

2018-06-14 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24556: -- Description: Currently, ReusedExchange would rewrite output partitioning if child's partitioning is

[jira] [Created] (SPARK-24556) ReusedExchange should rewrite output partitioning also when child's partitioning is RangePartitioning

2018-06-14 Thread yucai (JIRA)
yucai created SPARK-24556: - Summary: ReusedExchange should rewrite output partitioning also when child's partitioning is RangePartitioning Key: SPARK-24556 URL: https://issues.apache.org/jira/browse/SPARK-24556

[jira] [Updated] (SPARK-24343) Avoid shuffle for the bucketed table when shuffle.partition > bucket number

2018-05-22 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24343: -- Description: When shuffle.partition > bucket number, Spark needs to shuffle the bucket table as per the

[jira] [Updated] (SPARK-24343) Avoid shuffle for the bucketed table when shuffle.partition > bucket number

2018-05-22 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24343: -- Description: When shuffle.partition > bucket number, Spark needs to shuffle the bucket table as per the

[jira] [Updated] (SPARK-24343) Avoid shuffle for the bucketed table when shuffle.partition > bucket number

2018-05-22 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24343: -- Description: When shuffle.partition > bucket number, Spark needs to shuffle the bucket table as per the

[jira] [Created] (SPARK-24343) Avoid shuffle for the bucketed table when shuffle.partition > bucket number

2018-05-22 Thread yucai (JIRA)
yucai created SPARK-24343: - Summary: Avoid shuffle for the bucketed table when shuffle.partition > bucket number Key: SPARK-24343 URL: https://issues.apache.org/jira/browse/SPARK-24343 Project: Spark

[jira] [Created] (SPARK-24087) Avoid shuffle when join keys are a super-set of bucket keys

2018-04-25 Thread yucai (JIRA)
yucai created SPARK-24087: - Summary: Avoid shuffle when join keys are a super-set of bucket keys Key: SPARK-24087 URL: https://issues.apache.org/jira/browse/SPARK-24087 Project: Spark Issue Type:

[jira] [Commented] (SPARK-24076) very bad performance when shuffle.partition = 8192

2018-04-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451743#comment-16451743 ] yucai commented on SPARK-24076: --- 1. When shuffle.partition = 8192, tuples in the same partition follows the

[jira] [Commented] (SPARK-24076) very bad performance when shuffle.partition = 8192

2018-04-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451728#comment-16451728 ] yucai commented on SPARK-24076: --- Root cause: very bad hash conflict in hashaggregate.

[jira] [Updated] (SPARK-24076) very bad performance when shuffle.partition = 8192

2018-04-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24076: -- Attachment: image-2018-04-25-14-29-39-958.png > very bad performance when shuffle.partition = 8192 >

[jira] [Commented] (SPARK-24076) very bad performance when shuffle.partition = 8192

2018-04-25 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451727#comment-16451727 ] yucai commented on SPARK-24076: --- The query example: {code:sql} insert overwrite table target_xxx SELECT

[jira] [Commented] (SPARK-24076) very bad performance when shuffle.partition = 8192

2018-04-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451540#comment-16451540 ] yucai commented on SPARK-24076: --- shuffle.partition = 8192 !p1.png! shuffle.partition = 8000 !p2.png! >

[jira] [Updated] (SPARK-24076) very bad performance when shuffle.partition = 8192

2018-04-24 Thread yucai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yucai updated SPARK-24076: -- Attachment: p2.png p1.png > very bad performance when shuffle.partition = 8192 >

[jira] [Created] (SPARK-24076) very bad performance when shuffle.partition = 8192

2018-04-24 Thread yucai (JIRA)
yucai created SPARK-24076: - Summary: very bad performance when shuffle.partition = 8192 Key: SPARK-24076 URL: https://issues.apache.org/jira/browse/SPARK-24076 Project: Spark Issue Type: Bug

  1   2   >