[
https://issues.apache.org/jira/browse/SPARK-29768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968028#comment-16968028
]
yucai commented on SPARK-29768:
---
[~smilegator] [~wenchen], is it an issue or work as desgin?
>
yucai created SPARK-29768:
-
Summary: nondeterministic expression fails column pruning
Key: SPARK-29768
URL: https://issues.apache.org/jira/browse/SPARK-29768
Project: Spark
Issue Type: Bug
yucai created SPARK-26909:
-
Summary: use unsafeRow.hashCode() as hash value in HashAggregate
Key: SPARK-26909
URL: https://issues.apache.org/jira/browse/SPARK-26909
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-26909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-26909:
--
Description:
This is a followup PR for #21149.
New way uses unsafeRow.hashCode() as hash value in
[
https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25864:
--
Description:
Set main args correctly in BenchmarkBase, to make it accessible for its
subclass.
It will
[
https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25864:
--
Description:
Set main args correctly in BenchmarkBase, to make it accessible for its
subclass.
It will
[
https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25864:
--
Summary: Make main args set correctly in BenchmarkBase (was: Make mainArgs
correctly set in BenchmarkBase)
[
https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25864:
--
Description:
Save main args correctly in BenchmarkBase, to make it accessible for its
subclass.
It will
[
https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25864:
--
Description:
Make mainArgs correctly set in BenchmarkBase, it will benefit:
*
[
https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25864:
--
Description:
Make mainArgs correctly set in BenchmarkBase, it will benefit:
-
[
https://issues.apache.org/jira/browse/SPARK-25864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25864:
--
Issue Type: Sub-task (was: Bug)
Parent: SPARK-25475
> Make mainArgs correctly set in BenchmarkBase
>
yucai created SPARK-25864:
-
Summary: Make mainArgs correctly set in BenchmarkBase
Key: SPARK-25864
URL: https://issues.apache.org/jira/browse/SPARK-25864
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-25663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1148#comment-1148
]
yucai commented on SPARK-25663:
---
[~Gengliang.Wang] I make an improvement on this, could you help review?
yucai created SPARK-25850:
-
Summary: Make the split threshold for the code generated method
configurable
Key: SPARK-25850
URL: https://issues.apache.org/jira/browse/SPARK-25850
Project: Spark
Issue
[
https://issues.apache.org/jira/browse/SPARK-25676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663480#comment-16663480
]
yucai commented on SPARK-25676:
---
I am working on this.
> Refactor BenchmarkWideTable to use main method
>
yucai created SPARK-25508:
-
Summary: Refactor OrcReadBenchmark to use main method
Key: SPARK-25508
URL: https://issues.apache.org/jira/browse/SPARK-25508
Project: Spark
Issue Type: Sub-task
yucai created SPARK-25486:
-
Summary: Refactor SortBenchmark to use main method
Key: SPARK-25486
URL: https://issues.apache.org/jira/browse/SPARK-25486
Project: Spark
Issue Type: Sub-task
yucai created SPARK-25485:
-
Summary: Refactor UnsafeProjectionBenchmark to use main method
Key: SPARK-25485
URL: https://issues.apache.org/jira/browse/SPARK-25485
Project: Spark
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/SPARK-25481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25481:
--
Issue Type: Sub-task (was: Bug)
Parent: SPARK-25475
> Refactor ColumnarBatchBenchmark to use main
yucai created SPARK-25481:
-
Summary: Refactor ColumnarBatchBenchmark to use main method
Key: SPARK-25481
URL: https://issues.apache.org/jira/browse/SPARK-25481
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-23207:
--
Description:
Currently shuffle repartition uses RoundRobinPartitioning, the generated result
is
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai resolved SPARK-25206.
---
Resolution: Won't Fix
Not backport to 2.3 as per [~cloud_fan]'s summary, closed.
> wrong records are
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598113#comment-16598113
]
yucai commented on SPARK-25206:
---
Based on our discussion in
[
https://issues.apache.org/jira/browse/SPARK-25281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597236#comment-16597236
]
yucai commented on SPARK-25281:
---
cc [~smilegator], [~cloud_fan].
> Add tests to check the behavior when
[
https://issues.apache.org/jira/browse/SPARK-25281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597217#comment-16597217
]
yucai commented on SPARK-25281:
---
[~seancxmao] , since you have done many tests in PR22184 and SPARK-25175,
yucai created SPARK-25281:
-
Summary: Add tests to check the behavior when the physical schema
and logical schema use difference cases
Key: SPARK-25281
URL: https://issues.apache.org/jira/browse/SPARK-25281
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595298#comment-16595298
]
yucai edited comment on SPARK-25206 at 8/28/18 5:06 PM:
Do you want to simulate
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595298#comment-16595298
]
yucai edited comment on SPARK-25206 at 8/28/18 5:05 PM:
Do you want to simulate
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595298#comment-16595298
]
yucai commented on SPARK-25206:
---
Do you want to simulate an Exception in Spark?
Backporting
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594958#comment-16594958
]
yucai commented on SPARK-25206:
---
[~smilegator] , 2.1's exception is from parquet.
{code:java}
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Description:
In current Spark 2.3.1, below query returns wrong data silently.
{code:java}
[
https://issues.apache.org/jira/browse/SPARK-25175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593152#comment-16593152
]
yucai commented on SPARK-25175:
---
I pinged [~seancxmao] offline, he will give more details.
>
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Summary: wrong records are returned when Hive metastore schema and parquet
schema are in different letter
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Description:
In current Spark 2.3.1, below query returns wrong data silently.
{code:java}
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Description:
In current Spark 2.3.1, below query returns wrong data silently.
{code:java}
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Summary: data issue when Hive metastore schema and parquet schema are in
different letter cases (was: data
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Summary: data issue when Hive metastore schema and parquet schema have
different letter case (was: data
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Description:
In current Spark 2.3.1, below query returns wrong data silently.
{code:java}
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Summary: data issue when (was: data issue because wrong column is
pushdown for parquet)
> data issue when
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Description:
In current Spark 2.3.1, below query returns wrong data silently.
{code:java}
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Description:
In current Spark 2.3.1, below query returns wrong data silently.
{code:java}
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Description:
In current Spark 2.3.1, below query returns wrong data silently.
{code:java}
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593126#comment-16593126
]
yucai edited comment on SPARK-25206 at 8/27/18 2:27 AM:
[~dongjoon], because of
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593126#comment-16593126
]
yucai commented on SPARK-25206:
---
[~dongjoon], because of the below root cause
{quote}Spark pushdowns
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Description:
In current Spark 2.3.1, below query returns wrong data silently.
{code:java}
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Summary: data issue because wrong column is pushdown for parquet (was:
Wrong data may be returned for
[
https://issues.apache.org/jira/browse/SPARK-25207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593110#comment-16593110
]
yucai commented on SPARK-25207:
---
[~dongjoon] , sorry if I am confusing you.
This bug is created for
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593102#comment-16593102
]
yucai commented on SPARK-25206:
---
I am OK with "known correctness bug in 2.3" way, just raise some concern
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593100#comment-16593100
]
yucai commented on SPARK-25206:
---
[~smilegator] , sure, I will add tests.
If we don't backport
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592453#comment-16592453
]
yucai edited comment on SPARK-25206 at 8/25/18 5:01 AM:
{quote} # Vanilla Spark
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592460#comment-16592460
]
yucai commented on SPARK-25206:
---
[~dongjoon] , thanks a lot for so many explanations, if we both agree to
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592453#comment-16592453
]
yucai commented on SPARK-25206:
---
{quote} # Vanilla Spark 2.2.0 ~ 2.3.1 always returns NULL for Parquet
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592425#comment-16592425
]
yucai edited comment on SPARK-25206 at 8/25/18 3:33 AM:
[~dongjoon] , correct me
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592425#comment-16592425
]
yucai commented on SPARK-25206:
---
[~dongjoon] , correct me if I am wrong.
{code:java}
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592406#comment-16592406
]
yucai commented on SPARK-25206:
---
Not a simple duplication.
Backport -SPARK-25132-, but without
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592392#comment-16592392
]
yucai commented on SPARK-25206:
---
[~dongjoon] , the reason you see `null` without predicate pushdown, it is
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592390#comment-16592390
]
yucai commented on SPARK-25206:
---
Link to SPARK-25132, this bug needs two PRs backport.
> Wrong data may
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592384#comment-16592384
]
yucai commented on SPARK-25206:
---
[~dongjoon], I still think this bug is related to pushdown, but
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Attachment: image-2018-08-25-10-04-21-901.png
> Wrong data may be returned for Parquet
>
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Attachment: image-2018-08-25-09-54-53-219.png
> Wrong data may be returned for Parquet
>
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591756#comment-16591756
]
yucai commented on SPARK-25206:
---
[~cloud_fan] , we need both [https://github.com/apache/spark/pull/21696]
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Attachment: image-2018-08-24-22-46-05-346.png
> Wrong data may be returned when enable pushdown
>
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Attachment: image-2018-08-24-22-34-11-539.png
> Wrong data may be returned when enable pushdown
>
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Attachment: image-2018-08-24-22-33-03-231.png
> Wrong data may be returned when enable pushdown
>
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Attachment: pr22183.png
> Wrong data may be returned when enable pushdown
>
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Attachment: image-2018-08-24-18-05-23-485.png
> Wrong data may be returned when enable pushdown
>
yucai created SPARK-25207:
-
Summary: Case-insensitve field resolution for filter pushdown when
reading Parquet
Key: SPARK-25207
URL: https://issues.apache.org/jira/browse/SPARK-25207
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25206:
--
Description:
In current Spark 2.3.1, below query returns wrong data silently.
{code:java}
yucai created SPARK-25206:
-
Summary: Wrong data may be returned when enable pushdown
Key: SPARK-25206
URL: https://issues.apache.org/jira/browse/SPARK-25206
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-25132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583145#comment-16583145
]
yucai commented on SPARK-25132:
---
[~cloud_fan] [~smilegator] [~budde] [~ekhliang], do you have any insight?
[
https://issues.apache.org/jira/browse/SPARK-25132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582667#comment-16582667
]
yucai commented on SPARK-25132:
---
If Spark allows data source case insensitive, query t2 should return
[
https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576417#comment-16576417
]
yucai edited comment on SPARK-25084 at 8/10/18 3:17 PM:
[~smilegator],
[
https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576417#comment-16576417
]
yucai commented on SPARK-25084:
---
[~smilegator][~jerryshao]
Thanks a lot for marking it blocker.
A lot of
[
https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-25084:
--
Description:
Test Query:
{code:java}
select * from store_sales distribute by (ss_sold_time_sk, ss_item_sk,
[
https://issues.apache.org/jira/browse/SPARK-25084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575808#comment-16575808
]
yucai commented on SPARK-25084:
---
It is a regression, when the generated codes size is more than 1024,
yucai created SPARK-25084:
-
Summary: "distribute by" on multiple columns may lead to codegen
issue
Key: SPARK-25084
URL: https://issues.apache.org/jira/browse/SPARK-25084
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556820#comment-16556820
]
yucai commented on SPARK-24925:
---
[~cloud_fan], [~xiaoli] , [~kiszk] , any comments?
> input bytesRead
[
https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556818#comment-16556818
]
yucai commented on SPARK-24925:
---
I think there could be two issues.
In FileScanRDD
1. ColumnarBatch's
[
https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-24925:
--
Attachment: bytesRead.gif
> input bytesRead metrics fluctuate from time to time
>
[
https://issues.apache.org/jira/browse/SPARK-24925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-24925:
--
Description:
input bytesRead metrics fluctuate from time to time, it is worse when pushdown
enabled.
Query
yucai created SPARK-24925:
-
Summary: input bytesRead metrics fluctuate from time to time
Key: SPARK-24925
URL: https://issues.apache.org/jira/browse/SPARK-24925
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-24832:
--
Summary: Improve inputMetrics's bytesRead update for ColumnarBatch (was:
When pushdown enabled, input
[
https://issues.apache.org/jira/browse/SPARK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-24832:
--
Summary: When pushdown enabled, input bytesRead metrics is easy to
fluctuate from time to time (was: Improve
[
https://issues.apache.org/jira/browse/SPARK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546371#comment-16546371
]
yucai commented on SPARK-24832:
---
Currently, ColumnarBatch's bytesRead need to be updated for every 4096 *
yucai created SPARK-24832:
-
Summary: Improve inputMetrics's bytesRead update for ColumnarBatch
Key: SPARK-24832
URL: https://issues.apache.org/jira/browse/SPARK-24832
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-24556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-24556:
--
Description:
Currently, ReusedExchange would rewrite output partitioning if child's
partitioning is
[
https://issues.apache.org/jira/browse/SPARK-24556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-24556:
--
Description:
Currently, ReusedExchange would rewrite output partitioning if child's
partitioning is
yucai created SPARK-24556:
-
Summary: ReusedExchange should rewrite output partitioning also
when child's partitioning is RangePartitioning
Key: SPARK-24556
URL: https://issues.apache.org/jira/browse/SPARK-24556
[
https://issues.apache.org/jira/browse/SPARK-24343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-24343:
--
Description:
When shuffle.partition > bucket number, Spark needs to shuffle the bucket table
as per the
[
https://issues.apache.org/jira/browse/SPARK-24343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-24343:
--
Description:
When shuffle.partition > bucket number, Spark needs to shuffle the bucket table
as per the
[
https://issues.apache.org/jira/browse/SPARK-24343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-24343:
--
Description:
When shuffle.partition > bucket number, Spark needs to shuffle the bucket table
as per the
yucai created SPARK-24343:
-
Summary: Avoid shuffle for the bucketed table when
shuffle.partition > bucket number
Key: SPARK-24343
URL: https://issues.apache.org/jira/browse/SPARK-24343
Project: Spark
yucai created SPARK-24087:
-
Summary: Avoid shuffle when join keys are a super-set of bucket
keys
Key: SPARK-24087
URL: https://issues.apache.org/jira/browse/SPARK-24087
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451743#comment-16451743
]
yucai commented on SPARK-24076:
---
1. When shuffle.partition = 8192, tuples in the same partition follows the
[
https://issues.apache.org/jira/browse/SPARK-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451728#comment-16451728
]
yucai commented on SPARK-24076:
---
Root cause: very bad hash conflict in hashaggregate.
[
https://issues.apache.org/jira/browse/SPARK-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-24076:
--
Attachment: image-2018-04-25-14-29-39-958.png
> very bad performance when shuffle.partition = 8192
>
[
https://issues.apache.org/jira/browse/SPARK-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451727#comment-16451727
]
yucai commented on SPARK-24076:
---
The query example:
{code:sql}
insert overwrite table target_xxx
SELECT
[
https://issues.apache.org/jira/browse/SPARK-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451540#comment-16451540
]
yucai commented on SPARK-24076:
---
shuffle.partition = 8192
!p1.png!
shuffle.partition = 8000
!p2.png!
>
[
https://issues.apache.org/jira/browse/SPARK-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yucai updated SPARK-24076:
--
Attachment: p2.png
p1.png
> very bad performance when shuffle.partition = 8192
>
yucai created SPARK-24076:
-
Summary: very bad performance when shuffle.partition = 8192
Key: SPARK-24076
URL: https://issues.apache.org/jira/browse/SPARK-24076
Project: Spark
Issue Type: Bug
1 - 100 of 171 matches
Mail list logo