subject:"\[jira\] \[Updated\] \(SPARK\-11580\) Just do final aggregation when there is no Exchange"

[jira] [Updated] (SPARK-11580) Just do final aggregation when there is no Exchange

2015-11-09 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-11580:
-
Target Version/s:   (was: 1.6.0)

> Just do final aggregation when there is no Exchange
> ---
>
> Key: SPARK-11580
> URL: https://issues.apache.org/jira/browse/SPARK-11580
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Yadong Qi
>
> I do the SQL as below:
> {code}
> cache table src as select * from src distribute by key;
> select key, count(value) from src group by key;
> {code}
> and the Physical Plan is 
> {code}
> TungstenAggregate(key=[key#0], 
> functions=[(count(value#1),mode=Final,isDistinct=false)], 
> output=[key#0,_c1#28L])
>  TungstenAggregate(key=[key#0], 
> functions=[(count(value#1),mode=Partial,isDistinct=false)], 
> output=[key#0,currentCount#41L])
>   InMemoryColumnarTableScan [key#0,value#1], (InMemoryRelation 
> [key#0,value#1], true, 1, StorageLevel(true, true, false, true, 1), 
> (TungstenExchange hashpartitioning(key#0)), Some(src))
> {code}
> I think if there is no *Exchange*, just do final aggregation is better, like:
> {code}
> TungstenAggregate(key=[key#0], 
> functions=[(count(value#1),mode=Final,isDistinct=false)], 
> output=[key#0,_c1#28L])
>   InMemoryColumnarTableScan [key#0,value#1], (InMemoryRelation 
> [key#0,value#1], true, 1, StorageLevel(true, true, false, true, 1), 
> (TungstenExchange hashpartitioning(key#0)), Some(src))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-11580) Just do final aggregation when there is no Exchange

2015-11-08 Thread Yadong Qi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yadong Qi updated SPARK-11580:
--
Description: 
I do the SQL as below:
{code}
cache table src as select * from src distribute by key;
select key, count(value) from src group by key;
{code}
and the Physical Plan is 
{code}
TungstenAggregate(key=[key#0], 
functions=[(count(value#1),mode=Final,isDistinct=false)], 
output=[key#0,_c1#28L])
 TungstenAggregate(key=[key#0], 
functions=[(count(value#1),mode=Partial,isDistinct=false)], 
output=[key#0,currentCount#41L])
  InMemoryColumnarTableScan [key#0,value#1], (InMemoryRelation [key#0,value#1], 
true, 1, StorageLevel(true, true, false, true, 1), (TungstenExchange 
hashpartitioning(key#0)), Some(src))
{code}
I think if there is no *Exchange*, just do final aggregation is better, like:
{code}
TungstenAggregate(key=[key#0], 
functions=[(count(value#1),mode=Final,isDistinct=false)], 
output=[key#0,_c1#28L])
  InMemoryColumnarTableScan [key#0,value#1], (InMemoryRelation [key#0,value#1], 
true, 1, StorageLevel(true, true, false, true, 1), (TungstenExchange 
hashpartitioning(key#0)), Some(src))
{code}

  was:
I do the SQL as below:
{code}
cache table src as select * from src distribute by key;
select key, count(value) from src group by key;
{code}
and the Physical Plan is 
{code}
TungstenAggregate(key=[key#0], 
functions=[(count(value#1),mode=Final,isDistinct=false)], 
output=[key#0,_c1#28L])
 TungstenAggregate(key=[key#0], 
functions=[(count(value#1),mode=Partial,isDistinct=false)], 
output=[key#0,currentCount#41L])
  InMemoryColumnarTableScan [key#0,value#1], (InMemoryRelation [key#0,value#1], 
true, 1, StorageLevel(true, true, false, true, 1), (TungstenExchange 
hashpartitioning(key#0)), Some(src))
{code}
I think if there is no Exchange, just do final aggregation is better, like:
{code}
TungstenAggregate(key=[key#0], 
functions=[(count(value#1),mode=Final,isDistinct=false)], 
output=[key#0,_c1#28L])
  InMemoryColumnarTableScan [key#0,value#1], (InMemoryRelation [key#0,value#1], 
true, 1, StorageLevel(true, true, false, true, 1), (TungstenExchange 
hashpartitioning(key#0)), Some(src))
{code}


> Just do final aggregation when there is no Exchange
> ---
>
> Key: SPARK-11580
> URL: https://issues.apache.org/jira/browse/SPARK-11580
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Yadong Qi
>
> I do the SQL as below:
> {code}
> cache table src as select * from src distribute by key;
> select key, count(value) from src group by key;
> {code}
> and the Physical Plan is 
> {code}
> TungstenAggregate(key=[key#0], 
> functions=[(count(value#1),mode=Final,isDistinct=false)], 
> output=[key#0,_c1#28L])
>  TungstenAggregate(key=[key#0], 
> functions=[(count(value#1),mode=Partial,isDistinct=false)], 
> output=[key#0,currentCount#41L])
>   InMemoryColumnarTableScan [key#0,value#1], (InMemoryRelation 
> [key#0,value#1], true, 1, StorageLevel(true, true, false, true, 1), 
> (TungstenExchange hashpartitioning(key#0)), Some(src))
> {code}
> I think if there is no *Exchange*, just do final aggregation is better, like:
> {code}
> TungstenAggregate(key=[key#0], 
> functions=[(count(value#1),mode=Final,isDistinct=false)], 
> output=[key#0,_c1#28L])
>   InMemoryColumnarTableScan [key#0,value#1], (InMemoryRelation 
> [key#0,value#1], true, 1, StorageLevel(true, true, false, true, 1), 
> (TungstenExchange hashpartitioning(key#0)), Some(src))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-11580) Just do final aggregation when there is no Exchange

[jira] [Updated] (SPARK-11580) Just do final aggregation when there is no Exchange

2 matches

Site Navigation

Mail list logo

Footer information