RE: [SparkSQL] Incorrect ROLLUP results

Cheng, Hao Thu, 09 Jul 2015 18:16:08 -0700

Yes, this is a bug, do you mind to create a jira issue for this? I will fix 
this asap.

BTW, what’s your spark version?

From: Yana Kadiyska [mailto:yana.kadiy...@gmail.com]
Sent: Friday, July 10, 2015 12:16 AM
To: ayan guha
Cc: user
Subject: Re: [SparkSQL] Incorrect ROLLUP results

+---+---+---+

|cnt|_c1|grp|

+---+---+---+

|  1| 31|  0|

|  1| 31|  1|

|  1|  4|  0|

|  1|  4|  1|

|  1| 42|  0|

|  1| 42|  1|

|  1| 15|  0|

|  1| 15|  1|

|  1| 26|  0|

|  1| 26|  1|

|  1| 37|  0|

|  1| 10|  0|

|  1| 37|  1|

|  1| 10|  1|

|  1| 48|  0|

|  1| 21|  0|

|  1| 48|  1|

|  1| 21|  1|

|  1| 32|  0|

|  1| 32|  1|

+---+---+---+

On Thu, Jul 9, 2015 at 11:54 AM, ayan guha 
<guha.a...@gmail.com<mailto:guha.a...@gmail.com>> wrote:

Can you please post result of show()?
On 10 Jul 2015 01:00, "Yana Kadiyska" 
<yana.kadiy...@gmail.com<mailto:yana.kadiy...@gmail.com>> wrote:
Hi folks, I just re-wrote a query from using UNION ALL to use "with rollup" and 
I'm seeing some unexpected behavior. I'll open a JIRA if needed but wanted to 
check if this is user error. Here is my code:

case class KeyValue(key: Int, value: String)

val df = sc.parallelize(1 to 50).map(i=>KeyValue(i, i.toString)).toDF

df.registerTempTable("foo")

sqlContext.sql(“select count(*) as cnt, value as key,GROUPING__ID from foo 
group by value with rollup”).show(100)

sqlContext.sql(“select count(*) as cnt, key % 100 as key,GROUPING__ID from foo 
group by key%100 with rollup”).show(100)

Grouping by value does the right thing, I get one group 0 with the overall 
count. But grouping by expression (key%100) produces weird results -- appears 
that group 1 results are replicated as group 0. Am I doing something wrong or 
is this a bug?

RE: [SparkSQL] Incorrect ROLLUP results

Reply via email to