Re: Spark SQL incorrect result on GROUP BY query

Michael Armbrust Thu, 12 Jun 2014 00:33:23 -0700

Thanks for verifying!


On Thu, Jun 12, 2014 at 12:28 AM, Pei-Lun Lee <pl...@appier.com> wrote:

> I reran with master and looks like it is fixed.
>
>
>
> 2014-06-12 1:26 GMT+08:00 Michael Armbrust <mich...@databricks.com>:
>
> I'd try rerunning with master.  It is likely you are running into
>> SPARK-1994 <https://issues.apache.org/jira/browse/SPARK-1994>.
>>
>> Michael
>>
>>
>> On Wed, Jun 11, 2014 at 3:01 AM, Pei-Lun Lee <pl...@appier.com> wrote:
>>
>>> Hi,
>>>
>>> I am using spark 1.0.0 and found in spark sql some queries use GROUP BY
>>> give weird results.
>>> To reproduce, type the following commands in spark-shell connecting to a
>>> standalone server:
>>>
>>> case class Foo(k: String, v: Int)
>>> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>> import sqlContext._
>>> val rows = List.fill(100)(Foo("a", 1)) ++ List.fill(200)(Foo("b", 2)) ++
>>> List.fill(300)(Foo("c", 3))
>>> sc.makeRDD(rows).registerAsTable("foo")
>>> sql("select k,count(*) from foo group by k").collect
>>>
>>> the result will be something random like:
>>> res1: Array[org.apache.spark.sql.Row] = Array([b,180], [3,18], [a,75],
>>> [c,270], [4,56], [1,1])
>>>
>>> and if I run the same query again, the new result will be correct:
>>> sql("select k,count(*) from foo group by k").collect
>>> res2: Array[org.apache.spark.sql.Row] = Array([b,200], [a,100], [c,300])
>>>
>>> Should I file a bug?
>>>
>>> --
>>> Pei-Lun Lee
>>>
>>
>>
>

Re: Spark SQL incorrect result on GROUP BY query

Reply via email to