Re: Spark SQL incorrect result on GROUP BY query

Michael Armbrust Wed, 11 Jun 2014 10:27:35 -0700

I'd try rerunning with master.  It is likely you are running into SPARK-1994
<https://issues.apache.org/jira/browse/SPARK-1994>.


Michael


On Wed, Jun 11, 2014 at 3:01 AM, Pei-Lun Lee <pl...@appier.com> wrote:

> Hi,
>
> I am using spark 1.0.0 and found in spark sql some queries use GROUP BY
> give weird results.
> To reproduce, type the following commands in spark-shell connecting to a
> standalone server:
>
> case class Foo(k: String, v: Int)
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext._
> val rows = List.fill(100)(Foo("a", 1)) ++ List.fill(200)(Foo("b", 2)) ++
> List.fill(300)(Foo("c", 3))
> sc.makeRDD(rows).registerAsTable("foo")
> sql("select k,count(*) from foo group by k").collect
>
> the result will be something random like:
> res1: Array[org.apache.spark.sql.Row] = Array([b,180], [3,18], [a,75],
> [c,270], [4,56], [1,1])
>
> and if I run the same query again, the new result will be correct:
> sql("select k,count(*) from foo group by k").collect
> res2: Array[org.apache.spark.sql.Row] = Array([b,200], [a,100], [c,300])
>
> Should I file a bug?
>
> --
> Pei-Lun Lee
>

Re: Spark SQL incorrect result on GROUP BY query

Reply via email to