Thanks for verifying!
On Thu, Jun 12, 2014 at 12:28 AM, Pei-Lun Lee <pl...@appier.com> wrote: > I reran with master and looks like it is fixed. > > > > 2014-06-12 1:26 GMT+08:00 Michael Armbrust <mich...@databricks.com>: > > I'd try rerunning with master. It is likely you are running into >> SPARK-1994 <https://issues.apache.org/jira/browse/SPARK-1994>. >> >> Michael >> >> >> On Wed, Jun 11, 2014 at 3:01 AM, Pei-Lun Lee <pl...@appier.com> wrote: >> >>> Hi, >>> >>> I am using spark 1.0.0 and found in spark sql some queries use GROUP BY >>> give weird results. >>> To reproduce, type the following commands in spark-shell connecting to a >>> standalone server: >>> >>> case class Foo(k: String, v: Int) >>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) >>> import sqlContext._ >>> val rows = List.fill(100)(Foo("a", 1)) ++ List.fill(200)(Foo("b", 2)) ++ >>> List.fill(300)(Foo("c", 3)) >>> sc.makeRDD(rows).registerAsTable("foo") >>> sql("select k,count(*) from foo group by k").collect >>> >>> the result will be something random like: >>> res1: Array[org.apache.spark.sql.Row] = Array([b,180], [3,18], [a,75], >>> [c,270], [4,56], [1,1]) >>> >>> and if I run the same query again, the new result will be correct: >>> sql("select k,count(*) from foo group by k").collect >>> res2: Array[org.apache.spark.sql.Row] = Array([b,200], [a,100], [c,300]) >>> >>> Should I file a bug? >>> >>> -- >>> Pei-Lun Lee >>> >> >> >