That explains it. Thanks! Mohammed
From: Yin Huai [mailto:huaiyin....@gmail.com] Sent: Monday, October 13, 2014 8:47 AM To: Mohammed Guller Cc: Cheng, Hao; Cheng Lian; user@spark.apache.org Subject: Re: Spark SQL parser bug? Yeah, it is not related to timezone. I think you hit this issue<https://issues.apache.org/jira/browse/SPARK-3173> and it was fixed after 1.1 release. On Mon, Oct 13, 2014 at 11:24 AM, Mohammed Guller <moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote: Good guess, but that is not the reason. Look at this code: scala> val data = sc.parallelize(1325548800000L::1335548800000L::Nil).map(i=> T(i.toString, new java.sql.Timestamp(i))) data: org.apache.spark.rdd.RDD[T] = MappedRDD[17] at map at <console>:17 scala> data.collect res3: Array[T] = Array(T(1325548800000,2012-01-02 16:00:00.0), T(1335548800000,2012-04-27 10:46:40.0)) scala> data.registerTempTable("x") scala> val s = sqlContext.sql("select a from x where ts>='1970-01-01 00:00:00';") s: org.apache.spark.sql.SchemaRDD = SchemaRDD[20] at RDD at SchemaRDD.scala:103 == Query Plan == == Physical Plan == Project [a#2] ExistingRdd [a#2,ts#3], MapPartitionsRDD[22] at mapPartitions at basicOperators.scala:208 scala> s.collect res5: Array[org.apache.spark.sql.Row] = Array() Mohammed From: Yin Huai [mailto:huaiyin....@gmail.com<mailto:huaiyin....@gmail.com>] Sent: Monday, October 13, 2014 7:19 AM To: Mohammed Guller Cc: Cheng, Hao; Cheng Lian; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Spark SQL parser bug? Seems the reason that you got "wrong" results was caused by timezone. The time in java.sql.Timestamp(long time) means "milliseconds since January 1, 1970, 00:00:00 GMT. A negative number is the number of milliseconds before January 1, 1970, 00:00:00 GMT." However, in ts>='1970-01-01 00:00:00', '1970-01-01 00:00:00' is using your local timezone. Thanks, Yin On Mon, Oct 13, 2014 at 9:58 AM, Mohammed Guller <moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote: Hi Cheng, I am using version 1.1.0. Looks like that bug was fixed sometime after 1.1.0 was released. Interestingly, I tried your code on 1.1.0 and it gives me a different (incorrect) result: case class T(a:String, ts:java.sql.Timestamp) val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD val data = sc.parallelize(10000::20000::Nil).map(i=> T(i.toString, new java.sql.Timestamp(i))) data.registerTempTable("x") val s = sqlContext.sql("select a from x where ts>='1970-01-01 00:00:00';") scala> s.collect res1: Array[org.apache.spark.sql.Row] = Array() Mohammed From: Cheng, Hao [mailto:hao.ch...@intel.com<mailto:hao.ch...@intel.com>] Sent: Sunday, October 12, 2014 1:35 AM To: Mohammed Guller; Cheng Lian; user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: Spark SQL parser bug? Hi, I couldn’t reproduce the bug with the latest master branch. Which version are you using? Can you also list data in the table “x”? case class T(a:String, ts:java.sql.Timestamp) val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.createSchemaRDD val data = sc.parallelize(10000::20000::Nil).map(i=> T(i.toString, new java.sql.Timestamp(i))) data.registerTempTable("x") val s = sqlContext.sql("select a from x where ts>='1970-01-01 00:00:00';") s.collect output: res1: Array[org.apache.spark.sql.Row] = Array([10000], [20000]) Cheng Hao From: Mohammed Guller [mailto:moham...@glassbeam.com] Sent: Sunday, October 12, 2014 12:06 AM To: Cheng Lian; user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: Spark SQL parser bug? I tried even without the “T” and it still returns an empty result: scala> val sRdd = sqlContext.sql("select a from x where ts >= '2012-01-01 00:00:00';") sRdd: org.apache.spark.sql.SchemaRDD = SchemaRDD[35] at RDD at SchemaRDD.scala:103 == Query Plan == == Physical Plan == Project [a#0] ExistingRdd [a#0,ts#1], MapPartitionsRDD[37] at mapPartitions at basicOperators.scala:208 scala> sRdd.collect res10: Array[org.apache.spark.sql.Row] = Array() Mohammed From: Cheng Lian [mailto:lian.cs....@gmail.com] Sent: Friday, October 10, 2014 10:14 PM To: Mohammed Guller; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Spark SQL parser bug? Hmm, there is a “T” in the timestamp string, which makes the string not a valid timestamp string representation. Internally Spark SQL uses java.sql.Timestamp.valueOf to cast a string to a timestamp. On 10/11/14 2:08 AM, Mohammed Guller wrote: scala> rdd.registerTempTable("x") scala> val sRdd = sqlContext.sql("select a from x where ts >= '2012-01-01T00:00:00';") sRdd: org.apache.spark.sql.SchemaRDD = SchemaRDD[4] at RDD at SchemaRDD.scala:103 == Query Plan == == Physical Plan == Project [a#0] ExistingRdd [a#0,ts#1], MapPartitionsRDD[6] at mapPartitions at basicOperators.scala:208 scala> sRdd.collect res2: Array[org.apache.spark.sql.Row] = Array()