Re: Spark SQL parser bug?

Yin Huai Mon, 13 Oct 2014 07:19:38 -0700

Seems the reason that you got "wrong" results was caused by timezone.


The time in java.sql.Timestamp(long time) means "milliseconds since January
1, 1970, 00:00:00 *GMT*. A negative number is the number of milliseconds
before January 1, 1970, 00:00:00 *GMT*."

However, in ts>='1970-01-01 00:00:00', '1970-01-01 00:00:00' is using your
local timezone.

Thanks,

Yin


On Mon, Oct 13, 2014 at 9:58 AM, Mohammed Guller <moham...@glassbeam.com>
wrote:

>  Hi Cheng,
>
> I am using version 1.1.0.
>
>
>
> Looks like that bug was fixed sometime after 1.1.0 was released.
> Interestingly, I tried your code on 1.1.0 and it gives me a different
> (incorrect) result:
>
>
>
> case class T(a:String, ts:java.sql.Timestamp)
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
> import sqlContext.createSchemaRDD
>
> val data = sc.parallelize(10000::20000::Nil).map(i=> T(i.toString, new
> java.sql.Timestamp(i)))
>
> data.registerTempTable("x")
>
> val s = sqlContext.sql("select a from x where ts>='1970-01-01 00:00:00';")
>
>
>
> scala> s.collect
>
> res1: Array[org.apache.spark.sql.Row] = Array()
>
>
>
> Mohammed
>
>
>
> *From:* Cheng, Hao [mailto:hao.ch...@intel.com]
> *Sent:* Sunday, October 12, 2014 1:35 AM
> *To:* Mohammed Guller; Cheng Lian; user@spark.apache.org
>
> *Subject:* RE: Spark SQL parser bug?
>
>
>
> Hi, I couldn’t reproduce the bug with the latest master branch. Which
> version are you using? Can you also list data in the table “x”?
>
>
>
> case class T(a:String, ts:java.sql.Timestamp)
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
> import sqlContext.createSchemaRDD
>
> val data = sc.parallelize(10000::20000::Nil).map(i=> T(i.toString, new
> java.sql.Timestamp(i)))
>
> data.registerTempTable("x")
>
> val s = sqlContext.sql("select a from x where ts>='1970-01-01 00:00:00';")
>
> s.collect
>
>
>
> output:
>
> res1: Array[org.apache.spark.sql.Row] = Array([10000], [20000])
>
>
>
> Cheng Hao
>
>
>
> *From:* Mohammed Guller [mailto:moham...@glassbeam.com
> <moham...@glassbeam.com>]
> *Sent:* Sunday, October 12, 2014 12:06 AM
> *To:* Cheng Lian; user@spark.apache.org
> *Subject:* RE: Spark SQL parser bug?
>
>
>
> I tried even without the “T” and it still returns an empty result:
>
>
>
> scala> val sRdd = sqlContext.sql("select a from x where ts >= '2012-01-01
> 00:00:00';")
>
> sRdd: org.apache.spark.sql.SchemaRDD =
>
> SchemaRDD[35] at RDD at SchemaRDD.scala:103
>
> == Query Plan ==
>
> == Physical Plan ==
>
> Project [a#0]
>
> ExistingRdd [a#0,ts#1], MapPartitionsRDD[37] at mapPartitions at
> basicOperators.scala:208
>
>
>
> scala> sRdd.collect
>
> res10: Array[org.apache.spark.sql.Row] = Array()
>
>
>
>
>
> Mohammed
>
>
>
> *From:* Cheng Lian [mailto:lian.cs....@gmail.com <lian.cs....@gmail.com>]
> *Sent:* Friday, October 10, 2014 10:14 PM
> *To:* Mohammed Guller; user@spark.apache.org
> *Subject:* Re: Spark SQL parser bug?
>
>
>
> Hmm, there is a “T” in the timestamp string, which makes the string not a
> valid timestamp string representation. Internally Spark SQL uses
> java.sql.Timestamp.valueOf to cast a string to a timestamp.
>
> On 10/11/14 2:08 AM, Mohammed Guller wrote:
>
> scala> rdd.registerTempTable("x")
>
>
>
> scala> val sRdd = sqlContext.sql("select a from x where ts >= '2012-01-01
> *T*00:00:00';")
>
> sRdd: org.apache.spark.sql.SchemaRDD =
>
> SchemaRDD[4] at RDD at SchemaRDD.scala:103
>
> == Query Plan ==
>
> == Physical Plan ==
>
> Project [a#0]
>
> ExistingRdd [a#0,ts#1], MapPartitionsRDD[6] at mapPartitions at
> basicOperators.scala:208
>
>
>
> scala> sRdd.collect
>
> res2: Array[org.apache.spark.sql.Row] = Array()
>
>  
>

Re: Spark SQL parser bug?

Reply via email to