Re: SparkSQL Timestamp query failure

2014-11-23 Thread whitebread
Cheng thanks,

thanks to you I found out that the problem as you guessed was a precision one. 

2012-10-08 16:10:36 instead of 2012-10-08 16:10:36.0


Thanks again.

Alessandro


 On Nov 23, 2014, at 11:10 PM, Cheng, Hao [via Apache Spark User List] 
 ml-node+s1001560n19613...@n3.nabble.com wrote:
 
 Can you try query like “SELECT timestamp, CAST(timestamp as string) FROM logs 
 LIMIT 5”, I guess you probably ran into the timestamp precision or the 
 timezone shifting problem.
 
  
 
 (And it’s not mandatory, but you’d better change the field name from 
 “timestamp” to something else, as “timestamp” is the keyword of data type in 
 Hive/Spark SQL.)
 
   
 From: Alessandro Panebianco [mailto:[hidden email] 
 x-msg://2/user/SendEmail.jtp?type=nodenode=19613i=0] 
 Sent: Monday, November 24, 2014 11:12 AM
 To: Wang, Daoyuan
 Cc: [hidden email] x-msg://2/user/SendEmail.jtp?type=nodenode=19613i=1
 Subject: Re: SparkSQL Timestamp query failure
 
  
 
 Hey Daoyuan, 
 
  
 
 following your suggestion I obtain the same result as when I do:
 
  
 
 where l.timestamp = '2012-10-08 16:10:36.0’
 
  
 
 what happens using either your suggestion or simply using single quotes as I 
 just typed in the example before is that the query does not fail but it 
 doesn’t return anything either as it should.
 
  
 
 If I do a simple :
 
  
 
 SELECT timestamp FROM Logs limit 5).collect.foreach(println) 
 
  
 
 I get: 
 
  
 
 [2012-10-08 16:10:36.0]
 
 [2012-10-08 16:10:36.0]
 
 [2012-10-08 16:10:36.0]
 
 [2012-10-08 16:10:41.0]
 
 [2012-10-08 16:10:41.0]
 
  
 
 that is why I am sure that putting one of those timestamps should not return 
 an empty arrray.
 
  
 
 Id really love to find a solution to this problem. Since Spark supports 
 Timestamp it should provide simple comparison actions with them in my opinion.
 
  
 
 Any other help would be greatly appreciated.
 
  
 
 Alessandro
 
  
 
  
 
  
 
  
 
 On Nov 23, 2014, at 8:10 PM, Wang, Daoyuan [hidden email] 
 x-msg://2/user/SendEmail.jtp?type=nodenode=19613i=2 wrote:
 
  
 
 Hi,
 
 I think you can try
 cast(l.timestamp as string)='2012-10-08 16:10:36.0'
 
 Thanks,
 Daoyuan
 
 -Original Message-
 From: whitebread [[hidden email] 
 x-msg://2/user/SendEmail.jtp?type=nodenode=19613i=3] 
 Sent: Sunday, November 23, 2014 12:11 AM
 To: [hidden email] x-msg://2/user/SendEmail.jtp?type=nodenode=19613i=4
 Subject: Re: SparkSQL Timestamp query failure
 
 Thanks for your answer Akhil, 
 
 I have already tried that and the query actually doesn't fail but it doesn't 
 return anything either as it should.
 Using single quotes I think it reads it as a string and not as a timestamp. 
 
 I don't know how to solve this. Any other hint by any chance?
 
 Thanks,
 
 Alessandro
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Timestamp-query-failure-tp19502p19554.html
  
 http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Timestamp-query-failure-tp19502p19554.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com 
 http://nabble.com/.
 
 -
 To unsubscribe, e-mail: [hidden email] 
 x-msg://2/user/SendEmail.jtp?type=nodenode=19613i=5 For additional 
 commands, e-mail: [hidden email] 
 x-msg://2/user/SendEmail.jtp?type=nodenode=19613i=6
  
 
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Timestamp-query-failure-tp19502p19613.html
  
 http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Timestamp-query-failure-tp19502p19613.html
 To unsubscribe from SparkSQL Timestamp query failure, click here 
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=19502code=YWxlLnBhbmViaWFuY29AbWUuY29tfDE5NTAyfC00MjA1ODk4MTE=.
 NAML 
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Timestamp-query-failure-tp19502p19616.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: SparkSQL Timestamp query failure

2014-11-22 Thread whitebread
Thanks for your answer Akhil, 

I have already tried that and the query actually doesn't fail but it doesn't
return anything either as it should.
Using single quotes I think it reads it as a string and not as a timestamp. 

I don't know how to solve this. Any other hint by any chance?

Thanks,

Alessandro



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Timestamp-query-failure-tp19502p19554.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



SparkSQL Timestamp query failure

2014-11-21 Thread whitebread
Hi all,

I put some log files into sql tables through Spark and my schema looks like
this:

 |-- timestamp: timestamp (nullable = true)
 |-- c_ip: string (nullable = true)
 |-- cs_username: string (nullable = true)
 |-- s_ip: string (nullable = true)
 |-- s_port: string (nullable = true)
 |-- cs_method: string (nullable = true)
 |-- cs_uri_stem: string (nullable = true)
 |-- cs_query: string (nullable = true)
 |-- sc_status: integer (nullable = false)
 |-- sc_bytes: integer (nullable = false)
 |-- cs_bytes: integer (nullable = false)
 |-- time_taken: integer (nullable = false)
 |-- User_Agent: string (nullable = true)
 |-- Referrer: string (nullable = true)

As you can notice I created a timestamp field which I read is supported by
Spark (Date wouldn't work as far as I understood). I would love to use for
queries like where timestamp(2012-10-08 16:10:36.0) but when I run it I
keep getting errors.
I tried these 2 following sintax forms:
For the second one I parse a string so Im sure Im actually pass it in a
timestamp format.
I use 2 functions: /parse/ and  /date2timestamp/.

*Any hint on how I should handle timestamp values?* 

Thanks,

Alessandro

1)
scala sqlContext.sql(SELECT * FROM Logs as l where l.timestamp=(2012-10-08
16:10:36.0)).collect
java.lang.RuntimeException: [1.55] failure: ``)'' expected but 16 found

SELECT * FROM Logs as l where l.timestamp=(2012-10-08 16:10:36.0)
  ^
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60)
at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:73)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:260)
at $iwC$$iwC$$iwC$$iwC$$iwC.init(console:21)
at $iwC$$iwC$$iwC$$iwC.init(console:26)
at $iwC$$iwC$$iwC.init(console:28)
at $iwC$$iwC.init(console:30)
at $iwC.init(console:32)
at init(console:34)
at .init(console:38)
at .clinit(console)
at .init(console:7)
at .clinit(console)
at $print(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:789)
at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1062)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:615)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:646)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:610)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:814)
at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:859)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:771)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:616)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:624)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:629)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:954)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:902)
at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:997)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

2)
sqlContext.sql(SELECT * FROM Logs as l where
l.timestamp=+date2timestamp(formatTime3.parse(2012-10-08
16:10:36.0))).collect
java.lang.RuntimeException: [1.54] failure: ``UNION'' expected but 16 found

SELECT * FROM Logs as l where l.timestamp=2012-10-08 16:10:36.0
 ^
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60)
at