RE: How to use Joda Time with Spark SQL?

Wang, Daoyuan Sun, 12 Apr 2015 09:22:12 -0700

Actually, I did a little investigation on joda time when I was working on 
SPARK-4987 for Timestamp ser-de in parquet format. I think Joda offers 
interface to get java object from joda time object natively.


For example, to transform a java.util.Date (parent of java.sql.Date and 
java.sql.Timestamp) object named jd, in jave code you can use
DateTime dt = new DateTime(jd);
Or in scala code
val dt: DateTime = new DateTime(jd)

On the other hand, giving a DateTime object named dt, you can use code like
val jd: java.sql.Timestamp = new Timestamp(dt.getMillis)
to get the java object.

Thanks,
Daoyuan.

From: Cheng Lian [mailto:lian.cs....@gmail.com]
Sent: Sunday, April 12, 2015 11:51 PM
To: Justin Yip
Cc: adamgerst; user@spark.apache.org
Subject: Re: How to use Joda Time with Spark SQL?

These common UDTs can always be wrapped in libraries and published to 
spark-packages http://spark-packages.org/ :-)

Cheng
On 4/12/15 3:00 PM, Justin Yip wrote:
Cheng, this is great info. I have a follow up question. There are a few very 
common data types (i.e. Joda DateTime) that is not directly supported by 
SparkSQL. Do you know if there are any plans for accommodating some common data 
types in SparkSQL? They don't need to be a first class datatype, but if they 
are available as UDT and provided by the SparkSQL library, that will make 
DataFrame users' life easier.

Justin

On Sat, Apr 11, 2015 at 5:41 AM, Cheng Lian 
<lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>> wrote:
One possible approach can be defining a UDT (user-defined type) for Joda time. 
A UDT maps an arbitrary type to and from Spark SQL data types. You may check 
the ExamplePointUDT [1] for more details.

[1]: 
https://github.com/apache/spark/blob/694aef0d71d2683eaf63cbd1d8e95c2da423b72e/sql/core/src/main/scala/org/apache/spark/sql/test/ExamplePointUDT.scala


On 4/8/15 6:09 AM, adamgerst wrote:
I've been using Joda Time in all my spark jobs (by using the nscala-time
package) and have not run into any issues until I started trying to use
spark sql.  When I try to convert a case class that has a
com.github.nscala_time.time.Imports.DateTime object in it, an exception is
thrown for with a MatchError

My assumption is that this is because the basic types of spark sql are
java.sql.Timestamp and java.sql.Date and therefor spark doesn't know what to
do about the DateTime value.

How can I get around this? I would prefer not to have to change my code to
make the values be Timestamps but I'm concerned that might be the only way.
Would something like implicit conversions work here?

It seems that even if I specify the schema manually then I would still have
the issue since you have to specify the column type which has to be of type
org.apache.spark.sql.types.DataType



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-Joda-Time-with-Spark-SQL-tp22415.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>



---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

RE: How to use Joda Time with Spark SQL?

Reply via email to