Re: How to use Joda Time with Spark SQL?

Cheng Lian Sun, 12 Apr 2015 08:51:54 -0700

These common UDTs can always be wrapped in libraries and published tospark-packages http://spark-packages.org/ :-)


Cheng


On 4/12/15 3:00 PM, Justin Yip wrote:

Cheng, this is great info. I have a follow up question. There are afew very common data types (i.e. Joda DateTime) that is not directlysupported by SparkSQL. Do you know if there are any plans foraccommodating some common data types in SparkSQL? They don't need tobe a first class datatype, but if they are available as UDT andprovided by the SparkSQL library, that will make DataFrame users' lifeeasier.


Justin

On Sat, Apr 11, 2015 at 5:41 AM, Cheng Lian <lian.cs....@gmail.com<mailto:lian.cs....@gmail.com>> wrote:


    One possible approach can be defining a UDT (user-defined type)
    for Joda time. A UDT maps an arbitrary type to and from Spark SQL
    data types. You may check the ExamplePointUDT [1] for more details.

    [1]:
    
https://github.com/apache/spark/blob/694aef0d71d2683eaf63cbd1d8e95c2da423b72e/sql/core/src/main/scala/org/apache/spark/sql/test/ExamplePointUDT.scala



    On 4/8/15 6:09 AM, adamgerst wrote:

        I've been using Joda Time in all my spark jobs (by using the
        nscala-time
        package) and have not run into any issues until I started
        trying to use
        spark sql.  When I try to convert a case class that has a
        com.github.nscala_time.time.Imports.DateTime object in it, an
        exception is
        thrown for with a MatchError

        My assumption is that this is because the basic types of spark
        sql are
        java.sql.Timestamp and java.sql.Date and therefor spark
        doesn't know what to
        do about the DateTime value.

        How can I get around this? I would prefer not to have to
        change my code to
        make the values be Timestamps but I'm concerned that might be
        the only way.
        Would something like implicit conversions work here?

        It seems that even if I specify the schema manually then I
        would still have
        the issue since you have to specify the column type which has
        to be of type
        org.apache.spark.sql.types.DataType



        --
        View this message in context:
        
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-Joda-Time-with-Spark-SQL-tp22415.html
        Sent from the Apache Spark User List mailing list archive at
        Nabble.com.

        ---------------------------------------------------------------------
        To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
        <mailto:user-unsubscr...@spark.apache.org>
        For additional commands, e-mail: user-h...@spark.apache.org
        <mailto:user-h...@spark.apache.org>




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
    <mailto:user-unsubscr...@spark.apache.org>
    For additional commands, e-mail: user-h...@spark.apache.org
    <mailto:user-h...@spark.apache.org>

Re: How to use Joda Time with Spark SQL?

Reply via email to