[GitHub] spark pull request #19567: [SPARK-22291] Postgresql UUID[] to Cassandra: Con...

jmchung Tue, 24 Oct 2017 11:58:14 -0700

GitHub user jmchung opened a pull request:

    https://github.com/apache/spark/pull/19567


    [SPARK-22291] Postgresql UUID[] to Cassandra: Conversion Error

    ## What changes were proposed in this pull request?
    
    This PR fixes the conversion error when reads data from a PostgreSQL table 
that contains columns of `UUID[]` data type. 
    
    For example, create a table with the UUID[] data type, and insert the test 
data.
    ```SQL
    CREATE TABLE users
    (
        id smallint NOT NULL,
        name character varying(50),
        user_ids uuid[],
        PRIMARY KEY (id)
    )
    
    INSERT INTO users ("id", "name","user_ids") 
    VALUES (1, 'foo', ARRAY
        ['7be8aaf8-650e-4dbb-8186-0a749840ecf2'
        ,'205f9bfc-018c-4452-a605-609c0cfad228']::UUID[]
    )
    ```
    Then it will throw the following exceptions when trying to load the data.
    ```
    java.lang.ClassCastException: [Ljava.util.UUID; cannot be cast to 
[Ljava.lang.String;
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$14.apply(JdbcUtils.scala:459)
        at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$14.apply(JdbcUtils.scala:458)
    ...
    ```
    
    
    ## How was this patch tested?
    
    Existing tests.
    
    I try to imitate the tests with above case in `JDBCSuite`, but the `ARRAY` 
is unsupported type now. Therefore I took the above example in my Postgres and 
verified by the following code.
    
    ```scala
    val opts = Map(
          "url" -> 
"jdbc:postgresql://localhost:5432/postgres?user=postgres&password=postgres",
          "dbtable" -> "users")
    val df = spark.read.format("jdbc").options(opts).load()
    df.show(truncate = false)
    
    
+---+----+----------------------------------------------------------------------------+
    |id |name|user_ids                                                          
          |
    
+---+----+----------------------------------------------------------------------------+
    |1  |foo |[7be8aaf8-650e-4dbb-8186-0a749840ecf2, 
205f9bfc-018c-4452-a605-609c0cfad228]|
    
+---+----+----------------------------------------------------------------------------+
    
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jmchung/spark SPARK-22291

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19567.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19567
    
----
commit d84b1bb89e3be33931531345fb23cadd8fe6868f
Author: Jen-Ming Chung <[email protected]>
Date:   2017-10-24T18:24:43Z

    [SPARK-22291] Postgresql UUID[] to Cassandra: Conversion Error

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19567: [SPARK-22291] Postgresql UUID[] to Cassandra: Con...

Reply via email to