[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

felixcheung Fri, 25 Dec 2015 20:12:06 -0800

GitHub user felixcheung opened a pull request:

    https://github.com/apache/spark/pull/10480


    [SPARK-12224][SPARKR] R support for JDBC source

    Add R API for `read.jdbc`, `write.jdbc`.
    
    Tested this quite a bit manually with different combinations of parameters. 
It's not clear if we could have automated tests in R for this - Scala 
`JDBCSuite` depends on Java H2 in-memory database.
    
    Refactored some code into util so they could be tested.
    
    Core's R SerDe code needs to be updated to allow access to 
java.util.Properties as `jobj` handle which is required by 
DataFrameReader/Writer's `jdbc` method. It would be more code to add a 
`sql/r/SQLUtils` helper function.
    
    Tested:
    ```
    # with postgresql
    ../bin/sparkR --driver-class-path 
/usr/share/java/postgresql-9.4.1207.jre7.jar
    
    # read.jdbc
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", 
user = "user", password = "12345")
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", 
user = "user", password = 12345)
    
    # partitionColumn and numPartitions test
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", 
partitionColumn = "did", lowerBound = 0, upperBound = 200, numPartitions = 4, 
user = "user", password = 12345)
    a <- SparkR:::toRDD(df)
    SparkR:::getNumPartitions(a)
    [1] 4
    SparkR:::collectPartition(a, 2L)
    
    # defaultParallelism test
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", 
partitionColumn = "did", lowerBound = 0, upperBound = 200, user = "user", 
password = 12345)
    SparkR:::getNumPartitions(a)
    [1] 2
    
    # predicates test
    df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2", 
predicates = list("did<=105"), user = "user", password = 12345)
    count(df) == 1
    
    # write.jdbc, default save mode "error"
    irisDf <- as.DataFrame(sqlContext, iris)
    write.jdbc(irisDf, "jdbc:postgresql://localhost/db", "films2", user = 
"user", password = "12345")
    "error, already exists"
    
    write.jdbc(irisDf, "jdbc:postgresql://localhost/db", "iris", user = "user", 
password = "12345")
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/felixcheung/spark rreadjdbc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10480.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10480
    
----
commit b0f28523f17d7733d4e1fe2b7e040db127b7188b
Author: felixcheung <[email protected]>
Date:   2015-12-24T03:51:10Z

    read.jdbc support

commit c3e7bec0de7658c954539b421168e966bea17843
Author: felixcheung <[email protected]>
Date:   2015-12-24T03:52:56Z

    update comment

commit 5a0f6d2a3be14931126ac420b31f94a276eb5c02
Author: felixcheung <[email protected]>
Date:   2015-12-24T04:38:14Z

    write.jdbc, doc update

commit 5b15f38d151b5c90bc9e41ef02ad25712eec31f4
Author: felixcheung <[email protected]>
Date:   2015-12-24T04:48:45Z

    more doc update

commit 98ec05e8b9ec79ea8ea424dfbfc22f15ab0f7429
Author: felixcheung <[email protected]>
Date:   2015-12-24T16:02:42Z

    update doc

commit 12b36130fe1bd6c526c6e7ba89deb77b0cfb0ee3
Author: felixcheung <[email protected]>
Date:   2015-12-24T16:12:44Z

    code fix

commit 3f90db6686daea182161de3b4c668b6901be89c9
Author: felixcheung <[email protected]>
Date:   2015-12-26T03:50:51Z

    fix serialization of java.util.Properties, add tests for util functions, 
add generic, fix bugs

commit de635b18147e42bb931e8d81e51522330873498e
Author: felixcheung <[email protected]>
Date:   2015-12-26T03:53:22Z

    update doc

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...

Reply via email to