GitHub user felixcheung opened a pull request:
https://github.com/apache/spark/pull/10480
[SPARK-12224][SPARKR] R support for JDBC source
Add R API for `read.jdbc`, `write.jdbc`.
Tested this quite a bit manually with different combinations of parameters.
It's not clear if we could have automated tests in R for this - Scala
`JDBCSuite` depends on Java H2 in-memory database.
Refactored some code into util so they could be tested.
Core's R SerDe code needs to be updated to allow access to
java.util.Properties as `jobj` handle which is required by
DataFrameReader/Writer's `jdbc` method. It would be more code to add a
`sql/r/SQLUtils` helper function.
Tested:
```
# with postgresql
../bin/sparkR --driver-class-path
/usr/share/java/postgresql-9.4.1207.jre7.jar
# read.jdbc
df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2",
user = "user", password = "12345")
df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2",
user = "user", password = 12345)
# partitionColumn and numPartitions test
df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2",
partitionColumn = "did", lowerBound = 0, upperBound = 200, numPartitions = 4,
user = "user", password = 12345)
a <- SparkR:::toRDD(df)
SparkR:::getNumPartitions(a)
[1] 4
SparkR:::collectPartition(a, 2L)
# defaultParallelism test
df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2",
partitionColumn = "did", lowerBound = 0, upperBound = 200, user = "user",
password = 12345)
SparkR:::getNumPartitions(a)
[1] 2
# predicates test
df <- read.jdbc(sqlContext, "jdbc:postgresql://localhost/db", "films2",
predicates = list("did<=105"), user = "user", password = 12345)
count(df) == 1
# write.jdbc, default save mode "error"
irisDf <- as.DataFrame(sqlContext, iris)
write.jdbc(irisDf, "jdbc:postgresql://localhost/db", "films2", user =
"user", password = "12345")
"error, already exists"
write.jdbc(irisDf, "jdbc:postgresql://localhost/db", "iris", user = "user",
password = "12345")
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/felixcheung/spark rreadjdbc
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10480.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10480
----
commit b0f28523f17d7733d4e1fe2b7e040db127b7188b
Author: felixcheung <[email protected]>
Date: 2015-12-24T03:51:10Z
read.jdbc support
commit c3e7bec0de7658c954539b421168e966bea17843
Author: felixcheung <[email protected]>
Date: 2015-12-24T03:52:56Z
update comment
commit 5a0f6d2a3be14931126ac420b31f94a276eb5c02
Author: felixcheung <[email protected]>
Date: 2015-12-24T04:38:14Z
write.jdbc, doc update
commit 5b15f38d151b5c90bc9e41ef02ad25712eec31f4
Author: felixcheung <[email protected]>
Date: 2015-12-24T04:48:45Z
more doc update
commit 98ec05e8b9ec79ea8ea424dfbfc22f15ab0f7429
Author: felixcheung <[email protected]>
Date: 2015-12-24T16:02:42Z
update doc
commit 12b36130fe1bd6c526c6e7ba89deb77b0cfb0ee3
Author: felixcheung <[email protected]>
Date: 2015-12-24T16:12:44Z
code fix
commit 3f90db6686daea182161de3b4c668b6901be89c9
Author: felixcheung <[email protected]>
Date: 2015-12-26T03:50:51Z
fix serialization of java.util.Properties, add tests for util functions,
add generic, fix bugs
commit de635b18147e42bb931e8d81e51522330873498e
Author: felixcheung <[email protected]>
Date: 2015-12-26T03:53:22Z
update doc
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]