RE: JdbcRDD usage

Hussam_Jarada Wed, 06 Nov 2013 12:18:30 -0800

Dell - Internal Use - Confidential
Very helpful.

Thanks,
Hussam

From: Josh Rosen [mailto:[email protected]]
Sent: Wednesday, November 06, 2013 11:44 AM
To: [email protected]
Subject: Re: JdbcRDD usage

JavaSparkContext is just a thin wrapper over SparkContext that exposes 
Java-friendly methods.  You can access the underlying SparkContext instance by 
calling .sc() on your JavaSparkContext.

You may have to do a bit of extra work to instantiate JdbcRDD from Java, such 
as explicitly passing a ClassManifest for your mapRow function.  The Java API 
internals guide describes some of the steps involved in this:  
https://cwiki.apache.org/confluence/display/SPARK/Java+API+Internals

After constructing the JdbcRDD[T], you can wrap it into a JavaRDD[T] by calling 
new JavaRDD(myJdbcRDD, itsClassManifest).

Ideally, we'd have a Java-friendly API for this, but in the meantime it's still 
possible to use it from Java with a few of these extra steps.

On Wed, Nov 6, 2013 at 11:29 AM, 
<[email protected]<mailto:[email protected]>> wrote:

Dell - Internal Use - Confidential
Cool.

Since I am working on java base code, to use JdbcRDD I need to first create 
SparkContext sc then initialize JavaSparkConext(sc).

Any code that would allow me to create SparkConext from JavaSparkConext ?

Any sample java code that I can use to create scala Seq<String> from String[], 
cause I need to create SparkConext passing my app jars as Seq<String>?

Thanks,
Hussam

From: Reynold Xin [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, November 06, 2013 12:13 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: JdbcRDD usage

The RDD actually takes care of closing the jdbc connection at the end of the 
iterator. See the code here: 
https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala#L107

The explicit close you saw in the JDBCSuite is to close the test program's own 
connection for the insert statement (not for the JDBCRDD).

On Tue, Nov 5, 2013 at 3:13 PM, 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I need to access JDBC from my java spark code, and thinking to use JdbcRDD as 
noted in 
http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/rdd/JdbcRDD.html

I have this questions:
When RDD decide to close the connection?

... getConnection
a function that returns an open Connection. The RDD takes care of closing the 
connection.

Any setting that I can tell spark to keep JdbcRDD connections open for next 
query, instead of opening a new one for the same JDBC source?

Also per checking
https://github.com/apache/incubator-spark/blob/branch-0.8/core/src/test/scala/org/apache/spark/rdd/JdbcRDDSuite.scala

I am seeing it's invoking explicit close for the connection in the after { }.   
If RDD take care of closing the connection then why we have to explicit invoke  
     DriverManager.getConnection("jdbc:derby:;shutdown=true")

Thanks,
Hussam

RE: JdbcRDD usage

Reply via email to