Re: JdbcRDD usage

Josh Rosen Wed, 06 Nov 2013 11:45:20 -0800

JavaSparkContext is just a thin wrapper over SparkContext that exposes
Java-friendly methods.  You can access the underlying SparkContext instance
by calling .sc() on your JavaSparkContext.


You may have to do a bit of extra work to instantiate JdbcRDD from Java,
such as explicitly passing a ClassManifest for your mapRow function.  The
Java API internals guide describes some of the steps involved in this:
https://cwiki.apache.org/confluence/display/SPARK/Java+API+Internals

After constructing the JdbcRDD[T], you can wrap it into a JavaRDD[T] by
calling new JavaRDD(myJdbcRDD, itsClassManifest).

Ideally, we'd have a Java-friendly API for this, but in the meantime it's
still possible to use it from Java with a few of these extra steps.


On Wed, Nov 6, 2013 at 11:29 AM, <[email protected]> wrote:

> *Dell - Internal Use - Confidential *
>
> *Cool.*
>
>
>
> *Since I am working on java base code, to use JdbcRDD I need to first
> create SparkContext sc then initialize JavaSparkConext(sc).*
>
>
>
> *Any code that would allow me to create SparkConext from JavaSparkConext ?*
>
>
>
> *Any sample java code that I can use to create scala Seq<String> from
> String[], cause I need to create SparkConext passing my app jars as
> Seq<String>?*
>
>
>
> *Thanks,*
>
> *Hussam*
>
>
>
> *From:* Reynold Xin [mailto:[email protected]]
> *Sent:* Wednesday, November 06, 2013 12:13 AM
> *To:* [email protected]
> *Subject:* Re: JdbcRDD usage
>
>
>
> The RDD actually takes care of closing the jdbc connection at the end of
> the iterator. See the code here:
> https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala#L107
>
>
>
> The explicit close you saw in the JDBCSuite is to close the test program's
> own connection for the insert statement (not for the JDBCRDD).
>
>
>
> On Tue, Nov 5, 2013 at 3:13 PM, <[email protected]> wrote:
>
> Hi,
>
>
>
> I need to access JDBC from my java spark code, and thinking to use JdbcRDD
> as noted in
> http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/rdd/JdbcRDD.html
>
>
>
> I have this questions:
>
> When RDD decide to close the connection?
>
>
>
> *… getConnection*
>
> a function that returns an open Connection. The RDD takes care of closing
> the connection.
>
>
>
> Any setting that I can tell spark to keep JdbcRDD connections open for
> next query, instead of opening a new one for the same JDBC source?
>
>
>
> Also per checking
>
>
> https://github.com/apache/incubator-spark/blob/branch-0.8/core/src/test/scala/org/apache/spark/rdd/JdbcRDDSuite.scala
>
>
>
> I am seeing it’s invoking explicit close for the connection in the after { }. 
>   If RDD take care of closing the connection then why we have to explicit 
> invoke       *DriverManager.*getConnection*(*"jdbc:derby:;shutdown=true"*)*
>
>
>
> Thanks,
>
> Hussam
>
>
>
>
>
>
>
>
>

Re: JdbcRDD usage

Reply via email to