Dell - Internal Use - Confidential Very helpful. Thanks, Hussam
From: Josh Rosen [mailto:[email protected]] Sent: Wednesday, November 06, 2013 11:44 AM To: [email protected] Subject: Re: JdbcRDD usage JavaSparkContext is just a thin wrapper over SparkContext that exposes Java-friendly methods. You can access the underlying SparkContext instance by calling .sc() on your JavaSparkContext. You may have to do a bit of extra work to instantiate JdbcRDD from Java, such as explicitly passing a ClassManifest for your mapRow function. The Java API internals guide describes some of the steps involved in this: https://cwiki.apache.org/confluence/display/SPARK/Java+API+Internals After constructing the JdbcRDD[T], you can wrap it into a JavaRDD[T] by calling new JavaRDD(myJdbcRDD, itsClassManifest). Ideally, we'd have a Java-friendly API for this, but in the meantime it's still possible to use it from Java with a few of these extra steps. On Wed, Nov 6, 2013 at 11:29 AM, <[email protected]<mailto:[email protected]>> wrote: Dell - Internal Use - Confidential Cool. Since I am working on java base code, to use JdbcRDD I need to first create SparkContext sc then initialize JavaSparkConext(sc). Any code that would allow me to create SparkConext from JavaSparkConext ? Any sample java code that I can use to create scala Seq<String> from String[], cause I need to create SparkConext passing my app jars as Seq<String>? Thanks, Hussam From: Reynold Xin [mailto:[email protected]<mailto:[email protected]>] Sent: Wednesday, November 06, 2013 12:13 AM To: [email protected]<mailto:[email protected]> Subject: Re: JdbcRDD usage The RDD actually takes care of closing the jdbc connection at the end of the iterator. See the code here: https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala#L107 The explicit close you saw in the JDBCSuite is to close the test program's own connection for the insert statement (not for the JDBCRDD). On Tue, Nov 5, 2013 at 3:13 PM, <[email protected]<mailto:[email protected]>> wrote: Hi, I need to access JDBC from my java spark code, and thinking to use JdbcRDD as noted in http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/rdd/JdbcRDD.html I have this questions: When RDD decide to close the connection? ... getConnection a function that returns an open Connection. The RDD takes care of closing the connection. Any setting that I can tell spark to keep JdbcRDD connections open for next query, instead of opening a new one for the same JDBC source? Also per checking https://github.com/apache/incubator-spark/blob/branch-0.8/core/src/test/scala/org/apache/spark/rdd/JdbcRDDSuite.scala I am seeing it's invoking explicit close for the connection in the after { }. If RDD take care of closing the connection then why we have to explicit invoke DriverManager.getConnection("jdbc:derby:;shutdown=true") Thanks, Hussam
