The overridable methods of RDD are marked as @DeveloperApi, which means that these are internal APIs used by people that might want to extend Spark, but are not guaranteed to remain stable across Spark versions (unlike Spark's public APIs).
BTW, if you want a way to do this that does not involve JdbcRDD or internal APIs, you can use SoarkContext.paralellize followed by mapPartitions to read a subset of the data in each of your tasks. That can be done purely in Java. You'd probably parallelize a collection that contains ranges of the table that you want to scan, then open a connection to the DB in each task (in mapPartitions) and read the records from that range. Matei > On Oct 28, 2014, at 12:15 PM, Ron Ayoub <ronalday...@live.com> wrote: > > I interpret this to mean you have to learn Scala in order to work with Spark > in Scala (goes without saying) and also to work with Spark in Java (since you > have to jump through some hoops for basic functionality). > > The best path here is to take this as a learning opportunity and sit down and > learn Scala. > > Regarding RDD being an internal API, it has two methods that clearly allow > you to override them which the JdbcRDD does and it looks close to trivial - > if I only new Scala. Once I learn Scala, I would say the first thing I plan > on doing is writing my own OracleRDD with my own flavor of Jdbc code. Why > would this not be advisable? > > > > Subject: Re: Is Spark in Java a bad idea? > > From: matei.zaha...@gmail.com <mailto:matei.zaha...@gmail.com> > > Date: Tue, 28 Oct 2014 11:56:39 -0700 > > CC: u...@spark.incubator.apache.org <mailto:u...@spark.incubator.apache.org> > > To: isasmani....@gmail.com <mailto:isasmani....@gmail.com> > > > > A pretty large fraction of users use Java, but a few features are still not > > available in it. JdbcRDD is one of them -- this functionality will likely > > be superseded by Spark SQL when we add JDBC as a data source. In the > > meantime, to use it, I'd recommend writing a class in Scala that has > > Java-friendly methods and getting an RDD to it from that. Basically the two > > parameters that weren't friendly there were the ClassTag and the > > getConnection and mapRow functions. > > > > Subclassing RDD in Java is also not really supported, because that's an > > internal API. We don't expect users to be defining their own RDDs. > > > > Matei > > > > > On Oct 28, 2014, at 11:47 AM, critikaled <isasmani....@gmail.com > > > <mailto:isasmani....@gmail.com>> wrote: > > > > > > Hi Ron, > > > what ever api you have in scala you can possibly use it form java. scala > > > is > > > inter-operable with java and vice versa. scala being both object oriented > > > and functional will make your job easier on jvm and it is more consise > > > than > > > java. Take it as an opportunity and start learning scala ;). > > > > > > > > > > > > -- > > > View this message in context: > > > http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-in-Java-a-bad-idea-tp17534p17538.html > > > Sent from the Apache Spark User List mailing list archive at Nabble.com > > > <http://nabble.com/>. > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > > For additional commands, e-mail: user-h...@spark.apache.org > > > <mailto:user-h...@spark.apache.org> > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > <mailto:user-unsubscr...@spark.apache.org> > > For additional commands, e-mail: user-h...@spark.apache.org > > <mailto:user-h...@spark.apache.org> > >