Re: Is Spark in Java a bad idea?

Kevin Markey Tue, 28 Oct 2014 12:34:58 -0700

Don't be too concerned about the Scala hoop. Before making the commitment to Scala, I had coded up a modest analytic prototype in Hadoop mapreduce. Once making the commitment, it took 10 days to (1) learn enough Scala, and (2) re-write the prototype in Spark in Scala. In so doing, the execution time for this prototype was cut in 1/8 and the lines of code for identical functionality was about 1/10.

A few things helped me...

- Martin Odersky's "Programming in Scala". No need to read the whole thing, but use it as a reference and together with the course.
- His "Functional Programming Principles in Scala" on Coursera. It's not necessary that you enroll in a concurrent course. "Enroll" in a past course and watch the videos and do a few exercises. https://class.coursera.org/progfun-003
- The cheat-cheats on the Scala website. http://docs.scala-lang.org/cheatsheets/?_ga=1.267044046.1769090313.1387491444
- Example code in Spark. Plenty of it to go around.

Once you have experienced the glories of Scala, there's no turning back. It is a computer science cornucopia!

Kevin

On 10/28/2014 01:15 PM, Ron Ayoub wrote:

I interpret this to mean you have to learn Scala in order to work with Spark in Scala (goes without saying) and also to work with Spark in Java (since you have to jump through some hoops for basic functionality).

The best path here is to take this as a learning opportunity and sit down and learn Scala.

Regarding RDD being an internal API, it has two methods that clearly allow you to override them which the JdbcRDD does and it looks close to trivial - if I only new Scala. Once I learn Scala, I would say the first thing I plan on doing is writing my own OracleRDD with my own flavor of Jdbc code. Why would this not be advisable?

> Subject: Re: Is Spark in Java a bad idea?
> From: matei.zaha...@gmail.com
> Date: Tue, 28 Oct 2014 11:56:39 -0700
> CC: u...@spark.incubator.apache.org
> To: isasmani....@gmail.com
>
> A pretty large fraction of users use Java, but a few features are still not available in it. JdbcRDD is one of them -- this functionality will likely be superseded by Spark SQL when we add JDBC as a data source. In the meantime, to use it, I'd recommend writing a class in Scala that has Java-friendly methods and getting an RDD to it from that. Basically the two parameters that weren't friendly there were the ClassTag and the getConnection and mapRow functions.
>
> Subclassing RDD in Java is also not really supported, because that's an internal API. We don't expect users to be defining their own RDDs.
>
> Matei
>
> > On Oct 28, 2014, at 11:47 AM, critikaled <isasmani....@gmail.com> wrote:
> >
> > Hi Ron,
> > what ever api you have in scala you can possibly use it form java. scala is
> > inter-operable with java and vice versa. scala being both object oriented
> > and functional will make your job easier on jvm and it is more consise than
> > java. Take it as an opportunity and start learning scala ;).
> >
> >
> >
> > --
> > View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-in-Java-a-bad-idea-tp17534p17538.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Is Spark in Java a bad idea?

Reply via email to