Older versions of Spark had indeed a lower performance on Python and R due to a
conversion need between JVM datatypes and python/r datatypes. This changed in
Spark 2.2, I think, with the integration of Apache Arrow. However, what you do
after the conversion in those languages can be still slowe
how about Python.
java vs scala vs python vs R
which is better.
On Sat, Oct 27, 2018 at 3:34 AM karan alang wrote:
> Hello
> - is there a "performance" difference when using Java or Scala for Apache
> Spark ?
>
> I understand, there are other obvious differences (less code with scala,
> easier t
I genuinely do not think that Scala for Spark needs us to be super in
Scala. There is infact a tutorial called as "Just enough Scala for Spark"
which even with my IQ does not take more than 40 mins to go through. Also
the sytax of Scala is almost always similar to that of Python.
Data processing i
Most people when they compare two different programming languages 99% of
the time it all seems to boil down to syntax sugar.
Performance I doubt Scala is ever faster than Java given that Scala likes
Heap more than Java. I had also written some pointless micro-benchmarking
code like (Random String
did not see anything, but curious if you find something.
I think one of the big benefit of using Java, for data engineering in the
context of Spark, is that you do not have to train a lot of your team to
Scala. Now if you want to do data science, Java is probably not the best tool
yet...
> On
On Oct 27, 2018 3:34 AM, "karan alang" wrote:
Hello
- is there a "performance" difference when using Java or Scala for Apache
Spark ?
I understand, there are other obvious differences (less code with scala,
easier to focus on logic etc),
but wrt performance - i think there would not be much of a
Hello
- is there a "performance" difference when using Java or Scala for Apache
Spark ?
I understand, there are other obvious differences (less code with scala,
easier to focus on logic etc),
but wrt performance - i think there would not be much of a difference since
both of them are JVM based,
pl