Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Sofia’s World
Hey My 2 cents on CI/Cd for pyspark. You can leverage pytests + holden karau's spark testing libs for CI thus giving you `almost` same functionality as Scala - I say almost as in Scala you have nice and descriptive funcspecs - For me choice is based on expertise.having worked with teams which ar

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Mich Talebzadeh
. Some functionalities are not available in Python. I have seen this few times in Spark doc. There is an interesting write-up on this, although it does on touch on CI/CD aspects. Developing Apache Spark Applications: Scala vs. Python <https://www.pluralsight.com/blog/software-developm

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread William R
It's really a very big discussion around Pyspark Vs Scala. I have little bit experience about how we can automate the CI/CD when it's a JVM based language. I would like to take this as an opportunity to understand the end-to-end CI/CD flow for Pyspark based ETL pipelines. Could someone please list

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Wim Van Leuven
I think Sean is right, but in your argumentation you mention that 'functionality is sacrificed in favour of the availability of resources'. That's where I disagree with you but agree with Sean. That is mostly not true. In your previous posts you also mentioned this . The only reason we sometimes h

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Mich Talebzadeh
Thanks for the feedback Sean. Kind regards, Mich LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * *Disclaimer:* Use it at your own risk. Any and all responsibili

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Sean Owen
I don't find this trolling; I agree with the observation that 'the skills you have' are a valid and important determiner of what tools you pick. I disagree that you just have to pick the optimal tool for everything. Sounds good until that comes in contact with the real world. For Spark, Python vs S

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Gourav Sengupta
Hi Mich, this is turning into a troll now, can you please stop this? No one uses Scala where Python should be used, and no one uses Python where Scala should be used - it all depends on requirements. Everyone understands polyglot programming and how to use relevant technologies best to their adva

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Mich Talebzadeh
Today I had a discussion with a lead developer on a client site regarding Scala or PySpark. with Spark. They were not doing data science and reluctantly agreed that PySpark was used for ETL. In mitigation he mentioned that in his team he is the only one that is an expert on Scala (his words) and

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Magnus Nilsson
Holy war is a bit dramatic don't you think? 🙂 The difference between Scala and Python will always be very relevant when choosing between Spark and Pyspark. I wouldn't call it irrelevant to the original question. br, molotch On Sat, 17 Oct 2020 at 16:57, "Yuri Oleynikov (‫יורי אולייניקוב‬‎)" < yu

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Magnus Nilsson
I'm sorry you were offended. I'm not an expert in Python and I wasn't trying to attack you personally. It's just an opinion about what makes a language better or worse, it's not the single source of truth. You don't have to take offense. In the end its about context and what you're trying to achiev

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Holden Karau
Scala and Python have their advantages and disadvantages with Spark. In my experience with performance is super important you’ll end up needing to do some of your work in the JVM, but in many situations what matters work is what your team and company are familiar with and the ecosystem of tooling

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Yuri Oleynikov (‫יורי אולייניקוב‬‎)
It seems that thread converted to holy war that has nothing to do with original question. If it is, it’s super disappointing Отправлено с iPhone > 17 окт. 2020 г., в 15:53, Molotch написал(а): > > I would say the pros and cons of Python vs Scala is both down to Spark, the > languages in thems

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Sasha Kacanski
And you are an expert on python! Idiomatic... Please do everyone a favor and stop commenting on things you have no idea... I build ETL systems python that wiped java commercial stacks left and right. Pyspark was and is and will be a second class citizen in spark world. That has nothing to do with

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Molotch
I would say the pros and cons of Python vs Scala is both down to Spark, the languages in themselves and what kind of data engineer you will get when you try to hire for the different solutions. With Pyspark you get less functionality and increased complexity with the py4j java interop compared to

Re: Scala vs Python for ETL with Spark

2020-10-15 Thread Mich Talebzadeh
Hi, I spent a few days converting one of my Spark/Scala scripts to Python. It was interesting but at times looked like trench war. There is a lot of handy stuff in Scala like case classes for defining column headers etc that don't seem to be available in Python (possibly my lack of in-depth Python

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
Hi, With regard to your statement below ".technology choices are agnostic to use cases according to you" If I may say, I do not think that was the message implied. What was said was that in addition to "best technology fit" there are other factors "equally important" that need to be consider

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Gourav Sengupta
So Mich and rest, technology choices are agnostic to use cases according to you? This is interesting, really interesting. Perhaps I stand corrected. Regards, Gourav On Sun, Oct 11, 2020 at 5:00 PM Mich Talebzadeh wrote: > if we take Spark and its massive parallel processing and in-memory > cac

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
if we take Spark and its massive parallel processing and in-memory cache away, then one can argue anything can do the "ETL" job. just write some Java/Scala/SQL/Perl/python to read data and write to from one DB to another often using JDBC connections. However, we all concur that may not be good enou

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread ayan guha
But when you have fairly large volume of data that is where spark comes in the party. And I assume the requirement of using spark is already established in the original qs and the discussion is to use python vs scala/java. On Sun, 11 Oct 2020 at 10:51 pm, Sasha Kacanski wrote: > If org has folks

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
Thanks Ayan. I am not qualified to answer your first point. However, my experience with Spark with Scala or Spark with Python agrees with your assertion that use cases do not come into it. Most DEV/OPS work dealing with ETL are provided by service companies that have workforce very familiar with J

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread ayan guha
I have one observation: is "python udf is slow due to deserialization penulty" still relevant? Even after arrow is used as in memory data mgmt and so heavy investment from spark dev community on making pandas first class citizen including Udfs. As I work with multiple clients, my exp is org cultur

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Gourav Sengupta
Not quite sure how meaningful this discussion is, but in case someone is really faced with this query the question still is 'what is the use case'? I am just a bit confused with the one size fits all deterministic approach here thought that those days were over almost 10 years ago. Regards Gourav

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Stephen Boesch
I agree with Wim's assessment of data engineering / ETL vs Data Science. I wrote pipelines/frameworks for large companies and scala was a much better choice. But for ad-hoc work interfacing directly with data science experiments pyspark presents less friction. On Sat, 10 Oct 2020 at 13:03, Mich Ta

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Mich Talebzadeh
Many thanks everyone for their valuable contribution. We all started with Spark a few years ago where Scala was the talk of the town. I agree with the note that as long as Spark stayed nish and elite, then someone with Scala knowledge was attracting premiums. In fairness in 2014-2015, there was no

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Jacek Pliszka
I would not leave it to data scientists unless they will maintain it. The key decision in cases I've seen was usually people cost/availability with ETL operations cost taken into account. Often the situation is that ETL cloud cost is small and you will not save much. Then it is just skills cost/a

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Jörn Franke
It really depends on what your data scientists talk. I don’t think it makes sense for ad hoc data science things to impose a language on them, but let them choose. For more complex AI engineering things you can though apply different standards and criteria. And then it really depends on architec

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Wim Van Leuven
where people mostly do python. So, if you need those two worlds collaborate and even handover code, you don't want the ideological battle of Scala vs Python. We chose python for the sake of everybody speaking the same language. But it is true, if you do Spark DataFrames, because then PySpark is a

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Gourav Sengupta
What is the use case? Unless you have unlimited funding and time to waste you would usually start with that. Regards, Gourav On Fri, Oct 9, 2020 at 10:29 PM Russell Spitzer wrote: > Spark in Scala (or java) Is much more performant if you are using RDD's, > those operations basically force you t

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Russell Spitzer
Spark in Scala (or java) Is much more performant if you are using RDD's, those operations basically force you to pass lambdas, hit serialization between java and python types and yes hit the Global Interpreter Lock. But, none of those things apply to Data Frames which will generate Java code regard

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Mich Talebzadeh
Thanks So ignoring Python lambdas is it a matter of individuals familiarity with the language that is the most important factor? Also I have noticed that Spark document preferences have been switched from Scala to Python as the first example. However, some codes for example JDBC calls are the same

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Russell Spitzer
As long as you don't use python lambdas in your Spark job there should be almost no difference between the Scala and Python dataframe code. Once you introduce python lambdas you will hit some significant serialization penalties as well as have to run actual work code in python. As long as no lambda

Scala vs Python for ETL with Spark

2020-10-09 Thread Mich Talebzadeh
I have come across occasions when the teams use Python with Spark for ETL, for example processing data from S3 buckets into Snowflake with Spark. The only reason I think they are choosing Python as opposed to Scala is because they are more familiar with Python. Since Spark is written in Scala, its

Re: Scala Vs Python

2016-09-06 Thread 刘虓
8:07 AM > *To:* "darren"; > *Cc:* "Mich Talebzadeh"; "Jakob Odersky"< > ja...@odersky.com>; "ayan guha"; "kant kodali"< > kanth...@gmail.com>; "AssafMendelson"; "user"< > user@spark.apache.org>; >

Re: Scala Vs Python

2016-09-06 Thread Leonard Cohen
regards, Leonard -- Original -- From: "Luciano Resende";; Send time: Tuesday, Sep 6, 2016 8:07 AM To: "darren"; Cc: "Mich Talebzadeh"; "Jakob Odersky"; "ayan guha"; "kant kodali"; "AssafMendelson";

Re: Scala Vs Python

2016-09-05 Thread Luciano Resende
On Thu, Sep 1, 2016 at 3:15 PM, darren wrote: > This topic is a concern for us as well. In the data science world no one > uses native scala or java by choice. It's R and Python. And python is > growing. Yet in spark, python is 3rd in line for feature support, if at all. > > This is why we have d

Re: Scala Vs Python

2016-09-05 Thread Gourav Sengupta
different cluster configurations and ran it > several times to get some idea on the noise. > > Of course, the more complicated the UDF, the less the overhead affects you. > > Hope this helps. > > Assaf > > > > > > > > > > > > > >

RE: Scala Vs Python

2016-09-04 Thread AssafMendelson
Assaf From: ayan guha [mailto:guha.a...@gmail.com] Sent: Sunday, September 04, 2016 11:00 AM To: Mendelson, Assaf Cc: user Subject: Re: Scala Vs Python Hi This one is quite interesting. Is it possible to share few toy examples? On Sun, Sep 4, 2016 at 5:23 PM, AssafMendelson mailt

Re: Scala Vs Python

2016-09-04 Thread Simon Edelhaus
> >> >> >> >> *From:* ayan guha [mailto:[hidden email] >> <http:///user/SendEmail.jtp?type=node&node=27650&i=0>] >> *Sent:* Friday, September 02, 2016 12:21 AM >> *To:* kant kodali >> *Cc:* Mendelson, Assaf; user >> *Subject

Re: Scala Vs Python

2016-09-04 Thread ayan guha
scala one and then > wrap it to be accessible from python. > > > > > > *From:* ayan guha [mailto:[hidden email] > <http:///user/SendEmail.jtp?type=node&node=27650&i=0>] > *Sent:* Friday, September 02, 2016 12:21 AM > *To:* kant kodali > *Cc:* Mendelson, Assaf; user >

RE: Scala Vs Python

2016-09-04 Thread AssafMendelson
ask my team (which does the engineering) and we write them a scala one and then wrap it to be accessible from python. From: ayan guha [mailto:guha.a...@gmail.com] Sent: Friday, September 02, 2016 12:21 AM To: kant kodali Cc: Mendelson, Assaf; user Subject: Re: Scala Vs Python Thanks All for

Re: Scala Vs Python

2016-09-02 Thread darren
te: 9/2/16 4:03 AM (GMT-05:00) To: Mich Talebzadeh Cc: Jakob Odersky , ayan guha , Tal Grynbaum , darren , kant kodali , AssafMendelson , user Subject: Re: Scala Vs Python Whatever benefits you may accrue from the rapid prototyping and coding in Python, it will be offset against the tim

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
No offence taken. Glad that it was rectified. Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Dis

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
I apologize for my harsh tone. You are right, it was unnecessary and discourteous. On Fri, Sep 2, 2016 at 11:01 AM Mich Talebzadeh wrote: > Hi, > > You made such statement: > > "That's complete nonsense." > > That is a strong language and void of any courtesy. Only dogmatic > individuals make su

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
Hi, You made such statement: "That's complete nonsense." That is a strong language and void of any courtesy. Only dogmatic individuals make such statements, engaging the keyboard before thinking about it. You are perfectly in your right to agree to differ. However, that does not give you the ri

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
You made a specific claim -- that Spark will move away from Python -- which I responded to with clear references and data. How on earth is that a "religious argument"? I'm not saying that Python is better than Scala or anything like that. I'm just addressing your specific claim about its future in

Re: Scala Vs Python

2016-09-02 Thread andy petrella
looking at the examples, indeed they make nonsense :D On Fri, 2 Sep 2016 16:48 Mich Talebzadeh, wrote: > Right so. We are back into religious arguments. Best of luck > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
Right so. We are back into religious arguments. Best of luck Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.c

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
On Fri, Sep 2, 2016 at 3:58 AM Mich Talebzadeh wrote: > I believe as we progress in time Spark is going to move away from Python. If > you look at 2014 Databricks code examples, they were mostly in Python. Now > they are mostly in Scala for a reason. > That's complete nonsense. First off, you c

Re: Scala Vs Python

2016-09-02 Thread Sivakumaran S
Whatever benefits you may accrue from the rapid prototyping and coding in Python, it will be offset against the time taken to convert it to run inside the JVM. This of course depends on the complexity of the DAG. I guess it is a matter of language preference. Regards, Sivakumaran S > On 02-Se

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
>From an outsider point of view nobody likes change :) However, it appears to me that Scala is a rising star and if one learns it, it is another iron in the fire so to speak. I believe as we progress in time Spark is going to move away from Python. If you look at 2014 Databricks code examples, the

Re: Scala Vs Python

2016-09-02 Thread Jakob Odersky
Forgot to answer your question about feature parity of Python w.r.t. Spark's different components I mostly work with scala so I can't say for sure but I think that all pre-2.0 features (that's basically everything except Structured Streaming) are on par. Structured Streaming is a pretty new feature

Re: Scala Vs Python

2016-09-02 Thread Jakob Odersky
As you point out, often the reason that Python support lags behind is that functionality is implemented in Scala, so the API in that language is "free" whereas Python support needs to be added explicitly. Nevertheless, Python bindings are an important part of Spark and is used by many people (this

RE: Scala Vs Python

2016-09-02 Thread Santoshakhilesh
[mailto:guha.a...@gmail.com] Sent: 02 September 2016 15:25 To: Tal Grynbaum Cc: darren; Mich Talebzadeh; Jakob Odersky; kant kodali; AssafMendelson; user Subject: Re: Scala Vs Python Tal: I think by nature of the project itself, Python APIs are developed after Scala and Java, and it is a fair

Re: Scala Vs Python

2016-09-01 Thread ayan guha
Tal: I think by nature of the project itself, Python APIs are developed after Scala and Java, and it is a fair trade off between speed of getting stuff to market. And more and more this discussion is progressing, I see not much issue in terms of feature parity. Coming back to performance, Darren r

Re: Scala Vs Python

2016-09-01 Thread Tal Grynbaum
On Fri, Sep 2, 2016 at 1:15 AM, darren wrote: > This topic is a concern for us as well. In the data science world no one > uses native scala or java by choice. It's R and Python. And python is > growing. Yet in spark, python is 3rd in line for feature support, if at all. > > This is why we have d

Re: Scala Vs Python

2016-09-01 Thread ayan guha
>>>>>> wrote: >>>>>> >>>>>>> c'mon man this is no Brainer..Dynamic Typed Languages for Large Code >>>>>>> Bases or Large Scale Distributed Systems makes absolutely no sense. I >>>>>>> can >>>&g

Re: Scala Vs Python

2016-09-01 Thread Jakob Odersky
buted Systems makes absolutely no sense. I can >>>>>> write a 10 page essay on why that wouldn't work so great. you might be >>>>>> wondering why would spark have it then? well probably because its ease of >>>>>> use for ML (that would be my best guess). >>>>>> >>&

Re: Scala Vs Python

2016-09-01 Thread Mich Talebzadeh
>>>>>> familiarity with the languages. >>>>>> >>>>>> >>>>>> >>>>>> In general, scala would have a much better performance than python >>>>>> and not all interfaces are available in pytho

Re: Scala Vs Python

2016-09-01 Thread darren
--From: Mich Talebzadeh Date: 9/1/16 6:01 PM (GMT-05:00) To: Jakob Odersky Cc: ayan guha , kant kodali , AssafMendelson , user Subject: Re: Scala Vs Python Hi Jacob. My understanding of Dataset is that it is basically an RDD with some optimization gone into it. RDD is meant to

Re: Scala Vs Python

2016-09-01 Thread Peyman Mohajerian
;> >>>>> That said, if you are planning to use dataframes without any UDF then >>>>> the performance hit is practically nonexistent. >>>>> >>>>> Even if you need UDF, it is possible to write those in scala and wrap >>>>> them for p

Re: Scala Vs Python

2016-09-01 Thread Mich Talebzadeh
them for python and still get away without the performance hit. >>>> >>>> Python does not have interfaces for UDAFs. >>>> >>>> >>>> >>>> I believe that if you have large structured data and do not generally >>>>

Re: Scala Vs Python

2016-09-01 Thread Jakob Odersky
hon does not have interfaces for UDAFs. >>> >>> >>> >>> I believe that if you have large structured data and do not generally >>> need UDF/UDAF you can certainly work in python without losing too much. >>> >>> >>> >>> &

Re: Scala Vs Python

2016-09-01 Thread ayan guha
> >> I believe that if you have large structured data and do not generally >> need UDF/UDAF you can certainly work in python without losing too much. >> >> >> >> >> >> *From:* ayan guha [mailto:[hidden email] >> <http:///user/SendEmail.jtp?type=node&

Re: Scala Vs Python

2016-09-01 Thread kant kodali
n does not have interfaces for UDAFs. I believe that if you have large structured data and do not generally need UDF/UDAF you can certainly work in python without losing too much. From: ayan guha [mailto:[hidden email]] Sent: Thursday, September 01, 2016 5:03 AM To: user Subject: Scala

RE: Scala Vs Python

2016-08-31 Thread AssafMendelson
can certainly work in python without losing too much. From: ayan guha [mailto:guha.a...@gmail.com] Sent: Thursday, September 01, 2016 5:03 AM To: user Subject: Scala Vs Python Hi Users Thought to ask (again and again) the question: While I am building any production application, should I use

RE: Scala Vs Python

2016-08-31 Thread Santoshakhilesh
ould prefer to use Scala any day for very simple reason that I would get all the future features and optimizations out of box and I need to type less ☺. Regards, Santosh Akhilesh From: ayan guha [mailto:guha.a...@gmail.com] Sent: 01 September 2016 11:03 To: user Subject: Scala Vs Python Hi U

Scala Vs Python

2016-08-31 Thread ayan guha
Hi Users Thought to ask (again and again) the question: While I am building any production application, should I use Scala or Python? I have read many if not most articles but all seems pre-Spark 2. Anything changed with Spark 2? Either pro-scala way or pro-python way? I am thinking performance,

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread Jörn Franke
View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-for-Spark-ecosystem-tp26805p26806.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - &

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread kramer2...@126.com
-vs-Python-for-Spark-ecosystem-tp26805p26806.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread Zhang, Jingyu
guages for spark ecosystem? Will >> python cover everything scala can in short time periods? what do you >> advice? >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-li

Re: Scala vs Python for Spark ecosystem

2016-04-19 Thread sujeet jog
in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-for-Spark-ecosystem-tp26805.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: use

Scala vs Python for Spark ecosystem

2016-04-19 Thread berkerkozan
message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-for-Spark-ecosystem-tp26805.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user

Re: Scala vs Python performance differences

2015-01-16 Thread Davies Liu
o there's one data point, if only for the obvious data point comparing > computations in Scala to computations in pure Python. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences

Re: Scala vs Python performance differences

2015-01-16 Thread philpearl
ons in pure Python. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p21190.html Sent from the Apache Spark User List mailing list archive at

Re: Scala vs Python performance differences

2014-11-12 Thread Samarth Mailinglist
orrect, at least for some basic operations >>> >(e.g >>> >textFile, count, reduce). >>> > >>> >-- Jeremy >>> > >>> >- >>> >Jeremy Freeman, PhD >>> >Neuroscientist >>> >@thefreemanlab >>> > >>> > >>> > >>> >-- >>> >View this message in context: >>> > >>> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor >>> >mance-differences-tp4247p4261.html >>> >Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> >>> >> >

Re: Scala vs Python performance differences

2014-11-12 Thread Andrew Ash
t roughly, in our >> hands, >> >that 40% number is ballpark correct, at least for some basic operations >> >(e.g >> >textFile, count, reduce). >> > >> >-- Jeremy >> > >> >- >> >Jeremy Freeman, PhD >> >Neur

Re: Scala vs Python performance differences

2014-04-15 Thread Nicholas Chammas
but roughly, in our hands, > >that 40% number is ballpark correct, at least for some basic operations > >(e.g > >textFile, count, reduce). > > > >-- Jeremy > > > >- > >Jeremy Freeman, PhD > >Neuroscientist > >@thef

Re: Scala vs Python performance differences

2014-04-15 Thread Ian Ferreira
my > >- >Jeremy Freeman, PhD >Neuroscientist >@thefreemanlab > > > >-- >View this message in context: >http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor >mance-differences-tp4247p4261.html >Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Scala vs Python performance differences

2014-04-14 Thread Jeremy Freeman
em, but roughly, in our hands, that 40% number is ballpark correct, at least for some basic operations (e.g textFile, count, reduce). -- Jeremy - Jeremy Freeman, PhD Neuroscientist @thefreemanlab -- View this message in context: http://apache-spark-user-list.1001560.n3.nabb

Re: Scala vs Python performance differences

2014-04-14 Thread Bin Wang
At least, Spark Streaming doesn't support Python at this moment, right? On Mon, Apr 14, 2014 at 6:48 PM, Andrew Ash wrote: > Hi Spark users, > > I've always done all my Spark work in Scala, but occasionally people ask > about Python and its performance impact vs the same algorithm > implementat

Scala vs Python performance differences

2014-04-14 Thread Andrew Ash
Hi Spark users, I've always done all my Spark work in Scala, but occasionally people ask about Python and its performance impact vs the same algorithm implementation in Scala. Has anyone done tests to measure the difference? Anecdotally I've heard Python is a 40% slowdown but that's entirely hea