Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Sofia’s World
Hey My 2 cents on CI/Cd for pyspark. You can leverage pytests + holden karau's spark testing libs for CI thus giving you `almost` same functionality as Scala - I say almost as in Scala you have nice and descriptive funcspecs - For me choice is based on expertise.having worked with teams which ar

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Mich Talebzadeh
Hi Wim, I think we are splitting the atom here but my inference to functionality was based on: 1. Spark is written in Scala, so knowing Scala programming language helps coders navigate into the source code, if something does not function as expected. 2. Given the framework using P

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread William R
It's really a very big discussion around Pyspark Vs Scala. I have little bit experience about how we can automate the CI/CD when it's a JVM based language. I would like to take this as an opportunity to understand the end-to-end CI/CD flow for Pyspark based ETL pipelines. Could someone please list

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Wim Van Leuven
I think Sean is right, but in your argumentation you mention that 'functionality is sacrificed in favour of the availability of resources'. That's where I disagree with you but agree with Sean. That is mostly not true. In your previous posts you also mentioned this . The only reason we sometimes h

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Mich Talebzadeh
Thanks for the feedback Sean. Kind regards, Mich LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * *Disclaimer:* Use it at your own risk. Any and all responsibili

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Sean Owen
I don't find this trolling; I agree with the observation that 'the skills you have' are a valid and important determiner of what tools you pick. I disagree that you just have to pick the optimal tool for everything. Sounds good until that comes in contact with the real world. For Spark, Python vs S

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Gourav Sengupta
Hi Mich, this is turning into a troll now, can you please stop this? No one uses Scala where Python should be used, and no one uses Python where Scala should be used - it all depends on requirements. Everyone understands polyglot programming and how to use relevant technologies best to their adva

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Mich Talebzadeh
Today I had a discussion with a lead developer on a client site regarding Scala or PySpark. with Spark. They were not doing data science and reluctantly agreed that PySpark was used for ETL. In mitigation he mentioned that in his team he is the only one that is an expert on Scala (his words) and

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Magnus Nilsson
Holy war is a bit dramatic don't you think? 🙂 The difference between Scala and Python will always be very relevant when choosing between Spark and Pyspark. I wouldn't call it irrelevant to the original question. br, molotch On Sat, 17 Oct 2020 at 16:57, "Yuri Oleynikov (‫יורי אולייניקוב‬‎)" < yu

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Magnus Nilsson
I'm sorry you were offended. I'm not an expert in Python and I wasn't trying to attack you personally. It's just an opinion about what makes a language better or worse, it's not the single source of truth. You don't have to take offense. In the end its about context and what you're trying to achiev

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Holden Karau
Scala and Python have their advantages and disadvantages with Spark. In my experience with performance is super important you’ll end up needing to do some of your work in the JVM, but in many situations what matters work is what your team and company are familiar with and the ecosystem of tooling

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Yuri Oleynikov (‫יורי אולייניקוב‬‎)
It seems that thread converted to holy war that has nothing to do with original question. If it is, it’s super disappointing Отправлено с iPhone > 17 окт. 2020 г., в 15:53, Molotch написал(а): > > I would say the pros and cons of Python vs Scala is both down to Spark, the > languages in thems

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Sasha Kacanski
And you are an expert on python! Idiomatic... Please do everyone a favor and stop commenting on things you have no idea... I build ETL systems python that wiped java commercial stacks left and right. Pyspark was and is and will be a second class citizen in spark world. That has nothing to do with

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Molotch
I would say the pros and cons of Python vs Scala is both down to Spark, the languages in themselves and what kind of data engineer you will get when you try to hire for the different solutions. With Pyspark you get less functionality and increased complexity with the py4j java interop compared to

Re: Scala vs Python for ETL with Spark

2020-10-15 Thread Mich Talebzadeh
Hi, I spent a few days converting one of my Spark/Scala scripts to Python. It was interesting but at times looked like trench war. There is a lot of handy stuff in Scala like case classes for defining column headers etc that don't seem to be available in Python (possibly my lack of in-depth Python

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
Hi, With regard to your statement below ".technology choices are agnostic to use cases according to you" If I may say, I do not think that was the message implied. What was said was that in addition to "best technology fit" there are other factors "equally important" that need to be consider

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Gourav Sengupta
So Mich and rest, technology choices are agnostic to use cases according to you? This is interesting, really interesting. Perhaps I stand corrected. Regards, Gourav On Sun, Oct 11, 2020 at 5:00 PM Mich Talebzadeh wrote: > if we take Spark and its massive parallel processing and in-memory > cac

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
if we take Spark and its massive parallel processing and in-memory cache away, then one can argue anything can do the "ETL" job. just write some Java/Scala/SQL/Perl/python to read data and write to from one DB to another often using JDBC connections. However, we all concur that may not be good enou

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread ayan guha
But when you have fairly large volume of data that is where spark comes in the party. And I assume the requirement of using spark is already established in the original qs and the discussion is to use python vs scala/java. On Sun, 11 Oct 2020 at 10:51 pm, Sasha Kacanski wrote: > If org has folks

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
Thanks Ayan. I am not qualified to answer your first point. However, my experience with Spark with Scala or Spark with Python agrees with your assertion that use cases do not come into it. Most DEV/OPS work dealing with ETL are provided by service companies that have workforce very familiar with J

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread ayan guha
I have one observation: is "python udf is slow due to deserialization penulty" still relevant? Even after arrow is used as in memory data mgmt and so heavy investment from spark dev community on making pandas first class citizen including Udfs. As I work with multiple clients, my exp is org cultur

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Gourav Sengupta
Not quite sure how meaningful this discussion is, but in case someone is really faced with this query the question still is 'what is the use case'? I am just a bit confused with the one size fits all deterministic approach here thought that those days were over almost 10 years ago. Regards Gourav

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Stephen Boesch
I agree with Wim's assessment of data engineering / ETL vs Data Science. I wrote pipelines/frameworks for large companies and scala was a much better choice. But for ad-hoc work interfacing directly with data science experiments pyspark presents less friction. On Sat, 10 Oct 2020 at 13:03, Mich Ta

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Mich Talebzadeh
Many thanks everyone for their valuable contribution. We all started with Spark a few years ago where Scala was the talk of the town. I agree with the note that as long as Spark stayed nish and elite, then someone with Scala knowledge was attracting premiums. In fairness in 2014-2015, there was no

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Jacek Pliszka
I would not leave it to data scientists unless they will maintain it. The key decision in cases I've seen was usually people cost/availability with ETL operations cost taken into account. Often the situation is that ETL cloud cost is small and you will not save much. Then it is just skills cost/a

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Jörn Franke
It really depends on what your data scientists talk. I don’t think it makes sense for ad hoc data science things to impose a language on them, but let them choose. For more complex AI engineering things you can though apply different standards and criteria. And then it really depends on architec

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Wim Van Leuven
Hey Mich, This is a very fair question .. I've seen many data engineering teams start out with Scala because technically it is the best choice for many given reasons and basically it is what Spark is. On the other hand, almost all use cases we see these days are data science use cases where peopl

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Gourav Sengupta
What is the use case? Unless you have unlimited funding and time to waste you would usually start with that. Regards, Gourav On Fri, Oct 9, 2020 at 10:29 PM Russell Spitzer wrote: > Spark in Scala (or java) Is much more performant if you are using RDD's, > those operations basically force you t

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Russell Spitzer
Spark in Scala (or java) Is much more performant if you are using RDD's, those operations basically force you to pass lambdas, hit serialization between java and python types and yes hit the Global Interpreter Lock. But, none of those things apply to Data Frames which will generate Java code regard

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Mich Talebzadeh
Thanks So ignoring Python lambdas is it a matter of individuals familiarity with the language that is the most important factor? Also I have noticed that Spark document preferences have been switched from Scala to Python as the first example. However, some codes for example JDBC calls are the same

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Russell Spitzer
As long as you don't use python lambdas in your Spark job there should be almost no difference between the Scala and Python dataframe code. Once you introduce python lambdas you will hit some significant serialization penalties as well as have to run actual work code in python. As long as no lambda

Re: Scala Vs Python

2016-09-06 Thread 刘虓
8:07 AM > *To:* "darren"; > *Cc:* "Mich Talebzadeh"; "Jakob Odersky"< > ja...@odersky.com>; "ayan guha"; "kant kodali"< > kanth...@gmail.com>; "AssafMendelson"; "user"< > user@spark.apache.org>; >

Re: Scala Vs Python

2016-09-06 Thread Leonard Cohen
regards, Leonard -- Original -- From: "Luciano Resende";; Send time: Tuesday, Sep 6, 2016 8:07 AM To: "darren"; Cc: "Mich Talebzadeh"; "Jakob Odersky"; "ayan guha"; "kant kodali"; "AssafMendelson";

Re: Scala Vs Python

2016-09-05 Thread Luciano Resende
On Thu, Sep 1, 2016 at 3:15 PM, darren wrote: > This topic is a concern for us as well. In the data science world no one > uses native scala or java by choice. It's R and Python. And python is > growing. Yet in spark, python is 3rd in line for feature support, if at all. > > This is why we have d

Re: Scala Vs Python

2016-09-05 Thread Gourav Sengupta
different cluster configurations and ran it > several times to get some idea on the noise. > > Of course, the more complicated the UDF, the less the overhead affects you. > > Hope this helps. > > Assaf > > > > > > > > > > > > > >

RE: Scala Vs Python

2016-09-04 Thread AssafMendelson
Assaf From: ayan guha [mailto:guha.a...@gmail.com] Sent: Sunday, September 04, 2016 11:00 AM To: Mendelson, Assaf Cc: user Subject: Re: Scala Vs Python Hi This one is quite interesting. Is it possible to share few toy examples? On Sun, Sep 4, 2016 at 5:23 PM, AssafMendelson mailt

Re: Scala Vs Python

2016-09-04 Thread Simon Edelhaus
> >> >> >> >> *From:* ayan guha [mailto:[hidden email] >> <http:///user/SendEmail.jtp?type=node&node=27650&i=0>] >> *Sent:* Friday, September 02, 2016 12:21 AM >> *To:* kant kodali >> *Cc:* Mendelson, Assaf; user >> *Subject

Re: Scala Vs Python

2016-09-04 Thread ayan guha
scala one and then > wrap it to be accessible from python. > > > > > > *From:* ayan guha [mailto:[hidden email] > <http:///user/SendEmail.jtp?type=node&node=27650&i=0>] > *Sent:* Friday, September 02, 2016 12:21 AM > *To:* kant kodali > *Cc:* Mendelson, Assaf; user >

RE: Scala Vs Python

2016-09-04 Thread AssafMendelson
ask my team (which does the engineering) and we write them a scala one and then wrap it to be accessible from python. From: ayan guha [mailto:guha.a...@gmail.com] Sent: Friday, September 02, 2016 12:21 AM To: kant kodali Cc: Mendelson, Assaf; user Subject: Re: Scala Vs Python Thanks All for

Re: Scala Vs Python

2016-09-02 Thread darren
te: 9/2/16 4:03 AM (GMT-05:00) To: Mich Talebzadeh Cc: Jakob Odersky , ayan guha , Tal Grynbaum , darren , kant kodali , AssafMendelson , user Subject: Re: Scala Vs Python Whatever benefits you may accrue from the rapid prototyping and coding in Python, it will be offset against the tim

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
No offence taken. Glad that it was rectified. Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Dis

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
I apologize for my harsh tone. You are right, it was unnecessary and discourteous. On Fri, Sep 2, 2016 at 11:01 AM Mich Talebzadeh wrote: > Hi, > > You made such statement: > > "That's complete nonsense." > > That is a strong language and void of any courtesy. Only dogmatic > individuals make su

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
Hi, You made such statement: "That's complete nonsense." That is a strong language and void of any courtesy. Only dogmatic individuals make such statements, engaging the keyboard before thinking about it. You are perfectly in your right to agree to differ. However, that does not give you the ri

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
You made a specific claim -- that Spark will move away from Python -- which I responded to with clear references and data. How on earth is that a "religious argument"? I'm not saying that Python is better than Scala or anything like that. I'm just addressing your specific claim about its future in

Re: Scala Vs Python

2016-09-02 Thread andy petrella
looking at the examples, indeed they make nonsense :D On Fri, 2 Sep 2016 16:48 Mich Talebzadeh, wrote: > Right so. We are back into religious arguments. Best of luck > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
Right so. We are back into religious arguments. Best of luck Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.c

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
On Fri, Sep 2, 2016 at 3:58 AM Mich Talebzadeh wrote: > I believe as we progress in time Spark is going to move away from Python. If > you look at 2014 Databricks code examples, they were mostly in Python. Now > they are mostly in Scala for a reason. > That's complete nonsense. First off, you c

Re: Scala Vs Python

2016-09-02 Thread Sivakumaran S
Whatever benefits you may accrue from the rapid prototyping and coding in Python, it will be offset against the time taken to convert it to run inside the JVM. This of course depends on the complexity of the DAG. I guess it is a matter of language preference. Regards, Sivakumaran S > On 02-Se

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
>From an outsider point of view nobody likes change :) However, it appears to me that Scala is a rising star and if one learns it, it is another iron in the fire so to speak. I believe as we progress in time Spark is going to move away from Python. If you look at 2014 Databricks code examples, the

Re: Scala Vs Python

2016-09-02 Thread Jakob Odersky
Forgot to answer your question about feature parity of Python w.r.t. Spark's different components I mostly work with scala so I can't say for sure but I think that all pre-2.0 features (that's basically everything except Structured Streaming) are on par. Structured Streaming is a pretty new feature

Re: Scala Vs Python

2016-09-02 Thread Jakob Odersky
As you point out, often the reason that Python support lags behind is that functionality is implemented in Scala, so the API in that language is "free" whereas Python support needs to be added explicitly. Nevertheless, Python bindings are an important part of Spark and is used by many people (this

RE: Scala Vs Python

2016-09-02 Thread Santoshakhilesh
[mailto:guha.a...@gmail.com] Sent: 02 September 2016 15:25 To: Tal Grynbaum Cc: darren; Mich Talebzadeh; Jakob Odersky; kant kodali; AssafMendelson; user Subject: Re: Scala Vs Python Tal: I think by nature of the project itself, Python APIs are developed after Scala and Java, and it is a fair

Re: Scala Vs Python

2016-09-01 Thread ayan guha
Tal: I think by nature of the project itself, Python APIs are developed after Scala and Java, and it is a fair trade off between speed of getting stuff to market. And more and more this discussion is progressing, I see not much issue in terms of feature parity. Coming back to performance, Darren r

Re: Scala Vs Python

2016-09-01 Thread Tal Grynbaum
On Fri, Sep 2, 2016 at 1:15 AM, darren wrote: > This topic is a concern for us as well. In the data science world no one > uses native scala or java by choice. It's R and Python. And python is > growing. Yet in spark, python is 3rd in line for feature support, if at all. > > This is why we have d

Re: Scala Vs Python

2016-09-01 Thread ayan guha
believe this would greatly depend on your use case and your >>>>>>>> familiarity with the languages. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> In general, scala would have a much better performance than p

Re: Scala Vs Python

2016-09-01 Thread Jakob Odersky
in python. >>>>>>> >>>>>>> That said, if you are planning to use dataframes without any UDF >>>>>>> then the performance hit is practically nonexistent. >>>>>>> >>>>>>> Even if you need UDF, it is po

Re: Scala Vs Python

2016-09-01 Thread Mich Talebzadeh
;>> >>>>>> I believe that if you have large structured data and do not generally >>>>>> need UDF/UDAF you can certainly work in python without losing too much. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>&

Re: Scala Vs Python

2016-09-01 Thread darren
--From: Mich Talebzadeh Date: 9/1/16 6:01 PM (GMT-05:00) To: Jakob Odersky Cc: ayan guha , kant kodali , AssafMendelson , user Subject: Re: Scala Vs Python Hi Jacob. My understanding of Dataset is that it is basically an RDD with some optimization gone into it. RDD is meant to

Re: Scala Vs Python

2016-09-01 Thread Peyman Mohajerian
;>> need UDF/UDAF you can certainly work in python without losing too much. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *From:* ayan guha [mailto:[hidden email] >>>>> <http:///user/SendEmail.jtp?

Re: Scala Vs Python

2016-09-01 Thread Mich Talebzadeh
t;>>> *Sent:* Thursday, September 01, 2016 5:03 AM >>>> *To:* user >>>> *Subject:* Scala Vs Python >>>> >>>> >>>> >>>> Hi Users >>>> >>>> >>>> >>>> Thought to ask (agai

Re: Scala Vs Python

2016-09-01 Thread Jakob Odersky
articles but all seems pre-Spark 2. >>> Anything changed with Spark 2? Either pro-scala way or pro-python way? >>> >>> >>> >>> I am thinking performance, feature parity and future direction, not so >>> much in terms of skillset or ease of use. >>> >>> >>> >>> Or, if you think it is a moot point, please say so as well. >>> >>> >>> >>> Any real life example, production experience, anecdotes, personal taste, >>> profanity all are welcome :) >>> >>> >>> >>> -- >>> >>> Best Regards, >>> Ayan Guha >>> >>> -- >>> View this message in context: RE: Scala Vs Python >>> <http://apache-spark-user-list.1001560.n3.nabble.com/RE-Scala-Vs-Python-tp27637.html> >>> Sent from the Apache Spark User List mailing list archive >>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >>> >> > > > -- > Best Regards, > Ayan Guha >

Re: Scala Vs Python

2016-09-01 Thread ayan guha
inking performance, feature parity and future direction, not so >> much in terms of skillset or ease of use. >> >> >> >> Or, if you think it is a moot point, please say so as well. >> >> >> >> Any real life example, production experience, anecdotes, personal taste, >>

Re: Scala Vs Python

2016-09-01 Thread kant kodali
Ayan Guha View this message in context: RE: Scala Vs Python Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: Scala Vs Python

2016-08-31 Thread AssafMendelson
point, please say so as well. Any real life example, production experience, anecdotes, personal taste, profanity all are welcome :) -- Best Regards, Ayan Guha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RE-Scala-Vs-Python-tp27637.html Sent from the

RE: Scala Vs Python

2016-08-31 Thread Santoshakhilesh
Hi , I would prefer Scala if you are starting afresh , this is considering both ease of usage , features , performance and support. You will find numerous examples & support with Scala which might not be true for any other language. I had personally developed the first version of my App using Jav

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread Jörn Franke
Python can access the JVM - this how it interfaces with Spark. Some of the components do not have a wrapper fro the corresponding Java Api yet and thus are not accessible in Python. Same for elastic search. You need to write a more or less simple wrapper. > On 20 Apr 2016, at 09:53, "kramer2...

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread kramer2...@126.com
I am using python and spark. I think one problem might be to communicate spark with third product. For example, combine spark with elasticsearch. You have to use java or scala. Python is not supported -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-v

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread Zhang, Jingyu
Graphx did not support Python yet. http://spark.apache.org/docs/latest/graphx-programming-guide.html The workaround solution is use graphframes (3rd party API), https://issues.apache.org/jira/browse/SPARK-3789 but some features in Python are not as same as Scala, https://github.com/graphframes/gr

Re: Scala vs Python for Spark ecosystem

2016-04-19 Thread sujeet jog
It depends on the trade off's you wish to have, Python being a interpreted language, speed of execution will be lesser, but it being a very common language used across, people can jump in hands on quickly Scala programs run in java environment, so it's obvious you will get good execution speed,

Re: Scala vs Python performance differences

2015-01-16 Thread Davies Liu
Hey Phil, Thank you sharing this. The result didn't surprise me a lot, it's normal to do the prototype in Python, once it get stable and you really need the performance, then rewrite part of it in C or whole of it in another language does make sense, it will not cause you much time. Davies On Fr

Re: Scala vs Python performance differences

2015-01-16 Thread philpearl
I was interested in this as I had some Spark code in Python that was too slow and wanted to know whether Scala would fix it for me. So I re-wrote my code in Scala. In my particular case the Scala version was 10 times faster. But I think that is because I did an awful lot of computation in my own

Re: Scala vs Python performance differences

2014-11-12 Thread Samarth Mailinglist
I was about to ask this question. On Wed, Nov 12, 2014 at 3:42 PM, Andrew Ash wrote: > Jeremy, > > Did you complete this benchmark in a way that's shareable with those > interested here? > > Andrew > > On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >>

Re: Scala vs Python performance differences

2014-11-12 Thread Andrew Ash
Jeremy, Did you complete this benchmark in a way that's shareable with those interested here? Andrew On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > I'd also be interested in seeing such a benchmark. > > > On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira >

Re: Scala vs Python performance differences

2014-04-15 Thread Nicholas Chammas
I'd also be interested in seeing such a benchmark. On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira wrote: > This would be super useful. Thanks. > > On 4/15/14, 1:30 AM, "Jeremy Freeman" wrote: > > >Hi Andrew, > > > >I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on > >ML

Re: Scala vs Python performance differences

2014-04-15 Thread Ian Ferreira
This would be super useful. Thanks. On 4/15/14, 1:30 AM, "Jeremy Freeman" wrote: >Hi Andrew, > >I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on >ML algorithms, as I'm particularly curious about the relative performance >of >MLlib in Scala vs the Python MLlib API vs pur

Re: Scala vs Python performance differences

2014-04-14 Thread Jeremy Freeman
Hi Andrew, I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on ML algorithms, as I'm particularly curious about the relative performance of MLlib in Scala vs the Python MLlib API vs pure Python implementations. Will share real results as soon as I have them, but roughly, i

Re: Scala vs Python performance differences

2014-04-14 Thread Bin Wang
At least, Spark Streaming doesn't support Python at this moment, right? On Mon, Apr 14, 2014 at 6:48 PM, Andrew Ash wrote: > Hi Spark users, > > I've always done all my Spark work in Scala, but occasionally people ask > about Python and its performance impact vs the same algorithm > implementat