Re: Python vs. Scala

2017-09-06 Thread Conconscious
> > 3. Code complexity: Python is much faster to code, but this is more of > choice > > 4. data science - here python is first class citizen, almost no > feature gap between scala and python api > > IMHO, both has sweet spots...and i would highly recommend to learn

Re: Python vs. Scala

2017-09-05 Thread ayan guha
between scala and python api IMHO, both has sweet spots...and i would highly recommend to learn python for just sake of sheer fun to code with it :) best Ayan On Wed, Sep 6, 2017 at 1:46 PM, Adaryl Wakefield < adaryl.wakefi...@hotmail.com> wrote: > Is there any performance difference i

Python vs. Scala

2017-09-05 Thread Adaryl Wakefield
Is there any performance difference in writing your application in python vs. scala? I’ve resisted learning Python because it’s an interpreted scripting language, but the market seems to be demanding Python skills. Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 91

Re: [EXT] Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Michael Mansour (CS)
.org" <user@spark.apache.org> Subject: [EXT] Re: [Spark Core]: Python and Scala generate different DAGs for identical code This video https://www.youtube.com/watch?v=LQHMMCf2ZWY I think. On Wed, May 10, 2017 at 8:04 PM, lucas.g...@gmail.com<mailto:lucas.g...@gmail.com> <luca

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Pavel Klemenkov
;>> generated something strange which is hard to follow: >>>>> >>>>> (2) PythonRDD[13] at RDD at PythonRDD.scala:48 [] >>>>> | MapPartitionsRDD[12] at mapPartitions at PythonRDD.scala:422 [] >>>>> | ShuffledRDD[11] at part

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread lucas.g...@gmail.com
) PythonRDD[13] at RDD at PythonRDD.scala:48 [] >>>> | MapPartitionsRDD[12] at mapPartitions at PythonRDD.scala:422 [] >>>> | ShuffledRDD[11] at partitionBy at NativeMethodAccessorImpl.java:0 >>>> [] >>>> +-(2) PairwiseRDD[10] at reduceByKey at :1 >&g

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Holden Karau
| ../log.txt MapPartitionsRDD[8] at textFile at >>> NativeMethodAccessorImpl.java:0 [] >>> | ../log.txt HadoopRDD[7] at textFile at >>> NativeMethodAccessorImpl.java:0 [] >>> >>> Why is that? Does pyspark do some optimizations under th

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Pavel Klemenkov
tFile at >> NativeMethodAccessorImpl.java:0 [] >> | ../log.txt HadoopRDD[7] at textFile at >> NativeMethodAccessorImpl.java:0 [] >> >> Why is that? Does pyspark do some optimizations under the hood? This debug

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Holden Karau
doopRDD[7] at textFile at > NativeMethodAccessorImpl.java:0 [] > > Why is that? Does pyspark do some optimizations under the hood? This debug > string is really useless for debugging. > > > > -- > View this message in context: > http://apache-spark

[Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread pklemenkov
DD[8] at textFile at NativeMethodAccessorImpl.java:0 [] | ../log.txt HadoopRDD[7] at textFile at NativeMethodAccessorImpl.java:0 [] Why is that? Does pyspark do some optimizations under the hood? This debug string is really useless for debugging. -- View this message in context: http://apache-spa

[Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Pavel Klemenkov
This Scala code: scala> val logs = sc.textFile("big_data_specialization/log.txt"). | filter(x => !x.contains("INFO")). | map(x => (x.split("\t")(1), 1)). | reduceByKey((x, y) => x + y) generated obvious lineage: (2) ShuffledRDD[4] at reduceByKey at :27 [] +-(2)

Re: Is it better to Use Java or Python on Scala for Spark for using big data sets

2017-02-14 Thread Gourav Sengupta
morons who use Scala for OOPs and claim its nothing new please avoid them. Regards, Gourav On Mon, Feb 13, 2017 at 5:57 PM, Spark User <sparkuser2...@gmail.com> wrote: > Spark has more support for scala, by that I mean more APIs are available > for scala compared to python or Java

Re: Is it better to Use Java or Python on Scala for Spark for using big data sets

2017-02-13 Thread Spark User
Spark has more support for scala, by that I mean more APIs are available for scala compared to python or Java. Also scala code will be more concise and easy to read. Java is very verbose. On Thu, Feb 9, 2017 at 10:21 PM, Irving Duran <irving.du...@gmail.com> wrote: > I would say Ja

Re: Is it better to Use Java or Python on Scala for Spark for using big data sets

2017-02-09 Thread Irving Duran
ry6...@gmail.com> wrote: Hi All, Is it better to Use Java or Python on Scala for Spark coding.. Mainly My work is with getting file data which is in csv format and I have to do some rule checking and rule aggrgeation and put the final filtered data back to oracle so that real time apps can use it..

Is it better to Use Java or Python on Scala for Spark for using big data sets

2017-02-09 Thread nancy henry
Hi All, Is it better to Use Java or Python on Scala for Spark coding.. Mainly My work is with getting file data which is in csv format and I have to do some rule checking and rule aggrgeation and put the final filtered data back to oracle so that real time apps can use it..

Re: Starting a new Spark codebase, Python or Scala / Java?

2016-11-21 Thread Anthony May
n convert a Spark DataFrame to a Pandas DataFrame. > > On Mon, Nov 21, 2016 at 1:51 PM, Brandon White <bwwintheho...@gmail.com> > wrote: > > Hello all, > > I will be starting a new Spark codebase and I would like to get opinions > on using Python over Scala. Historically

Re: Starting a new Spark codebase, Python or Scala / Java?

2016-11-21 Thread Jon Gregg
. There are similarities between the DataFrame and Pandas APIs, and you can convert a Spark DataFrame to a Pandas DataFrame. On Mon, Nov 21, 2016 at 1:51 PM, Brandon White <bwwintheho...@gmail.com> wrote: > Hello all, > > I will be starting a new Spark codebase and I would like to get opinions > on

Starting a new Spark codebase, Python or Scala / Java?

2016-11-21 Thread Brandon White
Hello all, I will be starting a new Spark codebase and I would like to get opinions on using Python over Scala. Historically, the Scala API has always been the strongest interface to Spark. Is this still true? Are there still many benefits and additional features in the Scala API

Re: Apache Spark toDebugString producing different output for python and scala repl

2016-08-15 Thread Saisai Shao
The implementation inside the Python API and Scala API for RDD is slightly different, so the difference of RDD lineage you printed is expected. On Tue, Aug 16, 2016 at 10:58 AM, DEEPAK SHARMA <deepak_dehra...@outlook.com > wrote: > Hi All, > > > Below is the small piec

Re: Apache Spark toDebugString producing different output for python and scala repl

2016-08-15 Thread DEEPAK SHARMA
Hi All, Below is the small piece of code in scala and python REPL in Apache Spark.However I am getting different output in both the language when I execute toDebugString.I am using cloudera quick start VM. PYTHON rdd2 = sc.textFile('file:/home/training/training_materials/data/frostroad.txt

Re: Python to Scala

2016-06-18 Thread Sivakumaran S
fort of learning scala. > > https://spark.apache.org/docs/0.9.0/python-programming-guide.html > <https://spark.apache.org/docs/0.9.0/python-programming-guide.html> > - Thanks, via mobile, excuse brevity. > > On Jun 18, 2016 2:34 PM, "Aakash Basu" <raj2coo...

Re: Python to Scala

2016-06-18 Thread Marco Mistroni
Hi Post the code. I code in python and Scala on spark..I can give u help though api for Scala and python are practically sameonly difference is in the python lambda vs Scala inline functions Hth On 18 Jun 2016 6:27 am, "Aakash Basu" <raj2coo...@gmail.com> wrote: > I

Re: Python to Scala

2016-06-18 Thread ayan guha
;>>> will cut the effort of learning scala. >>>>> >>>>> https://spark.apache.org/docs/0.9.0/python-programming-guide.html >>>>> >>>>> - Thanks, via mobile, excuse brevity. >>>>> On Jun 18, 2016 2:34 PM, "Aakash Basu" <raj2coo...@gmail.com> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I've a python code, which I want to convert to Scala for using it in >>>>>> a Spark program. I'm not so well acquainted with python and learning >>>>>> scala >>>>>> now. Any Python+Scala expert here? Can someone help me out in this >>>>>> please? >>>>>> >>>>>> Thanks & Regards, >>>>>> Aakash. >>>>>> >>>>> >>> -- Best Regards, Ayan Guha

Re: Python to Scala

2016-06-18 Thread Yash Sharma
cala. >>>> >>>> https://spark.apache.org/docs/0.9.0/python-programming-guide.html >>>> >>>> - Thanks, via mobile, excuse brevity. >>>> On Jun 18, 2016 2:34 PM, "Aakash Basu" <raj2coo...@gmail.com> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I've a python code, which I want to convert to Scala for using it in a >>>>> Spark program. I'm not so well acquainted with python and learning scala >>>>> now. Any Python+Scala expert here? Can someone help me out in this please? >>>>> >>>>> Thanks & Regards, >>>>> Aakash. >>>>> >>>> >>

Re: Python to Scala

2016-06-17 Thread Aakash Basu
/python-programming-guide.html >>> >>> - Thanks, via mobile, excuse brevity. >>> On Jun 18, 2016 2:34 PM, "Aakash Basu" <raj2coo...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> I've a python code, which I want to convert to Scala for using it in a >>>> Spark program. I'm not so well acquainted with python and learning scala >>>> now. Any Python+Scala expert here? Can someone help me out in this please? >>>> >>>> Thanks & Regards, >>>> Aakash. >>>> >>> >

Re: Python to Scala

2016-06-17 Thread Stephen Boesch
t;>> >>> I've a python code, which I want to convert to Scala for using it in a >>> Spark program. I'm not so well acquainted with python and learning scala >>> now. Any Python+Scala expert here? Can someone help me out in this please? >>> >>> Thanks & Regards, >>> Aakash. >>> >>

Re: Python to Scala

2016-06-17 Thread Aakash Basu
xcuse brevity. > On Jun 18, 2016 2:34 PM, "Aakash Basu" <raj2coo...@gmail.com> wrote: > >> Hi all, >> >> I've a python code, which I want to convert to Scala for using it in a >> Spark program. I'm not so well acquainted with python and learning sc

Re: Python to Scala

2016-06-17 Thread Yash Sharma
wrote: > Hi all, > > I've a python code, which I want to convert to Scala for using it in a > Spark program. I'm not so well acquainted with python and learning scala > now. Any Python+Scala expert here? Can someone help me out in this please? > > Thanks & Regards, > Aakash. >

Python to Scala

2016-06-17 Thread Aakash Basu
Hi all, I've a python code, which I want to convert to Scala for using it in a Spark program. I'm not so well acquainted with python and learning scala now. Any Python+Scala expert here? Can someone help me out in this please? Thanks & Regards, Aakash.

Re: Single context Spark from Python and Scala

2016-02-15 Thread Chandeep Singh
ext Zeppelin automatically injects ZeppelinContext as variable 'z' in your scala/python environment. ZeppelinContext provides some additional functions and utility. Object exchange ZeppelinContext extends map and it's shared between scala, python environment. So you can put some object from sc

Single context Spark from Python and Scala

2016-02-15 Thread Leonid Blokhin
Hello I want to work with single context Spark from Python and Scala. Is it possible? Is it possible to do betwen started ./bin/pyspark and ./bin/spark-shell for dramatic example? Cheers, Leonid

Python vs Scala performance

2014-10-22 Thread Marius Soutier
to complete. Now one of the data scientists on the team wants to do write some jobs using Python. To learn Spark, he rewrote one of my Scala jobs in Python. From the API-side, everything looks more or less identical. However his jobs take between 5-8 hours to complete! We can also see that the execution

RE: Python vs Scala performance

2014-10-22 Thread Ashic Mahtab
on this :) Cheers, Ashic. From: mps@gmail.com Subject: Python vs Scala performance Date: Wed, 22 Oct 2014 12:00:41 +0200 To: user@spark.apache.org Hi there, we have a small Spark cluster running and are processing around 40 GB of Gzip-compressed JSON data per day. I have written a couple

Re: Python vs Scala performance

2014-10-22 Thread Nicholas Chammas
on this :) Cheers, Ashic. From: mps@gmail.com Subject: Python vs Scala performance Date: Wed, 22 Oct 2014 12:00:41 +0200 To: user@spark.apache.org Hi there, we have a small Spark cluster running and are processing around 40 GB of Gzip-compressed JSON data per day. I have

Re: Python vs Scala performance

2014-10-22 Thread Marius Soutier
fewer interactions with the adapter, that may improve things. Take this with a pinch of salt...I might be way off on this :) Cheers, Ashic. From: mps@gmail.com Subject: Python vs Scala performance Date: Wed, 22 Oct 2014 12:00:41 +0200 To: user@spark.apache.org Hi

Re: Python vs Scala performance

2014-10-22 Thread Nicholas Chammas
, this might be the reason why. If the code can be organised to require fewer interactions with the adapter, that may improve things. Take this with a pinch of salt...I might be way off on this :) Cheers, Ashic. From: mps@gmail.com Subject: Python vs Scala performance Date: Wed, 22

Re: Python vs Scala performance

2014-10-22 Thread Arian Pasquali
with the adapter, that may improve things. Take this with a pinch of salt...I might be way off on this :) Cheers, Ashic. From: mps@gmail.com Subject: Python vs Scala performance Date: Wed, 22 Oct 2014 12:00:41 +0200 To: user@spark.apache.org Hi there, we have a small Spark

Re: Python vs Scala performance

2014-10-22 Thread Marius Soutier
Didn’t seem to help: conf = SparkConf().set(spark.shuffle.spill, false).set(spark.default.parallelism, 12) sc = SparkContext(appName=’app_name', conf = conf) but still taking as much time On 22.10.2014, at 14:17, Nicholas Chammas nicholas.cham...@gmail.com wrote: Total guess without knowing

Re: Python vs Scala performance

2014-10-22 Thread Eustache DIEMERT
on this :) Cheers, Ashic. From: mps@gmail.com Subject: Python vs Scala performance Date: Wed, 22 Oct 2014 12:00:41 +0200 To: user@spark.apache.org Hi there, we have a small Spark cluster running and are processing around 40 GB of Gzip-compressed JSON data per day. I have written a couple

Re: Python vs Scala performance

2014-10-22 Thread Davies Liu
things. Take this with a pinch of salt...I might be way off on this :) Cheers, Ashic. From: mps@gmail.com Subject: Python vs Scala performance Date: Wed, 22 Oct 2014 12:00:41 +0200 To: user@spark.apache.org Hi there, we have a small Spark cluster running and are processing

Re: Python vs Scala performance

2014-10-22 Thread Nicholas Chammas
On Wed, Oct 22, 2014 at 11:34 AM, Eustache DIEMERT eusta...@diemert.fr wrote: Wild guess maybe, but do you decode the json records in Python ? it could be much slower as the default lib is quite slow. Oh yeah, this is a good place to look. Also, just upgrading to Python 2.7 may be enough

Re: Python vs Scala performance

2014-10-22 Thread Marius Soutier
Yeah we’re using Python 2.7.3. On 22.10.2014, at 20:06, Nicholas Chammas nicholas.cham...@gmail.com wrote: On Wed, Oct 22, 2014 at 11:34 AM, Eustache DIEMERT eusta...@diemert.fr wrote: Wild guess maybe, but do you decode the json records in Python ? it could be much slower as the

Re: Python vs Scala performance

2014-10-22 Thread Marius Soutier
Can’t install that on our cluster, but I can try locally. Is there a pre-built binary available? On 22.10.2014, at 19:01, Davies Liu dav...@databricks.com wrote: In the master, you can easily profile you job, find the bottlenecks, see https://github.com/apache/spark/pull/2556 Could you try

Re: Python vs Scala performance

2014-10-22 Thread Davies Liu
Sorry, there is not, you can try clone from github and build it from scratch, see [1] [1] https://github.com/apache/spark Davies On Wed, Oct 22, 2014 at 2:31 PM, Marius Soutier mps@gmail.com wrote: Can’t install that on our cluster, but I can try locally. Is there a pre-built binary

Time difference between Python and Scala

2014-09-19 Thread Luis Guerra
Hello everyone, What should be the normal time difference between Scala and Python using Spark? I mean running the same program in the same cluster environment. In my case I am using numpy array structures for the Python code and vectors for the Scala code, both for handling my data. The time

Re: Time difference between Python and Scala

2014-09-19 Thread Davies Liu
I think it's normal. On Fri, Sep 19, 2014 at 12:07 AM, Luis Guerra luispelay...@gmail.com wrote: Hello everyone, What should be the normal time difference between Scala and Python using Spark? I mean running the same program in the same cluster environment. In my case I am using numpy array

re: advice on spark input development - python or scala?

2014-09-04 Thread Johnny Kelsey
Hi guys, We're testing out a spark/cassandra cluster, we're very impressed with what we've seen so far. However, I'd very much like some advice from the shiny brains on the mailing list. We have a large collection of python code that we're in the process of adapting to move into

Re: advice on spark input development - python or scala?

2014-09-04 Thread Tobias Pfeiffer
a row in a table on the cluster. My problem is this: should we even be doing this? I think the problem you describe is not related to any programming language. This is a design decision and/or good/bad programming, but it has nothing to do with Python or Scala, if I am not mistaken. Personally, I