Re: PySpark: preference for Python 2.7 or Python 3.5?

2016-09-02 Thread Ian Stokes Rees

On 9/2/16 3:47 AM, Felix Cheung wrote:

There is an Anaconda parcel one could readily install on CDH

https://docs.continuum.io/anaconda/cloudera

As Sean says it is Python 2.7.x.

Spark should work for both 2.7 and 3.5.


Yes, I'm actually an engineer at Continuum, so I know the Anaconda 
parcel pretty well.  It is more a question of whether CDH and Spark 
"work" better with PySpark on Python 2.7 or Python 3.5.  My sense was 
"you choose: both are fine", but I wanted to ask here before committing 
to going down one path or another.


Thanks,

Ian


PySpark: preference for Python 2.7 or Python 3.5?

2016-09-01 Thread Ian Stokes Rees
I have the option of running PySpark with Python 2.7 or Python 3.5. I am 
fairly expert with Python and know the Python-side history of the 
differences.  All else being the same, I have a preference for Python 
3.5.  I'm using CDH 5.8 and I'm wondering if that biases whether I 
should proceed with PySpark on top of Python 2.7 or 3.5. Opinions?  Does 
Cloudera have an official (or unofficial) position on this?


Thanks,

Ian
___
Ian Stokes-Rees
Computational Scientist

Continuum Analytics <http://continuum.io>
@ijstokes Twitter <http://twitter.com/ijstokes> LinkedIn 
<http://linkedin.com/in/ijstokes> Github 
<http://github.com/ijstokes>617.942.0218