There is an Anaconda parcel one could readily install on CDH

As Sean says it is Python 2.7.x.

Spark should work for both 2.7 and 3.5.

From: Sean Owen <<>>
Sent: Friday, September 2, 2016 12:41 AM
Subject: Re: PySpark: preference for Python 2.7 or Python 3.5?
To: Ian Stokes Rees <<>>
Cc: user @spark <<>>

Spark should work fine with Python 3. I'm not a Python person, but all else 
equal I'd use 3.5 too. I assume the issue could be libraries you want that 
don't support Python 3. I don't think that changes with CDH. It includes a 
version of Anaconda from Continuum, but that lays down Python 2.7.11. I don't 
believe there's any particular position on 2 vs 3.

On Fri, Sep 2, 2016 at 3:56 AM, Ian Stokes Rees 
<<>> wrote:
I have the option of running PySpark with Python 2.7 or Python 3.5.  I am 
fairly expert with Python and know the Python-side history of the differences.  
All else being the same, I have a preference for Python 3.5.  I'm using CDH 5.8 
and I'm wondering if that biases whether I should proceed with PySpark on top 
of Python 2.7 or 3.5.  Opinions?  Does Cloudera have an official (or 
unofficial) position on this?


Ian Stokes-Rees
Computational Scientist

[Continuum Analytics]<>
@ijstokes [Twitter] <>  [LinkedIn] 
<>  [Github] <>  

Reply via email to