There is an Anaconda parcel one could readily install on CDH https://docs.continuum.io/anaconda/cloudera
As Sean says it is Python 2.7.x. Spark should work for both 2.7 and 3.5. _____________________________ From: Sean Owen <so...@cloudera.com<mailto:so...@cloudera.com>> Sent: Friday, September 2, 2016 12:41 AM Subject: Re: PySpark: preference for Python 2.7 or Python 3.5? To: Ian Stokes Rees <ijsto...@continuum.io<mailto:ijsto...@continuum.io>> Cc: user @spark <user@spark.apache.org<mailto:user@spark.apache.org>> Spark should work fine with Python 3. I'm not a Python person, but all else equal I'd use 3.5 too. I assume the issue could be libraries you want that don't support Python 3. I don't think that changes with CDH. It includes a version of Anaconda from Continuum, but that lays down Python 2.7.11. I don't believe there's any particular position on 2 vs 3. On Fri, Sep 2, 2016 at 3:56 AM, Ian Stokes Rees <ijsto...@continuum.io<mailto:ijsto...@continuum.io>> wrote: I have the option of running PySpark with Python 2.7 or Python 3.5. I am fairly expert with Python and know the Python-side history of the differences. All else being the same, I have a preference for Python 3.5. I'm using CDH 5.8 and I'm wondering if that biases whether I should proceed with PySpark on top of Python 2.7 or 3.5. Opinions? Does Cloudera have an official (or unofficial) position on this? Thanks, Ian _______________________________ Ian Stokes-Rees Computational Scientist [Continuum Analytics]<http://continuum.io> @ijstokes [Twitter] <http://twitter.com/ijstokes> [LinkedIn] <http://linkedin.com/in/ijstokes> [Github] <http://github.com/ijstokes> 617.942.0218