+1

On Wed, Jan 6, 2016 at 9:18 AM, Juliet Hougland <juliet.hougl...@gmail.com>
wrote:

> Most admins I talk to about python and spark are already actively (or on
> their way to) managing their cluster python installations. Even if people
> begin using the system python with pyspark, there is eventually a user who
> needs a complex dependency (like pandas or sklearn) on the cluster. No
> admin would muck around installing libs into system python, so you end up
> with other python installations.
>
> Installing a non-system python is something users intending to use pyspark
> on a real cluster should be thinking about, eventually, anyway. It would
> work in situations where people are running pyspark locally or actively
> managing python installations on a cluster. There is an awkward middle
> point where someone has installed spark but not configured their cluster
> (by installing non default python) in any other way. Most clusters I see
> are RHEL/CentOS and have something other than system python used by spark.
>
> What libraries stopped supporting python 2.6 and where does spark use
> them? The "ease of transitioning to pyspark onto a cluster" problem may be
> an easier pill to swallow if it only affected something like mllib or spark
> sql and not parts of the core api. You end up hoping numpy or pandas are
> installed in the runtime components of spark anyway. At that point people
> really should just go install a non system python. There are tradeoffs to
> using pyspark and I feel pretty fine explaining to people that managing
> their cluster's python installations is something that comes with using
> pyspark.
>
> RHEL/CentOS is so common that this would probably be a little work for a
> lot of people.
>
> --Juliet
>
> On Tue, Jan 5, 2016 at 4:07 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> hey evil admin:)
>> i think the bit about java was from me?
>> if so, i meant to indicate that the reality for us is java is 1.7 on most
>> (all?) clusters. i do not believe spark prefers java 1.8. my point was that
>> even although java 1.7 is getting old as well it would be a major issue for
>> me if spark dropped java 1.7 support.
>>
>> On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken <carli...@janelia.hhmi.org>
>> wrote:
>>
>>> As one of the evil administrators that runs a RHEL 6 cluster, we already
>>> provide quite a few different version of python on our cluster pretty darn
>>> easily. All you need is a separate install directory and to set the
>>> PYTHON_HOME environment variable to point to the correct python, then have
>>> the users make sure the correct python is in their PATH. I understand that
>>> other administrators may not be so compliant.
>>>
>>> Saw a small bit about the java version in there; does Spark currently
>>> prefer Java 1.8.x?
>>>
>>> —Ken
>>>
>>> On Jan 5, 2016, at 6:08 PM, Josh Rosen <joshro...@databricks.com> wrote:
>>>
>>> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>>>> while continuing to use a vanilla `python` executable on the executors
>>>
>>>
>>> Whoops, just to be clear, this should actually read "while continuing to
>>> use a vanilla `python` 2.7 executable".
>>>
>>> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <joshro...@databricks.com>
>>> wrote:
>>>
>>>> Yep, the driver and executors need to have compatible Python versions.
>>>> I think that there are some bytecode-level incompatibilities between 2.6
>>>> and 2.7 which would impact the deserialization of Python closures, so I
>>>> think you need to be running the same 2.x version for all communicating
>>>> Spark processes. Note that you _can_ use a Python 2.7 `ipython` executable
>>>> on the driver while continuing to use a vanilla `python` executable on the
>>>> executors (we have environment variables which allow you to control these
>>>> separately).
>>>>
>>>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>>>> nicholas.cham...@gmail.com> wrote:
>>>>
>>>>> I think all the slaves need the same (or a compatible) version of
>>>>> Python installed since they run Python code in PySpark jobs natively.
>>>>>
>>>>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com>
>>>>> wrote:
>>>>>
>>>>>> interesting i didnt know that!
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>>
>>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>>> the app we can not ship it with our software because its gpl licensed
>>>>>>>
>>>>>>> Not to nitpick, but maybe this is important. The Python license is 
>>>>>>> GPL-compatible
>>>>>>> but not GPL <https://docs.python.org/3/license.html>:
>>>>>>>
>>>>>>> Note GPL-compatible doesn’t mean that we’re distributing Python
>>>>>>> under the GPL. All Python licenses, unlike the GPL, let you distribute a
>>>>>>> modified version without making your changes open source. The
>>>>>>> GPL-compatible licenses make it possible to combine Python with other
>>>>>>> software that is released under the GPL; the others don’t.
>>>>>>>
>>>>>>> Nick
>>>>>>> ​
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> i do not think so.
>>>>>>>>
>>>>>>>> does the python 2.7 need to be installed on all slaves? if so, we
>>>>>>>> do not have direct access to those.
>>>>>>>>
>>>>>>>> also, spark is easy for us to ship with our software since its
>>>>>>>> apache 2 licensed, and it only needs to be present on the machine that
>>>>>>>> launches the app (thanks to yarn).
>>>>>>>> even if python 2.7 was needed only on this one machine that
>>>>>>>> launches the app we can not ship it with our software because its gpl
>>>>>>>> licensed, so the client would have to download it and install it
>>>>>>>> themselves, and this would mean its an independent install which has 
>>>>>>>> to be
>>>>>>>> audited and approved and now you are in for a lot of fun. basically it 
>>>>>>>> will
>>>>>>>> never happen.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <
>>>>>>>> joshro...@databricks.com> wrote:
>>>>>>>>
>>>>>>>>> If users are able to install Spark 2.0 on their RHEL clusters,
>>>>>>>>> then I imagine that they're also capable of installing a standalone 
>>>>>>>>> Python
>>>>>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>>>>>> require any special permissions to install (you don't need root / sudo
>>>>>>>>> access). Does this address the Python versioning concerns for RHEL 
>>>>>>>>> users?
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> yeah, the practical concern is that we have no control over java
>>>>>>>>>> or python version on large company clusters. our current reality for 
>>>>>>>>>> the
>>>>>>>>>> vast majority of them is java 7 and python 2.6, no matter how 
>>>>>>>>>> outdated that
>>>>>>>>>> is.
>>>>>>>>>>
>>>>>>>>>> i dont like it either, but i cannot change it.
>>>>>>>>>>
>>>>>>>>>> we currently don't use pyspark so i have no stake in this, but if
>>>>>>>>>> we did i can assure you we would not upgrade to spark 2.x if python 
>>>>>>>>>> 2.6 was
>>>>>>>>>> dropped. no point in developing something that doesnt run for 
>>>>>>>>>> majority of
>>>>>>>>>> customers.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> As I pointed out in my earlier email, RHEL will support Python
>>>>>>>>>>> 2.6 until 2020. So I'm assuming these large companies will have the 
>>>>>>>>>>> option
>>>>>>>>>>> of riding out Python 2.6 until then.
>>>>>>>>>>>
>>>>>>>>>>> Are we seriously saying that Spark should likewise support
>>>>>>>>>>> Python 2.6 for the next several years? Even though the core Python 
>>>>>>>>>>> devs
>>>>>>>>>>> stopped supporting it in 2013?
>>>>>>>>>>>
>>>>>>>>>>> If that's not what we're suggesting, then when, roughly, can we
>>>>>>>>>>> drop support? What are the criteria?
>>>>>>>>>>>
>>>>>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But 
>>>>>>>>>>> balancing
>>>>>>>>>>> that concern against the maintenance burden on this project, I 
>>>>>>>>>>> would say
>>>>>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>>>>>> position to take. There are many tiny annoyances one has to put up 
>>>>>>>>>>> with to
>>>>>>>>>>> support 2.6.
>>>>>>>>>>>
>>>>>>>>>>> I suppose if our main PySpark contributors are fine putting up
>>>>>>>>>>> with those annoyances, then maybe we don't need to drop support 
>>>>>>>>>>> just yet...
>>>>>>>>>>>
>>>>>>>>>>> Nick
>>>>>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>>>>> ju...@esbet.es>님이 작성:
>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>>>>
>>>>>>>>>>>> I've been in a couple of projects using Spark (banking
>>>>>>>>>>>> industry) where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>>>>
>>>>>>>>>>>> That said, I believe it should not be a concern for Spark.
>>>>>>>>>>>> Python 2.6 is old and busted, which is totally opposite to the 
>>>>>>>>>>>> Spark
>>>>>>>>>>>> philosophy IMO.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>>>> escribió:
>>>>>>>>>>>>
>>>>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>>>>
>>>>>>>>>>>> if so, i still know plenty of large companies where python 2.6
>>>>>>>>>>>> is the only option. asking them for python 2.7 is not going to work
>>>>>>>>>>>>
>>>>>>>>>>>> so i think its a bad idea
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>>>>> juliet.hougl...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python
>>>>>>>>>>>>> 2.6. At this point, Python 3 should be the default that is 
>>>>>>>>>>>>> encouraged.
>>>>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>>>>> behind the version they should theoretically use. Dropping python 
>>>>>>>>>>>>> 2.6
>>>>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core 
>>>>>>>>>>>>>> Python
>>>>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good 
>>>>>>>>>>>>>> enough
>>>>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Nick
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>>>>> allenzhang...@126.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> we are currently using python 2.7.2 in production
>>>>>>>>>>>>>>> environment.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <
>>>>>>>>>>>>>>> meethu.mat...@flytxt.com> 写道:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>>>>> r...@databricks.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does anybody here care about us dropping support for Python
>>>>>>>>>>>>>>>> 2.6 in Spark 2.0?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some 
>>>>>>>>>>>>>>>> libraries that
>>>>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince 
>>>>>>>>>>>>>>>> the library
>>>>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm 
>>>>>>>>>>>>>>>> curious if
>>>>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>>>
>>
>


-- 
Best Regards

Jeff Zhang

Reply via email to