Re: Python friendly API for Spark 3.0

2018-09-29 Thread Stavros Kontopoulos
Regarding Python 3.x upgrade referenced earlier. Some people already gone down that path of upgrading: https://blogs.dropbox.com/tech/2018/09/how-we-rolled-out-one-of-the-largest-python-3-migrations-ever They describe some good reasons. Stavros On Tue, Sep 18, 2018 at 6:35 PM, Erik Erlandson

Re: Python friendly API for Spark 3.0

2018-09-18 Thread Erik Erlandson
I like the notion of empowering cross platform bindings. The trend of computing frameworks seems to be that all APIs gradually converge on a stable attractor which could be described as "data frames and SQL" Spark's early API design was RDD focused, but these days the center of gravity is all

Re: Python friendly API for Spark 3.0

2018-09-17 Thread Leif Walsh
I agree with Reynold, at some point you’re going to run into the parts of the pandas API that aren’t distributable. More feature parity will be good, but users are still eventually going to hit a feature cliff. Moreover, it’s not just the pandas API that people want to use, but also the set of

Re: Python friendly API for Spark 3.0

2018-09-16 Thread Mark Hamstra
> > difficult to reconcile > That's a big chunk of what I'm getting at: How much is it even possible to do this kind of reconciliation from the underlying implementation to a more normal/expected/friendly API for a given programming environment? How much more work is it for us to maintain

Re: Python friendly API for Spark 3.0

2018-09-16 Thread Reynold Xin
Most of those are pretty difficult to add though, because they are fundamentally difficult to do in a distributed setting and with lazy execution. We should add some but at some point there are fundamental differences between the underlying execution engine that are pretty difficult to reconcile.

Re: Python friendly API for Spark 3.0

2018-09-16 Thread Matei Zaharia
My 2 cents on this is that the biggest room for improvement in Python is similarity to Pandas. We already made the Python DataFrame API different from Scala/Java in some respects, but if there’s anything we can do to make it more obvious to Pandas users, that will help the most. The other issue

Re: Python friendly API for Spark 3.0

2018-09-16 Thread Mark Hamstra
It's not splitting hairs, Erik. It's actually very close to something that I think deserves some discussion (perhaps on a separate thread.) What I've been thinking about also concerns API "friendliness" or style. The original RDD API was very intentionally modeled on the Scala parallel collections

Re: Python friendly API for Spark 3.0

2018-09-15 Thread Jules Damji
+1 I think phasing out EOL of any feature or supported language is a better strategy if possible than a quick drop. With enough admonition, it can gradually be dropped in 3.x— of course, there are exceptions. Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Sep 15,

Re: Python friendly API for Spark 3.0

2018-09-15 Thread Reynold Xin
we can also declare python 2 as deprecated and drop it in 3.x, not necessarily 3.0. -- excuse the brevity and lower case due to wrist injury On Sat, Sep 15, 2018 at 10:33 AM Erik Erlandson wrote: > I am probably splitting hairs to finely, but I was considering the > difference between

Re: Python friendly API for Spark 3.0

2018-09-15 Thread Erik Erlandson
I am probably splitting hairs to finely, but I was considering the difference between improvements to the jvm-side (py4j and the scala/java code) that would make it easier to write the python layer ("python-friendly api"), and actual improvements to the python layers ("friendly python api").

Re: Python friendly API for Spark 3.0

2018-09-15 Thread Leif Walsh
Hey there, Here’s something I proposed recently that’s in this space. https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-24258 It’s motivated by working with a user who wanted to do some custom statistics for which they could write the numpy code, and knew in what dimensions they

Re: Python friendly API for Spark 3.0

2018-09-15 Thread Maciej Szymkiewicz
For the reference I raised question of Python 2 support before - http://apache-spark-developers-list.1001551.n3.nabble.com/Future-of-the-Python-2-support-td20094.html On Sat, 15 Sep 2018 at 15:14, Alexander Shorin wrote: > What's the release due for Apache Spark 3.0? Will it be tomorrow or >

Re: Python friendly API for Spark 3.0

2018-09-15 Thread Alexander Shorin
What's the release due for Apache Spark 3.0? Will it be tomorrow or somewhere at the middle of 2019 year? I think we shouldn't care much about Python 2.x today, since quite soon it support turns into pumpkin. For today's projects I hope nobody takes into account support of 2.7 unless there is

Re: Python friendly API for Spark 3.0

2018-09-15 Thread Maciej Szymkiewicz
There is no need to ditch Python 2. There are basically two options - Use stub files and limit yourself to support only Python 3 support. Python 3 users benefit from type hints, Python 2 users don't, but no core functionality is affected. This is the approach I've used with

Re: Python friendly API for Spark 3.0

2018-09-14 Thread Nicholas Chammas
Do we need to ditch Python 2 support to provide type hints? I don’t think so. Python lets you specify typing stubs that provide the same benefit without forcing Python 3. 2018년 9월 14일 (금) 오후 8:01, Holden Karau 님이 작성: > > > On Fri, Sep 14, 2018, 3:26 PM Erik Erlandson wrote: > >> To be clear,

Re: Python friendly API for Spark 3.0

2018-09-14 Thread Holden Karau
On Fri, Sep 14, 2018, 3:26 PM Erik Erlandson wrote: > To be clear, is this about "python-friendly API" or "friendly python API" ? > Well what would you consider to be different between those two statements? I think it would be good to be a bit more explicit, but I don't think we should

Re: Python friendly API for Spark 3.0

2018-09-14 Thread Erik Erlandson
To be clear, is this about "python-friendly API" or "friendly python API" ? On the python side, it might be nice to take advantage of static typing. Requires python 3.6 but with python 2 going EOL, a spark-3.0 might be a good opportunity to jump the python-3-only train. On Fri, Sep 14, 2018 at