Re: [SURVEY] Usage of flink-python and flink-streaming-python

2018-12-19 Thread Till Rohrmann
Thanks a lot for the feedback for this survey. I will close it now since 6
days have passed without new activity.

To me it seems that we currently don't have many users who use flink-python
or flink-streaming-python because of their limitations (mentioned in the
survey by Xianda). This information might be useful when discussing Flink's
future Python strategy and whether to continue supporting flink-python and
flink-streaming-python in the future.

Cheers,
Till

On Thu, Dec 13, 2018 at 10:50 AM Stephan Ewen  wrote:

> You are right. Let's refocus this on the python user survey and spin out
> another thread.
>
> On Thu, Dec 13, 2018 at 9:56 AM Xianda Ke  wrote:
>
> > Hi Folks,
> > To avoid polluting the survey thread with discussions, we started
> separate
> > thread and maybe we can continue the discussion over there.
> >
> > Regards,
> > Xianda
> >
> > On Wed, Dec 12, 2018 at 3:34 AM Stephan Ewen  wrote:
> >
> > > I like that we are having a general discussion about how to use Python
> > and
> > > Flink together in the future.
> > > The current python support has some shortcomings that were mentioned
> > > before, so we clearly need something better.
> > >
> > > Parts of the community have worked together with the Apache Beam
> project,
> > > which is pretty far in adding a portability layer to support Python.
> > > Before we dive deep into a design proposal for a new Python API in
> > Flink, I
> > > think we should figure out in which general direction Python support
> > should
> > > go.
> > >
> > > *Option (1): Language portability via Apache Beam*
> > >
> > > Pro:
> > >   - already exists to a large extend and already has users
> > >   - portability layer offers other languages in addition to python. Go
> is
> > > in the making, NodeJS has been speculated, etc.
> > >   - collaboration with another project / community which means more
> > > manpower and exposure. Beam currently has a strong focus on Flink as a
> > > runner for Python.
> > >   - Python API is used for existing ML libraries from the TensorFlow
> > > ecosystem
> > >
> > > Con:
> > >   - Not Flink's API. Python users need to learn the syntax of another
> API
> > > (Python API is inherently different, but even more different here).
> > >
> > > *Option (2): Implement own Python API*
> > >
> > > Pro:
> > >   - Python API will be closer to Flink Java / Scala APIs
> > >
> > > Con:
> > >   - We will only have Python.
> > >   - Need to to rebuild the Python language bridge (significant work to
> > get
> > > stable)
> > >   - might lose tight collaboration with Beam and the other parties in
> > Beam
> > >   - not benefiting from Beam's ecosystem
> > >
> > > *Option (3): **Implement own portability layer*
> > >
> > > Pro
> > >   - Flexibility to align APIs across languages within Flink ecosystem
> > >
> > > Con
> > >   - A lot of work (for context, to get this feature complete, Beam has
> > > worked on that for a year now)
> > >   - Replicating work that already exists
> > >   - good chance to lose tight collaboration with Beam and parties in
> that
> > > project
> > >   - not benefiting from Beam's ecosystem
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > On Tue, Dec 11, 2018 at 3:38 PM Thomas Weise  wrote:
> > >
> > > > Did you take a look at Apache Beam? It already provides a
> comprehensive
> > > > Python SDK and can be used with Flink:
> > > > https://beam.apache.org/roadmap/portability/#python-on-flink
> > > >
> > > > We are using it at Lyft for Python streaming pipelines.
> > > >
> > > > Thomas
> > > >
> > > > On Tue, Dec 11, 2018 at 5:54 AM Xianda Ke 
> wrote:
> > > >
> > > > > Hi Till,
> > > > >
> > > > > 1. So far as I know, most of the users at Alibaba are using SQL.
> > Some
> > > of
> > > > > users at Alibaba want integrated python libraries with Flink for
> > > > streaming
> > > > > processing, and Jython is unusable.
> > > > >
> > > > > 2. Python UDFs for SQL:
> > > > > * declaring python UDF based on Alibaba's internal DDL syntax.
> > > > > * start a Python process in open()
> > > > > * communicate with JVM process via Socket.
> > > > > * Yes, it support python libraries, users can upload
> virutalenv/conda
> > > > > Python runtime
> > > > >
> > > > > 3. We've draft a design doc for Python API
> > > > >  [DISCUSS] Flink Python API
> > > > > <
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> > > > > >
> > > > >
> > > > > Python UDF for SQL is not discussed in this documentation, we'll
> > > create a
> > > > > new proposal when the SQL DDL is ready.
> > > > >
> > > > > On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann <
> trohrm...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi Xianda,
> > > > > >
> > > > > > thanks for sharing this detailed feedback. Do I understand you
> > > > correctly
> > > > > > that flink-python and flink-streaming-python are not usable for
> the
> > > use
> > > > > > cases at Alibaba atm?
> > > > > >
> > > > > > Could you sha

[SURVEY] Usage of flink-python and flink-streaming-python

2018-12-07 Thread Till Rohrmann
Dear Flink community,

in order to better understand the needs of our users and to plan for the
future, I wanted to reach out to you and ask how much you use Flink's
Python API, namely flink-python and flink-streaming-python.

In order to gather feedback, I would like to ask all Python users to
respond to this thread and quickly outline how you use Python in
combination with Flink. Thanks a lot for your help!

Cheers,
Till