I'm delighted to see energy going into improving the documentation.

With the current documentation, I get a lot of questions that I believe
reflect two fundamental problems with what we currently provide:

(1) We have a lot of contextual information in our heads about how Flink
works, and we are able to use that knowledge to make reasonable inferences
about how things (probably) work in cases we aren't so familiar with. For
example, I get a lot of questions of the form "If I use <this feature> will
I still have exactly once guarantees?" The answer is always yes, but they
continue to have doubts because we have failed to clearly communicate this
fundamental, underlying principle.

This specific example about fault tolerance applies across all of the Flink
docs, but the general idea can also be applied to the Table/SQL and PyFlink
docs. The guiding principles underlying these APIs should be written down
in one easy-to-find place.

(2) The other kind of question I get a lot is "Can I do <X> with <Y>?"
E.g., "Can I use the JDBC table sink from PyFlink?" These questions can be
very difficult to answer because it is frequently the case that one has to
reason about why a given feature doesn't seem to appear in the
documentation. It could be that I'm looking in the wrong place, or it could
be that someone forgot to document something, or it could be that it can in
fact be done by applying a general mechanism in a specific way that I
haven't thought of -- as in this case, where one can use a JDBC sink from
Python if one thinks to use DDL.

So I think it would be helpful to be explicit about both what is, and what
is not, supported in PyFlink. And to have some very clear organizing
principles in the documentation so that users can quickly learn where to
look for specific facts.

Regards,
David


On Tue, Aug 4, 2020 at 1:01 PM jincheng sun <sunjincheng...@gmail.com>
wrote:

> Hi Seth and David,
>
> I'm very happy to have your reply and suggestions. I would like to share
> my thoughts here:
>
> The main motivation we want to refactor the PyFlink doc is that we want to
> make sure that the Python users could find all they want starting from the
> PyFlink documentation mainpage. That’s, the PyFlink documentation should
> have a catalogue which includes all the functionalities available in
> PyFlink. However, this doesn’t mean that we will make a copy of the content
> of the documentation in the other places. It may be just a reference/link
> to the other documentation if needed. For the documentation added under
> PyFlink mainpage, the principle is that it should only include Python
> specific content, instead of making a copy of the Java content.
>
> >>  I'm concerned that this proposal duplicates a lot of content that will
> quickly get out of sync. It feels like it is documenting PyFlink separately
> from the rest of the project.
>
> Regarding the concerns about maintainability, as mentioned above, The goal
> of this FLIP is to provide an intelligible entrance of Python API, and the
> content in it should only contain the information which is useful for
> Python users. There are indeed many agenda items that duplicate the Java
> documents in this FLIP, but it doesn't mean the content would be copied
> from Java documentation. i.e, if the content of the document is the same as
> the corresponding Java document, we will add a link to the Java document.
> e.g. the "Built-in functions" and "SQL". We only create a page for the
> Python-only content, and then redirect to the Java document if there is
> something shared with Java. e.g. "Connectors" and "Catalogs". If the
> document is Python-only and already exists, we will move it from the old
> python document to the new python document, e.g. "Configurations". If the
> document is Python-only and not exists before, we will create a new page
> for it. e.g. "DataTypes".
>
> The main reason we create a new page for Python Data Types is that it is
> only conceptually one-to-one correspondence with Java Data Types, but the
> actual document content would be very different from Java DataTypes. Some
> detailed difference are as following:
>
>
>
>   - The text in the Java Data Types document is written for JVM-based
> language users, which is incomprehensible to users who only understand
> python.
>
>   - Currently the Python Data Types does not support the "bridgedTo"
> method, DataTypes.RAW, DataTypes.NULL and User Defined Types.
>
>   - The section "Planner Compatibility" and "Data Type Extraction" are
> only useful for Java/Scala users.
>
>   - We want to add sections which may only apply for Python such as which
> Data Types are currently supported in Python, the mapping between DataType
> and Python object type, etc.
>
> I think the root cause of such a difference with existing documents is
> that, Python is the first non-JVM language we support in flink. This means
> our previous method of sharing documents between Java and Scala may not be
> suitable for Python. So we will adopt some very different methods to
> provide documentation for Python users. Of course, we should reduce
> maintenance costs as much as possible while ensuring user experience.
> Furthermore, python is the first step of flink multi-language support, and
> there may be R, Go, etc in future. it is very necessary for us to form main
> page for each language, so that users of each type of language can focus on
> the content which they care about.
>
> >> Things like the cookbook and tutorial should be under the Try Flink
> section of the documentation.
>
> Regarding the position of the "Cookbook" section, in my sense the "Try
> Flink" is for the new users and the "Cookbook" is for more advanced users,
> i.e., In “Try Flink” can be the simplest end-to-end example, such as “Hello
> World” and In “Cookbook” we can add more use cases closer to production
> business, Such as, CDN log analysis, PV / UV of e-commerce. So I prefer to
> keep the current structure.
>
> >>  it's relatively straightforward to compare the Python API with the
> Java and Scala versions.
>
> Regarding the comparison between Python API and Java/Scala API, I think
> the majority of users, especially the beginner users, would not have this
> demand. The priority of increasing user experience for beginner users seems
> higher than it from my side. Would you please add more inputs for why user
> want to compare? How much impact will the comparison be if we put it on
> multiple pages :)
>
> Thanks for all of your feedback and suggestions, any follow-up feedback is
> welcome.
>
> Best,
>
> Jincheng
>
>
> David Anderson <da...@alpinegizmo.com> 于2020年8月3日周一 下午10:49写道:
>
>> Jincheng,
>>
>> One thing that I like about the way that the documentation is currently
>> organized is that it's relatively straightforward to compare the Python API
>> with the Java and Scala versions. I'm concerned that if the PyFlink docs
>> are more independent, it will be challenging to respond to questions about
>> which features from the other APIs are available from Python.
>>
>> David
>>
>> On Mon, Aug 3, 2020 at 8:07 AM jincheng sun <sunjincheng...@gmail.com>
>> wrote:
>>
>>> Would be great if you could join the contribution of PyFlink
>>> documentation @Marta !
>>> Thanks for all of the positive feedback. I will start a formal vote then
>>> later...
>>>
>>> Best,
>>> Jincheng
>>>
>>>
>>> Shuiqiang Chen <acqua....@gmail.com> 于2020年8月3日周一 上午9:56写道:
>>>
>>> > Hi jincheng,
>>> >
>>> > Thanks for the discussion. +1 for the FLIP.
>>> >
>>> > A well-organized documentation will greatly improve the efficiency and
>>> > experience for developers.
>>> >
>>> > Best,
>>> > Shuiqiang
>>> >
>>> > Hequn Cheng <he...@apache.org> 于2020年8月1日周六 上午8:42写道:
>>> >
>>> >> Hi Jincheng,
>>> >>
>>> >> Thanks a lot for raising the discussion. +1 for the FLIP.
>>> >>
>>> >> I think this will bring big benefits for the PyFlink users. Currently,
>>> >> the Python TableAPI document is hidden deeply under the TableAPI&SQL
>>> tab
>>> >> which makes it quite unreadable. Also, the PyFlink documentation is
>>> mixed
>>> >> with Java/Scala documentation. It is hard for users to have an
>>> overview of
>>> >> all the PyFlink documents. As more and more functionalities are added
>>> into
>>> >> PyFlink, I think it's time for us to refactor the document.
>>> >>
>>> >> Best,
>>> >> Hequn
>>> >>
>>> >>
>>> >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <
>>> ma...@ververica.com>
>>> >> wrote:
>>> >>
>>> >>> Hi, Jincheng!
>>> >>>
>>> >>> Thanks for creating this detailed FLIP, it will make a big
>>> difference in
>>> >>> the experience of Python developers using Flink. I'm interested in
>>> >>> contributing to this work, so I'll reach out to you offline!
>>> >>>
>>> >>> Also, thanks for sharing some information on the adoption of PyFlink,
>>> >>> it's
>>> >>> great to see that there are already production users.
>>> >>>
>>> >>> Marta
>>> >>>
>>> >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang <hxbks...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> > Hi Jincheng,
>>> >>> >
>>> >>> > Thanks a lot for bringing up this discussion and the proposal.
>>> >>> >
>>> >>> > Big +1 for improving the structure of PyFlink doc.
>>> >>> >
>>> >>> > It will be very friendly to give PyFlink users a unified entrance
>>> to
>>> >>> learn
>>> >>> > PyFlink documents.
>>> >>> >
>>> >>> > Best,
>>> >>> > Xingbo
>>> >>> >
>>> >>> > Dian Fu <dian0511...@gmail.com> 于2020年7月31日周五 上午11:00写道:
>>> >>> >
>>> >>> >> Hi Jincheng,
>>> >>> >>
>>> >>> >> Thanks a lot for bringing up this discussion and the proposal. +1
>>> to
>>> >>> >> improve the Python API doc.
>>> >>> >>
>>> >>> >> I have received many feedbacks from PyFlink beginners about
>>> >>> >> the PyFlink doc, e.g. the materials are too few, the Python doc is
>>> >>> mixed
>>> >>> >> with the Java doc and it's not easy to find the docs he wants to
>>> know.
>>> >>> >>
>>> >>> >> I think it would greatly improve the user experience if we can
>>> have
>>> >>> one
>>> >>> >> place which includes most knowledges PyFlink users should know.
>>> >>> >>
>>> >>> >> Regards,
>>> >>> >> Dian
>>> >>> >>
>>> >>> >> 在 2020年7月31日,上午10:14,jincheng sun <sunjincheng...@gmail.com> 写道:
>>> >>> >>
>>> >>> >> Hi folks,
>>> >>> >>
>>> >>> >> Since the release of Flink 1.11, users of PyFlink have continued
>>> to
>>> >>> grow.
>>> >>> >> As far as I know there are many companies have used PyFlink for
>>> data
>>> >>> >> analysis, operation and maintenance monitoring business has been
>>> put
>>> >>> into
>>> >>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).
>>> According
>>> >>> to
>>> >>> >> the feedback we received, current documentation is not very
>>> friendly
>>> >>> to
>>> >>> >> PyFlink users. There are two shortcomings:
>>> >>> >>
>>> >>> >> - Python related content is mixed in the Java/Scala documentation,
>>> >>> which
>>> >>> >> makes it difficult for users who only focus on PyFlink to read.
>>> >>> >> - There is already a "Python Table API" section in the Table API
>>> >>> document
>>> >>> >> to store PyFlink documents, but the number of articles is small
>>> and
>>> >>> the
>>> >>> >> content is fragmented. It is difficult for beginners to learn
>>> from it.
>>> >>> >>
>>> >>> >> In addition, FLIP-130 introduced the Python DataStream API. Many
>>> >>> >> documents will be added for those new APIs. In order to increase
>>> the
>>> >>> >> readability and maintainability of the PyFlink document, Wei Zhong
>>> >>> and me
>>> >>> >> have discussed offline and would like to rework it via this FLIP.
>>> >>> >>
>>> >>> >> We will rework the document around the following three objectives:
>>> >>> >>
>>> >>> >> - Add a separate section for Python API under the "Application
>>> >>> >> Development" section.
>>> >>> >> - Restructure current Python documentation to a brand new
>>> structure to
>>> >>> >> ensure complete content and friendly to beginners.
>>> >>> >> - Improve the documents shared by Python/Java/Scala to make it
>>> more
>>> >>> >> friendly to Python users and without affecting Java/Scala users.
>>> >>> >>
>>> >>> >> More detail can be found in the FLIP-133:
>>> >>> >>
>>> >>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
>>> >>> >>
>>> >>> >> Best,
>>> >>> >> Jincheng
>>> >>> >>
>>> >>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
>>> >>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>>
>>> >>
>>>
>>

Reply via email to