[jira] [Created] (FLINK-11994) Introduce TableImpl and remove Table in flink-table-planner-blink

2019-03-21 Thread Jark Wu (JIRA)
Jark Wu created FLINK-11994:
---

 Summary: Introduce TableImpl and remove Table in 
flink-table-planner-blink
 Key: FLINK-11994
 URL: https://issues.apache.org/jira/browse/FLINK-11994
 Project: Flink
  Issue Type: New Feature
Reporter: Jark Wu
Assignee: Jark Wu


After FLINK-11068 is merged, the {{Table}} interfaced is added into 
{{flink-table-api-java}}. The classpath is conflicted with {{Table}} in 
{{flink-table-planner-blink}} which result in IDE errors and some tests fail 
(only in my local, looks good in mvn verify). 

This issue make {{Table}} in {{flink-table-planner-blink}} to extends {{Table}} 
in {{flink-table-api-java}} and rename to {{TableImpl}}. We still left the 
methods implementation to be empty until the {{LogicalNode}} is refactored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11993) Introduce partitionable filesystem sink

2019-03-21 Thread Jing Zhang (JIRA)
Jing Zhang created FLINK-11993:
--

 Summary: Introduce partitionable filesystem sink
 Key: FLINK-11993
 URL: https://issues.apache.org/jira/browse/FLINK-11993
 Project: Flink
  Issue Type: Task
  Components: API / Table SQL
Reporter: Jing Zhang


Introduce partitionable filesystem sink,
1. Add partition trait for filesystem connector
2. All the filesystem formats can be declared as partitioned through new DDL 
grammar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Introduction of a Table API Java Expression DSL

2019-03-21 Thread Jark Wu
Hi Timo,

Sounds good to me.

Do you want to deprecate the string-based API in 1.9 or make the decision
in 1.10 after some feedbacks ?


On Thu, 21 Mar 2019 at 21:32, Timo Walther  wrote:

> Thanks for your feedback Rong and Jark.
>
> @Jark: Yes, you are right that the string-based API is used quite a lot.
> On the other side, the potential user base in the future is still bigger
> than our current user base. Because the Table API will become equally
> important as the DataStream API, we really need to fix some crucial design
> decisions before it is too late. I would suggest to introduce the new DSL
> in 1.9 and remove the Expression parser either in 1.10 or 1.11. From a
> developement point of view, I think we can handle the overhead to maintain
> 3 APIs until then because 2 APIs will share the same code base + expression
> parser.
>
> Regards,
> Timo
>
> Am 21.03.19 um 05:21 schrieb Jark Wu:
>
> Hi Timo,
>
> I'm +1 on the proposal. I like the idea to provide a Java DSL which is
> more friendly than string-based approach in programming.
>
> My concern is if/when we can drop the string-based expression parser. If
> it takes a very long time, we have to paid more development
> cost on the three Table APIs. As far as I know, the string-based API is
> used in many companies.
> We should also get some feedbacks from users. So I'm CCing this email to
> user mailing list.
>
> Best,
> Jark
>
>
>
> On Wed, 20 Mar 2019 at 08:51, Rong Rong  wrote:
>
>> Thanks for sharing the initiative of improving Java side Table expression
>> DSL.
>>
>> I agree as in the doc stated that Java DSL was always a "3rd class
>> citizen"
>> and we've run into many hand holding scenarios with our Flink developers
>> trying to get the Stringify syntax working.
>> Overall I am a +1 on this, it also help reduce the development cost of the
>> Table API so that we no longer need to maintain different DSL and
>> documentations.
>>
>> I left a few comments in the doc. and also some features that I think will
>> be beneficial to the final outcome. Please kindly take a look @Timo.
>>
>> Many thanks,
>> Rong
>>
>> On Mon, Mar 18, 2019 at 7:15 AM Timo Walther  wrote:
>>
>> > Hi everyone,
>> >
>> > some of you might have already noticed the JIRA issue that I opened
>> > recently [1] about introducing a proper Java expression DSL for the
>> > Table API. Instead of using string-based expressions, we should aim for
>> > a unified, maintainable, programmatic Java DSL.
>> >
>> > Some background: The Blink merging efforts and the big refactorings as
>> > part of FLIP-32 have revealed many shortcomings in the current Table &
>> > SQL API design. Most of these legacy issues cause problems nowadays in
>> > making the Table API a first-class API next to the DataStream API. An
>> > example is the ExpressionParser class[2]. It was implemented in the
>> > early days of the Table API using Scala parser combinators. During the
>> > last years, this parser caused many JIRA issues and user confusion on
>> > the mailing list. Because the exceptions and syntax might not be
>> > straight forward.
>> >
>> > For FLINK-11908, we added a temporary bridge instead of reimplementing
>> > the parser in Java for FLIP-32. However, this is only a intermediate
>> > solution until we made a final decision.
>> >
>> > I would like to propose a new, parser-free version of the Java Table
>> API:
>> >
>> >
>> >
>> https://docs.google.com/document/d/1r3bfR9R6q5Km0wXKcnhfig2XQ4aMiLG5h2MTx960Fg8/edit?usp=sharing
>> >
>> > I already implemented an early protoype that shows that such a DSL is
>> > not much implementation effort and integrates nicely with all existing
>> > API methods.
>> >
>> > What do you think?
>> >
>> > Thanks for your feedback,
>> >
>> > Timo
>> >
>> > [1] https://issues.apache.org/jira/browse/FLINK-11890
>> >
>> > [2]
>> >
>> >
>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/expressions/PlannerExpressionParserImpl.scala
>> >
>> >
>>
>
>


Re: [DISCUSS] Reorganizing Table-related Jira components some more

2019-03-21 Thread Jark Wu
+1 to Timo's proposal.

Best,
Jark

On Fri, 22 Mar 2019 at 07:42, Kurt Young  wrote:

> +1 to Timo's proposal.
>
> Timo Walther 于2019年3月21日 周四21:40写道:
>
> > Hi everyone,
> >
> > I also tried to summarize the previous discussion and would add an
> > additional `Ecosystem` component. I would suggest:
> >
> > Table SQL / API
> > Table SQL / Client
> > Table SQL / Legacy Planner
> > Table SQL / Planner
> > Table SQL / Runtime
> > Table SQL / Ecosystem (such as table connectors, formats, Hive catalog
> > etc.)
> >
> > This should make everyone happy, no?
> >
> > Thanks for proosing this Aljoscha. Big +1.
> >
> > Regards,
> > Timo
> >
> > Am 21.03.19 um 14:31 schrieb Aljoscha Krettek:
> > > Cool, I like this. I have one last suggestion. How about this:
> > >
> > > Table SQL / API
> > > Table SQL / Client
> > > Table SQL / Classic Planner (or Legacy Planner): Flink Table SQL
> runtime
> > and plan translation.
> > > Table SQL / Planner: plan-related for new Blink-based Table SQL runner.
> > > Table SQL / Runtime: runtime-related for new Blink-based Table SQL
> > runner.
> > >
> > > It’s Jark’s version but I renamed "Table SQL / Operators" to “Table SQL
> > / Runtime", because it is not only operators but all the supporting code
> > around that which is needed at, well, runtime. ;-)
> > >
> > > What do you think?
> > >
> > > Best,
> > > Aljoscha
> > >
> > >
> > >> On 21. Mar 2019, at 03:52, Jark Wu  wrote:
> > >>
> > >> +1 to Kurt's proposal which removes the "API" prefix and add a table's
> > operator component.
> > >>
> > >> In the other hand, I think it's worth to distinguish Blink SQL issues
> > and  Flink SQL issues via the component name. Currently, it's hard to
> > distinguish.
> > >>
> > >> How about:
> > >>
> > >> Table SQL / API
> > >> Table SQL / Client
> > >> Table SQL / Legacy Planner: Flink Table SQL runtime and plan
> > translation.
> > >> Table SQL / New Planner: plan-related for new Blink-based Table SQL
> > runner.
> > >> Table SQL / Operators: runtime-related for new Blink-based Table SQL
> > runner.
> > >>
> > >> Once blink merge is done, we can combine "Table SQL / Legacy Planner"
> > and "Table SQL / New Planner" into Table SQL / Planner".
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >>
> > >> On Thu, 21 Mar 2019 at 10:21, Kurt Young  > ykt...@gmail.com>> wrote:
> > >> Hi Aljoscha,
> > >>
> > >> +1 to further separate table-relate jira components, but I would
> prefer
> > to
> > >> move "Runtime / Operators" to a dedicated "Table SQL / Operators".
> > >> There is one concern about the "classic planner" and "new planner",
> the
> > >> naming will be inaccurate after blink merge done and we deprecated
> > classic
> > >> planner later (if it happens).
> > >> If only one planner left, then what component should we use when
> > creating
> > >> jira?
> > >>
> > >> How about this:
> > >> Table SQL / API
> > >> Table SQL / Client
> > >> Table SQL / Planner
> > >> Table SQL / Operators
> > >>
> > >> Best,
> > >> Kurt
> > >>
> > >>
> > >> On Thu, Mar 21, 2019 at 12:39 AM Aljoscha Krettek <
> aljos...@apache.org
> > >
> > >> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> First of all, I hope I cc’ed all the relevant people. Sorry if I
> forgot
> > >>> anyone.
> > >>>
> > >>> I would like to restructure the Table/SQL-related Jira components a
> bit
> > >>> more to better reflect the current state of components. Right now we
> > have:
> > >>>
> > >>> * API / Table SQL: this is just a wild collection of table-related
> > things
> > >>> * Runtime / Operators: this has general operators stuff, but also new
> > >>> Blink-based Table operator stuff and maybe classic Table runner stuff
> > >>> * SQL / Client: as it says
> > >>> * SQL / Planner: this has issues for the existing classic Flink Table
> > >>> runner and new things related to merging of the new Blink-based Table
> > Runner
> > >>>
> > >>> I would suggest to reorganise it like this:
> > >>>
> > >>> * API / Table SQL: API-related things
> > >>> * API / Table SQL / Client: the SQL client
> > >>> * API / Table SQL / Classic Planner: things related to classic Flink
> > Table
> > >>> API runtime and plan translation, everything to do with execution
> > >>> * API / Table SQL / New Planner: runtime operators, translation,
> > >>> everything really, for the new Blink-based Table API/SQL runner
> > >>>
> > >>> Runtime / Operators would be used purely for non-table-related
> > >>> operator/runtime stuff.
> > >>>
> > >>> What do you think? “Classic Planner” and “New Planner” are up for
> > >>> discussion.  We could even get rid of the API prefix, it doesn’t
> > really
> > >>> do much, I think.
> > >>>
> > >>> Best,
> > >>> Aljoscha
> > >
> >
> > --
> Best,
> Kurt
>


Re: [DISCUSS] Improvement to Flink Window Operator with Slicing

2019-03-21 Thread Rong Rong
Hi Devs,

Thank you all for the valuable feedbacks and comments in the previous
design doc.
We have currently created an initial design/work plan based on @Kurt's
suggestions as "improvements with the sliding window scenario without
changing/adding new public APIs".

Please kindly take a look at the initial design document here [1]. Any
comments or suggestions are highly appreciated!

Thanks,
Rong

--

[1]
https://docs.google.com/document/d/1CvjPJl1Fm1PCpsuuZ4Qc-p_iUzUosBePX_rWNUt8lRw/edit#

On Thu, Feb 28, 2019 at 2:24 PM Rong Rong  wrote:

> Hi Kurt,
>
> Thanks for the valuable feedback. I think the suggestions you and Jincheng
> provided are definitely the best execution plan
>
> - Starting with sliding window optimization by exhausting the current
> public API, there are some components we can leverage or directly reuse
> from Blink's window operator [1] implementation.
> - Backward compatibility is definitely an issue as any state changes will
> probably result in a non-trivial state upgrade. I can definitely follow up
> with you on this.
>
> At the same time I think it is also a good idea to summarize all the use
> cases that has been discussed so far. This can be very valuable as a
> reference: To answer the questions "Does the improvements cover
> most use cases?" and when not covered, "whether introduce some new API can
> meet the requirements?".
>
> I will try to convert the current parent JIRA [2] into one that does not
> include public API alternation. As you mentioned this will be much more
> practical way in terms of execution.
>
> Many thanks for the suggestions and guidance!
>
> --
> Rong
>
> [1]
> https://github.com/apache/flink/blob/blink/flink-libraries/flink-table/src/main/java/org/apache/flink/table/runtime/window/WindowOperator.java
> [2] https://issues.apache.org/jira/browse/FLINK-11454
>
> On Mon, Feb 25, 2019 at 3:44 AM Kurt Young  wrote:
>
>> Hi Rong,
>>
>> Thanks for the detailed summarization! It indeed involves lots of problems
>> and unanswered questions which is i think not practical to
>> solve in one shot. From my point of view, the performance issue with the
>> sliding window is the root one and maybe most possible
>> which user will run into. Thus i think the first question should be
>> answered is:
>> "What kind of improvements we can do with the sliding window scenario?"
>>
>> First, we should try to figure out how many improvements can be done
>> without changing or adding new API. This can reuse the POC you did
>> and the work blink had been done, we can share some ideas and maybe some
>> implementation details. And it already involves lots of efforts
>> even if we only do this one thing. It may introduce some refactory to
>> current window operator, and we should keep it compatible with old
>> version.
>>
>> After this, we can release it in next version and gather some users
>> feedbacks. We can further answer the question: "Does the improvements
>> cover
>> most use cases? Are there any critical ones which is impossible to do with
>> current window operator?". At that time, we can open the discussions to
>> introduce some new API to meet the requirements.
>>
>> It will introduce more work than improve window operator internally when
>> we
>> decide to add new APIs, which you have covered a lot in your proposal.
>> Actually, the approaches you proposed looks good to me, take it step by
>> step is a more practical way.
>>
>> Best,
>> Kurt
>>
>>
>> On Fri, Feb 22, 2019 at 2:58 AM Rong Rong  wrote:
>>
>> > Hi All,
>> >
>> > Thanks for sharing feedbacks for the window optimization design doc and
>> on
>> > the discussion JIRAs @Jincheng, @Kurt, @Jark and @Fabian. These are very
>> > valuable feedbacks and we will try to incorporate them in the next step.
>> >
>> > There were several revision done for the current design doc, and several
>> > POCs being developed since we initially shared the document. Thus, some
>> of
>> > the following might’ve already been addressed in other places. However,
>> I
>> > would still like to summarize them since these points were raised up in
>> > various scenarios.
>> >
>> > 1) Scope of the window optimization using slicing.
>> >
>> > The original scope of the doc was to address the problem of
>> > element-to-window duplication when sliding window goes across wide range
>> > with narrow slides (with or w/o efficient partial aggregations).
>> However,
>> > as we develop the 2 POCs [1,2] we found out there’s really no
>> > one-solution-fits-all approach to how this can be optimized (same
>> > observation as Kurt and Jark mentioned). Thus some further expansion of
>> the
>> > scope was done:
>> >
>> > 1a). Directly improving WindowOperator([1, 3])
>> >
>> > In the design doc, the PartialWindowFunction was designed as a cue for
>> > WindowOperator to choose how to optimize the sliding window. This was
>> not
>> > working well because: how efficient window operator process messages
>> > depends on: 1. the pattern of the window; 2. the pattern 

Re: [DISCUSS] Flink Kerberos Improvement

2019-03-21 Thread Rong Rong
Hi Tao,

Thanks for the comments and suggestions. Yes. I agree that the security
improvement should be properly applied on other cluster management systems
if designed properly.

I am not very familiar with the K8s security setup, but most of the changes
we proposal should be generic enough to apply to all resource management
systems.
Please kindly take a look at one of implementation [1] of another the
design initiative [2] we had. It would be great if you can provide any
additional comments or suggestions on that design doc as well.

Many Thanks,
Rong

--

[1] https://issues.apache.org/jira/browse/FLINK-11589
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html

On Thu, Mar 21, 2019 at 7:00 AM 杨弢(杨弢)  wrote:

>
> Hi, all!
> We have met some similar security requirements and did some investigation
> on security strategies, the third strategy (AM keytab distributed via YARN;
> AM regenerates delegation tokens for containers.) mentioned in YARN
> security doc is already used by Spark1.5+ and we quite agree with that it's
> necessary to be supported in Flink. Moreover, we would like to see the
> security improvements in Flink can be properly applied on other resource
> management systems like k8s etc. (BTW. we have did some work to let Flink
> application natively run on k8s cluster). We are going to do some work on
> this and hope it can help for finding a more generic solution. Thanks!
> Tao Yang
>
>
> --
> 发件人:Rong Rong 
> 发送时间:2018年12月19日(星期三) 03:06
> 收件人:dev 
> 主 题:Re: [DISCUSS] Flink Kerberos Improvement
>
> Hi Shuyi,
>
> Yes. I think the impersonation is a very much valid question! This can
> actually be considered as 2 questions as I stated in the doc.
> 1. In the doc I stated that impersonation should be implemented on the
> user-side code and should only invoke the cluster client as the actual user
> joe'.
> 2. However, since currently the cluster client assumes no impersonation at
> all, many of the code assumes that a fully authorized client can be
> instantiated with the same authority that the actual Flink cluster has.
> When impersonation is enabled, this might not be the case. For example, if
> impersonation is in place, most likely the cluster client running on joe's
> behalf will not, and should not have access to keytab file of 'joe'.
> Instead, a delegation token is used. Thus the second part of the doc is
> trying to address this issue.
>
> --
> Rong
>
> On Mon, Dec 17, 2018 at 11:41 PM Shuyi Chen  wrote:
>
> > Hi Rong, thanks a lot for the proposal. Currently, Flink assume the
> keytab
> > is located in a remote DFS. Pre-installing Keytabs statically in YARN
> node
> > local filesystem is a common approach, so I think we should support this
> > mode in Flink natively. As an optimazation to reduce the KDC access
> > frequency, we should also support method 3 (the DT approach) as discussed
> > in [1]. A question is that why do we need to implement impersonation in
> > Flink? I assume the superuser can do the impersonation for 'joe' and
> 'joe'
> > can then invoke Flink client to deploy the job. Thanks a lot.
> >
> > Shuyi
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/10V7LiNlUJKeKZ58mkR7oVv1t6BrC6TZi3FGf2Dm6-i8/edit
> >
> > On Mon, Dec 17, 2018 at 5:49 PM Rong Rong  wrote:
> >
> > > Hi All,
> > >
> > > We have been experimenting integration of Kerberos with Flink in our
> Corp
> > > environment and found out some limitations on the current
> Flink-Kerberos
> > > security mechanism running with Apache YARN.
> > >
> > > Based on the Hadoop Kerberos security guide [1]. Apparently there are
> > only
> > > a subset of the suggested long-running service security mechanism is
> > > supported in Flink. Furthermore, the current model does not work well
> > with
> > > superuser impersonating actual users [2] for deployment purposes, which
> > is
> > > a widely adopted way to launch application in corp environments.
> > >
> > > We would like to propose an improvement [3] to introduce the other
> > comment
> > > methods [1] for securing long-running application on YARN and enable
> > > impersonation mode. Any comments and suggestions are highly
> appreciated.
> > >
> > > Many thanks,
> > > Rong
> > >
> > > [1]
> > >
> > >
> >
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html#Securing_Long-lived_YARN_Services
> > > [2]
> > >
> > >
> >
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html
> > > [3]
> > >
> > >
> >
> https://docs.google.com/document/d/1rBLCpyQKg6Ld2P0DEgv4VIOMTwv4sitd7h7P5r202IE/edit?usp=sharing
> > >
> >
> >
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>


Re: [DISCUSS] Reorganizing Table-related Jira components some more

2019-03-21 Thread Kurt Young
+1 to Timo's proposal.

Timo Walther 于2019年3月21日 周四21:40写道:

> Hi everyone,
>
> I also tried to summarize the previous discussion and would add an
> additional `Ecosystem` component. I would suggest:
>
> Table SQL / API
> Table SQL / Client
> Table SQL / Legacy Planner
> Table SQL / Planner
> Table SQL / Runtime
> Table SQL / Ecosystem (such as table connectors, formats, Hive catalog
> etc.)
>
> This should make everyone happy, no?
>
> Thanks for proosing this Aljoscha. Big +1.
>
> Regards,
> Timo
>
> Am 21.03.19 um 14:31 schrieb Aljoscha Krettek:
> > Cool, I like this. I have one last suggestion. How about this:
> >
> > Table SQL / API
> > Table SQL / Client
> > Table SQL / Classic Planner (or Legacy Planner): Flink Table SQL runtime
> and plan translation.
> > Table SQL / Planner: plan-related for new Blink-based Table SQL runner.
> > Table SQL / Runtime: runtime-related for new Blink-based Table SQL
> runner.
> >
> > It’s Jark’s version but I renamed "Table SQL / Operators" to “Table SQL
> / Runtime", because it is not only operators but all the supporting code
> around that which is needed at, well, runtime. ;-)
> >
> > What do you think?
> >
> > Best,
> > Aljoscha
> >
> >
> >> On 21. Mar 2019, at 03:52, Jark Wu  wrote:
> >>
> >> +1 to Kurt's proposal which removes the "API" prefix and add a table's
> operator component.
> >>
> >> In the other hand, I think it's worth to distinguish Blink SQL issues
> and  Flink SQL issues via the component name. Currently, it's hard to
> distinguish.
> >>
> >> How about:
> >>
> >> Table SQL / API
> >> Table SQL / Client
> >> Table SQL / Legacy Planner: Flink Table SQL runtime and plan
> translation.
> >> Table SQL / New Planner: plan-related for new Blink-based Table SQL
> runner.
> >> Table SQL / Operators: runtime-related for new Blink-based Table SQL
> runner.
> >>
> >> Once blink merge is done, we can combine "Table SQL / Legacy Planner"
> and "Table SQL / New Planner" into Table SQL / Planner".
> >>
> >> Best,
> >> Jark
> >>
> >>
> >> On Thu, 21 Mar 2019 at 10:21, Kurt Young  ykt...@gmail.com>> wrote:
> >> Hi Aljoscha,
> >>
> >> +1 to further separate table-relate jira components, but I would prefer
> to
> >> move "Runtime / Operators" to a dedicated "Table SQL / Operators".
> >> There is one concern about the "classic planner" and "new planner", the
> >> naming will be inaccurate after blink merge done and we deprecated
> classic
> >> planner later (if it happens).
> >> If only one planner left, then what component should we use when
> creating
> >> jira?
> >>
> >> How about this:
> >> Table SQL / API
> >> Table SQL / Client
> >> Table SQL / Planner
> >> Table SQL / Operators
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Thu, Mar 21, 2019 at 12:39 AM Aljoscha Krettek  >
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> First of all, I hope I cc’ed all the relevant people. Sorry if I forgot
> >>> anyone.
> >>>
> >>> I would like to restructure the Table/SQL-related Jira components a bit
> >>> more to better reflect the current state of components. Right now we
> have:
> >>>
> >>> * API / Table SQL: this is just a wild collection of table-related
> things
> >>> * Runtime / Operators: this has general operators stuff, but also new
> >>> Blink-based Table operator stuff and maybe classic Table runner stuff
> >>> * SQL / Client: as it says
> >>> * SQL / Planner: this has issues for the existing classic Flink Table
> >>> runner and new things related to merging of the new Blink-based Table
> Runner
> >>>
> >>> I would suggest to reorganise it like this:
> >>>
> >>> * API / Table SQL: API-related things
> >>> * API / Table SQL / Client: the SQL client
> >>> * API / Table SQL / Classic Planner: things related to classic Flink
> Table
> >>> API runtime and plan translation, everything to do with execution
> >>> * API / Table SQL / New Planner: runtime operators, translation,
> >>> everything really, for the new Blink-based Table API/SQL runner
> >>>
> >>> Runtime / Operators would be used purely for non-table-related
> >>> operator/runtime stuff.
> >>>
> >>> What do you think? “Classic Planner” and “New Planner” are up for
> >>> discussion.  We could even get rid of the API prefix, it doesn’t
> really
> >>> do much, I think.
> >>>
> >>> Best,
> >>> Aljoscha
> >
>
> --
Best,
Kurt


Re: [DISCUSS] Create a Flink ecosystem website

2019-03-21 Thread Becket Qin
Thanks for the update Robert! Looking forward to the prototype!

On Thu, Mar 21, 2019 at 10:07 PM Robert Metzger  wrote:

> Quick summary of our call:
> Daryl will soon start with a front end, build against a very simple
> mock-backend.
> Congxian will start implementing the Spring-based backend early April.
>
> As soon as the first prototype of the UI is ready, we'll share it here for
> feedback.
>
> On Thu, Mar 21, 2019 at 10:08 AM Robert Metzger 
> wrote:
>
> > Okay, great.
> >
> > Congxian Qiu, Daryl and I have a kick-off call later today at 2pm CET,
> 9pm
> > China time about the design of the ecosystem page (see:
> > https://github.com/rmetzger/flink-community-tools/issues/4)
> > Please let me know if others want to join as well, I can add them to the
> > invite.
> >
> > On Wed, Mar 20, 2019 at 4:10 AM Becket Qin  wrote:
> >
> >> I agree. We can start with english-only and see how it goes. The
> comments
> >> and descriptions can always be multi-lingual but that is up to the
> package
> >> owners.
> >>
> >> On Tue, Mar 19, 2019 at 6:07 PM Robert Metzger 
> >> wrote:
> >>
> >>> Thanks.
> >>>
> >>> Do we actually want this page to be multi-language?
> >>>
> >>> I propose to make the website english-only, but maybe consider allowing
> >>> comments in different languages.
> >>> If we would make it multi-language, then we might have problems with
> >>> people submitting packages in non-english languages.
> >>>
> >>>
> >>>
> >>> On Tue, Mar 19, 2019 at 2:42 AM Becket Qin 
> wrote:
> >>>
>  Done. The writeup looks great!
> 
>  On Mon, Mar 18, 2019 at 9:09 PM Robert Metzger 
>  wrote:
> 
> > Nice, really good news on the INFRA front!
> > I think the hardware specs sound reasonable. And a periodic backup of
> > the website's database to Infra's backup solution sounds reasonable
> too.
> >
> > Can you accept and review my proposal for the website?
> >
> >
> > On Sat, Mar 16, 2019 at 3:47 PM Becket Qin 
> > wrote:
> >
> >> >
> >> > I have a very capable and motivated frontend developer who would
> be
> >> > willing to implement what I've mocked in my proposal.
> >>
> >>
> >> That is awesome!
> >>
> >> I created a Jira ticket[1] to Apache Infra and got the reply. It
> >> looks that
> >> Apache infra team could provide a decent VM. The last piece is how
> to
> >> ensure the data is persisted so we won't lose the project info /
> user
> >> feedbacks when the VM is down. If Apache infra does not provide a
> >> persistent storage for DB backup, we can always ask for multiple VMs
> >> and do
> >> the fault tolerance by ourselves. It seems we can almost say the
> >> hardware
> >> side is also ready.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> [1] https://issues.apache.org/jira/browse/INFRA-18010
> >>
> >> On Fri, Mar 15, 2019 at 5:39 PM Robert Metzger  >
> >> wrote:
> >>
> >> > Thank you for reaching out to Infra and the ember client.
> >> > When I first saw the Ember repository, I thought it is the whole
> >> thing
> >> > (frontend and backend), but while testing it, I realized it is
> >> "only" the
> >> > frontend. I'm not sure if it makes sense to adjust the Ember
> >> observer
> >> > client, or just write a simple UI from scratch.
> >> > I have a very capable and motivated frontend developer who would
> be
> >> > willing to implement what I've mocked in my proposal.
> >> > In addition, I found somebody (Congxian Qiu) who seems to be eager
> >> to help
> >> > with this project for the backend:
> >> > https://github.com/rmetzger/flink-community-tools/issues/4
> >> >
> >> > For Infra: I made the same experience when asking for more GitHub
> >> > permissions for "flinkbot": They didn't respond on their mailing
> >> list, only
> >> > on Jira.
> >> >
> >> >
> >> >
> >> > On Thu, Mar 14, 2019 at 2:45 PM Becket Qin 
> >> wrote:
> >> >
> >> >> Thanks for writing up the specifications.
> >> >>
> >> >> Regarding the website source code, Austin found a website[1]
> whose
> >> >> frontend code[2] is available publicly. It lacks some support
> (e.g
> >> login),
> >> >> but it is still a good starting point. One thing is that I did
> not
> >> find a
> >> >> License statement for that source code. I'll reach out to the
> >> author to see
> >> >> if they have any concern over our usage.
> >> >>
> >> >> Apache Infra has not replied to my email regarding some details
> >> about the
> >> >> VM. I'll open an infra Jira ticket tomorrow if there is still no
> >> response.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Jiangjie (Becket) Qin
> >> >>
> >> >> [1] https://emberobserver.com/
> >> >> [2] https://github.com/emberobserver/client
> >> >>
> >> >>
> >> >>
> 

[VOTE] Release 1.8.0, release candidate #4

2019-03-21 Thread Aljoscha Krettek
Hi everyone,
Please review and vote on the release candidate 4 for Flink 1.8.0, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be 
deployed to dist.apache.org [2], which are signed with the key with fingerprint 
F2A67A8047499BBB3908D17AA8F4FD97121D7293 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.8.0-rc4" [5],
* website pull request listing the new release [6]
* website pull request adding announcement blog post [7].

The vote will be open for at least 72 hours. It is adopted by majority 
approval, with at least 3 PMC affirmative votes.

Thanks,
Aljoscha

[1] 
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12344274
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc4/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1215
[5] 
https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=c650befc10c8bb6cc4b007ae250b7b2173046145
[6] https://github.com/apache/flink-web/pull/180 

[7] https://github.com/apache/flink-web/pull/179 


P.S. The difference to the previous RCs is small, you can fetch the tags and do 
a "git log release-1.8.0-rc1..release-1.8.0-rc4” to see the difference in 
commits. Its fixes for the issues that led to the cancellation of the previous 
RCs plus smaller fixes. Most verification/testing that was carried out should 
apply as is to this RC. Any functional verification that you did on previous 
RCs should therefore easily carry over to this one.

Re: [REMINDER] Flink Forward San Francisco in a few days

2019-03-21 Thread Robert Metzger
I would like to add that the organizers of the conference have agreed to
offer all Apache committers (of any Apache project) a free ticket.
*To get your free ticket, use the "ASFCommitters19” promo code AND use your
@apache.org  email when registering.*

Feel free to reach out to me directly if you have any questions!

I'm looking forward to see as many Apache committers as possible at the
conference, to discuss potential inter-project collaboration, learn from
each other, ...




On Wed, Mar 20, 2019 at 11:03 AM Fabian Hueske  wrote:

> Hi everyone,
>
> *Flink Forward San Francisco 2019 will take place in a few days on April
> 1st and 2nd.*
> If you haven't done so already and are planning to attend, you should
> register soon at:
>
> -> https://sf-2019.flink-forward.org/register
>
> Don't forget to use the 25% discount code *MailingList* for mailing list
> subscribers.
>
> If you are still undecided, check out the conference program [1] of
> exciting talks by speakers from Airbnb, Google, Lyft, Netflix, Splunk,
> Streamlio, Uber, Yelp, and Alibaba.
>
> Hope to see you there,
> Fabian
>
> [1] https://sf-2019.flink-forward.org/conference-program
>


Re: [DISCUSS] Create a Flink ecosystem website

2019-03-21 Thread Robert Metzger
Quick summary of our call:
Daryl will soon start with a front end, build against a very simple
mock-backend.
Congxian will start implementing the Spring-based backend early April.

As soon as the first prototype of the UI is ready, we'll share it here for
feedback.

On Thu, Mar 21, 2019 at 10:08 AM Robert Metzger  wrote:

> Okay, great.
>
> Congxian Qiu, Daryl and I have a kick-off call later today at 2pm CET, 9pm
> China time about the design of the ecosystem page (see:
> https://github.com/rmetzger/flink-community-tools/issues/4)
> Please let me know if others want to join as well, I can add them to the
> invite.
>
> On Wed, Mar 20, 2019 at 4:10 AM Becket Qin  wrote:
>
>> I agree. We can start with english-only and see how it goes. The comments
>> and descriptions can always be multi-lingual but that is up to the package
>> owners.
>>
>> On Tue, Mar 19, 2019 at 6:07 PM Robert Metzger 
>> wrote:
>>
>>> Thanks.
>>>
>>> Do we actually want this page to be multi-language?
>>>
>>> I propose to make the website english-only, but maybe consider allowing
>>> comments in different languages.
>>> If we would make it multi-language, then we might have problems with
>>> people submitting packages in non-english languages.
>>>
>>>
>>>
>>> On Tue, Mar 19, 2019 at 2:42 AM Becket Qin  wrote:
>>>
 Done. The writeup looks great!

 On Mon, Mar 18, 2019 at 9:09 PM Robert Metzger 
 wrote:

> Nice, really good news on the INFRA front!
> I think the hardware specs sound reasonable. And a periodic backup of
> the website's database to Infra's backup solution sounds reasonable too.
>
> Can you accept and review my proposal for the website?
>
>
> On Sat, Mar 16, 2019 at 3:47 PM Becket Qin 
> wrote:
>
>> >
>> > I have a very capable and motivated frontend developer who would be
>> > willing to implement what I've mocked in my proposal.
>>
>>
>> That is awesome!
>>
>> I created a Jira ticket[1] to Apache Infra and got the reply. It
>> looks that
>> Apache infra team could provide a decent VM. The last piece is how to
>> ensure the data is persisted so we won't lose the project info / user
>> feedbacks when the VM is down. If Apache infra does not provide a
>> persistent storage for DB backup, we can always ask for multiple VMs
>> and do
>> the fault tolerance by ourselves. It seems we can almost say the
>> hardware
>> side is also ready.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> [1] https://issues.apache.org/jira/browse/INFRA-18010
>>
>> On Fri, Mar 15, 2019 at 5:39 PM Robert Metzger 
>> wrote:
>>
>> > Thank you for reaching out to Infra and the ember client.
>> > When I first saw the Ember repository, I thought it is the whole
>> thing
>> > (frontend and backend), but while testing it, I realized it is
>> "only" the
>> > frontend. I'm not sure if it makes sense to adjust the Ember
>> observer
>> > client, or just write a simple UI from scratch.
>> > I have a very capable and motivated frontend developer who would be
>> > willing to implement what I've mocked in my proposal.
>> > In addition, I found somebody (Congxian Qiu) who seems to be eager
>> to help
>> > with this project for the backend:
>> > https://github.com/rmetzger/flink-community-tools/issues/4
>> >
>> > For Infra: I made the same experience when asking for more GitHub
>> > permissions for "flinkbot": They didn't respond on their mailing
>> list, only
>> > on Jira.
>> >
>> >
>> >
>> > On Thu, Mar 14, 2019 at 2:45 PM Becket Qin 
>> wrote:
>> >
>> >> Thanks for writing up the specifications.
>> >>
>> >> Regarding the website source code, Austin found a website[1] whose
>> >> frontend code[2] is available publicly. It lacks some support (e.g
>> login),
>> >> but it is still a good starting point. One thing is that I did not
>> find a
>> >> License statement for that source code. I'll reach out to the
>> author to see
>> >> if they have any concern over our usage.
>> >>
>> >> Apache Infra has not replied to my email regarding some details
>> about the
>> >> VM. I'll open an infra Jira ticket tomorrow if there is still no
>> response.
>> >>
>> >> Thanks,
>> >>
>> >> Jiangjie (Becket) Qin
>> >>
>> >> [1] https://emberobserver.com/
>> >> [2] https://github.com/emberobserver/client
>> >>
>> >>
>> >>
>> >> On Thu, Mar 14, 2019 at 1:35 AM Robert Metzger <
>> rmetz...@apache.org>
>> >> wrote:
>> >>
>> >>> @Bowen: I agree. Confluent Hub looks nicer, but it is on their
>> company
>> >>> website. I guess the likelihood that they give out code from
>> their company
>> >>> website is fairly low.
>> >>> @Nils: Beam's page is similar to our Ecosystem page, which we'll
>> >>> 

回复:[DISCUSS] Flink Kerberos Improvement

2019-03-21 Thread 杨弢(杨弢)

Hi, all!
We have met some similar security requirements and did some investigation on 
security strategies, the third strategy (AM keytab distributed via YARN; AM 
regenerates delegation tokens for containers.) mentioned in YARN security doc 
is already used by Spark1.5+ and we quite agree with that it's necessary to be 
supported in Flink. Moreover, we would like to see the security improvements in 
Flink can be properly applied on other resource management systems like k8s 
etc. (BTW. we have did some work to let Flink application natively run on k8s 
cluster). We are going to do some work on this and hope it can help for finding 
a more generic solution. Thanks!
Tao Yang


--
发件人:Rong Rong 
发送时间:2018年12月19日(星期三) 03:06
收件人:dev 
主 题:Re: [DISCUSS] Flink Kerberos Improvement

Hi Shuyi,

Yes. I think the impersonation is a very much valid question! This can
actually be considered as 2 questions as I stated in the doc.
1. In the doc I stated that impersonation should be implemented on the
user-side code and should only invoke the cluster client as the actual user
joe'.
2. However, since currently the cluster client assumes no impersonation at
all, many of the code assumes that a fully authorized client can be
instantiated with the same authority that the actual Flink cluster has.
When impersonation is enabled, this might not be the case. For example, if
impersonation is in place, most likely the cluster client running on joe's
behalf will not, and should not have access to keytab file of 'joe'.
Instead, a delegation token is used. Thus the second part of the doc is
trying to address this issue.

--
Rong

On Mon, Dec 17, 2018 at 11:41 PM Shuyi Chen  wrote:

> Hi Rong, thanks a lot for the proposal. Currently, Flink assume the keytab
> is located in a remote DFS. Pre-installing Keytabs statically in YARN node
> local filesystem is a common approach, so I think we should support this
> mode in Flink natively. As an optimazation to reduce the KDC access
> frequency, we should also support method 3 (the DT approach) as discussed
> in [1]. A question is that why do we need to implement impersonation in
> Flink? I assume the superuser can do the impersonation for 'joe' and 'joe'
> can then invoke Flink client to deploy the job. Thanks a lot.
>
> Shuyi
>
> [1]
>
> https://docs.google.com/document/d/10V7LiNlUJKeKZ58mkR7oVv1t6BrC6TZi3FGf2Dm6-i8/edit
>
> On Mon, Dec 17, 2018 at 5:49 PM Rong Rong  wrote:
>
> > Hi All,
> >
> > We have been experimenting integration of Kerberos with Flink in our Corp
> > environment and found out some limitations on the current Flink-Kerberos
> > security mechanism running with Apache YARN.
> >
> > Based on the Hadoop Kerberos security guide [1]. Apparently there are
> only
> > a subset of the suggested long-running service security mechanism is
> > supported in Flink. Furthermore, the current model does not work well
> with
> > superuser impersonating actual users [2] for deployment purposes, which
> is
> > a widely adopted way to launch application in corp environments.
> >
> > We would like to propose an improvement [3] to introduce the other
> comment
> > methods [1] for securing long-running application on YARN and enable
> > impersonation mode. Any comments and suggestions are highly appreciated.
> >
> > Many thanks,
> > Rong
> >
> > [1]
> >
> >
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html#Securing_Long-lived_YARN_Services
> > [2]
> >
> >
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html
> > [3]
> >
> >
> https://docs.google.com/document/d/1rBLCpyQKg6Ld2P0DEgv4VIOMTwv4sitd7h7P5r202IE/edit?usp=sharing
> >
>
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>


Re: [DISCUSS] Reorganizing Table-related Jira components some more

2019-03-21 Thread Timo Walther

Hi everyone,

I also tried to summarize the previous discussion and would add an 
additional `Ecosystem` component. I would suggest:


Table SQL / API
Table SQL / Client
Table SQL / Legacy Planner
Table SQL / Planner
Table SQL / Runtime
Table SQL / Ecosystem (such as table connectors, formats, Hive catalog etc.)

This should make everyone happy, no?

Thanks for proosing this Aljoscha. Big +1.

Regards,
Timo

Am 21.03.19 um 14:31 schrieb Aljoscha Krettek:

Cool, I like this. I have one last suggestion. How about this:

Table SQL / API
Table SQL / Client
Table SQL / Classic Planner (or Legacy Planner): Flink Table SQL runtime and 
plan translation.
Table SQL / Planner: plan-related for new Blink-based Table SQL runner.
Table SQL / Runtime: runtime-related for new Blink-based Table SQL runner.

It’s Jark’s version but I renamed "Table SQL / Operators" to “Table SQL / 
Runtime", because it is not only operators but all the supporting code around that which 
is needed at, well, runtime. ;-)

What do you think?

Best,
Aljoscha



On 21. Mar 2019, at 03:52, Jark Wu  wrote:

+1 to Kurt's proposal which removes the "API" prefix and add a table's operator 
component.

In the other hand, I think it's worth to distinguish Blink SQL issues and  
Flink SQL issues via the component name. Currently, it's hard to distinguish.

How about:

Table SQL / API
Table SQL / Client
Table SQL / Legacy Planner: Flink Table SQL runtime and plan translation.
Table SQL / New Planner: plan-related for new Blink-based Table SQL runner.
Table SQL / Operators: runtime-related for new Blink-based Table SQL runner.

Once blink merge is done, we can combine "Table SQL / Legacy Planner" and "Table SQL / 
New Planner" into Table SQL / Planner".

Best,
Jark


On Thu, 21 Mar 2019 at 10:21, Kurt Young mailto:ykt...@gmail.com>> wrote:
Hi Aljoscha,

+1 to further separate table-relate jira components, but I would prefer to
move "Runtime / Operators" to a dedicated "Table SQL / Operators".
There is one concern about the "classic planner" and "new planner", the
naming will be inaccurate after blink merge done and we deprecated classic
planner later (if it happens).
If only one planner left, then what component should we use when creating
jira?

How about this:
Table SQL / API
Table SQL / Client
Table SQL / Planner
Table SQL / Operators

Best,
Kurt


On Thu, Mar 21, 2019 at 12:39 AM Aljoscha Krettek mailto:aljos...@apache.org>>
wrote:


Hi,

First of all, I hope I cc’ed all the relevant people. Sorry if I forgot
anyone.

I would like to restructure the Table/SQL-related Jira components a bit
more to better reflect the current state of components. Right now we have:

* API / Table SQL: this is just a wild collection of table-related things
* Runtime / Operators: this has general operators stuff, but also new
Blink-based Table operator stuff and maybe classic Table runner stuff
* SQL / Client: as it says
* SQL / Planner: this has issues for the existing classic Flink Table
runner and new things related to merging of the new Blink-based Table Runner

I would suggest to reorganise it like this:

* API / Table SQL: API-related things
* API / Table SQL / Client: the SQL client
* API / Table SQL / Classic Planner: things related to classic Flink Table
API runtime and plan translation, everything to do with execution
* API / Table SQL / New Planner: runtime operators, translation,
everything really, for the new Blink-based Table API/SQL runner

Runtime / Operators would be used purely for non-table-related
operator/runtime stuff.

What do you think? “Classic Planner” and “New Planner” are up for
discussion.  We could even get rid of the API prefix, it doesn’t really
do much, I think.

Best,
Aljoscha






Re: [DISCUSS] Introduction of a Table API Java Expression DSL

2019-03-21 Thread Timo Walther

Thanks for your feedback Rong and Jark.

@Jark: Yes, you are right that the string-based API is used quite a lot. 
On the other side, the potential user base in the future is still bigger 
than our current user base. Because the Table API will become equally 
important as the DataStream API, we really need to fix some crucial 
design decisions before it is too late. I would suggest to introduce the 
new DSL in 1.9 and remove the Expression parser either in 1.10 or 1.11. 
From a developement point of view, I think we can handle the overhead 
to maintain 3 APIs until then because 2 APIs will share the same code 
base + expression parser.


Regards,
Timo

Am 21.03.19 um 05:21 schrieb Jark Wu:

Hi Timo,

I'm +1 on the proposal. I like the idea to provide a Java DSL which is 
more friendly than string-based approach in programming.


My concern is if/when we can drop the string-based expression parser. 
If it takes a very long time, we have to paid more development
cost on the three Table APIs. As far as I know, the string-based API 
is used in many companies.
We should also get some feedbacks from users. So I'm CCing this email 
to user mailing list.


Best,
Jark



On Wed, 20 Mar 2019 at 08:51, Rong Rong > wrote:


Thanks for sharing the initiative of improving Java side Table
expression
DSL.

I agree as in the doc stated that Java DSL was always a "3rd class
citizen"
and we've run into many hand holding scenarios with our Flink
developers
trying to get the Stringify syntax working.
Overall I am a +1 on this, it also help reduce the development
cost of the
Table API so that we no longer need to maintain different DSL and
documentations.

I left a few comments in the doc. and also some features that I
think will
be beneficial to the final outcome. Please kindly take a look @Timo.

Many thanks,
Rong

On Mon, Mar 18, 2019 at 7:15 AM Timo Walther mailto:twal...@apache.org>> wrote:

> Hi everyone,
>
> some of you might have already noticed the JIRA issue that I opened
> recently [1] about introducing a proper Java expression DSL for the
> Table API. Instead of using string-based expressions, we should
aim for
> a unified, maintainable, programmatic Java DSL.
>
> Some background: The Blink merging efforts and the big
refactorings as
> part of FLIP-32 have revealed many shortcomings in the current
Table &
> SQL API design. Most of these legacy issues cause problems
nowadays in
> making the Table API a first-class API next to the DataStream
API. An
> example is the ExpressionParser class[2]. It was implemented in the
> early days of the Table API using Scala parser combinators.
During the
> last years, this parser caused many JIRA issues and user
confusion on
> the mailing list. Because the exceptions and syntax might not be
> straight forward.
>
> For FLINK-11908, we added a temporary bridge instead of
reimplementing
> the parser in Java for FLIP-32. However, this is only a intermediate
> solution until we made a final decision.
>
> I would like to propose a new, parser-free version of the Java
Table API:
>
>
>

https://docs.google.com/document/d/1r3bfR9R6q5Km0wXKcnhfig2XQ4aMiLG5h2MTx960Fg8/edit?usp=sharing
>
> I already implemented an early protoype that shows that such a
DSL is
> not much implementation effort and integrates nicely with all
existing
> API methods.
>
> What do you think?
>
> Thanks for your feedback,
>
> Timo
>
> [1] https://issues.apache.org/jira/browse/FLINK-11890
>
> [2]
>
>

https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/expressions/PlannerExpressionParserImpl.scala
>
>





Re: [DISCUSS] Reorganizing Table-related Jira components some more

2019-03-21 Thread Aljoscha Krettek
Cool, I like this. I have one last suggestion. How about this:

Table SQL / API
Table SQL / Client
Table SQL / Classic Planner (or Legacy Planner): Flink Table SQL runtime and 
plan translation. 
Table SQL / Planner: plan-related for new Blink-based Table SQL runner.
Table SQL / Runtime: runtime-related for new Blink-based Table SQL runner.

It’s Jark’s version but I renamed "Table SQL / Operators" to “Table SQL / 
Runtime", because it is not only operators but all the supporting code around 
that which is needed at, well, runtime. ;-)

What do you think?

Best,
Aljoscha


> On 21. Mar 2019, at 03:52, Jark Wu  wrote:
> 
> +1 to Kurt's proposal which removes the "API" prefix and add a table's 
> operator component. 
> 
> In the other hand, I think it's worth to distinguish Blink SQL issues and  
> Flink SQL issues via the component name. Currently, it's hard to distinguish.
> 
> How about:
> 
> Table SQL / API
> Table SQL / Client
> Table SQL / Legacy Planner: Flink Table SQL runtime and plan translation. 
> Table SQL / New Planner: plan-related for new Blink-based Table SQL runner.
> Table SQL / Operators: runtime-related for new Blink-based Table SQL runner.
> 
> Once blink merge is done, we can combine "Table SQL / Legacy Planner" and 
> "Table SQL / New Planner" into Table SQL / Planner". 
> 
> Best,
> Jark
> 
> 
> On Thu, 21 Mar 2019 at 10:21, Kurt Young  > wrote:
> Hi Aljoscha,
> 
> +1 to further separate table-relate jira components, but I would prefer to
> move "Runtime / Operators" to a dedicated "Table SQL / Operators".
> There is one concern about the "classic planner" and "new planner", the
> naming will be inaccurate after blink merge done and we deprecated classic
> planner later (if it happens).
> If only one planner left, then what component should we use when creating
> jira?
> 
> How about this:
> Table SQL / API
> Table SQL / Client
> Table SQL / Planner
> Table SQL / Operators
> 
> Best,
> Kurt
> 
> 
> On Thu, Mar 21, 2019 at 12:39 AM Aljoscha Krettek  >
> wrote:
> 
> > Hi,
> >
> > First of all, I hope I cc’ed all the relevant people. Sorry if I forgot
> > anyone.
> >
> > I would like to restructure the Table/SQL-related Jira components a bit
> > more to better reflect the current state of components. Right now we have:
> >
> > * API / Table SQL: this is just a wild collection of table-related things
> > * Runtime / Operators: this has general operators stuff, but also new
> > Blink-based Table operator stuff and maybe classic Table runner stuff
> > * SQL / Client: as it says
> > * SQL / Planner: this has issues for the existing classic Flink Table
> > runner and new things related to merging of the new Blink-based Table Runner
> >
> > I would suggest to reorganise it like this:
> >
> > * API / Table SQL: API-related things
> > * API / Table SQL / Client: the SQL client
> > * API / Table SQL / Classic Planner: things related to classic Flink Table
> > API runtime and plan translation, everything to do with execution
> > * API / Table SQL / New Planner: runtime operators, translation,
> > everything really, for the new Blink-based Table API/SQL runner
> >
> > Runtime / Operators would be used purely for non-table-related
> > operator/runtime stuff.
> >
> > What do you think? “Classic Planner” and “New Planner” are up for
> > discussion.  We could even get rid of the API prefix, it doesn’t really
> > do much, I think.
> >
> > Best,
> > Aljoscha



[CANCEL][VOTE] Release 1.8.0, release candidate #3

2019-03-21 Thread Aljoscha Krettek
Hi,

I’m hereby canceling the vote for RC3 of Flink 1.8.0 because of various issues 
mentioned in the vote thread.

Best,
Aljoscha

Also, I noticed that cross-posting this to u...@flink.apache.org 
 is a bit tricky. It seems that people only 
responded to the user thread and not the dev thread anymore. Let’s see what we 
do next time.

> On 21. Mar 2019, at 14:15, Aljoscha Krettek  wrote:
> 
> Hi Yu,
> 
> I commented on the issue. For me both Hadoop 2.8.3 and Hadoop 2.4.1 seem to 
> work. Could you have a look at my comment?
> 
> I will also cancel this RC because of various issues.
> 
> Best,
> Aljoscha
> 
>> On 21. Mar 2019, at 12:23, Yu Li > > wrote:
>> 
>> Thanks @jincheng
>> 
>> @Aljoscha I've just opened FLINK-11990 
>>  for the HDFS 
>> BucketingSink issue with hadoop 2.8. IMHO it might be a blocker for 1.8.0 
>> and need your confirmation. Thanks.
>> 
>> Best Regards,
>> Yu
>> 
>> 
>> On Thu, 21 Mar 2019 at 15:57, jincheng sun > > wrote:
>> Thanks for the quick fix, Yu. the PR of FLINK-11972 
>>  has been merged.
>> 
>> Cheers,
>> Jincheng
>> 
>> Yu Li mailto:car...@gmail.com>> 于2019年3月21日周四 上午7:23写道:
>> -1, observed stably failure on streaming bucketing end-to-end test case in 
>> two different environments (Linux/MacOS) when running with both shaded 
>> hadoop-2.8.3 jar file 
>> 
>>  and hadoop-2.8.5 dist 
>> , while both env 
>> could pass with hadoop 2.6.5. More details please refer to this comment 
>> 
>>  in FLINK-11972.
>> 
>> Best Regards,
>> Yu
>> 
>> 
>> On Thu, 21 Mar 2019 at 04:25, jincheng sun > > wrote:
>> Thanks for the quick fix Aljoscha! The FLINK-11971 
>>  has been merged.
>> 
>> Cheers,
>> Jincheng
>> 
>> Piotr Nowojski mailto:pi...@ververica.com>> 
>> 于2019年3月21日周四 上午12:29写道:
>> -1 from my side due to performance regression found in the master branch 
>> since Jan 29th. 
>> 
>> In 10% JVM forks it was causing huge performance drop in some of the 
>> benchmarks (up to 30-50% reduced throughput), which could mean that one out 
>> of 10 task managers could be affected by it. Today we have merged a fix for 
>> it [1]. First benchmark run was promising [2], but we have to wait until 
>> tomorrow to make sure that the problem was definitely resolved. If that’s 
>> the case, I would recommend including it in 1.8.0, because we really do not 
>> know how big of performance regression this issue can be in the real world 
>> scenarios.
>> 
>> Regarding the second regression from mid February. We have found the 
>> responsible commit and this one is probably just a false positive. Because 
>> of the nature some of the benchmarks, they are running with low number of 
>> records (300k). The apparent performance regression was caused by higher 
>> initialisation time. When I temporarily increased the number of records to 
>> 2M, the regression was gone. Together with Till and Stefan Richter we 
>> discussed the potential impact of this longer initialisation time (in the 
>> case of said benchmarks initialisation time increased from 70ms to 120ms) 
>> and we think that it’s not a critical issue, that doesn’t have to block the 
>> release. Nevertheless there might some follow up work for this.
>> 
>> [1] https://github.com/apache/flink/pull/8020 
>> 
>> [2] http://codespeed.dak8s.net:8000/timeline/?ben=tumblingWindow=2 
>> 
>> 
>> Piotr Nowojski
>> 
>>> On 20 Mar 2019, at 10:09, Aljoscha Krettek >> > wrote:
>>> 
>>> Thanks Jincheng! It would be very good to fix those but as you said, I 
>>> would say they are not blockers.
>>> 
 On 20. Mar 2019, at 09:47, Kurt Young >>> > wrote:
 
 +1 (non-binding)
 
 Checked items:
 - checked checksums and GPG files
 - verified that the source archives do not contains any binaries
 - checked that all POM files point to the same version
 - build from source successfully 
 
 Best,
 Kurt
 
 
 On Wed, Mar 20, 2019 at 2:12 PM jincheng sun >>> > wrote:
 Hi Aljoscha,
 
 When I did the `end-to-end` test for RC3 under Mac OS, I found the 
 following two problems:
 
 1. The verification returned for different `minikube status` is is not 
 enough 

[jira] [Created] (FLINK-11992) Update Apache Parquet 1.10.1

2019-03-21 Thread Fokko Driesprong (JIRA)
Fokko Driesprong created FLINK-11992:


 Summary: Update Apache Parquet 1.10.1
 Key: FLINK-11992
 URL: https://issues.apache.org/jira/browse/FLINK-11992
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Reporter: Fokko Driesprong
Assignee: Fokko Driesprong






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Contributor permission application

2019-03-21 Thread Chesnay Schepler

Done.

On 20.03.2019 23:49, Artsem Semianenka wrote:

Hi guys,

Could you please give me contributor permissions for Flink project?
Here my Jira id: artsem.semianenka


Best regards,
Artsem





Re: contributor permission apply

2019-03-21 Thread Chesnay Schepler

Done.

On 20.03.2019 15:46, jianl miao wrote:

Hi,


I want to contribute to Apache Flink.
Would you please give me the contributor permission?
My JIRA ID is jianlong miao.





Re: Fw: Contributor permission application

2019-03-21 Thread Chesnay Schepler

Done.

On 20.03.2019 16:15, hdxg1101300123 wrote:

Hi Guys,

I want to contribute to Apache Flink.
Would you please give me the permission as a contributor?
My JIRA ID is yutaochina.

2019-03-20


   


博彦科技股份有限公司
--上海泓智信息科技有限公司
地址:上海市长宁区天山路8号402室
邮编:200336
手机:15501079221
邮箱:hdxg1101300...@163.com,yuta...@beyondsoft.com



发件人:"hdxg1101300123"
发送时间:2019-02-28 15:27
主题:Contributor permission application
收件人:"dev"
抄送:

Hi Guys,

I want to contribute to Apache Flink.
Would you please give me the permission as a contributor?
My JIRA ID is yutaochina.

2019-02-28



   


博彦科技股份有限公司
--上海泓智信息科技有限公司
地址:上海市长宁区天山路8号402室
邮编:200336
手机:15501079221
邮箱:hdxg1101300...@163.com,yuta...@beyondsoft.com





[jira] [Created] (FLINK-11991) Set headers to use for CSV output

2019-03-21 Thread Julien Nioche (JIRA)
Julien Nioche created FLINK-11991:
-

 Summary: Set headers to use for CSV output
 Key: FLINK-11991
 URL: https://issues.apache.org/jira/browse/FLINK-11991
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Reporter: Julien Nioche
 Fix For: 1.9.0


As discussed in 
[https://stackoverflow.com/questions/54530755/flink-write-tuples-with-csv-header-into-file/54536586?noredirect=1#comment97248717_54536586|[http://stackoverflow.com],|http://stackoverflow.com]%2C/]
 it would be nice to be able to specify headers to print out at the beginning 
of a CSV output.

I've written a patch for this and will add submit it as a PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11990) Streaming bucketing end-to-end test fail with hadoop 2.8

2019-03-21 Thread Yu Li (JIRA)
Yu Li created FLINK-11990:
-

 Summary: Streaming bucketing end-to-end test fail with hadoop 2.8
 Key: FLINK-11990
 URL: https://issues.apache.org/jira/browse/FLINK-11990
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hadoop Compatibility
Reporter: Yu Li


As titled, running the {{test_streaming_bucketing.sh}} case with hadoop 2.8 
bundles always fail, while running with 2.6 bundles could pass.

Command to run the case:
{{FLINK_DIR= flink-end-to-end-tests/run-single-test.sh 
test-scripts/test_streaming_bucketing.sh skip_check_exceptions}}

The output with hadoop 2.8 
[bundle|https://repository.apache.org/content/repositories/orgapacheflink-1213/org/apache/flink/flink-shaded-hadoop2-uber/2.8.3-1.8.0/flink-shaded-hadoop2-uber-2.8.3-1.8.0.jar]
 or [dist|http://archive.apache.org/dist/hadoop/core/hadoop-2.8.5]:
{noformat}
Starting taskexecutor daemon on host z05f06378.sqa.zth.
Waiting for job (905ae10bae4b99031e724b9c29f0ca7b) to reach terminal state 
FINISHED ...
Truncating buckets
Truncating  to
{noformat}

The output of the success run with hadoop 2.6 
[bundle|https://repository.apache.org/content/repositories/orgapacheflink-1213/org/apache/flink/flink-shaded-hadoop2-uber/2.6.5-1.8.0/flink-shaded-hadoop2-uber-2.6.5-1.8.0.jar]
 or [dist|http://archive.apache.org/dist/hadoop/core/hadoop-2.6.5]:
{noformat}
Truncating 
/home/jueding.ly/flink_rc_check/flink-1.8.0-src/flink-end-to-end-tests/test-scripts/temp-test-directory-06210353709/out/result3/part-3-0
 to 51250
1+0 records in
1+0 records out
51250 bytes (51 kB) copied, 0.000377998 s, 136 MB/s
Truncating 
/home/jueding.ly/flink_rc_check/flink-1.8.0-src/flink-end-to-end-tests/test-scripts/temp-test-directory-06210353709/out/result7/part-3-0
 to 51250
1+0 records in
1+0 records out
51250 bytes (51 kB) copied, 0.00033118 s, 155 MB/s
pass Bucketing Sink
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11989) Enable metric reporter modules in jdk9 runs

2019-03-21 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-11989:


 Summary: Enable metric reporter modules in jdk9 runs
 Key: FLINK-11989
 URL: https://issues.apache.org/jira/browse/FLINK-11989
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics, Travis
Affects Versions: 1.9.0
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.9.0


The Reporter modules are currently disabled on the travis jdk9 jobs as we ran 
into some issues in the MetricRegistry that prevented them from suceeding.

It appears that this issue no longer exists, so let's enable them again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] FLIP-33: Standardize connector metrics

2019-03-21 Thread Chesnay Schepler
As I said before, I believe this to be over-engineered and have no 
interest in this implementation.


There are conceptual issues like defining a duplicate numBytesIn(PerSec) 
metric that already exists for each operator.


On 21.03.2019 06:13, Becket Qin wrote:
A few updates to the thread. I uploaded a patch[1] as a complete 
example of how users can use the metrics in the future.


Some thoughts below after taking a look at the AbstractMetricGroup and 
its subclasses.


This patch intends to provide convenience for Flink connector 
implementations to follow metrics standards proposed in FLIP-33. It 
also try to enhance the metric management in general way to help users 
with:


 1. metric definition
 2. metric dependencies check
 3. metric validation
 4. metric control (turn on / off particular metrics)

This patch wraps |MetricGroup| to extend the functionality of 
|AbstractMetricGroup| and its subclasses. The 
|AbstractMetricGroup| mainly focus on the metric group hierarchy, but 
does not really manage the metrics other than keeping them in a Map.


Ideally we should only have one entry point for the metrics.

Right now the entry point is |AbstractMetricGroup|. However, besides 
the missing functionality mentioned above, |AbstractMetricGroup| seems 
deeply rooted in Flink runtime. We could extract it out to 
flink-metrics in order to use it for generic purpose. There will be 
some work, though.


Another approach is to make |AbstractMetrics| in this patch as the 
metric entry point. It wraps metric group and provides the missing 
functionalities. Then we can roll out this pattern to runtime 
components gradually as well.


My first thought is that the latter approach gives a more smooth 
migration. But I am also OK with doing a refactoring on the 
|AbstractMetricGroup| family.



Thanks,

Jiangjie (Becket) Qin

[1] https://github.com/becketqin/flink/pull/1

On Mon, Feb 25, 2019 at 2:32 PM Becket Qin > wrote:


Hi Chesnay,

It might be easier to discuss some implementation details in the
PR review instead of in the FLIP discussion thread. I have a patch
for Kafka connectors ready but haven't submitted the PR yet.
Hopefully that will help explain a bit more.

** Re: metric type binding
This is a valid point that worths discussing. If I understand
correctly, there are two points:

1. Metric type / interface does not matter as long as the metric
semantic is clearly defined.
Conceptually speaking, I agree that as long as the metric semantic
is defined, metric type does not matter. To some extent, Gauge /
Counter / Meter / Histogram themselves can be think of as some
well-recognized semantics, if you wish. In Flink, these metric
semantics have their associated interface classes. In practice,
such semantic to interface binding seems necessary for different
components to communicate.  Simply standardize the semantic of the
connector metrics seems not sufficient for people to build
ecosystem on top of. At the end of the day, we still need to have
some embodiment of the metric semantics that people can program
against.

2. Sometimes the same metric semantic can be exposed using
different metric types / interfaces.
This is a good point. Counter and Gauge-as-a-Counter are pretty
much interchangeable. This is more of a trade-off between the user
experience of metric producers and consumers. The metric producers
want to use Counter or Gauge depending on whether the counter is
already tracked in code, while ideally the metric consumers only
want to see a single metric type for each metric. I am leaning
towards to make the metric producers happy, i.e. allow Gauge /
Counter metric type, and the the metric consumers handle the type
variation. The reason is that in practice, there might be more
connector implementations than metric reporter implementations. We
could also provide some helper method to facilitate reading from
such variable metric type.


Just some quick replies to the comments around implementation details.

4) single place where metrics are registered except
connector-specific
ones (which we can't really avoid).

Register connector specific ones in a single place is actually
something that I want to achieve.

2) I'm talking about time-series databases like Prometheus. We
would
only have a gauge metric exposing the last fetchTime/emitTime
that is
regularly reported to the backend (Prometheus), where a user
could build
a histogram of his choosing when/if he wants it.

Not sure if such downsampling works. As an example, if a user
complains that there are some intermittent latency spikes (maybe a
few records in 10 seconds) in their processing system. Having a
Gauge sampling instantaneous latency seems unlikely useful.
However by 

Re: [DISCUSS] Change underlying Frontend Architecture for Flink Web Dashboard

2019-03-21 Thread Robert Metzger
Hey all,

Yadong has now opened a pull request with the Angular 7-based web frontend:
https://github.com/apache/flink/pull/8016
The PR contains the complete dashboard, and is ready to check out, build
and run locally.

I believe it would be good to get some more feedback on the PR, from people
with different systems, browsers and experiences.
The PR removes the old UI.

I propose to merge the PR soon to get more testing exposure in the
1.9-SNAPSHOT version.

Regards,
Robert


On Tue, Nov 6, 2018 at 12:45 PM Shaoxuan Wang  wrote:

> Fabian,
> Thanks for pointing out the Jira. Sure, we will reuse it to start the
> contribution.
>
> Regards,
> Shaoxuan
>
> On Tue, Nov 6, 2018 at 7:28 PM Fabian Wollert  wrote:
>
> > i updated this JIRA already, feel free to reuse this:
> > https://issues.apache.org/jira/browse/FLINK-10706
> >
> > --
> >
> >
> > *Fabian WollertZalando SE*
> >
> > E-Mail: fab...@zalando.de
> >
> >
> > Am Di., 6. Nov. 2018 um 12:10 Uhr schrieb Shaoxuan Wang <
> > wshaox...@gmail.com
> > >:
> >
> > > Till,
> > > Yes, it is a good idea to have a feature flag to switch the web UI
> before
> > > we completely deprecate the old one.
> > >
> > > Yadong,
> > > It seems that everyone likes the new web UI. Can you please open a
> master
> > > Jira and start to merge the code to Flink master. What do you think?
> > >
> > > Regards,
> > > Shaoxuan
> > >
> > > On Mon, Nov 5, 2018 at 7:04 PM Till Rohrmann 
> > wrote:
> > >
> > > > Thanks a lot for sharing the code with the community Yadong!
> > > >
> > > > It looks really cool and I also want to give it a try to see how easy
> > it
> > > is
> > > > to start Flink with it.
> > > >
> > > > If it is already implemented and working, we could also think about
> > > adding
> > > > it to Flink and add a feature flag to switch between the old and new
> > web
> > > > UI. We could think about enabling it by default to give it more user
> > > > exposure. After being confident and users having no complaints, we
> > could
> > > > think about deprecating the old web UI and eventually to drop it. Of
> > > > course, initially we should give it a thorough review.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Mon, Nov 5, 2018 at 8:40 AM Fabian Wollert 
> > wrote:
> > > >
> > > > > Hi Yadong, this is awesome, thx for the code! I will try it out on
> > our
> > > > > infrastructure and will post my feedback here, latest next week.
> > > > >
> > > > > I will also check if my ideas for FLINK-10707
> > > > >  are doable
> with
> > > your
> > > > > code since this was what pushed this discussion initially.
> > > > >
> > > > > Cheers
> > > > >
> > > > > --
> > > > >
> > > > >
> > > > > *Fabian WollertZalando SE*
> > > > >
> > > > > E-Mail: fab...@zalando.de
> > > > >
> > > > >
> > > > > Am Mo., 5. Nov. 2018 um 04:42 Uhr schrieb Yadong Xie <
> > > > vthink...@gmail.com
> > > > > >:
> > > > >
> > > > > > Hi Fabian, Till, and Robert
> > > > > >
> > > > > > Thank you for your attention to this matter, I just push our
> codes
> > to
> > > > > > github: https://github.com/vthinkxie/flink-runtime-web.
> > > > > >
> > > > > > You can start the project by following the guidelines
> > > > > > <
> > > https://github.com/vthinkxie/flink-runtime-web#development--debugging
> > > > >
> > > > > > (just
> > > > > > run `npm install && npm run proxy`), just feel free to give any
> > > > comments
> > > > > :)
> > > > > >
> > > > > > If I missed anything please let me know. Look forward to your
> > > feedback
> > > > > and
> > > > > > suggestions soon.
> > > > > >
> > > > > > Best regards
> > > > > > Yadong
> > > > > >
> > > > > >
> > > > > > On Fri, Nov 2, 2018 at 5:28 PM Fabian Wollert  >
> > > > wrote:
> > > > > >
> > > > > > > Hi Yadong, this looks awesome. is there any chance you can
> > already
> > > > > share
> > > > > > > the code of the new web UI, so we can take a look at what you
> > guys
> > > > > build
> > > > > > > there? I think that would speed up the discussion. If there is
> > > > already
> > > > > a
> > > > > > > fully fledged new Version with everything updated out there,
> and
> > > its
> > > > > even
> > > > > > > battle tested in production already, that sounds like the way
> to
> > go
> > > > for
> > > > > > me.
> > > > > > >
> > > > > > > i started aside from this discussion here (to strengthen and
> > learn
> > > > some
> > > > > > > new React stuff) my attempt on the React version already, for
> > > whoever
> > > > > is
> > > > > > > curious, you can check it out here:
> > > > > > > https://github.com/drummerwolli/flink-web-ui-tmp (adjust the
> > base
> > > > url
> > > > > in
> > > > > > > actions.js
> > > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/drummerwolli/flink-web-ui-tmp/blob/master/src/actions.js#L8
> > > > > > >,
> > > > > > > npm install and then npm start) ... i just started to convert
> the
> > > > first
> > > > > > > simple pages, so dont expect the whole UI yet, its just a 

Re: [DISCUSS] Create a Flink ecosystem website

2019-03-21 Thread Robert Metzger
Okay, great.

Congxian Qiu, Daryl and I have a kick-off call later today at 2pm CET, 9pm
China time about the design of the ecosystem page (see:
https://github.com/rmetzger/flink-community-tools/issues/4)
Please let me know if others want to join as well, I can add them to the
invite.

On Wed, Mar 20, 2019 at 4:10 AM Becket Qin  wrote:

> I agree. We can start with english-only and see how it goes. The comments
> and descriptions can always be multi-lingual but that is up to the package
> owners.
>
> On Tue, Mar 19, 2019 at 6:07 PM Robert Metzger 
> wrote:
>
>> Thanks.
>>
>> Do we actually want this page to be multi-language?
>>
>> I propose to make the website english-only, but maybe consider allowing
>> comments in different languages.
>> If we would make it multi-language, then we might have problems with
>> people submitting packages in non-english languages.
>>
>>
>>
>> On Tue, Mar 19, 2019 at 2:42 AM Becket Qin  wrote:
>>
>>> Done. The writeup looks great!
>>>
>>> On Mon, Mar 18, 2019 at 9:09 PM Robert Metzger 
>>> wrote:
>>>
 Nice, really good news on the INFRA front!
 I think the hardware specs sound reasonable. And a periodic backup of
 the website's database to Infra's backup solution sounds reasonable too.

 Can you accept and review my proposal for the website?


 On Sat, Mar 16, 2019 at 3:47 PM Becket Qin 
 wrote:

> >
> > I have a very capable and motivated frontend developer who would be
> > willing to implement what I've mocked in my proposal.
>
>
> That is awesome!
>
> I created a Jira ticket[1] to Apache Infra and got the reply. It looks
> that
> Apache infra team could provide a decent VM. The last piece is how to
> ensure the data is persisted so we won't lose the project info / user
> feedbacks when the VM is down. If Apache infra does not provide a
> persistent storage for DB backup, we can always ask for multiple VMs
> and do
> the fault tolerance by ourselves. It seems we can almost say the
> hardware
> side is also ready.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> [1] https://issues.apache.org/jira/browse/INFRA-18010
>
> On Fri, Mar 15, 2019 at 5:39 PM Robert Metzger 
> wrote:
>
> > Thank you for reaching out to Infra and the ember client.
> > When I first saw the Ember repository, I thought it is the whole
> thing
> > (frontend and backend), but while testing it, I realized it is
> "only" the
> > frontend. I'm not sure if it makes sense to adjust the Ember observer
> > client, or just write a simple UI from scratch.
> > I have a very capable and motivated frontend developer who would be
> > willing to implement what I've mocked in my proposal.
> > In addition, I found somebody (Congxian Qiu) who seems to be eager
> to help
> > with this project for the backend:
> > https://github.com/rmetzger/flink-community-tools/issues/4
> >
> > For Infra: I made the same experience when asking for more GitHub
> > permissions for "flinkbot": They didn't respond on their mailing
> list, only
> > on Jira.
> >
> >
> >
> > On Thu, Mar 14, 2019 at 2:45 PM Becket Qin 
> wrote:
> >
> >> Thanks for writing up the specifications.
> >>
> >> Regarding the website source code, Austin found a website[1] whose
> >> frontend code[2] is available publicly. It lacks some support (e.g
> login),
> >> but it is still a good starting point. One thing is that I did not
> find a
> >> License statement for that source code. I'll reach out to the
> author to see
> >> if they have any concern over our usage.
> >>
> >> Apache Infra has not replied to my email regarding some details
> about the
> >> VM. I'll open an infra Jira ticket tomorrow if there is still no
> response.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> [1] https://emberobserver.com/
> >> [2] https://github.com/emberobserver/client
> >>
> >>
> >>
> >> On Thu, Mar 14, 2019 at 1:35 AM Robert Metzger  >
> >> wrote:
> >>
> >>> @Bowen: I agree. Confluent Hub looks nicer, but it is on their
> company
> >>> website. I guess the likelihood that they give out code from their
> company
> >>> website is fairly low.
> >>> @Nils: Beam's page is similar to our Ecosystem page, which we'll
> >>> reactivate as part of this PR:
> >>> https://github.com/apache/flink-web/pull/187
> >>>
> >>> Spark-packages.org did not respond to my request.
> >>> I will propose a short specification in Becket's initial document.
> >>>
> >>>
> >>> On Mon, Mar 11, 2019 at 11:38 AM Niels Basjes 
> wrote:
> >>>
>  Hi,
> 
>  The Beam project has something in this area that is simply a page
>  within their documentation website:
> 

[jira] [Created] (FLINK-11988) Remove legacy MockNetworkEnvironment

2019-03-21 Thread zhijiang (JIRA)
zhijiang created FLINK-11988:


 Summary: Remove legacy MockNetworkEnvironment
 Key: FLINK-11988
 URL: https://issues.apache.org/jira/browse/FLINK-11988
 Project: Flink
  Issue Type: Task
  Components: Runtime / Network, Tests
Reporter: zhijiang
Assignee: zhijiang


Remove legacy {{MockNetworkEnvironment}} class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Facebook: Save Wilpattu One Srilanka's Most Loved Place

2019-03-21 Thread felipe . o . gutierrez
Olá,

Eu acabei de assinar o abaixo-assinado "Facebook: Save Wilpattu One
Srilanka's Most Loved Place" e queria saber se você pode ajudar assinando
também.

A nossa meta é conseguir 15.000 assinaturas e precisamos de mais apoio.
Você pode ler mais sobre este assunto e assinar o abaixo-assinado aqui:

http://chng.it/ZsgJMgc6rb

Obrigado!
Felipe


[jira] [Created] (FLINK-11987) Kafka producer occasionally throws NullpointerException

2019-03-21 Thread LIU Xiao (JIRA)
LIU Xiao created FLINK-11987:


 Summary: Kafka producer occasionally throws NullpointerException
 Key: FLINK-11987
 URL: https://issues.apache.org/jira/browse/FLINK-11987
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.7.2, 1.6.4, 1.6.3
 Environment: Flink 1.6.2 (Standalone Cluster)

Oracle JDK 1.8u151

Centos 7.4
Reporter: LIU Xiao


We are using Flink 1.6.2 in our production environment, and kafka producer 
occasionally throws NullpointerException.

We found in line 175 of 
flink/flink-connectors/flink-connector-kafka-0.11/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer011.java,
 NEXT_TRANSACTIONAL_ID_HINT_DESCRIPTOR was created as a static variable.

Then in line 837, 
"context.getOperatorStateStore().getUnionListState(NEXT_TRANSACTIONAL_ID_HINT_DESCRIPTOR);"
 was called, and that leads to line 734 of 
 
flink/flink-runtime/src/main/java/org/apache/flink/runtime/state/DefaultOperatorStateBackend.java:
 "stateDescriptor.initializeSerializerUnlessSet(getExecutionConfig());"

In function initializeSerializerUnlessSet(line 283 of 
flink/flink-core/src/main/java/org/apache/flink/api/common/state/StateDescriptor.java):

if (serializer == null) {
checkState(typeInfo != null, "no serializer and no type info");
// instantiate the serializer
serializer = typeInfo.createSerializer(executionConfig);
// we can drop the type info now, no longer needed
typeInfo  = null;
}

"serializer = typeInfo.createSerializer(executionConfig);" is the line which 
throws the exception.

We think that's because multiple subtasks of the same producer in a same 
TaskManager share a same NEXT_TRANSACTIONAL_ID_HINT_DESCRIPTOR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Facebook: Save Wilpattu One Srilanka's Most Loved Place

2019-03-21 Thread dhanuka . priyanath
Hello there,

I just signed the petition "Facebook: Save Wilpattu One Srilanka's Most
Loved Place" and wanted to see if you could help by adding your name.

Our goal is to reach 7,500 signatures and we need more support. You can
read more and sign the petition here:

http://chng.it/ZxfxKQcqqk

Thanks!
dhanuka