Re: [Feature]Returning RuntimeException to REST client while job submission

2019-01-24 Thread Lavkesh Lahngir
Hello,
I created a Jira. Can somebody review it, please? Or suggest if this is
useful?
https://issues.apache.org/jira/browse/FLINK-11423

Thank you.
~Lavkesh

On Thu, Jan 24, 2019 at 11:40 AM Lavkesh Lahngir  wrote:

> Hi,
> It's not fixed in the master. I compiled and ran it yesterday.
> I am not if that is an issue or design choice.
>
> On Thu, Jan 24, 2019 at 11:38 AM Lavkesh Lahngir 
> wrote:
>
>> Hello,
>> I mentioned in the first email.
>>
>> Version: 1.6.2, Commit ID: 3456ad0
>>
>> On Thu, Jan 24, 2019 at 12:33 AM Chesnay Schepler 
>> wrote:
>>
>>> I suggest that you first tell me which version you are using so that I
>>> can a) reproduce the issue and b) check that this issue wasn't fixed in
>>> master or a recent bugfix release.
>>>
>>> On 23.01.2019 17:16, Lavkesh Lahngir wrote:
>>> > Actually, I realized my mistake that JarRunHandler is being used in the
>>> > jar/run API call.
>>> > And the changes are done in RestClusterClient.
>>> > The problem I was facing was that It always gives me "The main method
>>> > caused an error"
>>> > without any more details.
>>> > I am thinking when we throw ProgramInvocationException in
>>> PackagedProgram.
>>> > callMainMethod()
>>> > we should add exceptionInMethod.getMessage() too.
>>> >
>>> > *---
>>> >
>>> a/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java*
>>> >
>>> > *+++
>>> >
>>> b/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java*
>>> >
>>> > @@ -543,7 +543,7 @@ public class PackagedProgram {
>>> >
>>> >  } else if (exceptionInMethod instanceof
>>> > ProgramInvocationException) {
>>> >
>>> >  throw (ProgramInvocationException)
>>> > exceptionInMethod;
>>> >
>>> >  } else {
>>> >
>>> > -   throw new
>>> ProgramInvocationException("The
>>> > main method caused an error.", exceptionInMethod);
>>> >
>>> > +   throw new
>>> ProgramInvocationException("The
>>> > main method caused an error.: " + exceptionInMethod.getMessage(),
>>> > exceptionInMethod);
>>> >
>>> >  }
>>> >
>>> >  }
>>> >
>>> >  catch (Throwable t) {
>>> >
>>> > What will you suggest?
>>> >
>>> > On Wed, Jan 23, 2019 at 7:01 PM Chesnay Schepler 
>>> wrote:
>>> >
>>> >> Which version are you using?
>>> >>
>>> >> On 23.01.2019 08:00, Lavkesh Lahngir wrote:
>>> >>> Or maybe I am missing something? It looks like the JIRA is trying to
>>> >> solve
>>> >>> the same issues I stated 樂
>>> >>> In the main method, I just threw a simple new Exception("Some
>>> message")
>>> >> and
>>> >>> I got the response I mentioned from the rest API.
>>> >>>
>>> >>> Thanks.
>>> >>>
>>> >>> On Wed, Jan 23, 2019 at 2:50 PM Lavkesh Lahngir 
>>> >> wrote:
>>>  Hello,
>>>  The change in FLINK-10312
>>>   makes REST
>>> response
>>>  of the API
>>>  <
>>> >>
>>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jars-jarid-run
>>> >
>>> >> not
>>>  very informative. It strips the stack trace and returns a generic
>>> >> message.
>>>  People using flink-cluster deployment who do not have access to job
>>> >> manager
>>>  logs, will not be able to figure out the root cause.
>>>  In the case of when the job submission fails,
>>>  In 1.6.2, I get
>>>  {
>>>    "errors": [
>>> 
>>>  "org.apache.flink.client.program.ProgramInvocationException:
>>> >> The
>>>  main method caused an error."
>>>    ]
>>>  }
>>> 
>>>  Is there a plan to improve error messages sent to the client?
>>>  Is somebody working on this already?
>>> 
>>>  Thanks in advance.
>>>  ~Lavkesh
>>> 
>>> >>
>>>
>>>


Re: [ANNOUNCE] Contributing Alibaba's Blink

2019-01-24 Thread jincheng sun
Thanks Stephan,

The entire plan makes sense to me.

Regarding the branch name, how about using "blink-flink-1.5" wich meant
that the branche is based on flink-1.5.  If the name is "blink-1.5", some
users will think that this is the version number of the internal Blink of
alibaba, and will not associate with the branch of flink-1.5. This is only
one of my concerns.

Wrt Offering Jars for Blink, from the points of my view, if we do not
Release the Blink code, we can write the blog documentation to detail how
to build and deploy Blink from source code. I think after we push the blink
branch, we also some bug fix and small function development(which urgently
needed by users). So telling users how to build a release package from
source is very important。 Something like current "Building Flionk from
Source" section of flink doc. In this way we are both user-friendly and
avoid any liability issues.

Wrt the "Docs for Flink", if we expect users to take advantage of the
functionality of blink, and the blink branch will also make bugfix changes,
I suggest adding an address same as ”
https://ci.apache.org/projects/flink/flink-docs-master“, e.g.: "https ://
ci.apache.org/projects/flink/flink-docs-blink", so users can have a
complete user experience, just like every version published by flink "
https://ci.apache.org/projects /flink/flink-docs-release-XX", the
difference is that we do not declare release, do not assume the quality and
responsibility of the release. So, I agree with
@Shaoxuan Wang  's suggestion. If I misunderstood what
you mean, please correct me. @Shaoxuan Wang 

Regards,
Jincheng

Stephan Ewen  于2019年1月22日周二 上午3:46写道:

> Dear Flink Community!
>
> Some of you may have heard it already from announcements or from a Flink
> Forward talk:
> Alibaba has decided to open source its in-house improvements to Flink,
> called Blink!
> First of all, big thanks to team that developed these improvements and made
> this
> contribution possible!
>
> Blink has some very exciting enhancements, most prominently on the Table
> API/SQL side
> and the unified execution of these programs. For batch (bounded) data, the
> SQL execution
> has full TPC-DS coverage (which is a big deal), and the execution is more
> than 10x faster
> than the current SQL runtime in Flink. Blink has also added support for
> catalogs,
> improved the failover speed of batch queries and the resource management.
> It also
> makes some good steps in the direction of more deeply unifying the batch
> and streaming
> execution.
>
> The proposal is to merge Blink's enhancements into Flink, to give Flink's
> SQL/Table API and
> execution a big boost in usability and performance.
>
> Just to avoid any confusion: This is not a suggested change of focus to
> batch processing,
> nor would this break with any of the streaming architecture and vision of
> Flink.
> This contribution follows very much the principle of "batch is a special
> case of streaming".
> As a special case, batch makes special optimizations possible. In its
> current state,
> Flink does not exploit many of these optimizations. This contribution adds
> exactly these
> optimizations and makes the streaming model of Flink applicable to harder
> batch use cases.
>
> Assuming that the community is excited about this as well, and in favor of
> these enhancements
> to Flink's capabilities, below are some thoughts on how this contribution
> and integration
> could work.
>
> --- Making the code available ---
>
> At the moment, the Blink code is in the form of a big Flink fork (rather
> than isolated
> patches on top of Flink), so the integration is unfortunately not as easy
> as merging a
> few patches or pull requests.
>
> To support a non-disruptive merge of such a big contribution, I believe it
> make sense to make
> the code of the fork available in the Flink project first.
> From there on, we can start to work on the details for merging the
> enhancements, including
> the refactoring of the necessary parts in the Flink master and the Blink
> code to make a
> merge possible without repeatedly breaking compatibility.
>
> The first question is where do we put the code of the Blink fork during the
> merging procedure?
> My first thought was to temporarily add a repository (like
> "flink-blink-staging"), but we could
> also put it into a special branch in the main Flink repository.
>
>
> I will start a separate thread about discussing a possible strategy to
> handle and merge
> such a big contribution.
>
> Best,
> Stephan
>


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-24 Thread Jark Wu
+1 for the leaner distribution and improve the "Download" page.

On Fri, 25 Jan 2019 at 01:54, Bowen Li  wrote:

> +1 for leaner distribution and a better 'download' webpage.
>
> +1 for a full distribution if we can automate it besides supporting the
> leaner one. If we support both, I'd image release managers should be able
> to package two distributions with a single change of parameter instead of
> manually package the full distribution. How to achieve that needs to be
> evaluated and discussed, probably can be something like 'mvn clean install
> -Dfull/-Dlean', I'm not sure yet.
>
>
> On Wed, Jan 23, 2019 at 10:11 AM Thomas Weise  wrote:
>
>> +1 for trimming the size by default and offering the fat distribution as
>> alternative download
>>
>>
>> On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann 
>> wrote:
>>
>>> Ufuk's proposal (having a lean default release and a user convenience
>>> tarball) sounds good to me. That way advanced users won't be bothered by
>>> an
>>> unnecessarily large release and new users can benefit from having many
>>> useful extensions bundled in one tarball.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi  wrote:
>>>
>>> > On Wed, Jan 23, 2019 at 11:01 AM Timo Walther 
>>> wrote:
>>> > > I think what is more important than a big dist bundle is a helpful
>>> > > "Downloads" page where users can easily find available filesystems,
>>> > > connectors, metric repoters. Not everyone checks Maven central for
>>> > > available JAR files. I just saw that we added a "Optional components"
>>> > > section recently [1], we just need to make it more prominent. This is
>>> > > also done for the SQL connectors and formats [2].
>>> >
>>> > +1 I fully agree with the importance of the Downloads page. We
>>> > definitely need to make any optional dependencies that users need to
>>> > download easy to find.
>>> >
>>>
>>


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-24 Thread jincheng sun
Hi Chesnay,

Thank you for the proposal. And i like it very much.

+1 for the leaner distribution.

About improve the "Download" page, I think we can add the connectors
download link in the  "Optional components" section which @Timo Walther
  mentioned above.


Regards,
Jincheng

Chesnay Schepler  于2019年1月18日周五 下午5:59写道:

> Hello,
>
> the binary distribution that we release by now contains quite a lot of
> optional components, including various filesystems, metric reporters and
> libraries. Most users will only use a fraction of these, and as such
> pretty much only increase the size of flink-dist.
>
> With Flink growing more and more in scope I don't believe it to be
> feasible to ship everything we have with every distribution, and instead
> suggest more of a "pick-what-you-need" model, where flink-dist is rather
> lean and additional components are downloaded separately and added by
> the user.
>
> This would primarily affect the /opt directory, but could also be
> extended to cover flink-dist. For example, the yarn and mesos code could
> be spliced out into separate jars that could be added to lib manually.
>
> Let me know what you think.
>
> Regards,
>
> Chesnay
>
>


[jira] [Created] (FLINK-11432) YarnFlinkResourceManagerTest test case timeout

2019-01-24 Thread vinoyang (JIRA)
vinoyang created FLINK-11432:


 Summary: YarnFlinkResourceManagerTest test case timeout 
 Key: FLINK-11432
 URL: https://issues.apache.org/jira/browse/FLINK-11432
 Project: Flink
  Issue Type: Test
Reporter: vinoyang


{code:java}
07:24:07.034 [ERROR] 
testYarnFlinkResourceManagerJobManagerLostLeadership(org.apache.flink.yarn.YarnFlinkResourceManagerTest)
  Time elapsed: 180.044 s  <<< FAILURE!
java.lang.AssertionError: assertion failed: timeout (179485118035 nanoseconds) 
during expectMsgClass waiting for class 
org.apache.flink.runtime.messages.Acknowledge
at 
org.apache.flink.yarn.YarnFlinkResourceManagerTest.testYarnFlinkResourceManagerJobManagerLostLeadership(YarnFlinkResourceManagerTest.java:99)
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Guowei Ma
+1

This not only helps Chinese users but also helps the community to collect more 
feedback and scenarios.


> 在 2019年1月25日,上午2:29,Zhang, Xuefu  写道:
> 
> +1 on the idea. This will certainly help promote Flink in China industries. 
> On a side note, it would be great if anyone in the list can help source 
> ideas, bug reports, and feature requests to dev@ list and/or JIRAs so as to 
> gain broader attention.
> 
> Thanks,
> Xuefu
> 
> 
> --
> From:Fabian Hueske 
> Sent At:2019 Jan. 24 (Thu.) 05:32
> To:dev 
> Subject:Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the 
> Chinese-speaking community?
> 
> Thanks Robert!
> I think this is a very good idea.
> +1
> 
> Fabian
> 
>> Am Do., 24. Jan. 2019 um 14:09 Uhr schrieb Jeff Zhang :
>> 
>> +1
>> 
>> Piotr Nowojski  于2019年1月24日周四 下午8:38写道:
>> 
>>> +1, good idea, especially with that many Chinese speaking contributors,
>>> committers & users :)
>>> 
>>> Piotrek
>>> 
 On 24 Jan 2019, at 13:20, Kurt Young  wrote:
 
 Big +1 on this, it will indeed help Chinese speaking users a lot.
 
 fudian.fd 于2019年1月24日 周四20:18写道:
 
> +1. I noticed that many folks from China are requesting the JIRA
> permission in the past year. It reflects that more and more developers
>>> from
> China are using Flink. A Chinese oriented mailing list will definitely
>>> be
> helpful for the growth of Flink in China.
> 
> 
>> 在 2019年1月24日,下午7:42,Stephan Ewen  写道:
>> 
>> +1, a very nice idea
>> 
>> On Thu, Jan 24, 2019 at 12:41 PM Robert Metzger >> 
> wrote:
>> 
>>> Thanks for your response.
>>> 
>>> You are right, I'm proposing "user...@flink.apache.org" as the
>>> mailing
>>> list's name!
>>> 
>>> On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
>>> wrote:
>>> 
 Hi Robert,
 
 Thanks a lot for starting this discussion!
 
 +1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the
> title,
 but
 -cn in the opening email content.
 I think -zh would be better as we are establishing the tool for
>>> general
 Chinese-speaking users).
 All dev@ discussions / JIRAs should still be in a single English
> mailing
 list.
 
 From what I've seen in the DingTalk Flink user group, there's
>> quite a
> bit
 of activity in forms of user questions and replies.
 It would really be great if the Chinese-speaking user community can
 actually have these discussions happen in the Apache mailing lists,
 so that questions / discussions / replies from developers can be
> indexed
 and searchable.
 Moreover, it'll give the community more insight in how active a
 Chinese-speaking contributor is helping with user requests,
 which in general is a form of contribution that the community
>> always
>>> merits
 a lot.
 
 Cheers,
 Gordon
 
 On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger <
>> rmetz...@apache.org
 
 wrote:
 
> Hey all,
> 
> I would like to create a new user support mailing list called "
> user...@flink.apache.org" to cater the Chinese-speaking Flink
>>> community.
> 
> Why?
> In the last year 24% of the traffic on flink.apache.org came from
>>> the
 US,
> 22% from China. In the last three months, China is at 30%, the US
>> at
>>> 20%.
> An additional data point is that there's a Flink DingTalk group
>> with
>>> more
> than 5000 members, asking Flink questions.
> I believe that knowledge about Flink should be available in public
>>> forums
> (our mailing list), indexable by search engines. If there's a huge
>>> demand
> in a Chinese language support, we as a community should provide
>>> these
 users
> the tools they need, to grow our community and to allow them to
>>> follow
 the
> Apache way.
> 
> Is it possible?
> I believe it is, because a number of other Apache projects are
>>> running
> non-English user@ mailing lists.
> Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
>>> non-English
> lists: http://mail-archives.apache.org/mod_mbox/
> One thing I want to make very clear in this discussion is that all
 project
> decisions, developer discussions, JIRA tickets etc. need to happen
>>> in
> English, as this is the primary language of the Apache Foundation
>>> and
>>> our
> community.
> We should also clarify this on the page listing the mailing lists.
> 
> How?
> If there is consensus in this discussion thread, I would request
>> the
>>> 

Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Jark Wu
a big +1 to this!

An user-zh mailing list will help more Chinese speaking users and there
will be more people willing to help answer questions who want to contribute
for Flink.
It's exciting to see the community is embracing Chinese speaking users.

Best, Jark

On Fri, 25 Jan 2019 at 11:59, shengjk1  wrote:

> +1, a good idea
>
>
> thanks
>
>
> On 01/25/2019 09:26,Hequn Cheng wrote:
> +1  This would be very helpful!
>
>
> On Fri, Jan 25, 2019 at 8:15 AM Guowei Ma  wrote:
>
> +1
>
> This not only helps Chinese users but also helps the community to collect
> more feedback and scenarios.
>
>
> 在 2019年1月25日,上午2:29,Zhang, Xuefu  写道:
>
> +1 on the idea. This will certainly help promote Flink in China
> industries. On a side note, it would be great if anyone in the list can
> help source ideas, bug reports, and feature requests to dev@ list and/or
> JIRAs so as to gain broader attention.
>
> Thanks,
> Xuefu
>
>
> --
> From:Fabian Hueske 
> Sent At:2019 Jan. 24 (Thu.) 05:32
> To:dev 
> Subject:Re: [DISCUSS] Start a user...@flink.apache.org mailing list for
> the Chinese-speaking community?
>
> Thanks Robert!
> I think this is a very good idea.
> +1
>
> Fabian
>
> Am Do., 24. Jan. 2019 um 14:09 Uhr schrieb Jeff Zhang  :
>
> +1
>
> Piotr Nowojski  于2019年1月24日周四 下午8:38写道:
>
> +1, good idea, especially with that many Chinese speaking contributors,
> committers & users :)
>
> Piotrek
>
> On 24 Jan 2019, at 13:20, Kurt Young  wrote:
>
> Big +1 on this, it will indeed help Chinese speaking users a lot.
>
> fudian.fd 于2019年1月24日 周四20:18写道:
>
> +1. I noticed that many folks from China are requesting the JIRA
> permission in the past year. It reflects that more and more
> developers
> from
> China are using Flink. A Chinese oriented mailing list will
> definitely
> be
> helpful for the growth of Flink in China.
>
>
> 在 2019年1月24日,下午7:42,Stephan Ewen  写道:
>
> +1, a very nice idea
>
> On Thu, Jan 24, 2019 at 12:41 PM Robert Metzger <
> rmetz...@apache.org
>
> wrote:
>
> Thanks for your response.
>
> You are right, I'm proposing "user...@flink.apache.org" as the
> mailing
> list's name!
>
> On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
> wrote:
>
> Hi Robert,
>
> Thanks a lot for starting this discussion!
>
> +1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the
> title,
> but
> -cn in the opening email content.
> I think -zh would be better as we are establishing the tool for
> general
> Chinese-speaking users).
> All dev@ discussions / JIRAs should still be in a single English
> mailing
> list.
>
> From what I've seen in the DingTalk Flink user group, there's
> quite a
> bit
> of activity in forms of user questions and replies.
> It would really be great if the Chinese-speaking user community
> can
> actually have these discussions happen in the Apache mailing
> lists,
> so that questions / discussions / replies from developers can be
> indexed
> and searchable.
> Moreover, it'll give the community more insight in how active a
> Chinese-speaking contributor is helping with user requests,
> which in general is a form of contribution that the community
> always
> merits
> a lot.
>
> Cheers,
> Gordon
>
> On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger <
> rmetz...@apache.org
>
> wrote:
>
> Hey all,
>
> I would like to create a new user support mailing list called "
> user...@flink.apache.org" to cater the Chinese-speaking Flink
> community.
>
> Why?
> In the last year 24% of the traffic on flink.apache.org came
> from
> the
> US,
> 22% from China. In the last three months, China is at 30%, the US
> at
> 20%.
> An additional data point is that there's a Flink DingTalk group
> with
> more
> than 5000 members, asking Flink questions.
> I believe that knowledge about Flink should be available in
> public
> forums
> (our mailing list), indexable by search engines. If there's a
> huge
> demand
> in a Chinese language support, we as a community should provide
> these
> users
> the tools they need, to grow our community and to allow them to
> follow
> the
> Apache way.
>
> Is it possible?
> I believe it is, because a number of other Apache projects are
> running
> non-English user@ mailing lists.
> Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
> non-English
> lists: http://mail-archives.apache.org/mod_mbox/
> One thing I want to make very clear in this discussion is that
> all
> project
> decisions, developer discussions, JIRA tickets etc. need to
> happen
> in
> English, as this is the primary language of the Apache Foundation
> and
> our
> community.
> We should also clarify this on the page listing the mailing
> lists.
>
> How?
> If there is consensus in this discussion thread, I would request
> the
> new
> mailing list next Monday.
> In case of discussions, I will start a vote on Monday or when the
> discussions have stopped.
> Then, we should put the new list on our website and start
> promoting
> it
> (in
> 

Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable to support fine grained recovery

2019-01-24 Thread zhijiang
Hi Bo,

Your mentioned problems can be summaried into two issues:

1. Failover strategy should consider whether the upstream produced partition is 
still available when the downstream fails. If the produced partition is 
available, then only downstream region needs to restarted, otherwise the 
upstream region should also be restarted to re-produce the partition data.
2. The lifecycle of partition: Currently once the partition data is transfered 
via network completely, the partition and view would be released from producer 
side, no matter whether the data is actually processed by consumer or not. Even 
the TaskManager would be released earier when the partition data is not 
transfered yet.

Both issues are already considered in my proposed pluggable shuffle manager 
architecutre which would introduce the ShuffleMaster componenet to manage 
partition globally on JobManager side, then it is natural to solve the above 
problems based on this architecuture. You can refer to the flip [1] if 
interested.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager

Best,
Zhijiang
--
From:Stephan Ewen 
Send Time:2019年1月24日(星期四) 22:17
To:dev ; Kurt Young 
Subject:Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable 
to support fine grained recovery

The SpillableSubpartition can also be used during the execution of bounded
DataStreams programs. I think this is largely independent from deprecating
the DataSet API.

I am wondering if this particular issue is one that has been addressed in
the Blink code already (we are looking to merge much of that functionality)
- because the proposed extension is actually necessary for proper batch
fault tolerance (independent of the DataSet or Query Processor stack).

I am adding Kurt to this thread - maybe he help us find that out.

On Thu, Jan 24, 2019 at 2:36 PM Piotr Nowojski 
wrote:

> Hi,
>
> I’m not sure how much effort we will be willing to invest in the existing
> batch stack. We are currently focusing on the support of bounded
> DataStreams (already done in Blink and will be merged to Flink soon) and
> unifing batch & stream under DataStream API.
>
> Piotrek
>
> > On 23 Jan 2019, at 04:45, Bo WANG  wrote:
> >
> > Hi all,
> >
> > When running the batch WordCount example,  I configured the job execution
> > mode
> > as BATCH_FORCED, and failover-strategy as region, I manually injected
> some
> > errors to let the execution fail in different phases. In some cases, the
> > job could
> > recovery from failover and became succeed, but in some cases, the job
> > retried
> > several times and failed.
> >
> > Example:
> > - If the failure occurred before task read data, e.g., failed before
> > invokable.invoke() in Task.java, failover could succeed.
> > - If the failure occurred after task having read data, failover did not
> > work.
> >
> > Problem diagnose:
> > Running the example described before, each ExecutionVertex is defined as
> > a restart region, and the ResultPartitionType between executions is
> > BLOCKING.
> > Thus, SpillableSubpartition and SpillableSubpartitionView are used to
> > write/read
> > shuffle data, and data blocks are described as BufferConsumers stored in
> a
> > list
> > called buffers, when task requires input data from
> > SpillableSubpartitionView,
> > BufferConsumers are REMOVED from buffers. Thus, when failures occurred
> > after having read data, some BufferConsumers have already released.
> > Although tasks retried, the input data is incomplete.
> >
> > Fix Proposal:
> > - BufferConsumer should not be removed from buffers until the consumed
> > ExecutionVertex is terminal.
> > - SpillableSubpartition should not be released until the consumed
> > ExecutionVertex is terminal.
> > - SpillableSubpartition could creates multi SpillableSubpartitionViews,
> > each of which is corresponding to a ExecutionAttempt.
> >
> > Best,
> > Bo
>
>



Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Hequn Cheng
+1  This would be very helpful!


On Fri, Jan 25, 2019 at 8:15 AM Guowei Ma  wrote:

> +1
>
> This not only helps Chinese users but also helps the community to collect
> more feedback and scenarios.
>
>
> > 在 2019年1月25日,上午2:29,Zhang, Xuefu  写道:
> >
> > +1 on the idea. This will certainly help promote Flink in China
> industries. On a side note, it would be great if anyone in the list can
> help source ideas, bug reports, and feature requests to dev@ list and/or
> JIRAs so as to gain broader attention.
> >
> > Thanks,
> > Xuefu
> >
> >
> > --
> > From:Fabian Hueske 
> > Sent At:2019 Jan. 24 (Thu.) 05:32
> > To:dev 
> > Subject:Re: [DISCUSS] Start a user...@flink.apache.org mailing list for
> the Chinese-speaking community?
> >
> > Thanks Robert!
> > I think this is a very good idea.
> > +1
> >
> > Fabian
> >
> >> Am Do., 24. Jan. 2019 um 14:09 Uhr schrieb Jeff Zhang  >:
> >>
> >> +1
> >>
> >> Piotr Nowojski  于2019年1月24日周四 下午8:38写道:
> >>
> >>> +1, good idea, especially with that many Chinese speaking contributors,
> >>> committers & users :)
> >>>
> >>> Piotrek
> >>>
>  On 24 Jan 2019, at 13:20, Kurt Young  wrote:
> 
>  Big +1 on this, it will indeed help Chinese speaking users a lot.
> 
>  fudian.fd 于2019年1月24日 周四20:18写道:
> 
> > +1. I noticed that many folks from China are requesting the JIRA
> > permission in the past year. It reflects that more and more
> developers
> >>> from
> > China are using Flink. A Chinese oriented mailing list will
> definitely
> >>> be
> > helpful for the growth of Flink in China.
> >
> >
> >> 在 2019年1月24日,下午7:42,Stephan Ewen  写道:
> >>
> >> +1, a very nice idea
> >>
> >> On Thu, Jan 24, 2019 at 12:41 PM Robert Metzger <
> rmetz...@apache.org
> >>>
> > wrote:
> >>
> >>> Thanks for your response.
> >>>
> >>> You are right, I'm proposing "user...@flink.apache.org" as the
> >>> mailing
> >>> list's name!
> >>>
> >>> On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai <
> > tzuli...@apache.org>
> >>> wrote:
> >>>
>  Hi Robert,
> 
>  Thanks a lot for starting this discussion!
> 
>  +1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the
> > title,
>  but
>  -cn in the opening email content.
>  I think -zh would be better as we are establishing the tool for
> >>> general
>  Chinese-speaking users).
>  All dev@ discussions / JIRAs should still be in a single English
> > mailing
>  list.
> 
>  From what I've seen in the DingTalk Flink user group, there's
> >> quite a
> > bit
>  of activity in forms of user questions and replies.
>  It would really be great if the Chinese-speaking user community
> can
>  actually have these discussions happen in the Apache mailing
> lists,
>  so that questions / discussions / replies from developers can be
> > indexed
>  and searchable.
>  Moreover, it'll give the community more insight in how active a
>  Chinese-speaking contributor is helping with user requests,
>  which in general is a form of contribution that the community
> >> always
> >>> merits
>  a lot.
> 
>  Cheers,
>  Gordon
> 
>  On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger <
> >> rmetz...@apache.org
> 
>  wrote:
> 
> > Hey all,
> >
> > I would like to create a new user support mailing list called "
> > user...@flink.apache.org" to cater the Chinese-speaking Flink
> >>> community.
> >
> > Why?
> > In the last year 24% of the traffic on flink.apache.org came
> from
> >>> the
>  US,
> > 22% from China. In the last three months, China is at 30%, the US
> >> at
> >>> 20%.
> > An additional data point is that there's a Flink DingTalk group
> >> with
> >>> more
> > than 5000 members, asking Flink questions.
> > I believe that knowledge about Flink should be available in
> public
> >>> forums
> > (our mailing list), indexable by search engines. If there's a
> huge
> >>> demand
> > in a Chinese language support, we as a community should provide
> >>> these
>  users
> > the tools they need, to grow our community and to allow them to
> >>> follow
>  the
> > Apache way.
> >
> > Is it possible?
> > I believe it is, because a number of other Apache projects are
> >>> running
> > non-English user@ mailing lists.
> > Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
> >>> non-English
> > lists: http://mail-archives.apache.org/mod_mbox/
> > One thing I want to make very clear in this discussion is that
> all
>  project
> > 

Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread shengjk1
+1, a good idea


thanks


On 01/25/2019 09:26,Hequn Cheng wrote:
+1  This would be very helpful!


On Fri, Jan 25, 2019 at 8:15 AM Guowei Ma  wrote:

+1

This not only helps Chinese users but also helps the community to collect
more feedback and scenarios.


在 2019年1月25日,上午2:29,Zhang, Xuefu  写道:

+1 on the idea. This will certainly help promote Flink in China
industries. On a side note, it would be great if anyone in the list can
help source ideas, bug reports, and feature requests to dev@ list and/or
JIRAs so as to gain broader attention.

Thanks,
Xuefu


--
From:Fabian Hueske 
Sent At:2019 Jan. 24 (Thu.) 05:32
To:dev 
Subject:Re: [DISCUSS] Start a user...@flink.apache.org mailing list for
the Chinese-speaking community?

Thanks Robert!
I think this is a very good idea.
+1

Fabian

Am Do., 24. Jan. 2019 um 14:09 Uhr schrieb Jeff Zhang  于2019年1月24日周四 下午8:38写道:

+1, good idea, especially with that many Chinese speaking contributors,
committers & users :)

Piotrek

On 24 Jan 2019, at 13:20, Kurt Young  wrote:

Big +1 on this, it will indeed help Chinese speaking users a lot.

fudian.fd 于2019年1月24日 周四20:18写道:

+1. I noticed that many folks from China are requesting the JIRA
permission in the past year. It reflects that more and more
developers
from
China are using Flink. A Chinese oriented mailing list will
definitely
be
helpful for the growth of Flink in China.


在 2019年1月24日,下午7:42,Stephan Ewen  写道:

+1, a very nice idea

On Thu, Jan 24, 2019 at 12:41 PM Robert Metzger <
rmetz...@apache.org

wrote:

Thanks for your response.

You are right, I'm proposing "user...@flink.apache.org" as the
mailing
list's name!

On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai <
tzuli...@apache.org>
wrote:

Hi Robert,

Thanks a lot for starting this discussion!

+1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the
title,
but
-cn in the opening email content.
I think -zh would be better as we are establishing the tool for
general
Chinese-speaking users).
All dev@ discussions / JIRAs should still be in a single English
mailing
list.

From what I've seen in the DingTalk Flink user group, there's
quite a
bit
of activity in forms of user questions and replies.
It would really be great if the Chinese-speaking user community
can
actually have these discussions happen in the Apache mailing
lists,
so that questions / discussions / replies from developers can be
indexed
and searchable.
Moreover, it'll give the community more insight in how active a
Chinese-speaking contributor is helping with user requests,
which in general is a form of contribution that the community
always
merits
a lot.

Cheers,
Gordon

On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger <
rmetz...@apache.org

wrote:

Hey all,

I would like to create a new user support mailing list called "
user...@flink.apache.org" to cater the Chinese-speaking Flink
community.

Why?
In the last year 24% of the traffic on flink.apache.org came
from
the
US,
22% from China. In the last three months, China is at 30%, the US
at
20%.
An additional data point is that there's a Flink DingTalk group
with
more
than 5000 members, asking Flink questions.
I believe that knowledge about Flink should be available in
public
forums
(our mailing list), indexable by search engines. If there's a
huge
demand
in a Chinese language support, we as a community should provide
these
users
the tools they need, to grow our community and to allow them to
follow
the
Apache way.

Is it possible?
I believe it is, because a number of other Apache projects are
running
non-English user@ mailing lists.
Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
non-English
lists: http://mail-archives.apache.org/mod_mbox/
One thing I want to make very clear in this discussion is that
all
project
decisions, developer discussions, JIRA tickets etc. need to
happen
in
English, as this is the primary language of the Apache Foundation
and
our
community.
We should also clarify this on the page listing the mailing
lists.

How?
If there is consensus in this discussion thread, I would request
the
new
mailing list next Monday.
In case of discussions, I will start a vote on Monday or when the
discussions have stopped.
Then, we should put the new list on our website and start
promoting
it
(in
said DingTalk group and on social media).

Let me know what you think about this idea :)

Best,
Robert


PS: In case you are wondering what ZH stands for:
https://en.wiktionary.org/wiki/ZH




--
Best,
Kurt



--
Best Regards

Jeff Zhang




Re: [DISCUSS] FLIP-32: Restructure flink-table for future contributions

2019-01-24 Thread jincheng sun
Hi Timo,

Thanks a lot for bring up the FLIP-32 discussion and the very detailed
implementation plan document !

Restructure `flink-table` is an important part of flink merge blink,
looking forward to the JIRAs which will be opened !

Cheers,
Jincheng


Timo Walther  于2019年1月24日周四 下午9:06写道:

> Hi everyone,
>
> as Stephan already announced on the mailing list [1], the Flink
> community will receive a big code contribution from Alibaba. The
> flink-table module is one of the biggest parts that will receive many
> new features and major architectural improvements. Instead of waiting
> until the next major version of Flink or introducing big API-breaking
> changes, we would like to gradually build up the Blink-based planner and
> runtime while keeping the Table & SQL API mostly stable. Users will be
> able to play around with the current merge status of the new planner or
> fall back to the old planner until the new one is stable.
>
> We have prepared a design document that discusses a restructuring of the
> flink-table module and suggests a rough implementation plan:
>
>
> https://docs.google.com/document/d/1Tfl2dBqBV3qSBy7oV3qLYvRRDbUOasvA1lhvYWWljQw/edit?usp=sharing
>
> I will briefly summarize the steps we would like to do:
>
> - Split the flink-table module similar to the proposal of FLIP-28 [3]
> which is outdated. This is a preparation to separate API from core
> (targeted for Flink 1.8).
> - Perform minor API changes to separate API from actual implementation
> (targeted for Flink 1.8).
> - Merge a MVP Blink SQL planner given that necessary Flink core/runtime
> changes have been completed.
>The merging will happen in stages (e.g. basic planner framework, then
> operator by operator). The exact merging plan still needs to be determined.
> - Rework the type system in order to unblock work on unified table
> environments, UDFs, sources/sinks, and catalog.
> - Enable full end-to-end batch and stream execution features.
>
> Our mid-term goal:
>
> Run full TPC-DS on a unified batch/streaming runtime. Initially, we will
> only support ingesting data coming from the DataStream API. Once we
> reworked the sources/sink interfaces, we will target full end-to-end
> TPC-DS query execution with table connectors.
>
> A rough task dependency graph is illustrated in the design document. A
> more detailed task dependency structure will be added to JIRA after we
> agreed on this FLIP.
>
> Looking forward to any feedback.
>
> Thanks,
> Timo
>
> [1]
>
> https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E
> [2]
>
> https://lists.apache.org/thread.html/6066abd0f09fc1c41190afad67770ede8efd0bebc36f00938eecc118@%3Cdev.flink.apache.org%3E
> [3]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free
>
>


Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable to support fine grained recovery

2019-01-24 Thread Guowei Ma
Thanks to zhijiang for a detailed explanation. I would do some supplements
Blink has indeed solved this particular problem. This problem can be
identified in Blink and the upstream will be restarted by Blink
thanks

zhijiang  于2019年1月25日周五 下午12:04写道:

> Hi Bo,
>
> Your mentioned problems can be summaried into two issues:
>
> 1. Failover strategy should consider whether the upstream produced
> partition is still available when the downstream fails. If the produced
> partition is available, then only downstream region needs to restarted,
> otherwise the upstream region should also be restarted to re-produce the
> partition data.
> 2. The lifecycle of partition: Currently once the partition data is
> transfered via network completely, the partition and view would be released
> from producer side, no matter whether the data is actually processed by
> consumer or not. Even the TaskManager would be released earier when the
> partition data is not transfered yet.
>
> Both issues are already considered in my proposed pluggable shuffle
> manager architecutre which would introduce the ShuffleMaster componenet to
> manage partition globally on JobManager side, then it is natural to solve
> the above problems based on this architecuture. You can refer to the flip
> [1] if interested.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager
>
> Best,
> Zhijiang
> --
> From:Stephan Ewen 
> Send Time:2019年1月24日(星期四) 22:17
> To:dev ; Kurt Young 
> Subject:Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly
> readable to support fine grained recovery
>
> The SpillableSubpartition can also be used during the execution of bounded
> DataStreams programs. I think this is largely independent from deprecating
> the DataSet API.
>
> I am wondering if this particular issue is one that has been addressed in
> the Blink code already (we are looking to merge much of that functionality)
> - because the proposed extension is actually necessary for proper batch
> fault tolerance (independent of the DataSet or Query Processor stack).
>
> I am adding Kurt to this thread - maybe he help us find that out.
>
> On Thu, Jan 24, 2019 at 2:36 PM Piotr Nowojski 
> wrote:
>
> > Hi,
> >
> > I’m not sure how much effort we will be willing to invest in the existing
> > batch stack. We are currently focusing on the support of bounded
> > DataStreams (already done in Blink and will be merged to Flink soon) and
> > unifing batch & stream under DataStream API.
> >
> > Piotrek
> >
> > > On 23 Jan 2019, at 04:45, Bo WANG  wrote:
> > >
> > > Hi all,
> > >
> > > When running the batch WordCount example,  I configured the job
> execution
> > > mode
> > > as BATCH_FORCED, and failover-strategy as region, I manually injected
> > some
> > > errors to let the execution fail in different phases. In some cases,
> the
> > > job could
> > > recovery from failover and became succeed, but in some cases, the job
> > > retried
> > > several times and failed.
> > >
> > > Example:
> > > - If the failure occurred before task read data, e.g., failed before
> > > invokable.invoke() in Task.java, failover could succeed.
> > > - If the failure occurred after task having read data, failover did not
> > > work.
> > >
> > > Problem diagnose:
> > > Running the example described before, each ExecutionVertex is defined
> as
> > > a restart region, and the ResultPartitionType between executions is
> > > BLOCKING.
> > > Thus, SpillableSubpartition and SpillableSubpartitionView are used to
> > > write/read
> > > shuffle data, and data blocks are described as BufferConsumers stored
> in
> > a
> > > list
> > > called buffers, when task requires input data from
> > > SpillableSubpartitionView,
> > > BufferConsumers are REMOVED from buffers. Thus, when failures occurred
> > > after having read data, some BufferConsumers have already released.
> > > Although tasks retried, the input data is incomplete.
> > >
> > > Fix Proposal:
> > > - BufferConsumer should not be removed from buffers until the consumed
> > > ExecutionVertex is terminal.
> > > - SpillableSubpartition should not be released until the consumed
> > > ExecutionVertex is terminal.
> > > - SpillableSubpartition could creates multi SpillableSubpartitionViews,
> > > each of which is corresponding to a ExecutionAttempt.
> > >
> > > Best,
> > > Bo
> >
> >
>
>


[jira] [Created] (FLINK-11424) org.apache.flink.metrics.datadog.DatadogHttpReporter#report remove will throw exception

2019-01-24 Thread lining (JIRA)
lining created FLINK-11424:
--

 Summary: 
org.apache.flink.metrics.datadog.DatadogHttpReporter#report remove will throw 
exception
 Key: FLINK-11424
 URL: https://issues.apache.org/jira/browse/FLINK-11424
 Project: Flink
  Issue Type: Bug
  Components: Metrics
Reporter: lining






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Flink CEP : Doesn't generate output

2019-01-24 Thread Chesnay Schepler
Can you provide us a self-contained reproducing example? (preferably as 
elementary as possible)


On 22.01.2019 18:58, dhanuka ranasinghe wrote:

Hi All,

I have used Flink CEP to filter some events and generate some alerts 
based on certain conditions. But unfortunately doesn't print any 
result. I have attached source code herewith, could you please help me 
on this.





package org.monitoring.stream.analytics;

import java.io.IOException;
import java.sql.Timestamp;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.Set;

import org.apache.flink.api.common.restartstrategy.RestartStrategies;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import 
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer010;
import org.apache.flink.table.functions.ScalarFunction;
import org.apache.flink.table.shaded.org.apache.commons.lang3.StringUtils;
import org.monitoring.stream.analytics.util.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.hazelcast.core.HazelcastInstance;
import com.hazelcast.core.MultiMap;

import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternSelectFunction;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.SimpleCondition;
import org.apache.flink.streaming.api.windowing.time.Time;


public class FlinkCEP {
private final static Logger LOGGER = 
LoggerFactory.getLogger(FlinkCEP.class);


public static void main(String[] args) throws Exception {

String query = 
FileHandler.readInputStream(FileHandler.getResourceAsStream("query.sql"));

if (query == null) {
LOGGER.error("*  Can't read resources 
");

} else {
LOGGER.info(" " + query + " 
=");

}
Properties props = 
FileHandler.loadResourceProperties("application.properties");
Properties kConsumer = 
FileHandler.loadResourceProperties("consumer.properties");
Properties kProducer = 
FileHandler.loadResourceProperties("producer.properties");
String hzConfig = 
FileHandler.readInputStream(FileHandler.getResourceAsStream("hazelcast-client.xml"));
String schemaContent = 
FileHandler.readInputStream(FileHandler.getResourceAsStream("IRIC-schema.json"));


props.setProperty("auto.offset.reset", "latest");
props.setProperty("flink.starting-position", "latest");
Map tempMap = new HashMap<>();
for (final String name : props.stringPropertyNames())
tempMap.put(name, props.getProperty(name));
final ParameterTool params = ParameterTool.fromMap(tempMap);
String jobName = props.getProperty(ApplicationConfig.JOB_NAME);

LOGGER.info("%%% Desktop Responsibility 
Start %%");


LOGGER.info("$$$ Hz instance name " + 
props.toString());

HazelcastInstance hzInst = HazelcastUtils.getClient(hzConfig, "");

LOGGER.info("== schema " + schemaContent);

MultiMap distributedMap = 
hzInst.getMultiMap("masterDataSynch");

distributedMap.put(jobName, query);

LOGGER.info("%% Desktop Responsibility 
End %");


Collection queries = distributedMap.get(jobName);
Set rules = new HashSet<>(queries);
LOGGER.info("== query" + query);
rules.add(query);
hzInst.getLifecycleService().shutdown();
final String sourceTable = "dataTable";

String paral = props.getProperty(ApplicationConfig.FLINK_PARALLEL_TASK);
String noOfOROperatorsValue = 
props.getProperty(ApplicationConfig.FLINK_NUMBER_OF_OR_OPERATORS);

int noOfOROperators = 50;
if(StringUtils.isNoneBlank(noOfOROperatorsValue)) {
noOfOROperators = Integer.parseInt(noOfOROperatorsValue);
}
List> subQueries = chunk(new ArrayList(rules), 
noOfOROperators);


// define a schema

// setup streaming environment
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();

// env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.getConfig().setRestartStrategy(RestartStrategies.fixedDelayRestart(4, 
1));

env.enableCheckpointing(30); // 300 seconds
env.getConfig().setGlobalJobParameters(params);
// env.getConfig().enableObjectReuse();
env.getConfig().setUseSnapshotCompression(true);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);

Re: [DISCUSS] FLIP-27: Refactor Source Interface

2019-01-24 Thread Piotr Nowojski
Hi Biao!

This discussion was stalled because of preparations for the open sourcing & 
merging Blink. I think before creating the tickets we should split this 
discussion into topics/areas outlined by Stephan and create Flips for that.

I think there is no chance for this to be completed in couple of remaining 
weeks/1 month before 1.8 feature freeze, however it would be good to aim with 
those changes for 1.9.

Piotrek 

> On 20 Jan 2019, at 16:08, Biao Liu  wrote:
> 
> Hi community,
> The summary of Stephan makes a lot sense to me. It is much clearer indeed
> after splitting the complex topic into small ones.
> I was wondering is there any detail plan for next step? If not, I would
> like to push this thing forward by creating some JIRA issues.
> Another question is that should version 1.8 include these features?
> 
> Stephan Ewen  于2018年12月1日周六 上午4:20写道:
> 
>> Thanks everyone for the lively discussion. Let me try to summarize where I
>> see convergence in the discussion and open issues.
>> I'll try to group this by design aspect of the source. Please let me know
>> if I got things wrong or missed something crucial here.
>> 
>> For issues 1-3, if the below reflects the state of the discussion, I would
>> try and update the FLIP in the next days.
>> For the remaining ones we need more discussion.
>> 
>> I would suggest to fork each of these aspects into a separate mail thread,
>> or will loose sight of the individual aspects.
>> 
>> *(1) Separation of Split Enumerator and Split Reader*
>> 
>>  - All seem to agree this is a good thing
>>  - Split Enumerator could in the end live on JobManager (and assign splits
>> via RPC) or in a task (and assign splits via data streams)
>>  - this discussion is orthogonal and should come later, when the interface
>> is agreed upon.
>> 
>> *(2) Split Readers for one or more splits*
>> 
>>  - Discussion seems to agree that we need to support one reader that
>> possibly handles multiple splits concurrently.
>>  - The requirement comes from sources where one poll()-style call fetches
>> data from different splits / partitions
>>--> example sources that require that would be for example Kafka,
>> Pravega, Pulsar
>> 
>>  - Could have one split reader per source, or multiple split readers that
>> share the "poll()" function
>>  - To not make it too complicated, we can start with thinking about one
>> split reader for all splits initially and see if that covers all
>> requirements
>> 
>> *(3) Threading model of the Split Reader*
>> 
>>  - Most active part of the discussion ;-)
>> 
>>  - A non-blocking way for Flink's task code to interact with the source is
>> needed in order to a task runtime code based on a
>> single-threaded/actor-style task design
>>--> I personally am a big proponent of that, it will help with
>> well-behaved checkpoints, efficiency, and simpler yet more robust runtime
>> code
>> 
>>  - Users care about simple abstraction, so as a subclass of SplitReader
>> (non-blocking / async) we need to have a BlockingSplitReader which will
>> form the basis of most source implementations. BlockingSplitReader lets
>> users do blocking simple poll() calls.
>>  - The BlockingSplitReader would spawn a thread (or more) and the
>> thread(s) can make blocking calls and hand over data buffers via a blocking
>> queue
>>  - This should allow us to cover both, a fully async runtime, and a simple
>> blocking interface for users.
>>  - This is actually very similar to how the Kafka connectors work. Kafka
>> 9+ with one thread, Kafka 8 with multiple threads
>> 
>>  - On the base SplitReader (the async one), the non-blocking method that
>> gets the next chunk of data would signal data availability via a
>> CompletableFuture, because that gives the best flexibility (can await
>> completion or register notification handlers).
>>  - The source task would register a "thenHandle()" (or similar) on the
>> future to put a "take next data" task into the actor-style mailbox
>> 
>> *(4) Split Enumeration and Assignment*
>> 
>>  - Splits may be generated lazily, both in cases where there is a limited
>> number of splits (but very many), or splits are discovered over time
>>  - Assignment should also be lazy, to get better load balancing
>>  - Assignment needs support locality preferences
>> 
>>  - Possible design based on discussion so far:
>> 
>>--> SplitReader has a method "addSplits(SplitT...)" to add one or more
>> splits. Some split readers might assume they have only one split ever,
>> concurrently, others assume multiple splits. (Note: idea behind being able
>> to add multiple splits at the same time is to ease startup where multiple
>> splits may be assigned instantly.)
>>--> SplitReader has a context object on which it can call indicate when
>> splits are completed. The enumerator gets that notification and can use to
>> decide when to assign new splits. This should help both in cases of sources
>> that take splits lazily (file readers) and in case the source needs to

[jira] [Created] (FLINK-11427) Protobuf parquet writer implementation

2019-01-24 Thread Guang Hu (JIRA)
Guang Hu created FLINK-11427:


 Summary: Protobuf parquet writer implementation
 Key: FLINK-11427
 URL: https://issues.apache.org/jira/browse/FLINK-11427
 Project: Flink
  Issue Type: Improvement
  Components: Formats
Reporter: Guang Hu


Right now only ParquetAvroWriters exist to create ParquetWriterFactory. We want 
to implement a protobuf ParquetProtoWriters to create ParquetWriterFactory.  I 
am happy to submit a PR if this approach sounds good . 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] Contributing Alibaba's Blink

2019-01-24 Thread Timo Walther
Regarding the content of a `blink-1.5` branch, is it possible to rebase 
the big Blink commit on top of the current master or the last Flink release?


I don't mean a full rebase here, but just forking the branch from 
current Flink, and putting the Blink content into the repository, and 
commit it. This would enable to see a diff which classes and lines have 
changed and which are still the same. I guess this would be very helpful 
instead of a branch with a big commit that has no common origin.


Thanks,
Timo

Am 24.01.19 um 02:54 schrieb Becket Qin:

Thanks Stephan,

The plan makes sense to me.

Regarding the docs, it seems better to have a separate versioned website
because there are a lot of changes spread over the places. We can add the
banner to remind users that they are looking at the blink docs, which is
temporary and will eventually be merged into Flink master. (The banner is
pretty similar to what user will see when they visit docs of old flink
versions

[1]).

Thanks,

Jiangjie (Becket) Qn

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html

On Thu, Jan 24, 2019 at 6:21 AM Shaoxuan Wang  wrote:


Thanks Stephan,
The entire plan looks good to me. WRT the "Docs for Flink", a subsection
should be good enough if we just introduce the outlines of what blink has
changed. However, we have made detailed introductions to blink based on the
framework of current release document of Flink (those introductions are
distributed in each subsections). Does it make sense to create a blink
document as a separate one, under the documentation section, say blink-1.5
(temporary, not a release).

Regards,
Shaoxuan


On Wed, Jan 23, 2019 at 10:15 PM Stephan Ewen  wrote:


Nice to see this lively discussion.

*--- Branch Versus Repository ---*

Looks like this is converging towards pushing a branch.
How about naming the branch simply "blink-1.5" ? That would be in line

with

the 1.5 version branch of Flink, which is simply called "release-1.5" ?

*--- SGA --- *

The SGA (Software Grant Agreement) should be either filed already or in

the

process of filing.

*--- Offering Jars for Blink ---*

As Chesnay and Timo mentioned, we cannot easily offer a "Release" of

Blink

(source or binary), because that would require a thorough
checking of licenses and creating/ bundling license files. That is a lot

of

work, as we recently experienced again in the Flink master.

What we can do is upload compiled jar files and link to them somewhere in
the blink docs. We need to add a disclaimer that these are
convenience jars, and not an official Apache release. I hope that would
work for the users that are curious to try things out.

*--- Docs for Blink --- *

Do we need a versioned website here? If not, can we simply make this a
subsection of the current Flink snapshot docs?
Next to "Flink Development" and "Internals", we could have a section on
"Blink branch".
I think it is crucial, thought, to make it clear that this is temporary

and

will eventually be subsumed by the main release, just
so that users do not get confused.

Best,
Stephan


On Wed, Jan 23, 2019 at 12:23 PM Becket Qin 

wrote:

Really excited to see Blink joining the Flink community!

My two cents regarding repo v.s. branch, I am +1 for a branch in Flink.
Among many things, what's most important at this point is probably to

make

Blink code available to the developers so people can discuss the merge
strategy. Creating a branch is probably the one of the fastest way to

do

that. We can always create separate repo later if necessary.

WRT the doc and jar distribution, It is true that we are going to have
some major refactoring to the code. But I can imagine some curious

users

may still want to try out something in Blink and it would be good if we

can

do them a favor. Legal wise, my hunch is that it is probably OK for

someone

to just build the jars and docs, host it somewhere for convenience. But

it

should be clear that this is just for convenience purpose instead of an
official release form Apache (unless we would like to make it

official).

Thanks,

Jiangjie (Becket) Qin

On Wed, Jan 23, 2019 at 6:48 PM Chesnay Schepler 
wrote:


  From the ASF side Jar files do notrequire a vote/release process,

this

is at the discretion of the PMC.

However, I have my doubts whether at this time we could even create a
source release of Blink given that we'd have to vet the code-base

first.

Even without source release we could still distribute jars, but would
not be allowed to advertise them to users as they do not constitute an
official release.

On 23.01.2019 11:41, Timo Walther wrote:

As far as I know it, we will not provide any binaries but only the
source code. JAR files on Apache servers would need an official
voting/release process. Interested users can build Blink themselves
using `mvn clean package`.

@Stephan: Please correct me 

[jira] [Created] (FLINK-11426) FlinkKafkaProducerBase test case failed

2019-01-24 Thread vinoyang (JIRA)
vinoyang created FLINK-11426:


 Summary: FlinkKafkaProducerBase test case failed
 Key: FLINK-11426
 URL: https://issues.apache.org/jira/browse/FLINK-11426
 Project: Flink
  Issue Type: Test
  Components: Tests
Reporter: vinoyang


error info:
{code:java}
10:15:02,459 ERROR 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase  - Error 
while sending record to Kafka: Test error
java.lang.Exception: Test error
at 
org.apache.flink.streaming.connectors.kafka.KafkaProducerTest$1.answer(KafkaProducerTest.java:78)
at 
org.apache.flink.streaming.connectors.kafka.KafkaProducerTest$1.answer(KafkaProducerTest.java:74)
at 
org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:39)
at 
org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:96)
at 
org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
at 
org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:35)
at 
org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:63)
at 
org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:49)
at 
org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptSuperCallable(MockMethodInterceptor.java:110)
at 
org.apache.kafka.clients.producer.KafkaProducer$MockitoMock$1191687640.send(Unknown
 Source)
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.invoke(FlinkKafkaProducerBase.java:313)
at 
org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
at 
org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness.processElement(OneInputStreamOperatorTestHarness.java:111)
at 
org.apache.flink.streaming.connectors.kafka.KafkaProducerTest.testPropagateExceptions(KafkaProducerTest.java:116)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:68)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:326)
at org.junit.internal.runners.MethodRoadie$2.run(MethodRoadie.java:89)
at 
org.junit.internal.runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java:97)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.executeTest(PowerMockJUnit44RunnerDelegateImpl.java:310)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTestInSuper(PowerMockJUnit47RunnerDelegateImpl.java:131)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.access$100(PowerMockJUnit47RunnerDelegateImpl.java:59)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner$TestExecutorStatement.evaluate(PowerMockJUnit47RunnerDelegateImpl.java:147)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.evaluateStatement(PowerMockJUnit47RunnerDelegateImpl.java:107)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit47RunnerDelegateImpl$PowerMockJUnit47MethodRunner.executeTest(PowerMockJUnit47RunnerDelegateImpl.java:82)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runBeforesThenTestThenAfters(PowerMockJUnit44RunnerDelegateImpl.java:298)
at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:87)
at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java:50)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.invokeTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:218)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.runMethods(PowerMockJUnit44RunnerDelegateImpl.java:160)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$1.run(PowerMockJUnit44RunnerDelegateImpl.java:134)
at 
org.junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java:34)
at 

Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Tzu-Li (Gordon) Tai
Hi Robert,

Thanks a lot for starting this discussion!

+1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the title, but
-cn in the opening email content.
I think -zh would be better as we are establishing the tool for general
Chinese-speaking users).
All dev@ discussions / JIRAs should still be in a single English mailing
list.

>From what I've seen in the DingTalk Flink user group, there's quite a bit
of activity in forms of user questions and replies.
It would really be great if the Chinese-speaking user community can
actually have these discussions happen in the Apache mailing lists,
so that questions / discussions / replies from developers can be indexed
and searchable.
Moreover, it'll give the community more insight in how active a
Chinese-speaking contributor is helping with user requests,
which in general is a form of contribution that the community always merits
a lot.

Cheers,
Gordon

On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger  wrote:

> Hey all,
>
> I would like to create a new user support mailing list called "
> user...@flink.apache.org" to cater the Chinese-speaking Flink community.
>
> Why?
> In the last year 24% of the traffic on flink.apache.org came from the US,
> 22% from China. In the last three months, China is at 30%, the US at 20%.
> An additional data point is that there's a Flink DingTalk group with more
> than 5000 members, asking Flink questions.
> I believe that knowledge about Flink should be available in public forums
> (our mailing list), indexable by search engines. If there's a huge demand
> in a Chinese language support, we as a community should provide these users
> the tools they need, to grow our community and to allow them to follow the
> Apache way.
>
> Is it possible?
> I believe it is, because a number of other Apache projects are running
> non-English user@ mailing lists.
> Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have non-English
> lists: http://mail-archives.apache.org/mod_mbox/
> One thing I want to make very clear in this discussion is that all project
> decisions, developer discussions, JIRA tickets etc. need to happen in
> English, as this is the primary language of the Apache Foundation and our
> community.
> We should also clarify this on the page listing the mailing lists.
>
> How?
> If there is consensus in this discussion thread, I would request the new
> mailing list next Monday.
> In case of discussions, I will start a vote on Monday or when the
> discussions have stopped.
> Then, we should put the new list on our website and start promoting it (in
> said DingTalk group and on social media).
>
> Let me know what you think about this idea :)
>
> Best,
> Robert
>
>
> PS: In case you are wondering what ZH stands for:
> https://en.wiktionary.org/wiki/ZH
>


[jira] [Created] (FLINK-11425) Support of “Hash Teams” in Hybrid Hash Join

2019-01-24 Thread LiuJi (JIRA)
LiuJi created FLINK-11425:
-

 Summary: Support of “Hash Teams” in Hybrid Hash Join
 Key: FLINK-11425
 URL: https://issues.apache.org/jira/browse/FLINK-11425
 Project: Flink
  Issue Type: New Feature
  Components: Core, Optimizer
Reporter: LiuJi


Hybrid Hash Join is already supported in current version. The join starts 
operating in memory and gradually starts spilling contents to disk, when the 
memory is not sufficient.

 

Current hash join only support two inputs,  so when a job contains multiple 
hash joins which have the same join keys, it will consume some unnecessary 
resources (I/O, memory, etc) because some upstream output data may useless for 
downstream hash join.

 

According to the above observations, we want to provide a HashTeamManager to 
implement multiway inputs hash join by combining several two way hash join 
which have same join keys. HashTeamManager manage the relations of multiple 
HashTables and improve efficiency in memory use and lower I/O operations by 
joining multiple relations at one time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] Contributing Alibaba's Blink

2019-01-24 Thread Kurt Young
Sure, i will do the rebase before pushing the branch.

Timo Walther 于2019年1月24日 周四18:20写道:

> Regarding the content of a `blink-1.5` branch, is it possible to rebase
> the big Blink commit on top of the current master or the last Flink
> release?
>
> I don't mean a full rebase here, but just forking the branch from
> current Flink, and putting the Blink content into the repository, and
> commit it. This would enable to see a diff which classes and lines have
> changed and which are still the same. I guess this would be very helpful
> instead of a branch with a big commit that has no common origin.
>
> Thanks,
> Timo
>
> Am 24.01.19 um 02:54 schrieb Becket Qin:
> > Thanks Stephan,
> >
> > The plan makes sense to me.
> >
> > Regarding the docs, it seems better to have a separate versioned website
> > because there are a lot of changes spread over the places. We can add the
> > banner to remind users that they are looking at the blink docs, which is
> > temporary and will eventually be merged into Flink master. (The banner is
> > pretty similar to what user will see when they visit docs of old flink
> > versions
> > <
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html
> >
> > [1]).
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qn
> >
> > [1]
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html
> >
> > On Thu, Jan 24, 2019 at 6:21 AM Shaoxuan Wang 
> wrote:
> >
> >> Thanks Stephan,
> >> The entire plan looks good to me. WRT the "Docs for Flink", a subsection
> >> should be good enough if we just introduce the outlines of what blink
> has
> >> changed. However, we have made detailed introductions to blink based on
> the
> >> framework of current release document of Flink (those introductions are
> >> distributed in each subsections). Does it make sense to create a blink
> >> document as a separate one, under the documentation section, say
> blink-1.5
> >> (temporary, not a release).
> >>
> >> Regards,
> >> Shaoxuan
> >>
> >>
> >> On Wed, Jan 23, 2019 at 10:15 PM Stephan Ewen  wrote:
> >>
> >>> Nice to see this lively discussion.
> >>>
> >>> *--- Branch Versus Repository ---*
> >>>
> >>> Looks like this is converging towards pushing a branch.
> >>> How about naming the branch simply "blink-1.5" ? That would be in line
> >> with
> >>> the 1.5 version branch of Flink, which is simply called "release-1.5" ?
> >>>
> >>> *--- SGA --- *
> >>>
> >>> The SGA (Software Grant Agreement) should be either filed already or in
> >> the
> >>> process of filing.
> >>>
> >>> *--- Offering Jars for Blink ---*
> >>>
> >>> As Chesnay and Timo mentioned, we cannot easily offer a "Release" of
> >> Blink
> >>> (source or binary), because that would require a thorough
> >>> checking of licenses and creating/ bundling license files. That is a
> lot
> >> of
> >>> work, as we recently experienced again in the Flink master.
> >>>
> >>> What we can do is upload compiled jar files and link to them somewhere
> in
> >>> the blink docs. We need to add a disclaimer that these are
> >>> convenience jars, and not an official Apache release. I hope that would
> >>> work for the users that are curious to try things out.
> >>>
> >>> *--- Docs for Blink --- *
> >>>
> >>> Do we need a versioned website here? If not, can we simply make this a
> >>> subsection of the current Flink snapshot docs?
> >>> Next to "Flink Development" and "Internals", we could have a section on
> >>> "Blink branch".
> >>> I think it is crucial, thought, to make it clear that this is temporary
> >> and
> >>> will eventually be subsumed by the main release, just
> >>> so that users do not get confused.
> >>>
> >>> Best,
> >>> Stephan
> >>>
> >>>
> >>> On Wed, Jan 23, 2019 at 12:23 PM Becket Qin 
> >> wrote:
>  Really excited to see Blink joining the Flink community!
> 
>  My two cents regarding repo v.s. branch, I am +1 for a branch in
> Flink.
>  Among many things, what's most important at this point is probably to
> >>> make
>  Blink code available to the developers so people can discuss the merge
>  strategy. Creating a branch is probably the one of the fastest way to
> >> do
>  that. We can always create separate repo later if necessary.
> 
>  WRT the doc and jar distribution, It is true that we are going to have
>  some major refactoring to the code. But I can imagine some curious
> >> users
>  may still want to try out something in Blink and it would be good if
> we
> >>> can
>  do them a favor. Legal wise, my hunch is that it is probably OK for
> >>> someone
>  to just build the jars and docs, host it somewhere for convenience.
> But
> >>> it
>  should be clear that this is just for convenience purpose instead of
> an
>  official release form Apache (unless we would like to make it
> >> official).
>  Thanks,
> 
>  Jiangjie (Becket) Qin
> 
>  On Wed, Jan 23, 2019 at 6:48 PM Chesnay Schepler 
>  wrote:
> 
> 

Re: [DISCUSS] FLIP-27: Refactor Source Interface

2019-01-24 Thread Stephan Ewen
Before creating any JIRA issues, we need to converge a bit further on the
design.
There are too many unsolved questions in the above summary.

I would try and come up with a next version of the interface proposal in
the coming days and
use that as the base to continue the discussion.

Whether this can be part of 1.8 or not depends on how fast we converge. If
the release interval is similar
to the past releases, we would see feature freeze in the mid of next month.

Best,
Stephan


On Sun, Jan 20, 2019 at 4:09 PM Biao Liu  wrote:

> Hi community,
> The summary of Stephan makes a lot sense to me. It is much clearer indeed
> after splitting the complex topic into small ones.
> I was wondering is there any detail plan for next step? If not, I would
> like to push this thing forward by creating some JIRA issues.
> Another question is that should version 1.8 include these features?
>
> Stephan Ewen  于2018年12月1日周六 上午4:20写道:
>
> > Thanks everyone for the lively discussion. Let me try to summarize where
> I
> > see convergence in the discussion and open issues.
> > I'll try to group this by design aspect of the source. Please let me know
> > if I got things wrong or missed something crucial here.
> >
> > For issues 1-3, if the below reflects the state of the discussion, I
> would
> > try and update the FLIP in the next days.
> > For the remaining ones we need more discussion.
> >
> > I would suggest to fork each of these aspects into a separate mail
> thread,
> > or will loose sight of the individual aspects.
> >
> > *(1) Separation of Split Enumerator and Split Reader*
> >
> >   - All seem to agree this is a good thing
> >   - Split Enumerator could in the end live on JobManager (and assign
> splits
> > via RPC) or in a task (and assign splits via data streams)
> >   - this discussion is orthogonal and should come later, when the
> interface
> > is agreed upon.
> >
> > *(2) Split Readers for one or more splits*
> >
> >   - Discussion seems to agree that we need to support one reader that
> > possibly handles multiple splits concurrently.
> >   - The requirement comes from sources where one poll()-style call
> fetches
> > data from different splits / partitions
> > --> example sources that require that would be for example Kafka,
> > Pravega, Pulsar
> >
> >   - Could have one split reader per source, or multiple split readers
> that
> > share the "poll()" function
> >   - To not make it too complicated, we can start with thinking about one
> > split reader for all splits initially and see if that covers all
> > requirements
> >
> > *(3) Threading model of the Split Reader*
> >
> >   - Most active part of the discussion ;-)
> >
> >   - A non-blocking way for Flink's task code to interact with the source
> is
> > needed in order to a task runtime code based on a
> > single-threaded/actor-style task design
> > --> I personally am a big proponent of that, it will help with
> > well-behaved checkpoints, efficiency, and simpler yet more robust runtime
> > code
> >
> >   - Users care about simple abstraction, so as a subclass of SplitReader
> > (non-blocking / async) we need to have a BlockingSplitReader which will
> > form the basis of most source implementations. BlockingSplitReader lets
> > users do blocking simple poll() calls.
> >   - The BlockingSplitReader would spawn a thread (or more) and the
> > thread(s) can make blocking calls and hand over data buffers via a
> blocking
> > queue
> >   - This should allow us to cover both, a fully async runtime, and a
> simple
> > blocking interface for users.
> >   - This is actually very similar to how the Kafka connectors work. Kafka
> > 9+ with one thread, Kafka 8 with multiple threads
> >
> >   - On the base SplitReader (the async one), the non-blocking method that
> > gets the next chunk of data would signal data availability via a
> > CompletableFuture, because that gives the best flexibility (can await
> > completion or register notification handlers).
> >   - The source task would register a "thenHandle()" (or similar) on the
> > future to put a "take next data" task into the actor-style mailbox
> >
> > *(4) Split Enumeration and Assignment*
> >
> >   - Splits may be generated lazily, both in cases where there is a
> limited
> > number of splits (but very many), or splits are discovered over time
> >   - Assignment should also be lazy, to get better load balancing
> >   - Assignment needs support locality preferences
> >
> >   - Possible design based on discussion so far:
> >
> > --> SplitReader has a method "addSplits(SplitT...)" to add one or
> more
> > splits. Some split readers might assume they have only one split ever,
> > concurrently, others assume multiple splits. (Note: idea behind being
> able
> > to add multiple splits at the same time is to ease startup where multiple
> > splits may be assigned instantly.)
> > --> SplitReader has a context object on which it can call indicate
> when
> > splits are completed. The enumerator gets that 

Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Robert Metzger
Thanks for your response.

You are right, I'm proposing "user...@flink.apache.org" as the mailing
list's name!

On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai 
wrote:

> Hi Robert,
>
> Thanks a lot for starting this discussion!
>
> +1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the title,
> but
> -cn in the opening email content.
> I think -zh would be better as we are establishing the tool for general
> Chinese-speaking users).
> All dev@ discussions / JIRAs should still be in a single English mailing
> list.
>
> From what I've seen in the DingTalk Flink user group, there's quite a bit
> of activity in forms of user questions and replies.
> It would really be great if the Chinese-speaking user community can
> actually have these discussions happen in the Apache mailing lists,
> so that questions / discussions / replies from developers can be indexed
> and searchable.
> Moreover, it'll give the community more insight in how active a
> Chinese-speaking contributor is helping with user requests,
> which in general is a form of contribution that the community always merits
> a lot.
>
> Cheers,
> Gordon
>
> On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger 
> wrote:
>
> > Hey all,
> >
> > I would like to create a new user support mailing list called "
> > user...@flink.apache.org" to cater the Chinese-speaking Flink community.
> >
> > Why?
> > In the last year 24% of the traffic on flink.apache.org came from the
> US,
> > 22% from China. In the last three months, China is at 30%, the US at 20%.
> > An additional data point is that there's a Flink DingTalk group with more
> > than 5000 members, asking Flink questions.
> > I believe that knowledge about Flink should be available in public forums
> > (our mailing list), indexable by search engines. If there's a huge demand
> > in a Chinese language support, we as a community should provide these
> users
> > the tools they need, to grow our community and to allow them to follow
> the
> > Apache way.
> >
> > Is it possible?
> > I believe it is, because a number of other Apache projects are running
> > non-English user@ mailing lists.
> > Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have non-English
> > lists: http://mail-archives.apache.org/mod_mbox/
> > One thing I want to make very clear in this discussion is that all
> project
> > decisions, developer discussions, JIRA tickets etc. need to happen in
> > English, as this is the primary language of the Apache Foundation and our
> > community.
> > We should also clarify this on the page listing the mailing lists.
> >
> > How?
> > If there is consensus in this discussion thread, I would request the new
> > mailing list next Monday.
> > In case of discussions, I will start a vote on Monday or when the
> > discussions have stopped.
> > Then, we should put the new list on our website and start promoting it
> (in
> > said DingTalk group and on social media).
> >
> > Let me know what you think about this idea :)
> >
> > Best,
> > Robert
> >
> >
> > PS: In case you are wondering what ZH stands for:
> > https://en.wiktionary.org/wiki/ZH
> >
>


Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Kurt Young
Big +1 on this, it will indeed help Chinese speaking users a lot.

fudian.fd 于2019年1月24日 周四20:18写道:

> +1. I noticed that many folks from China are requesting the JIRA
> permission in the past year. It reflects that more and more developers from
> China are using Flink. A Chinese oriented mailing list will definitely be
> helpful for the growth of Flink in China.
>
>
> > 在 2019年1月24日,下午7:42,Stephan Ewen  写道:
> >
> > +1, a very nice idea
> >
> > On Thu, Jan 24, 2019 at 12:41 PM Robert Metzger 
> wrote:
> >
> >> Thanks for your response.
> >>
> >> You are right, I'm proposing "user...@flink.apache.org" as the mailing
> >> list's name!
> >>
> >> On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
> >> wrote:
> >>
> >>> Hi Robert,
> >>>
> >>> Thanks a lot for starting this discussion!
> >>>
> >>> +1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the
> title,
> >>> but
> >>> -cn in the opening email content.
> >>> I think -zh would be better as we are establishing the tool for general
> >>> Chinese-speaking users).
> >>> All dev@ discussions / JIRAs should still be in a single English
> mailing
> >>> list.
> >>>
> >>> From what I've seen in the DingTalk Flink user group, there's quite a
> bit
> >>> of activity in forms of user questions and replies.
> >>> It would really be great if the Chinese-speaking user community can
> >>> actually have these discussions happen in the Apache mailing lists,
> >>> so that questions / discussions / replies from developers can be
> indexed
> >>> and searchable.
> >>> Moreover, it'll give the community more insight in how active a
> >>> Chinese-speaking contributor is helping with user requests,
> >>> which in general is a form of contribution that the community always
> >> merits
> >>> a lot.
> >>>
> >>> Cheers,
> >>> Gordon
> >>>
> >>> On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger 
> >>> wrote:
> >>>
>  Hey all,
> 
>  I would like to create a new user support mailing list called "
>  user...@flink.apache.org" to cater the Chinese-speaking Flink
> >> community.
> 
>  Why?
>  In the last year 24% of the traffic on flink.apache.org came from the
> >>> US,
>  22% from China. In the last three months, China is at 30%, the US at
> >> 20%.
>  An additional data point is that there's a Flink DingTalk group with
> >> more
>  than 5000 members, asking Flink questions.
>  I believe that knowledge about Flink should be available in public
> >> forums
>  (our mailing list), indexable by search engines. If there's a huge
> >> demand
>  in a Chinese language support, we as a community should provide these
> >>> users
>  the tools they need, to grow our community and to allow them to follow
> >>> the
>  Apache way.
> 
>  Is it possible?
>  I believe it is, because a number of other Apache projects are running
>  non-English user@ mailing lists.
>  Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
> >> non-English
>  lists: http://mail-archives.apache.org/mod_mbox/
>  One thing I want to make very clear in this discussion is that all
> >>> project
>  decisions, developer discussions, JIRA tickets etc. need to happen in
>  English, as this is the primary language of the Apache Foundation and
> >> our
>  community.
>  We should also clarify this on the page listing the mailing lists.
> 
>  How?
>  If there is consensus in this discussion thread, I would request the
> >> new
>  mailing list next Monday.
>  In case of discussions, I will start a vote on Monday or when the
>  discussions have stopped.
>  Then, we should put the new list on our website and start promoting it
> >>> (in
>  said DingTalk group and on social media).
> 
>  Let me know what you think about this idea :)
> 
>  Best,
>  Robert
> 
> 
>  PS: In case you are wondering what ZH stands for:
>  https://en.wiktionary.org/wiki/ZH
> 
> >>>
> >>
>
> --
Best,
Kurt


Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread fudian.fd
+1. I noticed that many folks from China are requesting the JIRA permission in 
the past year. It reflects that more and more developers from China are using 
Flink. A Chinese oriented mailing list will definitely be helpful for the 
growth of Flink in China.


> 在 2019年1月24日,下午7:42,Stephan Ewen  写道:
> 
> +1, a very nice idea
> 
> On Thu, Jan 24, 2019 at 12:41 PM Robert Metzger  wrote:
> 
>> Thanks for your response.
>> 
>> You are right, I'm proposing "user...@flink.apache.org" as the mailing
>> list's name!
>> 
>> On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai 
>> wrote:
>> 
>>> Hi Robert,
>>> 
>>> Thanks a lot for starting this discussion!
>>> 
>>> +1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the title,
>>> but
>>> -cn in the opening email content.
>>> I think -zh would be better as we are establishing the tool for general
>>> Chinese-speaking users).
>>> All dev@ discussions / JIRAs should still be in a single English mailing
>>> list.
>>> 
>>> From what I've seen in the DingTalk Flink user group, there's quite a bit
>>> of activity in forms of user questions and replies.
>>> It would really be great if the Chinese-speaking user community can
>>> actually have these discussions happen in the Apache mailing lists,
>>> so that questions / discussions / replies from developers can be indexed
>>> and searchable.
>>> Moreover, it'll give the community more insight in how active a
>>> Chinese-speaking contributor is helping with user requests,
>>> which in general is a form of contribution that the community always
>> merits
>>> a lot.
>>> 
>>> Cheers,
>>> Gordon
>>> 
>>> On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger 
>>> wrote:
>>> 
 Hey all,
 
 I would like to create a new user support mailing list called "
 user...@flink.apache.org" to cater the Chinese-speaking Flink
>> community.
 
 Why?
 In the last year 24% of the traffic on flink.apache.org came from the
>>> US,
 22% from China. In the last three months, China is at 30%, the US at
>> 20%.
 An additional data point is that there's a Flink DingTalk group with
>> more
 than 5000 members, asking Flink questions.
 I believe that knowledge about Flink should be available in public
>> forums
 (our mailing list), indexable by search engines. If there's a huge
>> demand
 in a Chinese language support, we as a community should provide these
>>> users
 the tools they need, to grow our community and to allow them to follow
>>> the
 Apache way.
 
 Is it possible?
 I believe it is, because a number of other Apache projects are running
 non-English user@ mailing lists.
 Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
>> non-English
 lists: http://mail-archives.apache.org/mod_mbox/
 One thing I want to make very clear in this discussion is that all
>>> project
 decisions, developer discussions, JIRA tickets etc. need to happen in
 English, as this is the primary language of the Apache Foundation and
>> our
 community.
 We should also clarify this on the page listing the mailing lists.
 
 How?
 If there is consensus in this discussion thread, I would request the
>> new
 mailing list next Monday.
 In case of discussions, I will start a vote on Monday or when the
 discussions have stopped.
 Then, we should put the new list on our website and start promoting it
>>> (in
 said DingTalk group and on social media).
 
 Let me know what you think about this idea :)
 
 Best,
 Robert
 
 
 PS: In case you are wondering what ZH stands for:
 https://en.wiktionary.org/wiki/ZH
 
>>> 
>> 



smime.p7s
Description: S/MIME cryptographic signature


Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Piotr Nowojski
+1, good idea, especially with that many Chinese speaking contributors, 
committers & users :)

Piotrek

> On 24 Jan 2019, at 13:20, Kurt Young  wrote:
> 
> Big +1 on this, it will indeed help Chinese speaking users a lot.
> 
> fudian.fd 于2019年1月24日 周四20:18写道:
> 
>> +1. I noticed that many folks from China are requesting the JIRA
>> permission in the past year. It reflects that more and more developers from
>> China are using Flink. A Chinese oriented mailing list will definitely be
>> helpful for the growth of Flink in China.
>> 
>> 
>>> 在 2019年1月24日,下午7:42,Stephan Ewen  写道:
>>> 
>>> +1, a very nice idea
>>> 
>>> On Thu, Jan 24, 2019 at 12:41 PM Robert Metzger 
>> wrote:
>>> 
 Thanks for your response.
 
 You are right, I'm proposing "user...@flink.apache.org" as the mailing
 list's name!
 
 On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai <
>> tzuli...@apache.org>
 wrote:
 
> Hi Robert,
> 
> Thanks a lot for starting this discussion!
> 
> +1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the
>> title,
> but
> -cn in the opening email content.
> I think -zh would be better as we are establishing the tool for general
> Chinese-speaking users).
> All dev@ discussions / JIRAs should still be in a single English
>> mailing
> list.
> 
> From what I've seen in the DingTalk Flink user group, there's quite a
>> bit
> of activity in forms of user questions and replies.
> It would really be great if the Chinese-speaking user community can
> actually have these discussions happen in the Apache mailing lists,
> so that questions / discussions / replies from developers can be
>> indexed
> and searchable.
> Moreover, it'll give the community more insight in how active a
> Chinese-speaking contributor is helping with user requests,
> which in general is a form of contribution that the community always
 merits
> a lot.
> 
> Cheers,
> Gordon
> 
> On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger 
> wrote:
> 
>> Hey all,
>> 
>> I would like to create a new user support mailing list called "
>> user...@flink.apache.org" to cater the Chinese-speaking Flink
 community.
>> 
>> Why?
>> In the last year 24% of the traffic on flink.apache.org came from the
> US,
>> 22% from China. In the last three months, China is at 30%, the US at
 20%.
>> An additional data point is that there's a Flink DingTalk group with
 more
>> than 5000 members, asking Flink questions.
>> I believe that knowledge about Flink should be available in public
 forums
>> (our mailing list), indexable by search engines. If there's a huge
 demand
>> in a Chinese language support, we as a community should provide these
> users
>> the tools they need, to grow our community and to allow them to follow
> the
>> Apache way.
>> 
>> Is it possible?
>> I believe it is, because a number of other Apache projects are running
>> non-English user@ mailing lists.
>> Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
 non-English
>> lists: http://mail-archives.apache.org/mod_mbox/
>> One thing I want to make very clear in this discussion is that all
> project
>> decisions, developer discussions, JIRA tickets etc. need to happen in
>> English, as this is the primary language of the Apache Foundation and
 our
>> community.
>> We should also clarify this on the page listing the mailing lists.
>> 
>> How?
>> If there is consensus in this discussion thread, I would request the
 new
>> mailing list next Monday.
>> In case of discussions, I will start a vote on Monday or when the
>> discussions have stopped.
>> Then, we should put the new list on our website and start promoting it
> (in
>> said DingTalk group and on social media).
>> 
>> Let me know what you think about this idea :)
>> 
>> Best,
>> Robert
>> 
>> 
>> PS: In case you are wondering what ZH stands for:
>> https://en.wiktionary.org/wiki/ZH
>> 
> 
 
>> 
>> --
> Best,
> Kurt



Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Jeff Zhang
+1

Piotr Nowojski  于2019年1月24日周四 下午8:38写道:

> +1, good idea, especially with that many Chinese speaking contributors,
> committers & users :)
>
> Piotrek
>
> > On 24 Jan 2019, at 13:20, Kurt Young  wrote:
> >
> > Big +1 on this, it will indeed help Chinese speaking users a lot.
> >
> > fudian.fd 于2019年1月24日 周四20:18写道:
> >
> >> +1. I noticed that many folks from China are requesting the JIRA
> >> permission in the past year. It reflects that more and more developers
> from
> >> China are using Flink. A Chinese oriented mailing list will definitely
> be
> >> helpful for the growth of Flink in China.
> >>
> >>
> >>> 在 2019年1月24日,下午7:42,Stephan Ewen  写道:
> >>>
> >>> +1, a very nice idea
> >>>
> >>> On Thu, Jan 24, 2019 at 12:41 PM Robert Metzger 
> >> wrote:
> >>>
>  Thanks for your response.
> 
>  You are right, I'm proposing "user...@flink.apache.org" as the
> mailing
>  list's name!
> 
>  On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai <
> >> tzuli...@apache.org>
>  wrote:
> 
> > Hi Robert,
> >
> > Thanks a lot for starting this discussion!
> >
> > +1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the
> >> title,
> > but
> > -cn in the opening email content.
> > I think -zh would be better as we are establishing the tool for
> general
> > Chinese-speaking users).
> > All dev@ discussions / JIRAs should still be in a single English
> >> mailing
> > list.
> >
> > From what I've seen in the DingTalk Flink user group, there's quite a
> >> bit
> > of activity in forms of user questions and replies.
> > It would really be great if the Chinese-speaking user community can
> > actually have these discussions happen in the Apache mailing lists,
> > so that questions / discussions / replies from developers can be
> >> indexed
> > and searchable.
> > Moreover, it'll give the community more insight in how active a
> > Chinese-speaking contributor is helping with user requests,
> > which in general is a form of contribution that the community always
>  merits
> > a lot.
> >
> > Cheers,
> > Gordon
> >
> > On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger  >
> > wrote:
> >
> >> Hey all,
> >>
> >> I would like to create a new user support mailing list called "
> >> user...@flink.apache.org" to cater the Chinese-speaking Flink
>  community.
> >>
> >> Why?
> >> In the last year 24% of the traffic on flink.apache.org came from
> the
> > US,
> >> 22% from China. In the last three months, China is at 30%, the US at
>  20%.
> >> An additional data point is that there's a Flink DingTalk group with
>  more
> >> than 5000 members, asking Flink questions.
> >> I believe that knowledge about Flink should be available in public
>  forums
> >> (our mailing list), indexable by search engines. If there's a huge
>  demand
> >> in a Chinese language support, we as a community should provide
> these
> > users
> >> the tools they need, to grow our community and to allow them to
> follow
> > the
> >> Apache way.
> >>
> >> Is it possible?
> >> I believe it is, because a number of other Apache projects are
> running
> >> non-English user@ mailing lists.
> >> Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
>  non-English
> >> lists: http://mail-archives.apache.org/mod_mbox/
> >> One thing I want to make very clear in this discussion is that all
> > project
> >> decisions, developer discussions, JIRA tickets etc. need to happen
> in
> >> English, as this is the primary language of the Apache Foundation
> and
>  our
> >> community.
> >> We should also clarify this on the page listing the mailing lists.
> >>
> >> How?
> >> If there is consensus in this discussion thread, I would request the
>  new
> >> mailing list next Monday.
> >> In case of discussions, I will start a vote on Monday or when the
> >> discussions have stopped.
> >> Then, we should put the new list on our website and start promoting
> it
> > (in
> >> said DingTalk group and on social media).
> >>
> >> Let me know what you think about this idea :)
> >>
> >> Best,
> >> Robert
> >>
> >>
> >> PS: In case you are wondering what ZH stands for:
> >> https://en.wiktionary.org/wiki/ZH
> >>
> >
> 
> >>
> >> --
> > Best,
> > Kurt
>
>

-- 
Best Regards

Jeff Zhang


[DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Robert Metzger
Hey all,

I would like to create a new user support mailing list called "
user...@flink.apache.org" to cater the Chinese-speaking Flink community.

Why?
In the last year 24% of the traffic on flink.apache.org came from the US,
22% from China. In the last three months, China is at 30%, the US at 20%.
An additional data point is that there's a Flink DingTalk group with more
than 5000 members, asking Flink questions.
I believe that knowledge about Flink should be available in public forums
(our mailing list), indexable by search engines. If there's a huge demand
in a Chinese language support, we as a community should provide these users
the tools they need, to grow our community and to allow them to follow the
Apache way.

Is it possible?
I believe it is, because a number of other Apache projects are running
non-English user@ mailing lists.
Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have non-English
lists: http://mail-archives.apache.org/mod_mbox/
One thing I want to make very clear in this discussion is that all project
decisions, developer discussions, JIRA tickets etc. need to happen in
English, as this is the primary language of the Apache Foundation and our
community.
We should also clarify this on the page listing the mailing lists.

How?
If there is consensus in this discussion thread, I would request the new
mailing list next Monday.
In case of discussions, I will start a vote on Monday or when the
discussions have stopped.
Then, we should put the new list on our website and start promoting it (in
said DingTalk group and on social media).

Let me know what you think about this idea :)

Best,
Robert


PS: In case you are wondering what ZH stands for:
https://en.wiktionary.org/wiki/ZH


Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Stephan Ewen
+1, a very nice idea

On Thu, Jan 24, 2019 at 12:41 PM Robert Metzger  wrote:

> Thanks for your response.
>
> You are right, I'm proposing "user...@flink.apache.org" as the mailing
> list's name!
>
> On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai 
> wrote:
>
> > Hi Robert,
> >
> > Thanks a lot for starting this discussion!
> >
> > +1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the title,
> > but
> > -cn in the opening email content.
> > I think -zh would be better as we are establishing the tool for general
> > Chinese-speaking users).
> > All dev@ discussions / JIRAs should still be in a single English mailing
> > list.
> >
> > From what I've seen in the DingTalk Flink user group, there's quite a bit
> > of activity in forms of user questions and replies.
> > It would really be great if the Chinese-speaking user community can
> > actually have these discussions happen in the Apache mailing lists,
> > so that questions / discussions / replies from developers can be indexed
> > and searchable.
> > Moreover, it'll give the community more insight in how active a
> > Chinese-speaking contributor is helping with user requests,
> > which in general is a form of contribution that the community always
> merits
> > a lot.
> >
> > Cheers,
> > Gordon
> >
> > On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger 
> > wrote:
> >
> > > Hey all,
> > >
> > > I would like to create a new user support mailing list called "
> > > user...@flink.apache.org" to cater the Chinese-speaking Flink
> community.
> > >
> > > Why?
> > > In the last year 24% of the traffic on flink.apache.org came from the
> > US,
> > > 22% from China. In the last three months, China is at 30%, the US at
> 20%.
> > > An additional data point is that there's a Flink DingTalk group with
> more
> > > than 5000 members, asking Flink questions.
> > > I believe that knowledge about Flink should be available in public
> forums
> > > (our mailing list), indexable by search engines. If there's a huge
> demand
> > > in a Chinese language support, we as a community should provide these
> > users
> > > the tools they need, to grow our community and to allow them to follow
> > the
> > > Apache way.
> > >
> > > Is it possible?
> > > I believe it is, because a number of other Apache projects are running
> > > non-English user@ mailing lists.
> > > Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
> non-English
> > > lists: http://mail-archives.apache.org/mod_mbox/
> > > One thing I want to make very clear in this discussion is that all
> > project
> > > decisions, developer discussions, JIRA tickets etc. need to happen in
> > > English, as this is the primary language of the Apache Foundation and
> our
> > > community.
> > > We should also clarify this on the page listing the mailing lists.
> > >
> > > How?
> > > If there is consensus in this discussion thread, I would request the
> new
> > > mailing list next Monday.
> > > In case of discussions, I will start a vote on Monday or when the
> > > discussions have stopped.
> > > Then, we should put the new list on our website and start promoting it
> > (in
> > > said DingTalk group and on social media).
> > >
> > > Let me know what you think about this idea :)
> > >
> > > Best,
> > > Robert
> > >
> > >
> > > PS: In case you are wondering what ZH stands for:
> > > https://en.wiktionary.org/wiki/ZH
> > >
> >
>


[DISCUSS] FLIP-32: Restructure flink-table for future contributions

2019-01-24 Thread Timo Walther

Hi everyone,

as Stephan already announced on the mailing list [1], the Flink 
community will receive a big code contribution from Alibaba. The 
flink-table module is one of the biggest parts that will receive many 
new features and major architectural improvements. Instead of waiting 
until the next major version of Flink or introducing big API-breaking 
changes, we would like to gradually build up the Blink-based planner and 
runtime while keeping the Table & SQL API mostly stable. Users will be 
able to play around with the current merge status of the new planner or 
fall back to the old planner until the new one is stable.


We have prepared a design document that discusses a restructuring of the 
flink-table module and suggests a rough implementation plan:


https://docs.google.com/document/d/1Tfl2dBqBV3qSBy7oV3qLYvRRDbUOasvA1lhvYWWljQw/edit?usp=sharing

I will briefly summarize the steps we would like to do:

- Split the flink-table module similar to the proposal of FLIP-28 [3] 
which is outdated. This is a preparation to separate API from core 
(targeted for Flink 1.8).
- Perform minor API changes to separate API from actual implementation 
(targeted for Flink 1.8).
- Merge a MVP Blink SQL planner given that necessary Flink core/runtime 
changes have been completed.
  The merging will happen in stages (e.g. basic planner framework, then 
operator by operator). The exact merging plan still needs to be determined.
- Rework the type system in order to unblock work on unified table 
environments, UDFs, sources/sinks, and catalog.

- Enable full end-to-end batch and stream execution features.

Our mid-term goal:

Run full TPC-DS on a unified batch/streaming runtime. Initially, we will 
only support ingesting data coming from the DataStream API. Once we 
reworked the sources/sink interfaces, we will target full end-to-end 
TPC-DS query execution with table connectors.


A rough task dependency graph is illustrated in the design document. A 
more detailed task dependency structure will be added to JIRA after we 
agreed on this FLIP.


Looking forward to any feedback.

Thanks,
Timo

[1] 
https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E
[2] 
https://lists.apache.org/thread.html/6066abd0f09fc1c41190afad67770ede8efd0bebc36f00938eecc118@%3Cdev.flink.apache.org%3E
[3] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free




[jira] [Created] (FLINK-11428) BufferFileWriterFileSegmentReaderTest#testWriteRead failed on Travis

2019-01-24 Thread Congxian Qiu (JIRA)
Congxian Qiu created FLINK-11428:


 Summary: BufferFileWriterFileSegmentReaderTest#testWriteRead 
failed on Travis
 Key: FLINK-11428
 URL: https://issues.apache.org/jira/browse/FLINK-11428
 Project: Flink
  Issue Type: Test
Reporter: Congxian Qiu


10:31:58.273 [ERROR] Errors: 
10:31:58.273 [ERROR] 
org.apache.flink.runtime.io.disk.iomanager.BufferFileWriterFileSegmentReaderTest.testWriteRead(org.apache.flink.runtime.io.disk.iomanager.BufferFileWriterFileSegmentReaderTest)
10:31:58.273 [ERROR] Run 1: 
BufferFileWriterFileSegmentReaderTest.testWriteRead:141
10:31:58.273 [ERROR] Run 2: 
BufferFileWriterFileSegmentReaderTest.tearDownWriterAndReader:95 » IllegalState
 

Travis link: https://travis-ci.org/apache/flink/jobs/483788040



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11430) Incorrect Akka timeout syntax

2019-01-24 Thread Sina Madani (JIRA)
Sina Madani created FLINK-11430:
---

 Summary: Incorrect Akka timeout syntax
 Key: FLINK-11430
 URL: https://issues.apache.org/jira/browse/FLINK-11430
 Project: Flink
  Issue Type: Task
  Components: Documentation
Affects Versions: 1.7.1, 1.7.0
Reporter: Sina Madani


The current 
[documentation|https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html]
 specifies the syntax for Akka timeouts to be in the form "[0-9]+ 
ms|s|min|h|d". However this doesn't work, leading to NumberFormatException or 
similar. Reading through the [Akka 
documentation|https://doc.akka.io/docs/akka/2.5/general/configuration.html] 
however, it seems the correct format is [0-9]+ms|s|min|h|d (note the lack of 
spaces and quotes).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Fabian Hueske
Thanks Robert!
I think this is a very good idea.
+1

Fabian

Am Do., 24. Jan. 2019 um 14:09 Uhr schrieb Jeff Zhang :

> +1
>
> Piotr Nowojski  于2019年1月24日周四 下午8:38写道:
>
> > +1, good idea, especially with that many Chinese speaking contributors,
> > committers & users :)
> >
> > Piotrek
> >
> > > On 24 Jan 2019, at 13:20, Kurt Young  wrote:
> > >
> > > Big +1 on this, it will indeed help Chinese speaking users a lot.
> > >
> > > fudian.fd 于2019年1月24日 周四20:18写道:
> > >
> > >> +1. I noticed that many folks from China are requesting the JIRA
> > >> permission in the past year. It reflects that more and more developers
> > from
> > >> China are using Flink. A Chinese oriented mailing list will definitely
> > be
> > >> helpful for the growth of Flink in China.
> > >>
> > >>
> > >>> 在 2019年1月24日,下午7:42,Stephan Ewen  写道:
> > >>>
> > >>> +1, a very nice idea
> > >>>
> > >>> On Thu, Jan 24, 2019 at 12:41 PM Robert Metzger  >
> > >> wrote:
> > >>>
> >  Thanks for your response.
> > 
> >  You are right, I'm proposing "user...@flink.apache.org" as the
> > mailing
> >  list's name!
> > 
> >  On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai <
> > >> tzuli...@apache.org>
> >  wrote:
> > 
> > > Hi Robert,
> > >
> > > Thanks a lot for starting this discussion!
> > >
> > > +1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the
> > >> title,
> > > but
> > > -cn in the opening email content.
> > > I think -zh would be better as we are establishing the tool for
> > general
> > > Chinese-speaking users).
> > > All dev@ discussions / JIRAs should still be in a single English
> > >> mailing
> > > list.
> > >
> > > From what I've seen in the DingTalk Flink user group, there's
> quite a
> > >> bit
> > > of activity in forms of user questions and replies.
> > > It would really be great if the Chinese-speaking user community can
> > > actually have these discussions happen in the Apache mailing lists,
> > > so that questions / discussions / replies from developers can be
> > >> indexed
> > > and searchable.
> > > Moreover, it'll give the community more insight in how active a
> > > Chinese-speaking contributor is helping with user requests,
> > > which in general is a form of contribution that the community
> always
> >  merits
> > > a lot.
> > >
> > > Cheers,
> > > Gordon
> > >
> > > On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger <
> rmetz...@apache.org
> > >
> > > wrote:
> > >
> > >> Hey all,
> > >>
> > >> I would like to create a new user support mailing list called "
> > >> user...@flink.apache.org" to cater the Chinese-speaking Flink
> >  community.
> > >>
> > >> Why?
> > >> In the last year 24% of the traffic on flink.apache.org came from
> > the
> > > US,
> > >> 22% from China. In the last three months, China is at 30%, the US
> at
> >  20%.
> > >> An additional data point is that there's a Flink DingTalk group
> with
> >  more
> > >> than 5000 members, asking Flink questions.
> > >> I believe that knowledge about Flink should be available in public
> >  forums
> > >> (our mailing list), indexable by search engines. If there's a huge
> >  demand
> > >> in a Chinese language support, we as a community should provide
> > these
> > > users
> > >> the tools they need, to grow our community and to allow them to
> > follow
> > > the
> > >> Apache way.
> > >>
> > >> Is it possible?
> > >> I believe it is, because a number of other Apache projects are
> > running
> > >> non-English user@ mailing lists.
> > >> Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
> >  non-English
> > >> lists: http://mail-archives.apache.org/mod_mbox/
> > >> One thing I want to make very clear in this discussion is that all
> > > project
> > >> decisions, developer discussions, JIRA tickets etc. need to happen
> > in
> > >> English, as this is the primary language of the Apache Foundation
> > and
> >  our
> > >> community.
> > >> We should also clarify this on the page listing the mailing lists.
> > >>
> > >> How?
> > >> If there is consensus in this discussion thread, I would request
> the
> >  new
> > >> mailing list next Monday.
> > >> In case of discussions, I will start a vote on Monday or when the
> > >> discussions have stopped.
> > >> Then, we should put the new list on our website and start
> promoting
> > it
> > > (in
> > >> said DingTalk group and on social media).
> > >>
> > >> Let me know what you think about this idea :)
> > >>
> > >> Best,
> > >> Robert
> > >>
> > >>
> > >> PS: In case you are wondering what ZH stands for:
> > >> https://en.wiktionary.org/wiki/ZH
> > >>
> > >
> > 
> > >>
> > >> --
> > > Best,
> > > Kurt
> >
> >
>
> --
> 

Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable to support fine grained recovery

2019-01-24 Thread Piotr Nowojski
Hi,

I’m not sure how much effort we will be willing to invest in the existing batch 
stack. We are currently focusing on the support of bounded DataStreams (already 
done in Blink and will be merged to Flink soon) and unifing batch & stream 
under DataStream API.

Piotrek

> On 23 Jan 2019, at 04:45, Bo WANG  wrote:
> 
> Hi all,
> 
> When running the batch WordCount example,  I configured the job execution
> mode
> as BATCH_FORCED, and failover-strategy as region, I manually injected some
> errors to let the execution fail in different phases. In some cases, the
> job could
> recovery from failover and became succeed, but in some cases, the job
> retried
> several times and failed.
> 
> Example:
> - If the failure occurred before task read data, e.g., failed before
> invokable.invoke() in Task.java, failover could succeed.
> - If the failure occurred after task having read data, failover did not
> work.
> 
> Problem diagnose:
> Running the example described before, each ExecutionVertex is defined as
> a restart region, and the ResultPartitionType between executions is
> BLOCKING.
> Thus, SpillableSubpartition and SpillableSubpartitionView are used to
> write/read
> shuffle data, and data blocks are described as BufferConsumers stored in a
> list
> called buffers, when task requires input data from
> SpillableSubpartitionView,
> BufferConsumers are REMOVED from buffers. Thus, when failures occurred
> after having read data, some BufferConsumers have already released.
> Although tasks retried, the input data is incomplete.
> 
> Fix Proposal:
> - BufferConsumer should not be removed from buffers until the consumed
> ExecutionVertex is terminal.
> - SpillableSubpartition should not be released until the consumed
> ExecutionVertex is terminal.
> - SpillableSubpartition could creates multi SpillableSubpartitionViews,
> each of which is corresponding to a ExecutionAttempt.
> 
> Best,
> Bo



[jira] [Created] (FLINK-11429) Flink fails to authenticate s3a with core-site.xml

2019-01-24 Thread Mario Georgiev (JIRA)
 Mario Georgiev created FLINK-11429:
---

 Summary: Flink fails to authenticate s3a with core-site.xml
 Key: FLINK-11429
 URL: https://issues.apache.org/jira/browse/FLINK-11429
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.7.1
Reporter:  Mario Georgiev


Hello,

Problem is, if i put the core-site.xml somewhere and add it in the flink image, 
put the path to it in the flink-conf.yaml it does not get picked and i get an 
exception 


{code:java}
Caused by: 
org.apache.flink.fs.s3base.shaded.com.amazonaws.AmazonClientException: No AWS 
Credentials provided by BasicAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : 
org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Unable to 
load credentials from service endpoint

at 
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:139)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1164)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.runBeforeRequestHandlers(AmazonHttpClient.java:762)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:724)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4325)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:5086)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:5060)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4309)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4272)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1337)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1277)

at 
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$1(S3AFileSystem.java:373)

at 
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:109)

... 31 more

Caused by: org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: 
Unable to load credentials from service endpoint

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.auth.EC2CredentialsFetcher.handleError(EC2CredentialsFetcher.java:183)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.auth.EC2CredentialsFetcher.fetchCredentials(EC2CredentialsFetcher.java:162)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.auth.EC2CredentialsFetcher.getCredentials(EC2CredentialsFetcher.java:82)

at 
org.apache.flink.fs.s3base.shaded.com.amazonaws.auth.InstanceProfileCredentialsProvider.getCredentials(InstanceProfileCredentialsProvider.java:151)

at 
org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:117)

... 48 more

Caused by: java.net.SocketException: Network unreachable (connect failed)

at java.net.PlainSocketImpl.socketConnect(Native Method)

at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)

at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)

at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

at java.net.Socket.connect(Socket.java:589)

at sun.net.NetworkClient.doConnect(NetworkClient.java:175)

at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)

at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)

at sun.net.www.http.HttpClient.(HttpClient.java:242)

at sun.net.www.http.HttpClient.New(HttpClient.java:339)

at sun.net.www.http.HttpClient.New(HttpClient.java:357)

at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)

at 

Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable to support fine grained recovery

2019-01-24 Thread Stephan Ewen
The SpillableSubpartition can also be used during the execution of bounded
DataStreams programs. I think this is largely independent from deprecating
the DataSet API.

I am wondering if this particular issue is one that has been addressed in
the Blink code already (we are looking to merge much of that functionality)
- because the proposed extension is actually necessary for proper batch
fault tolerance (independent of the DataSet or Query Processor stack).

I am adding Kurt to this thread - maybe he help us find that out.

On Thu, Jan 24, 2019 at 2:36 PM Piotr Nowojski 
wrote:

> Hi,
>
> I’m not sure how much effort we will be willing to invest in the existing
> batch stack. We are currently focusing on the support of bounded
> DataStreams (already done in Blink and will be merged to Flink soon) and
> unifing batch & stream under DataStream API.
>
> Piotrek
>
> > On 23 Jan 2019, at 04:45, Bo WANG  wrote:
> >
> > Hi all,
> >
> > When running the batch WordCount example,  I configured the job execution
> > mode
> > as BATCH_FORCED, and failover-strategy as region, I manually injected
> some
> > errors to let the execution fail in different phases. In some cases, the
> > job could
> > recovery from failover and became succeed, but in some cases, the job
> > retried
> > several times and failed.
> >
> > Example:
> > - If the failure occurred before task read data, e.g., failed before
> > invokable.invoke() in Task.java, failover could succeed.
> > - If the failure occurred after task having read data, failover did not
> > work.
> >
> > Problem diagnose:
> > Running the example described before, each ExecutionVertex is defined as
> > a restart region, and the ResultPartitionType between executions is
> > BLOCKING.
> > Thus, SpillableSubpartition and SpillableSubpartitionView are used to
> > write/read
> > shuffle data, and data blocks are described as BufferConsumers stored in
> a
> > list
> > called buffers, when task requires input data from
> > SpillableSubpartitionView,
> > BufferConsumers are REMOVED from buffers. Thus, when failures occurred
> > after having read data, some BufferConsumers have already released.
> > Although tasks retried, the input data is incomplete.
> >
> > Fix Proposal:
> > - BufferConsumer should not be removed from buffers until the consumed
> > ExecutionVertex is terminal.
> > - SpillableSubpartition should not be released until the consumed
> > ExecutionVertex is terminal.
> > - SpillableSubpartition could creates multi SpillableSubpartitionViews,
> > each of which is corresponding to a ExecutionAttempt.
> >
> > Best,
> > Bo
>
>


flink runtime incompatibility with java 9 or above due to akka version

2019-01-24 Thread Matthieu Bonneviot
Hi
i am working on migrating my flink cluster to 1.7.1 with java 11.
I am facing a runtime error in the taskmanager:
2019-01-24 14:43:37,014 ERROR
akka.remote.Remoting  - class [B
cannot be cast to class [C ([B and [C are in module java.base of loader
'bootstrap')
java.lang.ClassCastException: class [B cannot be cast to class [C ([B and
[C are in module java.base of loader 'bootstrap')
at akka.remote.artery.FastHash$.ofString(LruBoundedCache.scala:18)
at
akka.remote.serialization.ActorRefResolveCache.hash(ActorRefResolveCache.scala:61)

This error is due to a code optimization done in akka which is not working
anymore from java 9.

I have open an issue for more details:
https://issues.apache.org/jira/browse/FLINK-11431

Are you facing the same issue?

Regards
Matthieu Bonneviot

-- 
Matthieu Bonneviot
Senior Engineer, DataDome
M +33 7 68 29 79 34  <+33+7+68+29+79+34>
E matthieu.bonnev...@datadome.co  
W www.datadome.co






DataDome
ranked 'Strong Performer' in latest Forrester Bot management report



[jira] [Created] (FLINK-11431) Akka dependency not compatible with java 9 or above

2019-01-24 Thread Matthieu Bonneviot (JIRA)
Matthieu Bonneviot created FLINK-11431:
--

 Summary: Akka dependency not compatible with java 9 or above
 Key: FLINK-11431
 URL: https://issues.apache.org/jira/browse/FLINK-11431
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.7.1
Reporter: Matthieu Bonneviot


2019-01-24 14:43:52,059 ERROR akka.remote.Remoting  
    - class [B cannot be cast to class [C ([B and [C are in module 
java.base of loader 'bootstrap')
java.lang.ClassCastException: class [B cannot be cast to class [C ([B and [C 
are in module java.base of loader 'bootstrap')
    at akka.remote.artery.FastHash$.ofString(LruBoundedCache.scala:18)
    at 
akka.remote.serialization.ActorRefResolveCache.hash(ActorRefResolveCache.scala:61)
    at 
akka.remote.serialization.ActorRefResolveCache.hash(ActorRefResolveCache.scala:55)
    at 
akka.remote.artery.LruBoundedCache.getOrCompute(LruBoundedCache.scala:110)
    at 
akka.remote.RemoteActorRefProvider.resolveActorRef(RemoteActorRefProvider.scala:403)
    at akka.actor.SerializedActorRef.readResolve(ActorRef.scala:433)
    at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at 
java.base/java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1250)
    at 
java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2096)
    at 
java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1594)Running
 a jobmanager with java 11 fail with the following call stack:


Flink master is using akka 2.4.20.

After some investigation, the error in akka comes from the following line:
def ofString(s: String): Int = {
val chars = Unsafe.instance.getObject(s, 
EnvelopeBuffer.StringValueFieldOffset).asInstanceOf[Array[Char]]

from java 9 it is now an array of byte. The akka code in the newer version is:
public static int fastHash(String str) {
  ...
if (isJavaVersion9Plus) {
final byte[] chars = (byte[]) instance.getObject(str, 
stringValueFieldOffset);
...
} else {
final char[] chars = (char[]) instance.getObject(str, 
stringValueFieldOffset);
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: flink runtime incompatibility with java 9 or above due to akka version

2019-01-24 Thread Gary Yao
Hi Matthieu,

Flink is not Java 9 compatible at the moment [1]. Although Flink can be
compiled with Java 9, not all tests are currently passing. For a list of
modules that have failing tests, see [2].

The problem that you found persists in Akka versions prior to 2.5.6 [3][4].

Best,
Gary

[1] https://issues.apache.org/jira/browse/FLINK-8033
[2]
https://github.com/apache/flink/blob/700e5a6c2563253791408e8008e421d77b42f3e3/tools/travis/stage.sh#L73
[3] https://akka.io/blog/news/2017/09/28/akka-2.5.6-released
[4] https://github.com/akka/akka/issues/23702

On Thu, Jan 24, 2019 at 5:04 PM Matthieu Bonneviot <
matthieu.bonnev...@datadome.co> wrote:

> Hi
> i am working on migrating my flink cluster to 1.7.1 with java 11.
> I am facing a runtime error in the taskmanager:
> 2019-01-24 14:43:37,014 ERROR
> akka.remote.Remoting  - class [B
> cannot be cast to class [C ([B and [C are in module java.base of loader
> 'bootstrap')
> java.lang.ClassCastException: class [B cannot be cast to class [C ([B and
> [C are in module java.base of loader 'bootstrap')
> at akka.remote.artery.FastHash$.ofString(LruBoundedCache.scala:18)
> at
>
> akka.remote.serialization.ActorRefResolveCache.hash(ActorRefResolveCache.scala:61)
>
> This error is due to a code optimization done in akka which is not working
> anymore from java 9.
>
> I have open an issue for more details:
> https://issues.apache.org/jira/browse/FLINK-11431
>
> Are you facing the same issue?
>
> Regards
> Matthieu Bonneviot
>
> --
> Matthieu Bonneviot
> Senior Engineer, DataDome
> M +33 7 68 29 79 34  <+33+7+68+29+79+34>
> E matthieu.bonnev...@datadome.co  
> W www.datadome.co
> <
> http://www.datadome.co?utm_source=WiseStamp_medium=email_term=_content=_campaign=signature
> >
> <
> https://www.facebook.com/datadome/?utm_source=WiseStamp_medium=email_term=_content=_campaign=signature
> >
> <
> https://fr.linkedin.com/company/datadome?utm_source=WiseStamp_medium=email_term=_content=_campaign=signature
> >
> <
> https://twitter.com/data_dome?utm_source=WiseStamp_medium=email_term=_content=_campaign=signature
> >
>
> <
> https://datadome.co/forrester-strong-performer-2018/?utm_source=WiseStamp_medium=email_term=_content=_campaign=signature
> >
> DataDome
> ranked 'Strong Performer' in latest Forrester Bot management report
> <
> https://datadome.co/forrester-strong-performer-2018/?utm_source=WiseStamp_medium=email_term=_content=_campaign=signature
> >
>


Re: flink runtime incompatibility with java 9 or above due to akka version

2019-01-24 Thread Chesnay Schepler
Flink only runs on java 8. There were efforts to add support for java 9 
in 1.8 but in turned into a massive rabbit hole with rather wide 
implications, like the akka version upgrade that you just found.


On 24.01.2019 17:04, Matthieu Bonneviot wrote:

Hi
i am working on migrating my flink cluster to 1.7.1 with java 11.
I am facing a runtime error in the taskmanager:
2019-01-24 14:43:37,014 ERROR
akka.remote.Remoting  - class [B
cannot be cast to class [C ([B and [C are in module java.base of loader
'bootstrap')
java.lang.ClassCastException: class [B cannot be cast to class [C ([B and
[C are in module java.base of loader 'bootstrap')
 at akka.remote.artery.FastHash$.ofString(LruBoundedCache.scala:18)
 at
akka.remote.serialization.ActorRefResolveCache.hash(ActorRefResolveCache.scala:61)

This error is due to a code optimization done in akka which is not working
anymore from java 9.

I have open an issue for more details:
https://issues.apache.org/jira/browse/FLINK-11431

Are you facing the same issue?

Regards
Matthieu Bonneviot





Re: [DISCUSS] Towards a leaner flink-dist

2019-01-24 Thread Bowen Li
+1 for leaner distribution and a better 'download' webpage.

+1 for a full distribution if we can automate it besides supporting the
leaner one. If we support both, I'd image release managers should be able
to package two distributions with a single change of parameter instead of
manually package the full distribution. How to achieve that needs to be
evaluated and discussed, probably can be something like 'mvn clean install
-Dfull/-Dlean', I'm not sure yet.


On Wed, Jan 23, 2019 at 10:11 AM Thomas Weise  wrote:

> +1 for trimming the size by default and offering the fat distribution as
> alternative download
>
>
> On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann 
> wrote:
>
>> Ufuk's proposal (having a lean default release and a user convenience
>> tarball) sounds good to me. That way advanced users won't be bothered by
>> an
>> unnecessarily large release and new users can benefit from having many
>> useful extensions bundled in one tarball.
>>
>> Cheers,
>> Till
>>
>> On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi  wrote:
>>
>> > On Wed, Jan 23, 2019 at 11:01 AM Timo Walther 
>> wrote:
>> > > I think what is more important than a big dist bundle is a helpful
>> > > "Downloads" page where users can easily find available filesystems,
>> > > connectors, metric repoters. Not everyone checks Maven central for
>> > > available JAR files. I just saw that we added a "Optional components"
>> > > section recently [1], we just need to make it more prominent. This is
>> > > also done for the SQL connectors and formats [2].
>> >
>> > +1 I fully agree with the importance of the Downloads page. We
>> > definitely need to make any optional dependencies that users need to
>> > download easy to find.
>> >
>>
>


Re: Side Outputs for late arriving records

2019-01-24 Thread Fabian Hueske
Hi Ramya,

This would be a great feature, but unfortunately is not support (yet) by
Flink SQL.
Currently, all late records are dropped.

A workaround is to ingest the stream as a DataStream, have a custom
operator that routes all late records to a side output, and registering the
DataStream without late records as a table on which the SQL query is
evaluated.
This requires quite a bit of boilerplate code but could be hidden in a util
class.

Best, Fabian

Am Do., 24. Jan. 2019 um 06:42 Uhr schrieb Ramya Ramamurthy <
hair...@gmail.com>:

> Hi,
>
> I have a query with regard to Late arriving records.
> We are using Flink 1.7 with Kafka Consumers of version 2.11.0.11.
> In my sink operators, which converts this table to a stream which is being
> pushed to Elastic Search, I am able to see this metric "
> *numLateRecordsDropped*".
>
> My Kafka consumers doesn't seem to have any lag and the events are
> processed properly. To be able to take these events to a side outputs
> doesn't seem to be possible with tables. Below is the snippet:
>
> tableEnv.connect(new Kafka()
>   /* setting of all kafka properties */
>.startFromLatest())
>.withSchema(new Schema()
>.field("sid", Types.STRING())
>.field("_zpsbd6", Types.STRING())
>.field("r1", Types.STRING())
>.field("r2", Types.STRING())
>.field("r5", Types.STRING())
>.field("r10", Types.STRING())
>.field("isBot", Types.BOOLEAN())
>.field("botcode", Types.STRING())
>.field("ts", Types.SQL_TIMESTAMP())
>.rowtime(new Rowtime()
>.timestampsFromField("recvdTime")
>.watermarksPeriodicBounded(1)
>)
>)
>.withFormat(new Json().deriveSchema())
>.inAppendMode()
>.registerTableSource("sourceTopic");
>
>String sql = "SELECT sid, _zpsbd6 as ip, COUNT(*) as total_hits, "
>+ "TUMBLE_START(ts, INTERVAL '5' MINUTE) as tumbleStart, "
>+ "TUMBLE_END(ts, INTERVAL '5' MINUTE) as tumbleEnd FROM
> sourceTopic "
>+ "WHERE r1='true' or r2='true' or r5='true' or r10='true'
> and isBot='true' "
>+ "GROUP BY TUMBLE(ts, INTERVAL '5' MINUTE), sid,  _zpsbd6";
>
> Table source = tableEnv.sqlQuery(sql) ---> This is where the metric is
> showing the lateRecordsDropped, while executing the group by operation.
>
> Is there  a way to get the sideOutput of this to be able to debug better ??
>
> Thanks,
> ~Ramya.
>


Re: [ANNOUNCE] Contributing Alibaba's Blink

2019-01-24 Thread Bowen Li
Exciting to see this happening!

Wrt doc, have we done a diff which can show us how much differences are
between Flink's and Blink's documentation (flink/docs)? For example, how
many pages and how much percentage of each page is different? How many new
pages (for new features) does Blink have?If we have such a summary or
visualization, it may give us a better idea which approach we should go
with.

Another perspective is that, though the main feature differences between
Flink and Blink that the community is interested in are SQL/Table API and
Batch, Blink's code changes seem to be much more extensive and touches more
modules and behaviors. As a user, I'd love to have a more consistent
experience of understanding and trying Blink, and a separate versioned
website works best in such a case.

Thanks,
Bowen


On Thu, Jan 24, 2019 at 4:22 AM Kurt Young  wrote:

> Sure, i will do the rebase before pushing the branch.
>
> Timo Walther 于2019年1月24日 周四18:20写道:
>
> > Regarding the content of a `blink-1.5` branch, is it possible to rebase
> > the big Blink commit on top of the current master or the last Flink
> > release?
> >
> > I don't mean a full rebase here, but just forking the branch from
> > current Flink, and putting the Blink content into the repository, and
> > commit it. This would enable to see a diff which classes and lines have
> > changed and which are still the same. I guess this would be very helpful
> > instead of a branch with a big commit that has no common origin.
> >
> > Thanks,
> > Timo
> >
> > Am 24.01.19 um 02:54 schrieb Becket Qin:
> > > Thanks Stephan,
> > >
> > > The plan makes sense to me.
> > >
> > > Regarding the docs, it seems better to have a separate versioned
> website
> > > because there are a lot of changes spread over the places. We can add
> the
> > > banner to remind users that they are looking at the blink docs, which
> is
> > > temporary and will eventually be merged into Flink master. (The banner
> is
> > > pretty similar to what user will see when they visit docs of old flink
> > > versions
> > > <
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html
> > >
> > > [1]).
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qn
> > >
> > > [1]
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html
> > >
> > > On Thu, Jan 24, 2019 at 6:21 AM Shaoxuan Wang 
> > wrote:
> > >
> > >> Thanks Stephan,
> > >> The entire plan looks good to me. WRT the "Docs for Flink", a
> subsection
> > >> should be good enough if we just introduce the outlines of what blink
> > has
> > >> changed. However, we have made detailed introductions to blink based
> on
> > the
> > >> framework of current release document of Flink (those introductions
> are
> > >> distributed in each subsections). Does it make sense to create a blink
> > >> document as a separate one, under the documentation section, say
> > blink-1.5
> > >> (temporary, not a release).
> > >>
> > >> Regards,
> > >> Shaoxuan
> > >>
> > >>
> > >> On Wed, Jan 23, 2019 at 10:15 PM Stephan Ewen 
> wrote:
> > >>
> > >>> Nice to see this lively discussion.
> > >>>
> > >>> *--- Branch Versus Repository ---*
> > >>>
> > >>> Looks like this is converging towards pushing a branch.
> > >>> How about naming the branch simply "blink-1.5" ? That would be in
> line
> > >> with
> > >>> the 1.5 version branch of Flink, which is simply called
> "release-1.5" ?
> > >>>
> > >>> *--- SGA --- *
> > >>>
> > >>> The SGA (Software Grant Agreement) should be either filed already or
> in
> > >> the
> > >>> process of filing.
> > >>>
> > >>> *--- Offering Jars for Blink ---*
> > >>>
> > >>> As Chesnay and Timo mentioned, we cannot easily offer a "Release" of
> > >> Blink
> > >>> (source or binary), because that would require a thorough
> > >>> checking of licenses and creating/ bundling license files. That is a
> > lot
> > >> of
> > >>> work, as we recently experienced again in the Flink master.
> > >>>
> > >>> What we can do is upload compiled jar files and link to them
> somewhere
> > in
> > >>> the blink docs. We need to add a disclaimer that these are
> > >>> convenience jars, and not an official Apache release. I hope that
> would
> > >>> work for the users that are curious to try things out.
> > >>>
> > >>> *--- Docs for Blink --- *
> > >>>
> > >>> Do we need a versioned website here? If not, can we simply make this
> a
> > >>> subsection of the current Flink snapshot docs?
> > >>> Next to "Flink Development" and "Internals", we could have a section
> on
> > >>> "Blink branch".
> > >>> I think it is crucial, thought, to make it clear that this is
> temporary
> > >> and
> > >>> will eventually be subsumed by the main release, just
> > >>> so that users do not get confused.
> > >>>
> > >>> Best,
> > >>> Stephan
> > >>>
> > >>>
> > >>> On Wed, Jan 23, 2019 at 12:23 PM Becket Qin 
> > >> wrote:
> >  Really excited to see Blink joining the Flink community!
> > 
> >  My two cents 

Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Bowen Li
+1, a great starting point!

It'll be nice if we can later come up with some way to bridge the knowledge
between two email channels.

On Thu, Jan 24, 2019 at 5:32 AM Fabian Hueske  wrote:

> Thanks Robert!
> I think this is a very good idea.
> +1
>
> Fabian
>
> Am Do., 24. Jan. 2019 um 14:09 Uhr schrieb Jeff Zhang :
>
> > +1
> >
> > Piotr Nowojski  于2019年1月24日周四 下午8:38写道:
> >
> > > +1, good idea, especially with that many Chinese speaking contributors,
> > > committers & users :)
> > >
> > > Piotrek
> > >
> > > > On 24 Jan 2019, at 13:20, Kurt Young  wrote:
> > > >
> > > > Big +1 on this, it will indeed help Chinese speaking users a lot.
> > > >
> > > > fudian.fd 于2019年1月24日 周四20:18写道:
> > > >
> > > >> +1. I noticed that many folks from China are requesting the JIRA
> > > >> permission in the past year. It reflects that more and more
> developers
> > > from
> > > >> China are using Flink. A Chinese oriented mailing list will
> definitely
> > > be
> > > >> helpful for the growth of Flink in China.
> > > >>
> > > >>
> > > >>> 在 2019年1月24日,下午7:42,Stephan Ewen  写道:
> > > >>>
> > > >>> +1, a very nice idea
> > > >>>
> > > >>> On Thu, Jan 24, 2019 at 12:41 PM Robert Metzger <
> rmetz...@apache.org
> > >
> > > >> wrote:
> > > >>>
> > >  Thanks for your response.
> > > 
> > >  You are right, I'm proposing "user...@flink.apache.org" as the
> > > mailing
> > >  list's name!
> > > 
> > >  On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai <
> > > >> tzuli...@apache.org>
> > >  wrote:
> > > 
> > > > Hi Robert,
> > > >
> > > > Thanks a lot for starting this discussion!
> > > >
> > > > +1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the
> > > >> title,
> > > > but
> > > > -cn in the opening email content.
> > > > I think -zh would be better as we are establishing the tool for
> > > general
> > > > Chinese-speaking users).
> > > > All dev@ discussions / JIRAs should still be in a single English
> > > >> mailing
> > > > list.
> > > >
> > > > From what I've seen in the DingTalk Flink user group, there's
> > quite a
> > > >> bit
> > > > of activity in forms of user questions and replies.
> > > > It would really be great if the Chinese-speaking user community
> can
> > > > actually have these discussions happen in the Apache mailing
> lists,
> > > > so that questions / discussions / replies from developers can be
> > > >> indexed
> > > > and searchable.
> > > > Moreover, it'll give the community more insight in how active a
> > > > Chinese-speaking contributor is helping with user requests,
> > > > which in general is a form of contribution that the community
> > always
> > >  merits
> > > > a lot.
> > > >
> > > > Cheers,
> > > > Gordon
> > > >
> > > > On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger <
> > rmetz...@apache.org
> > > >
> > > > wrote:
> > > >
> > > >> Hey all,
> > > >>
> > > >> I would like to create a new user support mailing list called "
> > > >> user...@flink.apache.org" to cater the Chinese-speaking Flink
> > >  community.
> > > >>
> > > >> Why?
> > > >> In the last year 24% of the traffic on flink.apache.org came
> from
> > > the
> > > > US,
> > > >> 22% from China. In the last three months, China is at 30%, the
> US
> > at
> > >  20%.
> > > >> An additional data point is that there's a Flink DingTalk group
> > with
> > >  more
> > > >> than 5000 members, asking Flink questions.
> > > >> I believe that knowledge about Flink should be available in
> public
> > >  forums
> > > >> (our mailing list), indexable by search engines. If there's a
> huge
> > >  demand
> > > >> in a Chinese language support, we as a community should provide
> > > these
> > > > users
> > > >> the tools they need, to grow our community and to allow them to
> > > follow
> > > > the
> > > >> Apache way.
> > > >>
> > > >> Is it possible?
> > > >> I believe it is, because a number of other Apache projects are
> > > running
> > > >> non-English user@ mailing lists.
> > > >> Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
> > >  non-English
> > > >> lists: http://mail-archives.apache.org/mod_mbox/
> > > >> One thing I want to make very clear in this discussion is that
> all
> > > > project
> > > >> decisions, developer discussions, JIRA tickets etc. need to
> happen
> > > in
> > > >> English, as this is the primary language of the Apache
> Foundation
> > > and
> > >  our
> > > >> community.
> > > >> We should also clarify this on the page listing the mailing
> lists.
> > > >>
> > > >> How?
> > > >> If there is consensus in this discussion thread, I would request
> > the
> > >  new
> > > >> mailing list next Monday.
> > > >> In case of discussions, I will start a vote on Monday or when
> the
> > 

Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-24 Thread Zhang, Xuefu
+1 on the idea. This will certainly help promote Flink in China industries. On 
a side note, it would be great if anyone in the list can help source ideas, bug 
reports, and feature requests to dev@ list and/or JIRAs so as to gain broader 
attention.

Thanks,
Xuefu


--
From:Fabian Hueske 
Sent At:2019 Jan. 24 (Thu.) 05:32
To:dev 
Subject:Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the 
Chinese-speaking community?

Thanks Robert!
I think this is a very good idea.
+1

Fabian

Am Do., 24. Jan. 2019 um 14:09 Uhr schrieb Jeff Zhang :

> +1
>
> Piotr Nowojski  于2019年1月24日周四 下午8:38写道:
>
> > +1, good idea, especially with that many Chinese speaking contributors,
> > committers & users :)
> >
> > Piotrek
> >
> > > On 24 Jan 2019, at 13:20, Kurt Young  wrote:
> > >
> > > Big +1 on this, it will indeed help Chinese speaking users a lot.
> > >
> > > fudian.fd 于2019年1月24日 周四20:18写道:
> > >
> > >> +1. I noticed that many folks from China are requesting the JIRA
> > >> permission in the past year. It reflects that more and more developers
> > from
> > >> China are using Flink. A Chinese oriented mailing list will definitely
> > be
> > >> helpful for the growth of Flink in China.
> > >>
> > >>
> > >>> 在 2019年1月24日,下午7:42,Stephan Ewen  写道:
> > >>>
> > >>> +1, a very nice idea
> > >>>
> > >>> On Thu, Jan 24, 2019 at 12:41 PM Robert Metzger  >
> > >> wrote:
> > >>>
> >  Thanks for your response.
> > 
> >  You are right, I'm proposing "user...@flink.apache.org" as the
> > mailing
> >  list's name!
> > 
> >  On Thu, Jan 24, 2019 at 12:37 PM Tzu-Li (Gordon) Tai <
> > >> tzuli...@apache.org>
> >  wrote:
> > 
> > > Hi Robert,
> > >
> > > Thanks a lot for starting this discussion!
> > >
> > > +1 to a user-zh@flink.a.o mailing list (you mentioned -zh in the
> > >> title,
> > > but
> > > -cn in the opening email content.
> > > I think -zh would be better as we are establishing the tool for
> > general
> > > Chinese-speaking users).
> > > All dev@ discussions / JIRAs should still be in a single English
> > >> mailing
> > > list.
> > >
> > > From what I've seen in the DingTalk Flink user group, there's
> quite a
> > >> bit
> > > of activity in forms of user questions and replies.
> > > It would really be great if the Chinese-speaking user community can
> > > actually have these discussions happen in the Apache mailing lists,
> > > so that questions / discussions / replies from developers can be
> > >> indexed
> > > and searchable.
> > > Moreover, it'll give the community more insight in how active a
> > > Chinese-speaking contributor is helping with user requests,
> > > which in general is a form of contribution that the community
> always
> >  merits
> > > a lot.
> > >
> > > Cheers,
> > > Gordon
> > >
> > > On Thu, Jan 24, 2019 at 12:15 PM Robert Metzger <
> rmetz...@apache.org
> > >
> > > wrote:
> > >
> > >> Hey all,
> > >>
> > >> I would like to create a new user support mailing list called "
> > >> user...@flink.apache.org" to cater the Chinese-speaking Flink
> >  community.
> > >>
> > >> Why?
> > >> In the last year 24% of the traffic on flink.apache.org came from
> > the
> > > US,
> > >> 22% from China. In the last three months, China is at 30%, the US
> at
> >  20%.
> > >> An additional data point is that there's a Flink DingTalk group
> with
> >  more
> > >> than 5000 members, asking Flink questions.
> > >> I believe that knowledge about Flink should be available in public
> >  forums
> > >> (our mailing list), indexable by search engines. If there's a huge
> >  demand
> > >> in a Chinese language support, we as a community should provide
> > these
> > > users
> > >> the tools they need, to grow our community and to allow them to
> > follow
> > > the
> > >> Apache way.
> > >>
> > >> Is it possible?
> > >> I believe it is, because a number of other Apache projects are
> > running
> > >> non-English user@ mailing lists.
> > >> Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
> >  non-English
> > >> lists: http://mail-archives.apache.org/mod_mbox/
> > >> One thing I want to make very clear in this discussion is that all
> > > project
> > >> decisions, developer discussions, JIRA tickets etc. need to happen
> > in
> > >> English, as this is the primary language of the Apache Foundation
> > and
> >  our
> > >> community.
> > >> We should also clarify this on the page listing the mailing lists.
> > >>
> > >> How?
> > >> If there is consensus in this discussion thread, I would request
> the
> >  new
> > >> mailing list next Monday.
> > >> In case of discussions, I will start a vote on Monday or when the
> > >>