date:20190123

I'm afraid you cannot do that. The inputs having the same key should be 
processed by the same CEP operator. Otherwise the results will be 
nondeterministic and also be wrong.

Regards,
Dian

> 在 2019年1月24日，下午2:56，dhanuka ranasinghe  写道：
> 
> In this example key will be same. I am using 1 million messages with same key 
> for performance testing. But still I want to process them parallel. Can't I 
> use Split function and get a SplitStream for that purpose?
> 
> On Thu, Jan 24, 2019 at 2:49 PM Dian Fu  > wrote:
> Hi Dhanuka,
> 
> Does the KeySelector of Event::getTriggerID generate the same key for all the 
> inputs or only generate very few key values and these key values happen to be 
> hashed to the same downstream operator? You can print the results of 
> Event::getTriggerID to check if it's that case.
> 
> Regards,
> Dian
> 
>> 在 2019年1月24日，下午2:08，dhanuka ranasinghe > > 写道：
>> 
>> Hi Dian,
>> 
>> Thanks for the explanation. Please find the screen shot and source code for 
>> above mention use case. And in main issue is though I use KeyedStream , 
>> parallelism not apply properly.
>> Only one host is processing messages.
>> 
>> 
>> 
>> Cheers,
>> Dhanuka
>> 
>> On Thu, Jan 24, 2019 at 1:40 PM Dian Fu > > wrote:
>> Whether using KeyedStream depends on the logic of your job, i.e, whether you 
>> are looking for patterns for some partitions, i.e, patterns for a particular 
>> user. If so, you should partition the input data before the CEP operator. 
>> Otherwise, the input data should not be partitioned.
>> 
>> Regards,
>> Dian 
>> 
>>> 在 2019年1月24日，下午12:37，dhanuka ranasinghe >> > 写道：
>>> 
>>> Hi Dian,
>>> 
>>> I tried that but then kafkaproducer only produce to single partition and 
>>> only single flink host working while rest not contribute for processing . I 
>>> will share the code and screenshot
>>> 
>>> Cheers 
>>> Dhanuka
>>> 
>>> On Thu, 24 Jan 2019, 12:31 Dian Fu >>  wrote:
>>> Hi Dhanuka,
>>> 
>>> In order to make the CEP operator to run parallel, the input stream should 
>>> be KeyedStream. You can refer [1] for detailed information.
>>> 
>>> Regards,
>>> Dian
>>> 
>>> [1]: 
>>> https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns
>>>  
>>> 
>>> 
>>> > 在 2019年1月24日，上午10:18，dhanuka ranasinghe >> > > 写道：
>>> > 
>>> > Hi All,
>>> > 
>>> > Is there way to run CEP function parallel. Currently CEP run only 
>>> > sequentially
>>> > 
>>> > 
>>> > .
>>> > 
>>> > Cheers,
>>> > Dhanuka
>>> > 
>>> > -- 
>>> > Nothing Impossible,Creativity is more important than knowledge.
>>> 
>> 
>> 
>> 
>> -- 
>> Nothing Impossible,Creativity is more important than knowledge.
>> 
> 
> 
> 
> -- 
> Nothing Impossible,Creativity is more important than knowledge.

Re: Parallel CEP

In this example key will be same. I am using 1 million messages with same
key for performance testing. But still I want to process them parallel.
Can't I use Split function and get a SplitStream for that purpose?

On Thu, Jan 24, 2019 at 2:49 PM Dian Fu  wrote:

> Hi Dhanuka,
>
> Does the KeySelector of Event::getTriggerID generate the same key for all
> the inputs or only generate very few key values and these key values happen
> to be hashed to the same downstream operator? You can print the results of
> Event::getTriggerID to check if it's that case.
>
> Regards,
> Dian
>
> 在 2019年1月24日，下午2:08，dhanuka ranasinghe  写道：
>
> Hi Dian,
>
> Thanks for the explanation. Please find the screen shot and source code
> for above mention use case. And in main issue is though I use KeyedStream ,
> parallelism not apply properly.
> Only one host is processing messages.
>
> 
>
> Cheers,
> Dhanuka
>
> On Thu, Jan 24, 2019 at 1:40 PM Dian Fu  wrote:
>
>> Whether using KeyedStream depends on the logic of your job, i.e, whether
>> you are looking for patterns for some partitions, i.e, patterns for a
>> particular user. If so, you should partition the input data before the CEP
>> operator. Otherwise, the input data should not be partitioned.
>>
>> Regards,
>> Dian
>>
>> 在 2019年1月24日，下午12:37，dhanuka ranasinghe  写道：
>>
>> Hi Dian,
>>
>> I tried that but then kafkaproducer only produce to single partition and
>> only single flink host working while rest not contribute for processing . I
>> will share the code and screenshot
>>
>> Cheers
>> Dhanuka
>>
>> On Thu, 24 Jan 2019, 12:31 Dian Fu >
>>> Hi Dhanuka,
>>>
>>> In order to make the CEP operator to run parallel, the input stream
>>> should be KeyedStream. You can refer [1] for detailed information.
>>>
>>> Regards,
>>> Dian
>>>
>>> [1]:
>>> https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns
>>>
>>> > 在 2019年1月24日，上午10:18，dhanuka ranasinghe 
>>> 写道：
>>> >
>>> > Hi All,
>>> >
>>> > Is there way to run CEP function parallel. Currently CEP run only
>>> sequentially
>>> >
>>> > 
>>> > .
>>> >
>>> > Cheers,
>>> > Dhanuka
>>> >
>>> > --
>>> > Nothing Impossible,Creativity is more important than knowledge.
>>>
>>>
>>
>
> --
> Nothing Impossible,Creativity is more important than knowledge.
> 
>
>
>

-- 
Nothing Impossible,Creativity is more important than knowledge.

Re: Parallel CEP

Hi Dhanuka,

Does the KeySelector of Event::getTriggerID generate the same key for all the 
inputs or only generate very few key values and these key values happen to be 
hashed to the same downstream operator? You can print the results of 
Event::getTriggerID to check if it's that case.

Regards,
Dian

> 在 2019年1月24日，下午2:08，dhanuka ranasinghe  写道：
> 
> Hi Dian,
> 
> Thanks for the explanation. Please find the screen shot and source code for 
> above mention use case. And in main issue is though I use KeyedStream , 
> parallelism not apply properly.
> Only one host is processing messages.
> 
> 
> 
> Cheers,
> Dhanuka
> 
> On Thu, Jan 24, 2019 at 1:40 PM Dian Fu  > wrote:
> Whether using KeyedStream depends on the logic of your job, i.e, whether you 
> are looking for patterns for some partitions, i.e, patterns for a particular 
> user. If so, you should partition the input data before the CEP operator. 
> Otherwise, the input data should not be partitioned.
> 
> Regards,
> Dian 
> 
>> 在 2019年1月24日，下午12:37，dhanuka ranasinghe > > 写道：
>> 
>> Hi Dian,
>> 
>> I tried that but then kafkaproducer only produce to single partition and 
>> only single flink host working while rest not contribute for processing . I 
>> will share the code and screenshot
>> 
>> Cheers 
>> Dhanuka
>> 
>> On Thu, 24 Jan 2019, 12:31 Dian Fu >  wrote:
>> Hi Dhanuka,
>> 
>> In order to make the CEP operator to run parallel, the input stream should 
>> be KeyedStream. You can refer [1] for detailed information.
>> 
>> Regards,
>> Dian
>> 
>> [1]: 
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns
>>  
>> 
>> 
>> > 在 2019年1月24日，上午10:18，dhanuka ranasinghe > > > 写道：
>> > 
>> > Hi All,
>> > 
>> > Is there way to run CEP function parallel. Currently CEP run only 
>> > sequentially
>> > 
>> > 
>> > .
>> > 
>> > Cheers,
>> > Dhanuka
>> > 
>> > -- 
>> > Nothing Impossible,Creativity is more important than knowledge.
>> 
> 
> 
> 
> -- 
> Nothing Impossible,Creativity is more important than knowledge.
>

Side Outputs for late arriving records

2019-01-23 Thread Ramya Ramamurthy

Hi,

I have a query with regard to Late arriving records.
We are using Flink 1.7 with Kafka Consumers of version 2.11.0.11.
In my sink operators, which converts this table to a stream which is being
pushed to Elastic Search, I am able to see this metric "
*numLateRecordsDropped*".

My Kafka consumers doesn't seem to have any lag and the events are
processed properly. To be able to take these events to a side outputs
doesn't seem to be possible with tables. Below is the snippet:

tableEnv.connect(new Kafka()
  /* setting of all kafka properties */
   .startFromLatest())
   .withSchema(new Schema()
   .field("sid", Types.STRING())
   .field("_zpsbd6", Types.STRING())
   .field("r1", Types.STRING())
   .field("r2", Types.STRING())
   .field("r5", Types.STRING())
   .field("r10", Types.STRING())
   .field("isBot", Types.BOOLEAN())
   .field("botcode", Types.STRING())
   .field("ts", Types.SQL_TIMESTAMP())
   .rowtime(new Rowtime()
   .timestampsFromField("recvdTime")
   .watermarksPeriodicBounded(1)
   )
   )
   .withFormat(new Json().deriveSchema())
   .inAppendMode()
   .registerTableSource("sourceTopic");

   String sql = "SELECT sid, _zpsbd6 as ip, COUNT(*) as total_hits, "
   + "TUMBLE_START(ts, INTERVAL '5' MINUTE) as tumbleStart, "
   + "TUMBLE_END(ts, INTERVAL '5' MINUTE) as tumbleEnd FROM
sourceTopic "
   + "WHERE r1='true' or r2='true' or r5='true' or r10='true'
and isBot='true' "
   + "GROUP BY TUMBLE(ts, INTERVAL '5' MINUTE), sid,  _zpsbd6";

Table source = tableEnv.sqlQuery(sql) ---> This is where the metric is
showing the lateRecordsDropped, while executing the group by operation.

Is there  a way to get the sideOutput of this to be able to debug better ??

Thanks,
~Ramya.

Re: Parallel CEP

Whether using KeyedStream depends on the logic of your job, i.e, whether you 
are looking for patterns for some partitions, i.e, patterns for a particular 
user. If so, you should partition the input data before the CEP operator. 
Otherwise, the input data should not be partitioned.

Regards,
Dian 

> 在 2019年1月24日，下午12:37，dhanuka ranasinghe  写道：
> 
> Hi Dian,
> 
> I tried that but then kafkaproducer only produce to single partition and only 
> single flink host working while rest not contribute for processing . I will 
> share the code and screenshot
> 
> Cheers 
> Dhanuka
> 
> On Thu, 24 Jan 2019, 12:31 Dian Fu   wrote:
> Hi Dhanuka,
> 
> In order to make the CEP operator to run parallel, the input stream should be 
> KeyedStream. You can refer [1] for detailed information.
> 
> Regards,
> Dian
> 
> [1]: 
> https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns
>  
> 
> 
> > 在 2019年1月24日，上午10:18，dhanuka ranasinghe  > > 写道：
> > 
> > Hi All,
> > 
> > Is there way to run CEP function parallel. Currently CEP run only 
> > sequentially
> > 
> > 
> > .
> > 
> > Cheers,
> > Dhanuka
> > 
> > -- 
> > Nothing Impossible,Creativity is more important than knowledge.
>

Re: Flink CEP : Doesn't generate output

Hi Dhanuka,

From the code you shared, it seems that you're using event time. The processing 
of elements is triggered by watermark in event time and so you should define 
how to generate the watermark, i.e with DataStream.assignTimestampsAndWatermarks

Regards,
Dian


> 在 2019年1月23日，上午1:58，dhanuka ranasinghe  写道：
> 
> patternStream

Re: Parallel CEP

Hi Dhanuka,

In order to make the CEP operator to run parallel, the input stream should be 
KeyedStream. You can refer [1] for detailed information.

Regards,
Dian

[1]: 
https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns

> 在 2019年1月24日，上午10:18，dhanuka ranasinghe  写道：
> 
> Hi All,
> 
> Is there way to run CEP function parallel. Currently CEP run only sequentially
> 
> 
> .
> 
> Cheers,
> Dhanuka
> 
> -- 
> Nothing Impossible,Creativity is more important than knowledge.

Re: Flink CEP : Doesn't generate output

Thank you for the clarification.

On Thu, 24 Jan 2019, 12:44 Dian Fu  Hi Dhanuka,
>
> From the code you shared, it seems that you're using event time. The
> processing of elements is triggered by watermark in event time and so you
> should define how to generate the watermark, i.e with
> DataStream.assignTimestampsAndWatermarks
>
> Regards,
> Dian
>
>
> 在 2019年1月23日，上午1:58，dhanuka ranasinghe  写道：
>
> patternStream
>
>
>

Re: Parallel CEP

Hi Dian,

I tried that but then kafkaproducer only produce to single partition and
only single flink host working while rest not contribute for processing . I
will share the code and screenshot

Cheers
Dhanuka

On Thu, 24 Jan 2019, 12:31 Dian Fu  Hi Dhanuka,
>
> In order to make the CEP operator to run parallel, the input stream should
> be KeyedStream. You can refer [1] for detailed information.
>
> Regards,
> Dian
>
> [1]:
> https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/cep.html#detecting-patterns
>
> > 在 2019年1月24日，上午10:18，dhanuka ranasinghe 
> 写道：
> >
> > Hi All,
> >
> > Is there way to run CEP function parallel. Currently CEP run only
> sequentially
> >
> > 
> > .
> >
> > Cheers,
> > Dhanuka
> >
> > --
> > Nothing Impossible,Creativity is more important than knowledge.
>
>

[jira] [Created] (FLINK-11422) Prefer testing class to mock StreamTask in AbstractStreamOperatorTestHarness

2019-01-23 Thread TisonKun (JIRA)

TisonKun created FLINK-11422:


 Summary: Prefer testing class to mock StreamTask in 
AbstractStreamOperatorTestHarness
 Key: FLINK-11422
 URL: https://issues.apache.org/jira/browse/FLINK-11422
 Project: Flink
  Issue Type: Improvement
  Components: Tests
Affects Versions: 1.8.0
Reporter: TisonKun
Assignee: TisonKun
 Fix For: 1.8.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [Feature]Returning RuntimeException to REST client while job submission

Hi,
It's not fixed in the master. I compiled and ran it yesterday.
I am not if that is an issue or design choice.

On Thu, Jan 24, 2019 at 11:38 AM Lavkesh Lahngir  wrote:

> Hello,
> I mentioned in the first email.
>
> Version: 1.6.2, Commit ID: 3456ad0
>
> On Thu, Jan 24, 2019 at 12:33 AM Chesnay Schepler 
> wrote:
>
>> I suggest that you first tell me which version you are using so that I
>> can a) reproduce the issue and b) check that this issue wasn't fixed in
>> master or a recent bugfix release.
>>
>> On 23.01.2019 17:16, Lavkesh Lahngir wrote:
>> > Actually, I realized my mistake that JarRunHandler is being used in the
>> > jar/run API call.
>> > And the changes are done in RestClusterClient.
>> > The problem I was facing was that It always gives me "The main method
>> > caused an error"
>> > without any more details.
>> > I am thinking when we throw ProgramInvocationException in
>> PackagedProgram.
>> > callMainMethod()
>> > we should add exceptionInMethod.getMessage() too.
>> >
>> > *---
>> >
>> a/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java*
>> >
>> > *+++
>> >
>> b/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java*
>> >
>> > @@ -543,7 +543,7 @@ public class PackagedProgram {
>> >
>> >  } else if (exceptionInMethod instanceof
>> > ProgramInvocationException) {
>> >
>> >  throw (ProgramInvocationException)
>> > exceptionInMethod;
>> >
>> >  } else {
>> >
>> > -   throw new
>> ProgramInvocationException("The
>> > main method caused an error.", exceptionInMethod);
>> >
>> > +   throw new
>> ProgramInvocationException("The
>> > main method caused an error.: " + exceptionInMethod.getMessage(),
>> > exceptionInMethod);
>> >
>> >  }
>> >
>> >  }
>> >
>> >  catch (Throwable t) {
>> >
>> > What will you suggest?
>> >
>> > On Wed, Jan 23, 2019 at 7:01 PM Chesnay Schepler 
>> wrote:
>> >
>> >> Which version are you using?
>> >>
>> >> On 23.01.2019 08:00, Lavkesh Lahngir wrote:
>> >>> Or maybe I am missing something? It looks like the JIRA is trying to
>> >> solve
>> >>> the same issues I stated 樂
>> >>> In the main method, I just threw a simple new Exception("Some
>> message")
>> >> and
>> >>> I got the response I mentioned from the rest API.
>> >>>
>> >>> Thanks.
>> >>>
>> >>> On Wed, Jan 23, 2019 at 2:50 PM Lavkesh Lahngir 
>> >> wrote:
>>  Hello,
>>  The change in FLINK-10312
>>   makes REST
>> response
>>  of the API
>>  <
>> >>
>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jars-jarid-run
>> >
>> >> not
>>  very informative. It strips the stack trace and returns a generic
>> >> message.
>>  People using flink-cluster deployment who do not have access to job
>> >> manager
>>  logs, will not be able to figure out the root cause.
>>  In the case of when the job submission fails,
>>  In 1.6.2, I get
>>  {
>>    "errors": [
>> 
>>  "org.apache.flink.client.program.ProgramInvocationException:
>> >> The
>>  main method caused an error."
>>    ]
>>  }
>> 
>>  Is there a plan to improve error messages sent to the client?
>>  Is somebody working on this already?
>> 
>>  Thanks in advance.
>>  ~Lavkesh
>> 
>> >>
>>
>>

Re: [Feature]Returning RuntimeException to REST client while job submission

Hello,
I mentioned in the first email.

Version: 1.6.2, Commit ID: 3456ad0

On Thu, Jan 24, 2019 at 12:33 AM Chesnay Schepler 
wrote:

> I suggest that you first tell me which version you are using so that I
> can a) reproduce the issue and b) check that this issue wasn't fixed in
> master or a recent bugfix release.
>
> On 23.01.2019 17:16, Lavkesh Lahngir wrote:
> > Actually, I realized my mistake that JarRunHandler is being used in the
> > jar/run API call.
> > And the changes are done in RestClusterClient.
> > The problem I was facing was that It always gives me "The main method
> > caused an error"
> > without any more details.
> > I am thinking when we throw ProgramInvocationException in
> PackagedProgram.
> > callMainMethod()
> > we should add exceptionInMethod.getMessage() too.
> >
> > *---
> >
> a/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java*
> >
> > *+++
> >
> b/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java*
> >
> > @@ -543,7 +543,7 @@ public class PackagedProgram {
> >
> >  } else if (exceptionInMethod instanceof
> > ProgramInvocationException) {
> >
> >  throw (ProgramInvocationException)
> > exceptionInMethod;
> >
> >  } else {
> >
> > -   throw new ProgramInvocationException("The
> > main method caused an error.", exceptionInMethod);
> >
> > +   throw new ProgramInvocationException("The
> > main method caused an error.: " + exceptionInMethod.getMessage(),
> > exceptionInMethod);
> >
> >  }
> >
> >  }
> >
> >  catch (Throwable t) {
> >
> > What will you suggest?
> >
> > On Wed, Jan 23, 2019 at 7:01 PM Chesnay Schepler 
> wrote:
> >
> >> Which version are you using?
> >>
> >> On 23.01.2019 08:00, Lavkesh Lahngir wrote:
> >>> Or maybe I am missing something? It looks like the JIRA is trying to
> >> solve
> >>> the same issues I stated 樂
> >>> In the main method, I just threw a simple new Exception("Some message")
> >> and
> >>> I got the response I mentioned from the rest API.
> >>>
> >>> Thanks.
> >>>
> >>> On Wed, Jan 23, 2019 at 2:50 PM Lavkesh Lahngir 
> >> wrote:
>  Hello,
>  The change in FLINK-10312
>   makes REST
> response
>  of the API
>  <
> >>
> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jars-jarid-run
> >
> >> not
>  very informative. It strips the stack trace and returns a generic
> >> message.
>  People using flink-cluster deployment who do not have access to job
> >> manager
>  logs, will not be able to figure out the root cause.
>  In the case of when the job submission fails,
>  In 1.6.2, I get
>  {
>    "errors": [
>    "org.apache.flink.client.program.ProgramInvocationException:
> >> The
>  main method caused an error."
>    ]
>  }
> 
>  Is there a plan to improve error messages sent to the client?
>  Is somebody working on this already?
> 
>  Thanks in advance.
>  ~Lavkesh
> 
> >>
>
>

[jira] [Created] (FLINK-11421) Providing more compilation options for code-generated operators

2019-01-23 Thread Liya Fan (JIRA)

Liya Fan created FLINK-11421:


 Summary: Providing more compilation options for code-generated 
operators
 Key: FLINK-11421
 URL: https://issues.apache.org/jira/browse/FLINK-11421
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Liya Fan
Assignee: Liya Fan


Flink supports some operators (like Calc, Hash Agg, Hash Join, etc.) by code 
generation. That is, Flink generates their source code dynamically, and then 
compile it into Java Byte Code, which is load and executed at runtime.

 

By default, Flink compiles the generated source code by Janino. This is fast, 
as the compilation often finishes in hundreds of milliseconds. The generated 
Java Byte Code, however, is of poor quality. To illustrate, we use Java 
Compiler API (JCA) to compile the generated code. Experiments on TPC-H (1 TB) 
queries show that the E2E time can be more than 10% shorter, when operators are 
compiled by JCA, despite that it takes more time (a few seconds) to compile 
with JCA.

 

Therefore, we believe it is beneficial to compile generated code by JCA in the 
following scenarios: 1) For batch jobs, the E2E time is relatively long, so it 
is worth of spending more time compiling and generating high quality Java Byte 
Code. 2) For repeated stream jobs, the generated code will be compiled once and 
run many times. Therefore, it pays to spend more time compiling for the first 
time, and enjoy the high byte code qualities for later runs.

 

According to the above observations, we want to provide a compilation option 
(Janino, JCA, or dynamic) for Flink, so that the user can choose the one 
suitable for their specific scenario and obtain better performance whenever 
possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Parallel CEP

Hi All,

Is there way to run CEP function parallel. Currently CEP run only
sequentially

[image: flink-CEP.png]
.

Cheers,
Dhanuka

-- 
Nothing Impossible,Creativity is more important than knowledge.

Re: [ANNOUNCE] Contributing Alibaba's Blink

2019-01-23 Thread Becket Qin

Thanks Stephan,

The plan makes sense to me.

Regarding the docs, it seems better to have a separate versioned website
because there are a lot of changes spread over the places. We can add the
banner to remind users that they are looking at the blink docs, which is
temporary and will eventually be merged into Flink master. (The banner is
pretty similar to what user will see when they visit docs of old flink
versions

[1]).

Thanks,

Jiangjie (Becket) Qn

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/libs/ml/quickstart.html

On Thu, Jan 24, 2019 at 6:21 AM Shaoxuan Wang  wrote:

> Thanks Stephan,
> The entire plan looks good to me. WRT the "Docs for Flink", a subsection
> should be good enough if we just introduce the outlines of what blink has
> changed. However, we have made detailed introductions to blink based on the
> framework of current release document of Flink (those introductions are
> distributed in each subsections). Does it make sense to create a blink
> document as a separate one, under the documentation section, say blink-1.5
> (temporary, not a release).
>
> Regards,
> Shaoxuan
>
>
> On Wed, Jan 23, 2019 at 10:15 PM Stephan Ewen  wrote:
>
> > Nice to see this lively discussion.
> >
> > *--- Branch Versus Repository ---*
> >
> > Looks like this is converging towards pushing a branch.
> > How about naming the branch simply "blink-1.5" ? That would be in line
> with
> > the 1.5 version branch of Flink, which is simply called "release-1.5" ?
> >
> > *--- SGA --- *
> >
> > The SGA (Software Grant Agreement) should be either filed already or in
> the
> > process of filing.
> >
> > *--- Offering Jars for Blink ---*
> >
> > As Chesnay and Timo mentioned, we cannot easily offer a "Release" of
> Blink
> > (source or binary), because that would require a thorough
> > checking of licenses and creating/ bundling license files. That is a lot
> of
> > work, as we recently experienced again in the Flink master.
> >
> > What we can do is upload compiled jar files and link to them somewhere in
> > the blink docs. We need to add a disclaimer that these are
> > convenience jars, and not an official Apache release. I hope that would
> > work for the users that are curious to try things out.
> >
> > *--- Docs for Blink --- *
> >
> > Do we need a versioned website here? If not, can we simply make this a
> > subsection of the current Flink snapshot docs?
> > Next to "Flink Development" and "Internals", we could have a section on
> > "Blink branch".
> > I think it is crucial, thought, to make it clear that this is temporary
> and
> > will eventually be subsumed by the main release, just
> > so that users do not get confused.
> >
> > Best,
> > Stephan
> >
> >
> > On Wed, Jan 23, 2019 at 12:23 PM Becket Qin 
> wrote:
> >
> > > Really excited to see Blink joining the Flink community!
> > >
> > > My two cents regarding repo v.s. branch, I am +1 for a branch in Flink.
> > > Among many things, what's most important at this point is probably to
> > make
> > > Blink code available to the developers so people can discuss the merge
> > > strategy. Creating a branch is probably the one of the fastest way to
> do
> > > that. We can always create separate repo later if necessary.
> > >
> > > WRT the doc and jar distribution, It is true that we are going to have
> > > some major refactoring to the code. But I can imagine some curious
> users
> > > may still want to try out something in Blink and it would be good if we
> > can
> > > do them a favor. Legal wise, my hunch is that it is probably OK for
> > someone
> > > to just build the jars and docs, host it somewhere for convenience. But
> > it
> > > should be clear that this is just for convenience purpose instead of an
> > > official release form Apache (unless we would like to make it
> official).
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Wed, Jan 23, 2019 at 6:48 PM Chesnay Schepler 
> > > wrote:
> > >
> > >>  From the ASF side Jar files do notrequire a vote/release process,
> this
> > >> is at the discretion of the PMC.
> > >>
> > >> However, I have my doubts whether at this time we could even create a
> > >> source release of Blink given that we'd have to vet the code-base
> first.
> > >>
> > >> Even without source release we could still distribute jars, but would
> > >> not be allowed to advertise them to users as they do not constitute an
> > >> official release.
> > >>
> > >> On 23.01.2019 11:41, Timo Walther wrote:
> > >> > As far as I know it, we will not provide any binaries but only the
> > >> > source code. JAR files on Apache servers would need an official
> > >> > voting/release process. Interested users can build Blink themselves
> > >> > using `mvn clean package`.
> > >> >
> > >> > @Stephan: Please correct me if I'm wrong.
> > >> >
> > >> > Regards,
> > >> > Timo
> > >> >
> > >> > Am 23.01.19 um 11:16 schrieb Kurt Young:
> >

[jira] [Created] (FLINK-11420) Serialization of case classes containing a Map[String, Any] sometimes throws ArrayIndexOutOfBounds

2019-01-23 Thread JIRA

Jürgen Kreileder created FLINK-11420:


 Summary: Serialization of case classes containing a Map[String, 
Any] sometimes throws ArrayIndexOutOfBounds
 Key: FLINK-11420
 URL: https://issues.apache.org/jira/browse/FLINK-11420
 Project: Flink
  Issue Type: Bug
  Components: Type Serialization System
Affects Versions: 1.7.1
Reporter: Jürgen Kreileder


We frequently run into random ArrayIndexOutOfBounds exceptions when flink tries 
to serialize Scala case classes containing a Map[String, Any] (Any being 
String, Long, Int, or Boolean) with the FsStateBackend. (This probably happens 
with any case class containing a type requiring Kryo, see this thread for 
instance: 
[http://mail-archives.apache.org/mod_mbox/flink-user/201710.mbox/%3cCANNGFpjX4gjV=df6tlfeojsb_rhwxs_ruoylqcqv2gvwqtt...@mail.gmail.com%3e])

Disabling asynchronous snapshots seems to work around the problem, so maybe 
something is not thread-safe in CaseClassSerializer.

Our objects look like this:
{code}
case class Event(timestamp: Long, [...], content: Map[String, Any]
case class EnrichedEvent(event: Event, additionalInfo: Map[String, Any])
{code}
I've looked at a few of the exceptions in a debugger. It always happens when 
serializing the right-hand side a tuple from EnrichedEvent -> Event -> content, 
e.g: 13 from ("foo", 13) or false from ("bar", false).

Stacktrace:
{code:java}
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 0
 at com.esotericsoftware.kryo.util.IntArray.pop(IntArray.java:157)
 at com.esotericsoftware.kryo.Kryo.reference(Kryo.java:822)
 at com.esotericsoftware.kryo.Kryo.copy(Kryo.java:863)
 at 
org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.copy(KryoSerializer.java:217)
 at 
org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:101)
 at 
org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:32)
 at 
org.apache.flink.api.scala.typeutils.TraversableSerializer.$anonfun$copy$1(TraversableSerializer.scala:69)
 at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:234)
 at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:465)
 at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:465)
 at 
org.apache.flink.api.scala.typeutils.TraversableSerializer.copy(TraversableSerializer.scala:69)
 at 
org.apache.flink.api.scala.typeutils.TraversableSerializer.copy(TraversableSerializer.scala:33)
 at 
org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:101)
 at 
org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:32)
 at 
org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:101)
 at 
org.apache.flink.api.scala.typeutils.CaseClassSerializer.copy(CaseClassSerializer.scala:32)
 at 
org.apache.flink.api.common.typeutils.base.ListSerializer.copy(ListSerializer.java:99)
 at 
org.apache.flink.api.common.typeutils.base.ListSerializer.copy(ListSerializer.java:42)
 at 
org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:287)
 at 
org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:311)
 at org.apache.flink.runtime.state.heap.HeapListState.add(HeapListState.java:95)
 at 
org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.processElement(WindowOperator.java:391)
 at 
org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
 at 
org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
 at java.base/java.lang.Thread.run(Thread.java:834){code}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [ANNOUNCE] Contributing Alibaba's Blink

2019-01-23 Thread Shaoxuan Wang

Thanks Stephan,
The entire plan looks good to me. WRT the "Docs for Flink", a subsection
should be good enough if we just introduce the outlines of what blink has
changed. However, we have made detailed introductions to blink based on the
framework of current release document of Flink (those introductions are
distributed in each subsections). Does it make sense to create a blink
document as a separate one, under the documentation section, say blink-1.5
(temporary, not a release).

Regards,
Shaoxuan


On Wed, Jan 23, 2019 at 10:15 PM Stephan Ewen  wrote:

> Nice to see this lively discussion.
>
> *--- Branch Versus Repository ---*
>
> Looks like this is converging towards pushing a branch.
> How about naming the branch simply "blink-1.5" ? That would be in line with
> the 1.5 version branch of Flink, which is simply called "release-1.5" ?
>
> *--- SGA --- *
>
> The SGA (Software Grant Agreement) should be either filed already or in the
> process of filing.
>
> *--- Offering Jars for Blink ---*
>
> As Chesnay and Timo mentioned, we cannot easily offer a "Release" of Blink
> (source or binary), because that would require a thorough
> checking of licenses and creating/ bundling license files. That is a lot of
> work, as we recently experienced again in the Flink master.
>
> What we can do is upload compiled jar files and link to them somewhere in
> the blink docs. We need to add a disclaimer that these are
> convenience jars, and not an official Apache release. I hope that would
> work for the users that are curious to try things out.
>
> *--- Docs for Blink --- *
>
> Do we need a versioned website here? If not, can we simply make this a
> subsection of the current Flink snapshot docs?
> Next to "Flink Development" and "Internals", we could have a section on
> "Blink branch".
> I think it is crucial, thought, to make it clear that this is temporary and
> will eventually be subsumed by the main release, just
> so that users do not get confused.
>
> Best,
> Stephan
>
>
> On Wed, Jan 23, 2019 at 12:23 PM Becket Qin  wrote:
>
> > Really excited to see Blink joining the Flink community!
> >
> > My two cents regarding repo v.s. branch, I am +1 for a branch in Flink.
> > Among many things, what's most important at this point is probably to
> make
> > Blink code available to the developers so people can discuss the merge
> > strategy. Creating a branch is probably the one of the fastest way to do
> > that. We can always create separate repo later if necessary.
> >
> > WRT the doc and jar distribution, It is true that we are going to have
> > some major refactoring to the code. But I can imagine some curious users
> > may still want to try out something in Blink and it would be good if we
> can
> > do them a favor. Legal wise, my hunch is that it is probably OK for
> someone
> > to just build the jars and docs, host it somewhere for convenience. But
> it
> > should be clear that this is just for convenience purpose instead of an
> > official release form Apache (unless we would like to make it official).
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Wed, Jan 23, 2019 at 6:48 PM Chesnay Schepler 
> > wrote:
> >
> >>  From the ASF side Jar files do notrequire a vote/release process, this
> >> is at the discretion of the PMC.
> >>
> >> However, I have my doubts whether at this time we could even create a
> >> source release of Blink given that we'd have to vet the code-base first.
> >>
> >> Even without source release we could still distribute jars, but would
> >> not be allowed to advertise them to users as they do not constitute an
> >> official release.
> >>
> >> On 23.01.2019 11:41, Timo Walther wrote:
> >> > As far as I know it, we will not provide any binaries but only the
> >> > source code. JAR files on Apache servers would need an official
> >> > voting/release process. Interested users can build Blink themselves
> >> > using `mvn clean package`.
> >> >
> >> > @Stephan: Please correct me if I'm wrong.
> >> >
> >> > Regards,
> >> > Timo
> >> >
> >> > Am 23.01.19 um 11:16 schrieb Kurt Young:
> >> >> Hi Timo,
> >> >>
> >> >> What about the jar files, will blink's jar be uploaded to apache
> >> >> repository? If not, i think it will be very inconvenient for users
> who
> >> >> wants to try blink and view the documents if they need some help from
> >> >> doc.
> >> >>
> >> >> Best,
> >> >> Kurt
> >> >>
> >> >>
> >> >> On Wed, Jan 23, 2019 at 6:09 PM Timo Walther 
> >> wrote:
> >> >>
> >> >>> Hi Kurt,
> >> >>>
> >> >>> I would not make the Blink's documentation visible to users or
> search
> >> >>> engines via a website. Otherwise this would communicate that Blink
> >> >>> is an
> >> >>> official release. I would suggest to put the Blink docs into `/docs`
> >> >>> and
> >> >>> people can build it with `./docs/build.sh -pi` if there are
> >> interested.
> >> >>> I would not invest time into setting up a docs infrastructure.
> >> >>>
> >> >>> Regards,
> >> >>> Timo
> >> >>>
> >> >>> Am 23.01.19 um 08:56

Re: Request multiple subpartitions of one partition

2019-01-23 Thread Chris Miller

Hi Zhijang, 

thank you for your replay. I was playing around a little in the last
days and ended up in a solution where I change the ResultPartitionView's
subpartitionIndex as soon as it returns an EndOfPartition Event. This
way I can, sequentially, receive multiple subpartitions at one single
InputChannel. 

Now receiving the subpartitions sequentially has a very poor
performance. I would like to receive them concurrently. 

As far a I found out, we have to divide between data send over
LocalInputChannels and RemoteInputChannels (via Netty). 

For remote transportation I figured out that the PartitionRequestQueue
on Sender side has to be the place to tweak. So first of all I
suppressed the processing of EndOfPartition Events at the
SingleInputGate and triggered it all togheter in the end. Then I tried
to build a new Reader within PartitionRequestQueue and request a
subpartition by using an existing reader: 

public void addParallelReader(NetworkSequenceViewReader oldReader, int
newIndex){
 NetworkSequenceViewReader newReader = new
SequenceNumberingViewReader(oldReader.getReceiverId(), this);

 //send request
 newReader.requestSubpartitionView(
 oldReader.getPartitionProvider(),
 oldReader.getResultPartitionId(),
 newIndex);

 //notify and add
 notifyReaderCreated(newReader);

 registerAvailableReader(newReader);
 availableReaders.add(newReader);
}

But unfortunately, the data was not received... 

The second place where a new subpartitionIndex has to be requested is in
the LocalInputChannel. Since the ResultSubpartitionViews are not wrapped
in Readers here, I wanted to handle multiple ResultSubpartitionViews
here. But it seemed to me that the LocalInputChannel is too firmly
interwined with the View. 

Do you have an idea for an other approach or would you say I'm on the
right track? 

Thank you. 

Chris 

On 2019/01/10 04:11:56, "zhijiang"  wrote:

> Hi Chris,>
> 
> I think your requirement seems like this:>
> 1. Determine the number of logic output partitions on upstream side.>
> 2. Determine the number of logic input channels on downstream side.>
> 3. Determine which input channel consumes corresponding output partition.>
> 
> I remembered Tez has similar mechanism. In flink the number of 
> partitions/channels and mapping relationship are determined by parallelism 
> and ship/partitioner strategy during graph generation. So Currently I think 
> it has no way to change from this aspect. But it might realize your 
> requirement from another point in flink. You can change the data distribution 
> in subpartitions. That means the previous data in sp1 can always be emitted 
> into sp0, so the worker0 which consumes sp0 can get data from both. 
> Considering implementation, you might need implement a custom partitioner 
> which controls the data distribution in all subpartitions as you required.>
> 
> Best,>
> Zhijiang>
> -->
> From:Chris Miller >
> Send Time:2019年1月9日(星期三) 20:46>
> To:dev >
> Subject:Request multiple subpartitions of one partition>
> 
> Hello, >
> 
> let's image we do a hash join of two DataSources. For the join operation>
> we choose parallelism=5. >
> 
> This way Flink uses 5 TaskManagerRunners to do the join job. In>
> particular, the DataSource tasks, EACH ARE CREATING 5 SUBPARTITIONS.>
> Every worker, now requests ONE SUBPARTITION from both DataSources. (eg.>
> worker 0 requests subpartition 0, worker 1 requests subpartition 1, ...>
> worker 4 requests subpartition 4) >
> 
> _For the case that I'm wrong until here - please correct me._ >
> 
> Now - for my special Usecase - I would like worker 0 to not only request>
> subpartition 0 BUT ALSO REQUEST SUBPARTITION 1. (sure, I also have to>
> stop worker 1 requesting subpartition 1)>
> The problem is, that I cannot just trigger requestSubpartition() in the>
> InputChannel again with another index, because the channel itself has to>
> be created first. >
> 
> Can anyone help me finding the best position to do the changes? >
> 
> Thanks. >
> 
> Chris >

 [1] 

Gruß
Benjamin 

Links:
--
[1]
file:///Applications/Spark.app/Contents/Resources/smx-composer.bundle/smx-plain-composer.html#

[jira] [Created] (FLINK-11419) StreamingFileSink fails to recover after taskmanager failure

2019-01-23 Thread Edward Rojas (JIRA)

Edward Rojas created FLINK-11419:


 Summary: StreamingFileSink fails to recover after taskmanager 
failure
 Key: FLINK-11419
 URL: https://issues.apache.org/jira/browse/FLINK-11419
 Project: Flink
  Issue Type: Bug
  Components: filesystem-connector
Affects Versions: 1.7.1
Reporter: Edward Rojas


If a job with a StreamingFileSink sending data to HDFS is running in a cluster 
with multiple taskmanagers and the taskmanagers executing the job goes down 
(for some reason) "missing data in tmp file" because it's not able to perform a 
truncate in the file.

 Here the full stack trace:
{code:java}
java.io.IOException: Missing data in tmp file: 
hdfs://path/to/hdfs/2019-01-20/.part-0-0.inprogress.823f9c20-3594-4fe3-ae8c-f57b6c35e191
at 
org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.(HadoopRecoverableFsDataOutputStream.java:93)
at 
org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.recover(HadoopRecoverableWriter.java:72)
at 
org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.restoreInProgressFile(Bucket.java:140)
at 
org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.(Bucket.java:127)
at 
org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.restore(Bucket.java:396)
at 
org.apache.flink.streaming.api.functions.sink.filesystem.DefaultBucketFactoryImpl.restoreBucket(DefaultBucketFactoryImpl.java:64)
at 
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.handleRestoredBucketState(Buckets.java:177)
at 
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.initializeActiveBuckets(Buckets.java:165)
at 
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.initializeState(Buckets.java:149)
at 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.initializeState(StreamingFileSink.java:334)
at 
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
at 
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:278)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
at java.lang.Thread.run(Thread.java:745)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
 Failed to TRUNCATE_FILE 
/path/to/hdfs/2019-01-20/.part-0-0.inprogress.823f9c20-3594-4fe3-ae8c-f57b6c35e191
 for DFSClient_NONMAPREDUCE_-2103482360_62 on x.xxx.xx.xx because this file 
lease is currently owned by DFSClient_NONMAPREDUCE_1834204750_59 on x.xx.xx.xx
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:3190)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncateInternal(FSNamesystem.java:2282)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncateInt(FSNamesystem.java:2228)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncate(FSNamesystem.java:2198)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.truncate(NameNodeRpcServer.java:1056)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.truncate(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1489)
at org.apache.hadoop.ipc.Client.call(Client.java:1435)
at org.apache.hadoop.ipc.Client.call(Client.java:1345)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Thomas Weise

+1 for trimming the size by default and offering the fat distribution as
alternative download


On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann  wrote:

> Ufuk's proposal (having a lean default release and a user convenience
> tarball) sounds good to me. That way advanced users won't be bothered by an
> unnecessarily large release and new users can benefit from having many
> useful extensions bundled in one tarball.
>
> Cheers,
> Till
>
> On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi  wrote:
>
> > On Wed, Jan 23, 2019 at 11:01 AM Timo Walther 
> wrote:
> > > I think what is more important than a big dist bundle is a helpful
> > > "Downloads" page where users can easily find available filesystems,
> > > connectors, metric repoters. Not everyone checks Maven central for
> > > available JAR files. I just saw that we added a "Optional components"
> > > section recently [1], we just need to make it more prominent. This is
> > > also done for the SQL connectors and formats [2].
> >
> > +1 I fully agree with the importance of the Downloads page. We
> > definitely need to make any optional dependencies that users need to
> > download easy to find.
> >
>

Re: [DISCUSS] A strategy for merging the Blink enhancements

2019-01-23 Thread Till Rohrmann

+1 for Stephan's merge proposal. I think it makes sense to pause the
development of the Table API for a short time in order to be able to
quickly converge on a common API.

>From my experience with the Flip-6 refactoring it can be challenging to
catch up with a branch which is actively developed. The biggest danger is
to miss changes which are only ported to a single branch and to develop
features which are not compatible with the other branch. Limiting the
changes to critical fixes and paying attention to applying them also to the
other branch should help with this problem.

Cheers,
Till

On Wed, Jan 23, 2019 at 3:28 PM Stephan Ewen  wrote:

> I think that is a reasonable proposal. Bugs that are identified could be
> fixed in the blink branch, so that we merge the working code.
>
> New feature contributions to that branch would complicate the merge. I
> would try and rather focus on merging and let new contributions go to the
> master branch.
>
> On Tue, Jan 22, 2019 at 11:12 PM Zhang, Xuefu 
> wrote:
>
> > Hi Stephan,
> >
> > Thanks for bringing up the discussions. I'm +1 on the merging plan. One
> > question though: since the merge will not be completed for some time and
> > there are might be uses trying blink branch, what's the plan for the
> > development in the branch? Personally I think we may discourage big
> > contributions to the branch, which would further complicate the merge,
> > while we shouldn't stop critical fixes as well.
> >
> > What's your take on this?
> >
> > Thanks,
> > Xuefu
> >
> >
> > --
> > From:Stephan Ewen 
> > Sent At:2019 Jan. 22 (Tue.) 06:16
> > To:dev 
> > Subject:[DISCUSS] A strategy for merging the Blink enhancements
> >
> > Dear Flink community!
> >
> > As a follow-up to the thread announcing Alibaba's offer to contribute the
> > Blink code [1]
> > <
> >
> https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E
> > >
> > ,
> > here are some thoughts on how this contribution could be merged.
> >
> > As described in the announcement thread, it is a big contribution, and we
> > need to
> > carefully plan how to handle the contribution. We would like to get the
> > improvements to Flink,
> > while making it as non-disruptive as possible for the community.
> > I hope that this plan gives the community get a better understanding of
> > what the
> > proposed contribution would mean.
> >
> > Here is an initial rough proposal, with thoughts from
> > Timo, Piotr, Dawid, Kurt, Shaoxuan, Jincheng, Jark, Aljoscha, Fabian,
> > Xiaowei:
> >
> >   - It is obviously very hard to merge all changes in a quick move,
> because
> > we
> > are talking about multiple 100k lines of code.
> >
> >   - As much as possible, we want to maintain compatibility with the
> current
> > Table API,
> > so that this becomes a transparent change for most users.
> >
> >   - The two areas with the most changes we identified were
> >  (1) The SQL/Table query processor
> >  (2) The batch scheduling/failover/shuffle
> >
> >   - For the query processor part, this is what we found and propose:
> >
> > -> The Blink and Flink code have the same semantics (ANSI SQL) except
> > for minor
> >aspects (under discussion). Blink also covers more SQL operations.
> >
> > -> The Blink code is quite different from the current Flink SQL
> > runtime.
> >Merging as changes seems hardly feasible. From the current
> > evaluation, the
> >Blink query processor uses the more advanced architecture, so it
> > would make
> >sense to converge to that design.
> >
> > -> We propose to gradually build up the Blink-based query processor
> as
> > a second
> >query processor under the SQL/Table API. Think of it as two
> > different runners
> >for the Table API.
> >As the new query processor becomes fully merged and stable, we can
> > deprecate and
> >eventually remove the existing query processor. That should give
> the
> > least
> >disruption to Flink users and allow for gradual merge/development.
> >
> > -> Some refactoring of the Table API is necessary to support the
> above
> > strategy.
> >Most of the prerequisite refactoring is around splitting the
> project
> > into
> >different modules, following a similar idea as FLIP-28 [2]
> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free
> > >
> > .
> >
> > -> A more detailed proposal is being worked on.
> >
> > -> Same as FLIP-28, this approach would probably need to suspend
> Table
> > API
> >contributions for a short while. We hope that this can be a very
> > short period,
> >to not impact the very active development in Flink on Table
> API/SQL
> > too much.
> >
> >   - For the batch scheduling and failover enhancements, we should be able
> > to build
> >

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Till Rohrmann

Ufuk's proposal (having a lean default release and a user convenience
tarball) sounds good to me. That way advanced users won't be bothered by an
unnecessarily large release and new users can benefit from having many
useful extensions bundled in one tarball.

Cheers,
Till

On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi  wrote:

> On Wed, Jan 23, 2019 at 11:01 AM Timo Walther  wrote:
> > I think what is more important than a big dist bundle is a helpful
> > "Downloads" page where users can easily find available filesystems,
> > connectors, metric repoters. Not everyone checks Maven central for
> > available JAR files. I just saw that we added a "Optional components"
> > section recently [1], we just need to make it more prominent. This is
> > also done for the SQL connectors and formats [2].
>
> +1 I fully agree with the importance of the Downloads page. We
> definitely need to make any optional dependencies that users need to
> download easy to find.
>

Re: [Feature]Returning RuntimeException to REST client while job submission

I suggest that you first tell me which version you are using so that I 
can a) reproduce the issue and b) check that this issue wasn't fixed in 
master or a recent bugfix release.


On 23.01.2019 17:16, Lavkesh Lahngir wrote:

Actually, I realized my mistake that JarRunHandler is being used in the
jar/run API call.
And the changes are done in RestClusterClient.
The problem I was facing was that It always gives me "The main method
caused an error"
without any more details.
I am thinking when we throw ProgramInvocationException in PackagedProgram.
callMainMethod()
we should add exceptionInMethod.getMessage() too.

*---
a/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java*

*+++
b/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java*

@@ -543,7 +543,7 @@ public class PackagedProgram {

 } else if (exceptionInMethod instanceof
ProgramInvocationException) {

 throw (ProgramInvocationException)
exceptionInMethod;

 } else {

-   throw new ProgramInvocationException("The
main method caused an error.", exceptionInMethod);

+   throw new ProgramInvocationException("The
main method caused an error.: " + exceptionInMethod.getMessage(),
exceptionInMethod);

 }

 }

 catch (Throwable t) {

What will you suggest?

On Wed, Jan 23, 2019 at 7:01 PM Chesnay Schepler  wrote:


Which version are you using?

On 23.01.2019 08:00, Lavkesh Lahngir wrote:

Or maybe I am missing something? It looks like the JIRA is trying to

solve

the same issues I stated 樂
In the main method, I just threw a simple new Exception("Some message")

and

I got the response I mentioned from the rest API.

Thanks.

On Wed, Jan 23, 2019 at 2:50 PM Lavkesh Lahngir 

wrote:

Hello,
The change in FLINK-10312
 makes REST response
of the API
<

https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jars-jarid-run>
not

very informative. It strips the stack trace and returns a generic

message.

People using flink-cluster deployment who do not have access to job

manager

logs, will not be able to figure out the root cause.
In the case of when the job submission fails,
In 1.6.2, I get
{
  "errors": [
  "org.apache.flink.client.program.ProgramInvocationException:

The

main method caused an error."
  ]
}

Is there a plan to improve error messages sent to the client?
Is somebody working on this already?

Thanks in advance.
~Lavkesh

Re: [Feature]Returning RuntimeException to REST client while job submission

Actually, I realized my mistake that JarRunHandler is being used in the
jar/run API call.
And the changes are done in RestClusterClient.
The problem I was facing was that It always gives me "The main method
caused an error"
without any more details.
I am thinking when we throw ProgramInvocationException in PackagedProgram.
callMainMethod()
we should add exceptionInMethod.getMessage() too.

*---
a/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java*

*+++
b/flink-clients/src/main/java/org/apache/flink/client/program/PackagedProgram.java*

@@ -543,7 +543,7 @@ public class PackagedProgram {

} else if (exceptionInMethod instanceof
ProgramInvocationException) {

throw (ProgramInvocationException)
exceptionInMethod;

} else {

-   throw new ProgramInvocationException("The
main method caused an error.", exceptionInMethod);

+   throw new ProgramInvocationException("The
main method caused an error.: " + exceptionInMethod.getMessage(),
exceptionInMethod);

}

}

catch (Throwable t) {

What will you suggest?

On Wed, Jan 23, 2019 at 7:01 PM Chesnay Schepler  wrote:

> Which version are you using?
>
> On 23.01.2019 08:00, Lavkesh Lahngir wrote:
> > Or maybe I am missing something? It looks like the JIRA is trying to
> solve
> > the same issues I stated 樂
> > In the main method, I just threw a simple new Exception("Some message")
> and
> > I got the response I mentioned from the rest API.
> >
> > Thanks.
> >
> > On Wed, Jan 23, 2019 at 2:50 PM Lavkesh Lahngir 
> wrote:
> >
> >> Hello,
> >> The change in FLINK-10312
> >>  makes REST response
> >> of the API
> >> <
> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#jars-jarid-run>
> not
> >> very informative. It strips the stack trace and returns a generic
> message.
> >> People using flink-cluster deployment who do not have access to job
> manager
> >> logs, will not be able to figure out the root cause.
> >> In the case of when the job submission fails,
> >> In 1.6.2, I get
> >> {
> >>  "errors": [
> >>  "org.apache.flink.client.program.ProgramInvocationException:
> The
> >> main method caused an error."
> >>  ]
> >> }
> >>
> >> Is there a plan to improve error messages sent to the client?
> >> Is somebody working on this already?
> >>
> >> Thanks in advance.
> >> ~Lavkesh
> >>
>
>

[jira] [Created] (FLINK-11418) Unable to build docs in Docker image

2019-01-23 Thread Robert Metzger (JIRA)

Robert Metzger created FLINK-11418:
--

 Summary: Unable to build docs in Docker image
 Key: FLINK-11418
 URL: https://issues.apache.org/jira/browse/FLINK-11418
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.8.0
Reporter: Robert Metzger


Running 
{code:java}
cd flink/docs/docker
./run.sh{code}
 

And then in the container
{code:java}
Welcome to Apache Flink docs
To build, execute
./build_docs.sh
To watch and regenerate automatically
./build_docs.sh -p
and access http://localhost:4000

bash-4.4$ ./build_docs.sh -p
Traceback (most recent call last):
2: from /usr/local/bin/bundle:23:in `'
1: from /usr/share/rubygems/rubygems.rb:308:in `activate_bin_path'
/usr/share/rubygems/rubygems.rb:289:in `find_spec_for_exe': can't find gem 
bundler (>= 0.a) with executable bundle (Gem::GemNotFoundException){code}
I believe there's something wrong.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Ufuk Celebi

On Wed, Jan 23, 2019 at 11:01 AM Timo Walther  wrote:
> I think what is more important than a big dist bundle is a helpful
> "Downloads" page where users can easily find available filesystems,
> connectors, metric repoters. Not everyone checks Maven central for
> available JAR files. I just saw that we added a "Optional components"
> section recently [1], we just need to make it more prominent. This is
> also done for the SQL connectors and formats [2].

+1 I fully agree with the importance of the Downloads page. We
definitely need to make any optional dependencies that users need to
download easy to find.

Re: [DISCUSS] A strategy for merging the Blink enhancements

2019-01-23 Thread Stephan Ewen

I think that is a reasonable proposal. Bugs that are identified could be
fixed in the blink branch, so that we merge the working code.

New feature contributions to that branch would complicate the merge. I
would try and rather focus on merging and let new contributions go to the
master branch.

On Tue, Jan 22, 2019 at 11:12 PM Zhang, Xuefu 
wrote:

> Hi Stephan,
>
> Thanks for bringing up the discussions. I'm +1 on the merging plan. One
> question though: since the merge will not be completed for some time and
> there are might be uses trying blink branch, what's the plan for the
> development in the branch? Personally I think we may discourage big
> contributions to the branch, which would further complicate the merge,
> while we shouldn't stop critical fixes as well.
>
> What's your take on this?
>
> Thanks,
> Xuefu
>
>
> --
> From:Stephan Ewen 
> Sent At:2019 Jan. 22 (Tue.) 06:16
> To:dev 
> Subject:[DISCUSS] A strategy for merging the Blink enhancements
>
> Dear Flink community!
>
> As a follow-up to the thread announcing Alibaba's offer to contribute the
> Blink code [1]
> <
> https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E
> >
> ,
> here are some thoughts on how this contribution could be merged.
>
> As described in the announcement thread, it is a big contribution, and we
> need to
> carefully plan how to handle the contribution. We would like to get the
> improvements to Flink,
> while making it as non-disruptive as possible for the community.
> I hope that this plan gives the community get a better understanding of
> what the
> proposed contribution would mean.
>
> Here is an initial rough proposal, with thoughts from
> Timo, Piotr, Dawid, Kurt, Shaoxuan, Jincheng, Jark, Aljoscha, Fabian,
> Xiaowei:
>
>   - It is obviously very hard to merge all changes in a quick move, because
> we
> are talking about multiple 100k lines of code.
>
>   - As much as possible, we want to maintain compatibility with the current
> Table API,
> so that this becomes a transparent change for most users.
>
>   - The two areas with the most changes we identified were
>  (1) The SQL/Table query processor
>  (2) The batch scheduling/failover/shuffle
>
>   - For the query processor part, this is what we found and propose:
>
> -> The Blink and Flink code have the same semantics (ANSI SQL) except
> for minor
>aspects (under discussion). Blink also covers more SQL operations.
>
> -> The Blink code is quite different from the current Flink SQL
> runtime.
>Merging as changes seems hardly feasible. From the current
> evaluation, the
>Blink query processor uses the more advanced architecture, so it
> would make
>sense to converge to that design.
>
> -> We propose to gradually build up the Blink-based query processor as
> a second
>query processor under the SQL/Table API. Think of it as two
> different runners
>for the Table API.
>As the new query processor becomes fully merged and stable, we can
> deprecate and
>eventually remove the existing query processor. That should give the
> least
>disruption to Flink users and allow for gradual merge/development.
>
> -> Some refactoring of the Table API is necessary to support the above
> strategy.
>Most of the prerequisite refactoring is around splitting the project
> into
>different modules, following a similar idea as FLIP-28 [2]
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free
> >
> .
>
> -> A more detailed proposal is being worked on.
>
> -> Same as FLIP-28, this approach would probably need to suspend Table
> API
>contributions for a short while. We hope that this can be a very
> short period,
>to not impact the very active development in Flink on Table API/SQL
> too much.
>
>   - For the batch scheduling and failover enhancements, we should be able
> to build
> on the currently ongoing refactoring of the scheduling logic [3]
> . That should
> make it easy to plug in a new scheduler and failover logic. We can port
> the Blink
> enhancements as a new scheduler / failover handler. We can later make
> it the
> default for bounded stream programs once the merge is completed and it
> is tested.
>
>   - For the catalog and source/sink design and interfaces, we would like to
> continue with the already started design discussion threads. Once these
> are
> converged, we might use some of the Blink code for the implementation,
> if it
> is close to the outcome of the design discussions.
>
> Best,
> Stephan
>
> [1]
>
> https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E
>
> [2]
>
>

Re: [ANNOUNCE] Contributing Alibaba's Blink

2019-01-23 Thread Stephan Ewen

Nice to see this lively discussion.

*--- Branch Versus Repository ---*

Looks like this is converging towards pushing a branch.
How about naming the branch simply "blink-1.5" ? That would be in line with
the 1.5 version branch of Flink, which is simply called "release-1.5" ?

*--- SGA --- *

The SGA (Software Grant Agreement) should be either filed already or in the
process of filing.

*--- Offering Jars for Blink ---*

As Chesnay and Timo mentioned, we cannot easily offer a "Release" of Blink
(source or binary), because that would require a thorough
checking of licenses and creating/ bundling license files. That is a lot of
work, as we recently experienced again in the Flink master.

What we can do is upload compiled jar files and link to them somewhere in
the blink docs. We need to add a disclaimer that these are
convenience jars, and not an official Apache release. I hope that would
work for the users that are curious to try things out.

*--- Docs for Blink --- *

Do we need a versioned website here? If not, can we simply make this a
subsection of the current Flink snapshot docs?
Next to "Flink Development" and "Internals", we could have a section on
"Blink branch".
I think it is crucial, thought, to make it clear that this is temporary and
will eventually be subsumed by the main release, just
so that users do not get confused.

Best,
Stephan


On Wed, Jan 23, 2019 at 12:23 PM Becket Qin  wrote:

> Really excited to see Blink joining the Flink community!
>
> My two cents regarding repo v.s. branch, I am +1 for a branch in Flink.
> Among many things, what's most important at this point is probably to make
> Blink code available to the developers so people can discuss the merge
> strategy. Creating a branch is probably the one of the fastest way to do
> that. We can always create separate repo later if necessary.
>
> WRT the doc and jar distribution, It is true that we are going to have
> some major refactoring to the code. But I can imagine some curious users
> may still want to try out something in Blink and it would be good if we can
> do them a favor. Legal wise, my hunch is that it is probably OK for someone
> to just build the jars and docs, host it somewhere for convenience. But it
> should be clear that this is just for convenience purpose instead of an
> official release form Apache (unless we would like to make it official).
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Jan 23, 2019 at 6:48 PM Chesnay Schepler 
> wrote:
>
>>  From the ASF side Jar files do notrequire a vote/release process, this
>> is at the discretion of the PMC.
>>
>> However, I have my doubts whether at this time we could even create a
>> source release of Blink given that we'd have to vet the code-base first.
>>
>> Even without source release we could still distribute jars, but would
>> not be allowed to advertise them to users as they do not constitute an
>> official release.
>>
>> On 23.01.2019 11:41, Timo Walther wrote:
>> > As far as I know it, we will not provide any binaries but only the
>> > source code. JAR files on Apache servers would need an official
>> > voting/release process. Interested users can build Blink themselves
>> > using `mvn clean package`.
>> >
>> > @Stephan: Please correct me if I'm wrong.
>> >
>> > Regards,
>> > Timo
>> >
>> > Am 23.01.19 um 11:16 schrieb Kurt Young:
>> >> Hi Timo,
>> >>
>> >> What about the jar files, will blink's jar be uploaded to apache
>> >> repository? If not, i think it will be very inconvenient for users who
>> >> wants to try blink and view the documents if they need some help from
>> >> doc.
>> >>
>> >> Best,
>> >> Kurt
>> >>
>> >>
>> >> On Wed, Jan 23, 2019 at 6:09 PM Timo Walther 
>> wrote:
>> >>
>> >>> Hi Kurt,
>> >>>
>> >>> I would not make the Blink's documentation visible to users or search
>> >>> engines via a website. Otherwise this would communicate that Blink
>> >>> is an
>> >>> official release. I would suggest to put the Blink docs into `/docs`
>> >>> and
>> >>> people can build it with `./docs/build.sh -pi` if there are
>> interested.
>> >>> I would not invest time into setting up a docs infrastructure.
>> >>>
>> >>> Regards,
>> >>> Timo
>> >>>
>> >>> Am 23.01.19 um 08:56 schrieb Kurt Young:
>>  Thanks @Stephan for this exciting announcement!
>> 
>>  >From my point of view, i would prefer to use branch. It makes the
>> >>> message
>>  "Blink is pat of Flink" more straightforward and clear.
>> 
>>  Except for the location of blink codes, there are some other
>> questions
>> >>> like
>>  what version should should use, and where do we put blink's
>> documents.
>>  Currently, we choose to use "1.5.1-blink-r0" as blink's version since
>> >>> blink
>>  forked from Flink's 1.5.1. We also added some docs to blink just as
>>  Flink
>>  did. Can blink use a website like
>>  "https://ci.apache.org/projects/flink/flink-docs-release-1.7/; to
>> put
>> >>> all
>>  blink's docs, change it to something like
>>

Re: [DISCUSS] Start new Review Process

I have the bot now running in https://github.com/flinkbot/test-repo/pulls
Feel free to play with it.

On Wed, Jan 23, 2019 at 10:25 AM Robert Metzger  wrote:

> Okay, cool! I'll let you know when the bot is ready in a test repo.
> While you (and others) are testing it, I'll open a PR for the docs.
>
> On Wed, Jan 23, 2019 at 10:15 AM Fabian Hueske  wrote:
>
>> Oh, that's great news!
>> In that case we can just close the PR and start with the bot right away.
>> I think it would be good to extend the PR Review guide [1] with a section
>> about the bot and how to use it.
>>
>> Fabian
>>
>> [1] https://flink.apache.org/reviewing-prs.html
>>
>> Am Mi., 23. Jan. 2019 um 10:03 Uhr schrieb Robert Metzger <
>> rmetz...@apache.org>:
>>
>> > Hey,
>> >
>> > as I've mentioned already in the pull request, I have started
>> implementing
>> > a little bot for GitHub that tracks the checklist [1]
>> > The bot is monitoring incoming pull requests. It creates a comment with
>> the
>> > checklist.
>> > Reviewers can write a message to the bot (such as "@flinkbot approve
>> > contribution"), then the bot will update the checklist comment.
>> >
>> > As an upcoming feature, I also plan to add a label to the pull request
>> when
>> > all the checklist conditions have been met.
>> >
>> > I hope to finish the bot today. After some initial testing, we can
>> deploy
>> > it with Flink (if there are no objections by the community).
>> >
>> >
>> > [1] https://github.com/rmetzger/flink-community-tools
>> >
>> >
>> > On Tue, Jan 22, 2019 at 3:48 PM Fabian Hueske 
>> wrote:
>> >
>> > > Hi everyone,
>> > >
>> > > A few months ago the community discussed and agreed to improve the PR
>> > > review process [1-4].
>> > > The idea is to follow a checklist to avoid in-depth reviews on
>> > > contributions that might not be accepted for other reasons. Thereby,
>> > > reviewers and contributors do not spend their time on PRs that will
>> not
>> > be
>> > > merged.
>> > > The checklist consists of five points:
>> > >
>> > > 1. The contribution is well-described.
>> > > 2. There is consensus that the contribution should go into to Flink.
>> > > 3. [Does not need specific attention | Needs specific attention for X
>> |
>> > Has
>> > > attention for X by Y]
>> > > 4. The architectural approach is sound.
>> > > 5. Overall code quality is good.
>> > >
>> > > Back then we added a review guide to the website [5] but did not put
>> the
>> > > new process in place yet. I would like to start this now.
>> > > There is a PR [6] that adds the review checklist to the PR template.
>> > > Committers who review add PR should follow the checklist and tick and
>> > sign
>> > > off the boxes by updating the PR description. For that committers
>> need to
>> > > be members of the ASF Github organization.
>> > >
>> > > If nobody has concerns, I'll merge the PR in a few days.
>> > > Once the PR is merged, the reviews of all new PRs should follow the
>> > > checklist.
>> > >
>> > > Best,
>> > > Fabian
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://lists.apache.org/thread.html/dcbe377eb477b531f49c462e90d8b1e50e0ff33c6efd296081c6934d@%3Cdev.flink.apache.org%3E
>> > > [2]
>> > >
>> > >
>> >
>> https://lists.apache.org/thread.html/172aa6d12ed442ea4da9ed2a72fe0894c9be7408fb2e1b7b50dfcb8c@%3Cdev.flink.apache.org%3E
>> > > [3]
>> > >
>> > >
>> >
>> https://lists.apache.org/thread.html/5e07c1be8078dd7b89d93c67b71defacff137f3df56ccf4adb04b4d7@%3Cdev.flink.apache.org%3E
>> > > [4]
>> > >
>> > >
>> >
>> https://lists.apache.org/thread.html/d7fd1fe45949f7c706142c62de85d246c7f6a1485a186fd3e9dced01@%3Cdev.flink.apache.org%3E
>> > > [5] https://flink.apache.org/reviewing-prs.html
>> > > [6] https://github.com/apache/flink/pull/6873
>> > >
>> >
>>
>

Re: [ANNOUNCE] Contributing Alibaba's Blink

2019-01-23 Thread Becket Qin

Really excited to see Blink joining the Flink community!

My two cents regarding repo v.s. branch, I am +1 for a branch in Flink.
Among many things, what's most important at this point is probably to make
Blink code available to the developers so people can discuss the merge
strategy. Creating a branch is probably the one of the fastest way to do
that. We can always create separate repo later if necessary.

WRT the doc and jar distribution, It is true that we are going to have some
major refactoring to the code. But I can imagine some curious users may
still want to try out something in Blink and it would be good if we can do
them a favor. Legal wise, my hunch is that it is probably OK for someone to
just build the jars and docs, host it somewhere for convenience. But it
should be clear that this is just for convenience purpose instead of an
official release form Apache (unless we would like to make it official).

Thanks,

Jiangjie (Becket) Qin

On Wed, Jan 23, 2019 at 6:48 PM Chesnay Schepler  wrote:

>  From the ASF side Jar files do notrequire a vote/release process, this
> is at the discretion of the PMC.
>
> However, I have my doubts whether at this time we could even create a
> source release of Blink given that we'd have to vet the code-base first.
>
> Even without source release we could still distribute jars, but would
> not be allowed to advertise them to users as they do not constitute an
> official release.
>
> On 23.01.2019 11:41, Timo Walther wrote:
> > As far as I know it, we will not provide any binaries but only the
> > source code. JAR files on Apache servers would need an official
> > voting/release process. Interested users can build Blink themselves
> > using `mvn clean package`.
> >
> > @Stephan: Please correct me if I'm wrong.
> >
> > Regards,
> > Timo
> >
> > Am 23.01.19 um 11:16 schrieb Kurt Young:
> >> Hi Timo,
> >>
> >> What about the jar files, will blink's jar be uploaded to apache
> >> repository? If not, i think it will be very inconvenient for users who
> >> wants to try blink and view the documents if they need some help from
> >> doc.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Wed, Jan 23, 2019 at 6:09 PM Timo Walther 
> wrote:
> >>
> >>> Hi Kurt,
> >>>
> >>> I would not make the Blink's documentation visible to users or search
> >>> engines via a website. Otherwise this would communicate that Blink
> >>> is an
> >>> official release. I would suggest to put the Blink docs into `/docs`
> >>> and
> >>> people can build it with `./docs/build.sh -pi` if there are interested.
> >>> I would not invest time into setting up a docs infrastructure.
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>> Am 23.01.19 um 08:56 schrieb Kurt Young:
>  Thanks @Stephan for this exciting announcement!
> 
>  >From my point of view, i would prefer to use branch. It makes the
> >>> message
>  "Blink is pat of Flink" more straightforward and clear.
> 
>  Except for the location of blink codes, there are some other questions
> >>> like
>  what version should should use, and where do we put blink's documents.
>  Currently, we choose to use "1.5.1-blink-r0" as blink's version since
> >>> blink
>  forked from Flink's 1.5.1. We also added some docs to blink just as
>  Flink
>  did. Can blink use a website like
>  "https://ci.apache.org/projects/flink/flink-docs-release-1.7/; to put
> >>> all
>  blink's docs, change it to something like
>  https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ?
> 
>  Best,
>  Kurt
> 
> 
>  On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng 
> >>> wrote:
> > Hi all,
> >
> > @Stephan  Thanks a lot for driving these efforts. I think a lot of
> >>> people
> > is already waiting for this.
> > +1 for opening the blink source code.
> > Both a separate repository or a special branch is ok for me.
> > Hopefully,
> > this will not last too long.
> >
> > Best, Hequn
> >
> >
> > On Tue, Jan 22, 2019 at 11:35 PM Jark Wu  wrote:
> >
> >> Great news! Looking forward to the new wave of developments.
> >>
> >> If Blink needs to be continuously updated, fix bugs, release
> >> versions,
> >> maybe a separate repository is a better idea.
> >>
> >> Best,
> >> Jark
> >>
> >> On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński 
> >>> wrote:
> >>> Hey!
> >>> I also think that creating the separate branch for Blink in
> >>> Flink repo
> >> is a
> >>> better idea than creating the fork as IMHO it will allow merging
> > changes
> >>> more easily.
> >>>
> >>> Best Regards,
> >>> Dom.
> >>>
> >>> wt., 22 sty 2019 o 10:09 Ufuk Celebi  napisał(a):
> >>>
>  Hey Stephan and others,
> 
>  thanks for the summary. I'm very excited about the outlined
> >> improvements.
>  :-)
> 
>  Separate branch vs. fork: I'm fine with either of the

Re: [Feature]Returning RuntimeException to REST client while job submission


Which version are you using?

On 23.01.2019 08:00, Lavkesh Lahngir wrote:

Or maybe I am missing something? It looks like the JIRA is trying to solve
the same issues I stated 樂
In the main method, I just threw a simple new Exception("Some message") and
I got the response I mentioned from the rest API.

Thanks.

On Wed, Jan 23, 2019 at 2:50 PM Lavkesh Lahngir  wrote:


Hello,
The change in FLINK-10312
 makes REST response
of the API

 not
very informative. It strips the stack trace and returns a generic message.
People using flink-cluster deployment who do not have access to job manager
logs, will not be able to figure out the root cause.
In the case of when the job submission fails,
In 1.6.2, I get
{
 "errors": [
 "org.apache.flink.client.program.ProgramInvocationException: The
main method caused an error."
 ]
}

Is there a plan to improve error messages sent to the client?
Is somebody working on this already?

Thanks in advance.
~Lavkesh

Re: [ANNOUNCE] Contributing Alibaba's Blink

From the ASF side Jar files do notrequire a vote/release process, this 
is at the discretion of the PMC.


However, I have my doubts whether at this time we could even create a 
source release of Blink given that we'd have to vet the code-base first.


Even without source release we could still distribute jars, but would 
not be allowed to advertise them to users as they do not constitute an 
official release.


On 23.01.2019 11:41, Timo Walther wrote:
As far as I know it, we will not provide any binaries but only the 
source code. JAR files on Apache servers would need an official 
voting/release process. Interested users can build Blink themselves 
using `mvn clean package`.


@Stephan: Please correct me if I'm wrong.

Regards,
Timo

Am 23.01.19 um 11:16 schrieb Kurt Young:

Hi Timo,

What about the jar files, will blink's jar be uploaded to apache
repository? If not, i think it will be very inconvenient for users who
wants to try blink and view the documents if they need some help from 
doc.


Best,
Kurt


On Wed, Jan 23, 2019 at 6:09 PM Timo Walther  wrote:


Hi Kurt,

I would not make the Blink's documentation visible to users or search
engines via a website. Otherwise this would communicate that Blink 
is an
official release. I would suggest to put the Blink docs into `/docs` 
and

people can build it with `./docs/build.sh -pi` if there are interested.
I would not invest time into setting up a docs infrastructure.

Regards,
Timo

Am 23.01.19 um 08:56 schrieb Kurt Young:

Thanks @Stephan for this exciting announcement!

>From my point of view, i would prefer to use branch. It makes the

message

"Blink is pat of Flink" more straightforward and clear.

Except for the location of blink codes, there are some other questions

like

what version should should use, and where do we put blink's documents.
Currently, we choose to use "1.5.1-blink-r0" as blink's version since

blink
forked from Flink's 1.5.1. We also added some docs to blink just as 
Flink

did. Can blink use a website like
"https://ci.apache.org/projects/flink/flink-docs-release-1.7/; to put

all

blink's docs, change it to something like
https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ?

Best,
Kurt


On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng 

wrote:

Hi all,

@Stephan  Thanks a lot for driving these efforts. I think a lot of

people

is already waiting for this.
+1 for opening the blink source code.
Both a separate repository or a special branch is ok for me. 
Hopefully,

this will not last too long.

Best, Hequn


On Tue, Jan 22, 2019 at 11:35 PM Jark Wu  wrote:


Great news! Looking forward to the new wave of developments.

If Blink needs to be continuously updated, fix bugs, release 
versions,

maybe a separate repository is a better idea.

Best,
Jark

On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński 

wrote:

Hey!
I also think that creating the separate branch for Blink in 
Flink repo

is a

better idea than creating the fork as IMHO it will allow merging

changes

more easily.

Best Regards,
Dom.

wt., 22 sty 2019 o 10:09 Ufuk Celebi  napisał(a):


Hey Stephan and others,

thanks for the summary. I'm very excited about the outlined

improvements.

:-)

Separate branch vs. fork: I'm fine with either of the suggestions.
Depending on the expected strategy for merging the changes, 
expected
number of additional changes, etc., either one or the other 
approach

might be better suited.

– Ufuk

On Tue, Jan 22, 2019 at 9:20 AM Kurt Young  
wrote:

Hi Driesprong,

Glad to hear that you're interested with blink's codes. Actually,

blink

only has one branch by itself, so either a separated repo or a

flink's

branch works for blink's code share.

Best,
Kurt


On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko


wrote:


Great news Stephan!

Why not make the code available by having a fork of Flink on

Alibaba's

Github account. This will allow us to do easy diff's in the

Github

UI

and

create PR's of cherry-picked commits if needed. I can imagine

that

the
Blink codebase has a lot of branches by itself, so just 
pushing a

couple of

branches to the main Flink repo is not ideal. Looking forward to

it!

Cheers, Fokko





Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <

wshaox...@gmail.com

:

big +1 to contribute Blink codebase directly into the Apache

Flink

project.

Looking forward to the new journey.

Regards,
Shaoxuan

On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <

xiaow...@gmail.com>

wrote:

   Thanks Stephan! We are hoping to make the process as

non-disruptive as

possible to the Flink community. Making the Blink codebase

public

is

the

first step that hopefully facilitates further discussions.
Xiaowei

  On Monday, January 21, 2019, 11:46:28 AM PST, Stephan

Ewen

<

se...@apache.org> wrote:

   Dear Flink Community!

Some of you may have heard it already from announcements or

from

a

Flink

Forward talk:
Alibaba has decided to open source its in-house improvements

to

Flink,

called Blink!
First of

[jira] [Created] (FLINK-11417) Make access to ExecutionGraph single threaded from JobMaster main thread

2019-01-23 Thread Stefan Richter (JIRA)

Stefan Richter created FLINK-11417:
--

 Summary: Make access to ExecutionGraph single threaded from 
JobMaster main thread
 Key: FLINK-11417
 URL: https://issues.apache.org/jira/browse/FLINK-11417
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination
Reporter: Stefan Richter
Assignee: Stefan Richter


To simplify future developments in scheduling, we should re-design the 
threading model for interaction with the {{ExecutionGraph}} and its 
sub-components to a single threaded approach. In this model, expensive tasks 
will still be performed by background threads so that the JM main thread will 
never block. However, all resulting modifications and interaction with the EG 
itself should only go through the JM main thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [ANNOUNCE] Contributing Alibaba's Blink

2019-01-23 Thread Timo Walther

As far as I know it, we will not provide any binaries but only the 
source code. JAR files on Apache servers would need an official 
voting/release process. Interested users can build Blink themselves 
using `mvn clean package`.


@Stephan: Please correct me if I'm wrong.

Regards,
Timo

Am 23.01.19 um 11:16 schrieb Kurt Young:

Hi Timo,

What about the jar files, will blink's jar be uploaded to apache
repository? If not, i think it will be very inconvenient for users who
wants to try blink and view the documents if they need some help from doc.

Best,
Kurt


On Wed, Jan 23, 2019 at 6:09 PM Timo Walther  wrote:


Hi Kurt,

I would not make the Blink's documentation visible to users or search
engines via a website. Otherwise this would communicate that Blink is an
official release. I would suggest to put the Blink docs into `/docs` and
people can build it with `./docs/build.sh -pi` if there are interested.
I would not invest time into setting up a docs infrastructure.

Regards,
Timo

Am 23.01.19 um 08:56 schrieb Kurt Young:

Thanks @Stephan for this exciting announcement!

>From my point of view, i would prefer to use branch. It makes the

message

"Blink is pat of Flink" more straightforward and clear.

Except for the location of blink codes, there are some other questions

like

what version should should use, and where do we put blink's documents.
Currently, we choose to use "1.5.1-blink-r0" as blink's version since

blink

forked from Flink's 1.5.1. We also added some docs to blink just as Flink
did. Can blink use a website like
"https://ci.apache.org/projects/flink/flink-docs-release-1.7/; to put

all

blink's docs, change it to something like
https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ?

Best,
Kurt


On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng 

wrote:

Hi all,

@Stephan  Thanks a lot for driving these efforts. I think a lot of

people

is already waiting for this.
+1 for opening the blink source code.
Both a separate repository or a special branch is ok for me. Hopefully,
this will not last too long.

Best, Hequn


On Tue, Jan 22, 2019 at 11:35 PM Jark Wu  wrote:


Great news! Looking forward to the new wave of developments.

If Blink needs to be continuously updated, fix bugs, release versions,
maybe a separate repository is a better idea.

Best,
Jark

On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński 

wrote:

Hey!
I also think that creating the separate branch for Blink in Flink repo

is a

better idea than creating the fork as IMHO it will allow merging

changes

more easily.

Best Regards,
Dom.

wt., 22 sty 2019 o 10:09 Ufuk Celebi  napisał(a):


Hey Stephan and others,

thanks for the summary. I'm very excited about the outlined

improvements.

:-)

Separate branch vs. fork: I'm fine with either of the suggestions.
Depending on the expected strategy for merging the changes, expected
number of additional changes, etc., either one or the other approach
might be better suited.

– Ufuk

On Tue, Jan 22, 2019 at 9:20 AM Kurt Young  wrote:

Hi Driesprong,

Glad to hear that you're interested with blink's codes. Actually,

blink

only has one branch by itself, so either a separated repo or a

flink's

branch works for blink's code share.

Best,
Kurt


On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko


wrote:


Great news Stephan!

Why not make the code available by having a fork of Flink on

Alibaba's

Github account. This will allow us to do easy diff's in the

Github

UI

and

create PR's of cherry-picked commits if needed. I can imagine

that

the

Blink codebase has a lot of branches by itself, so just pushing a

couple of

branches to the main Flink repo is not ideal. Looking forward to

it!

Cheers, Fokko





Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <

wshaox...@gmail.com

:

big +1 to contribute Blink codebase directly into the Apache

Flink

project.

Looking forward to the new journey.

Regards,
Shaoxuan

On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <

xiaow...@gmail.com>

wrote:

   Thanks Stephan! We are hoping to make the process as

non-disruptive as

possible to the Flink community. Making the Blink codebase

public

is

the

first step that hopefully facilitates further discussions.
Xiaowei

  On Monday, January 21, 2019, 11:46:28 AM PST, Stephan

Ewen

<

se...@apache.org> wrote:

   Dear Flink Community!

Some of you may have heard it already from announcements or

from

a

Flink

Forward talk:
Alibaba has decided to open source its in-house improvements

to

Flink,

called Blink!
First of all, big thanks to team that developed these

improvements

and

made

this
contribution possible!

Blink has some very exciting enhancements, most prominently

on

the

Table

API/SQL side
and the unified execution of these programs. For batch

(bounded)

data,

the

SQL execution
has full TPC-DS coverage (which is a big deal), and the

execution

is

more

than 10x faster
than the current SQL runtime in Flink. Blink has also added

support for

catalogs,
improved the

Re: [ANNOUNCE] Contributing Alibaba's Blink

2019-01-23 Thread Kurt Young

Hi Timo,

What about the jar files, will blink's jar be uploaded to apache
repository? If not, i think it will be very inconvenient for users who
wants to try blink and view the documents if they need some help from doc.

Best,
Kurt


On Wed, Jan 23, 2019 at 6:09 PM Timo Walther  wrote:

> Hi Kurt,
>
> I would not make the Blink's documentation visible to users or search
> engines via a website. Otherwise this would communicate that Blink is an
> official release. I would suggest to put the Blink docs into `/docs` and
> people can build it with `./docs/build.sh -pi` if there are interested.
> I would not invest time into setting up a docs infrastructure.
>
> Regards,
> Timo
>
> Am 23.01.19 um 08:56 schrieb Kurt Young:
> > Thanks @Stephan for this exciting announcement!
> >
> > >From my point of view, i would prefer to use branch. It makes the
> message
> > "Blink is pat of Flink" more straightforward and clear.
> >
> > Except for the location of blink codes, there are some other questions
> like
> > what version should should use, and where do we put blink's documents.
> > Currently, we choose to use "1.5.1-blink-r0" as blink's version since
> blink
> > forked from Flink's 1.5.1. We also added some docs to blink just as Flink
> > did. Can blink use a website like
> > "https://ci.apache.org/projects/flink/flink-docs-release-1.7/; to put
> all
> > blink's docs, change it to something like
> > https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ?
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng 
> wrote:
> >
> >> Hi all,
> >>
> >> @Stephan  Thanks a lot for driving these efforts. I think a lot of
> people
> >> is already waiting for this.
> >> +1 for opening the blink source code.
> >> Both a separate repository or a special branch is ok for me. Hopefully,
> >> this will not last too long.
> >>
> >> Best, Hequn
> >>
> >>
> >> On Tue, Jan 22, 2019 at 11:35 PM Jark Wu  wrote:
> >>
> >>> Great news! Looking forward to the new wave of developments.
> >>>
> >>> If Blink needs to be continuously updated, fix bugs, release versions,
> >>> maybe a separate repository is a better idea.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński 
> wrote:
> >>>
>  Hey!
>  I also think that creating the separate branch for Blink in Flink repo
> >>> is a
>  better idea than creating the fork as IMHO it will allow merging
> >> changes
>  more easily.
> 
>  Best Regards,
>  Dom.
> 
>  wt., 22 sty 2019 o 10:09 Ufuk Celebi  napisał(a):
> 
> > Hey Stephan and others,
> >
> > thanks for the summary. I'm very excited about the outlined
> >>> improvements.
> > :-)
> >
> > Separate branch vs. fork: I'm fine with either of the suggestions.
> > Depending on the expected strategy for merging the changes, expected
> > number of additional changes, etc., either one or the other approach
> > might be better suited.
> >
> > – Ufuk
> >
> > On Tue, Jan 22, 2019 at 9:20 AM Kurt Young  wrote:
> >> Hi Driesprong,
> >>
> >> Glad to hear that you're interested with blink's codes. Actually,
> >>> blink
> >> only has one branch by itself, so either a separated repo or a
> >>> flink's
> >> branch works for blink's code share.
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko
> >>>  >> wrote:
> >>
> >>> Great news Stephan!
> >>>
> >>> Why not make the code available by having a fork of Flink on
>  Alibaba's
> >>> Github account. This will allow us to do easy diff's in the
> >> Github
> >>> UI
> > and
> >>> create PR's of cherry-picked commits if needed. I can imagine
> >> that
>  the
> >>> Blink codebase has a lot of branches by itself, so just pushing a
> > couple of
> >>> branches to the main Flink repo is not ideal. Looking forward to
> >>> it!
> >>> Cheers, Fokko
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <
>  wshaox...@gmail.com
> >> :
>  big +1 to contribute Blink codebase directly into the Apache
> >>> Flink
> >>> project.
>  Looking forward to the new journey.
> 
>  Regards,
>  Shaoxuan
> 
>  On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <
> >>> xiaow...@gmail.com>
> >>> wrote:
> >   Thanks Stephan! We are hoping to make the process as
> > non-disruptive as
> > possible to the Flink community. Making the Blink codebase
> >>> public
> > is
> >>> the
> > first step that hopefully facilitates further discussions.
> > Xiaowei
> >
> >  On Monday, January 21, 2019, 11:46:28 AM PST, Stephan
> >> Ewen
> >>> <
> > se...@apache.org> wrote:
> >
> >   Dear Flink Community!
> >
> > Some of you may have heard it already

Re: issue in the MetricReporterRegistry

2019-01-23 Thread Matthieu Bonneviot

Yes indeed, it affects the behavior on java 11.
I have created a bug in jira about it:
Summary: MetricReporter: "metrics.reporters" configuration has to be
provided for reporters to be taken into account
 Key: FLINK-11413
 URL: https://issues.apache.org/jira/browse/FLINK-11413
 Project: Flink
  Issue Type: Bug
  Components: Configuration
Affects Versions: 1.7.1

I will have time to fix it and submit a PR.

Regards
Matthieu Bonneviot


Le mer. 23 janv. 2019 à 10:41, Chesnay Schepler  a
écrit :

> nvm, it does indeed affect behavior :/
>
> On 23.01.2019 10:08, Chesnay Schepler wrote:
> > Just to make sure, this issue does not actually affect the behavior,
> > does it? Since we only use these as a filter for reporters to activate.
> >
> > On 21.01.2019 18:22, Matthieu Bonneviot wrote:
> >> Hi
> >>
> >> I don't have the jira permission but If you grant me the permission I
> >> could
> >> contribute to fix the following issue:
> >> When using java 11, "metrics.reporters" configuration has to be provided
> >> for reporters to be taken into account.
> >>
> >> The desired behavior:
> >> The MetricRegistryConfiguration looks for a conf like
> >> "metrics.reporters =
> >> foo,bar", if not found: all reporters that could be found in the
> >> configuration will be started.
> >>
> >> In the code is it done by
> >> Set includedReporters =
> >>
> reporterListPattern.splitAsStream(includedReportersString).collect(Collectors.toSet());
>
> >>
> >>
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryConfiguration.java#L134
> >>
> >>
> >> Definition of splitAsStream: If this pattern does not match any
> >> subsequence
> >> of the input then the resulting stream has just one element, namely the
> >> input sequence in string form.
> >> It means  reporterListPattern.splitAsStream("") should return "" and so
> >> includedReporters should have size 1 with "" as unique element
> >>
> >> However there is a misbehavior in some version of java 8, it does return
> >> empty stream.
> >> But working with java 11, the further code does not work: if
> >> (includedReporters.isEmpty() ||
> >> includedReporters.contains(reporterName))
> >>
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryConfiguration.java#L145
> >>
> >>
> >> I would suggest to filter empty string:
> >> Set includedReporters =
> >> reporterListPattern.splitAsStream(includedReportersString).*filter(s ->
> >> !s.isEmpty())*.collect(Collectors.toSet());
> >>
> >> Regards
> >> Matthieu Bonneviot
> >
> >
> >
>
>

-- 
Matthieu Bonneviot
Senior Engineer, DataDome
M +33 7 68 29 79 34  <+33+7+68+29+79+34>
E matthieu.bonnev...@datadome.co  
W www.datadome.co





DataDome
ranked 'Strong Performer' in latest Forrester Bot management report

Re: [ANNOUNCE] Contributing Alibaba's Blink

2019-01-23 Thread Timo Walther


Hi Kurt,

I would not make the Blink's documentation visible to users or search 
engines via a website. Otherwise this would communicate that Blink is an 
official release. I would suggest to put the Blink docs into `/docs` and 
people can build it with `./docs/build.sh -pi` if there are interested. 
I would not invest time into setting up a docs infrastructure.


Regards,
Timo

Am 23.01.19 um 08:56 schrieb Kurt Young:

Thanks @Stephan for this exciting announcement!

>From my point of view, i would prefer to use branch. It makes the message
"Blink is pat of Flink" more straightforward and clear.

Except for the location of blink codes, there are some other questions like
what version should should use, and where do we put blink's documents.
Currently, we choose to use "1.5.1-blink-r0" as blink's version since blink
forked from Flink's 1.5.1. We also added some docs to blink just as Flink
did. Can blink use a website like
"https://ci.apache.org/projects/flink/flink-docs-release-1.7/; to put all
blink's docs, change it to something like
https://ci.apache.org/projects/flink/flink-docs-blink-r0/ ?

Best,
Kurt


On Wed, Jan 23, 2019 at 10:55 AM Hequn Cheng  wrote:


Hi all,

@Stephan  Thanks a lot for driving these efforts. I think a lot of people
is already waiting for this.
+1 for opening the blink source code.
Both a separate repository or a special branch is ok for me. Hopefully,
this will not last too long.

Best, Hequn


On Tue, Jan 22, 2019 at 11:35 PM Jark Wu  wrote:


Great news! Looking forward to the new wave of developments.

If Blink needs to be continuously updated, fix bugs, release versions,
maybe a separate repository is a better idea.

Best,
Jark

On Tue, 22 Jan 2019 at 18:29, Dominik Wosiński  wrote:


Hey!
I also think that creating the separate branch for Blink in Flink repo

is a

better idea than creating the fork as IMHO it will allow merging

changes

more easily.

Best Regards,
Dom.

wt., 22 sty 2019 o 10:09 Ufuk Celebi  napisał(a):


Hey Stephan and others,

thanks for the summary. I'm very excited about the outlined

improvements.

:-)

Separate branch vs. fork: I'm fine with either of the suggestions.
Depending on the expected strategy for merging the changes, expected
number of additional changes, etc., either one or the other approach
might be better suited.

– Ufuk

On Tue, Jan 22, 2019 at 9:20 AM Kurt Young  wrote:

Hi Driesprong,

Glad to hear that you're interested with blink's codes. Actually,

blink

only has one branch by itself, so either a separated repo or a

flink's

branch works for blink's code share.

Best,
Kurt


On Tue, Jan 22, 2019 at 2:30 PM Driesprong, Fokko


wrote:


Great news Stephan!

Why not make the code available by having a fork of Flink on

Alibaba's

Github account. This will allow us to do easy diff's in the

Github

UI

and

create PR's of cherry-picked commits if needed. I can imagine

that

the

Blink codebase has a lot of branches by itself, so just pushing a

couple of

branches to the main Flink repo is not ideal. Looking forward to

it!

Cheers, Fokko





Op di 22 jan. 2019 om 03:48 schreef Shaoxuan Wang <

wshaox...@gmail.com

:

big +1 to contribute Blink codebase directly into the Apache

Flink

project.

Looking forward to the new journey.

Regards,
Shaoxuan

On Tue, Jan 22, 2019 at 3:52 AM Xiaowei Jiang <

xiaow...@gmail.com>

wrote:

  Thanks Stephan! We are hoping to make the process as

non-disruptive as

possible to the Flink community. Making the Blink codebase

public

is

the

first step that hopefully facilitates further discussions.
Xiaowei

 On Monday, January 21, 2019, 11:46:28 AM PST, Stephan

Ewen

<

se...@apache.org> wrote:

  Dear Flink Community!

Some of you may have heard it already from announcements or

from

a

Flink

Forward talk:
Alibaba has decided to open source its in-house improvements

to

Flink,

called Blink!
First of all, big thanks to team that developed these

improvements

and

made

this
contribution possible!

Blink has some very exciting enhancements, most prominently

on

the

Table

API/SQL side
and the unified execution of these programs. For batch

(bounded)

data,

the

SQL execution
has full TPC-DS coverage (which is a big deal), and the

execution

is

more

than 10x faster
than the current SQL runtime in Flink. Blink has also added

support for

catalogs,
improved the failover speed of batch queries and the resource

management.

It also
makes some good steps in the direction of more deeply

unifying

the

batch

and streaming
execution.

The proposal is to merge Blink's enhancements into Flink, to

give

Flink's

SQL/Table API and
execution a big boost in usability and performance.

Just to avoid any confusion: This is not a suggested change

of

focus to

batch processing,
nor would this break with any of the streaming architecture

and

vision

of

Flink.
This contribution follows very much the principle of "batch

is

a

special

case of streaming".
As a special case, batch makes

[jira] [Created] (FLINK-11416) DISTINCT on a JOIN inside of an UNION is not working

2019-01-23 Thread Elias Saalmann (JIRA)

Elias Saalmann created FLINK-11416:
--

 Summary: DISTINCT on a JOIN inside of an UNION is not working
 Key: FLINK-11416
 URL: https://issues.apache.org/jira/browse/FLINK-11416
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Affects Versions: 1.7.1, 1.7.0
Reporter: Elias Saalmann


I get an error (Error while applying rule AggregateUnionAggregateRule) when 
having a DISTINCT on a result of a JOIN within an UNION, e.g.

(
   SELECT DISTINCT c
   FROM a JOIN b ON a = b
 )
 UNION
 (
   SELECT c
   FROM c
 )

Full stacktrace:


Exception in thread "main" java.lang.RuntimeException: Error while applying 
rule AggregateUnionAggregateRule, args 
[rel#197:LogicalAggregate.NONE(input=rel#196:Subset#21.NONE,group=\{0}), 
rel#194:LogicalUnion.NONE(input#0=rel#188:Subset#18.NONE,input#1=rel#189:Subset#19.NONE,all=true),
 rel#221:LogicalAggregate.NONE(input=rel#184:Subset#16.NONE,group=\{2}), 
rel#164:LogicalTableScan.NONE(table=[_DataSetTable_2])] 
    at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:236)
 
    at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:646)
 
    at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:339) 
    at 
org.apache.flink.table.api.TableEnvironment.runVolcanoPlanner(TableEnvironment.scala:373)
 
    at 
org.apache.flink.table.api.TableEnvironment.optimizeLogicalPlan(TableEnvironment.scala:292)
 
    at 
org.apache.flink.table.api.BatchTableEnvironment.optimize(BatchTableEnvironment.scala:455)
 
    at 
org.apache.flink.table.api.BatchTableEnvironment.translate(BatchTableEnvironment.scala:475)
 
    at 
org.apache.flink.table.api.java.BatchTableEnvironment.toDataSet(BatchTableEnvironment.scala:165)
 
    at org.myorg.quickstart.TableJob1.main(TableJob1.java:51) 
Caused by: java.lang.IllegalArgumentException: Cannot compute compatible row 
type for arguments to set op: RecordType(VARCHAR(65536) a, VARCHAR(65536) b, 
VARCHAR(65536) c), RecordType(VARCHAR(65536) d) 
    at org.apache.calcite.rel.core.SetOp.deriveRowType(SetOp.java:111) 
    at 
org.apache.calcite.rel.AbstractRelNode.getRowType(AbstractRelNode.java:222) 
    at org.apache.calcite.tools.RelBuilder$Frame.(RelBuilder.java:2065) 
    at org.apache.calcite.tools.RelBuilder$Frame.(RelBuilder.java:2050) 
    at org.apache.calcite.tools.RelBuilder.push(RelBuilder.java:243) 
    at org.apache.calcite.tools.RelBuilder.setOp(RelBuilder.java:1370) 
    at org.apache.calcite.tools.RelBuilder.union(RelBuilder.java:1390) 
    at org.apache.calcite.tools.RelBuilder.union(RelBuilder.java:1380) 
    at 
org.apache.calcite.rel.rules.AggregateUnionAggregateRule.onMatch(AggregateUnionAggregateRule.java:130)
 
    at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:212)
 
    ... 8 more

Full example reproducing the error: 
[GitHub|https://github.com/lordon/flink_quickstart/blob/master/src/main/java/org/myorg/quickstart/TableJob1.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Applying for Flink contributor permission

2019-01-23 Thread Matthieu Bonneviot

Thanks a lot

Le mer. 23 janv. 2019 à 10:06, Robert Metzger  a
écrit :

> Hey Matthieu,
>
> welcome to the Flink community!
> I've added you as a contributor to our JIRA! Happy coding :)
>
>
>
> On Wed, Jan 23, 2019 at 9:39 AM Matthieu Bonneviot <
> matthieu.bonnev...@datadome.co> wrote:
>
> > Hi
> >
> > Please provide me contribution permission.
> > email: matthieu.bonnev...@datadome.co 
> > apache-username: mbonneviot
> >
> > Thank you
> >
>


-- 
Matthieu Bonneviot
Senior Engineer, DataDome
M +33 7 68 29 79 34  <+33+7+68+29+79+34>
E matthieu.bonnev...@datadome.co  
W www.datadome.co





DataDome
ranked 'Strong Performer' in latest Forrester Bot management report

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Timo Walther

+1 for Stephan's suggestion. For example, SQL connectors have never been
part of the main distribution and nobody complained about this so far. I
think what is more important than a big dist bundle is a helpful
"Downloads" page where users can easily find available filesystems,
connectors, metric repoters. Not everyone checks Maven central for
available JAR files. I just saw that we added a "Optional components"
section recently [1], we just need to make it more prominent. This is
also done for the SQL connectors and formats [2].

[1] https://flink.apache.org/downloads.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/connect.html#dependencies

Regards,
Timo

Am 23.01.19 um 10:07 schrieb Ufuk Celebi:

I like the idea of a leaner binary distribution. At the same time I
agree with Jamie that the current binary is quite convenient and
connection speeds should not be that big of a deal. Since the binary
distribution is one of the first entry points for users, I'd like to
keep it as user-friendly as possible.

What do you think about building a lean distribution by default and a
"full" distribution that still bundles all the optional dependencies
for releases? (If you don't think that's feasible I'm still +1 to only
go with the "lean dist" approach.)

– Ufuk

On Wed, Jan 23, 2019 at 9:36 AM Stephan Ewen wrote:

There are some points where a leaner approach could help.
There are many libraries and connectors that are currently being adding to
Flink, which makes the "include all" approach not completely feasible in
long run:

- Connectors: For a proper experience with the Shell/CLI (for example for
SQL) we need a lot of fat connector jars.
These come often for multiple versions, which alone accounts for 100s
of MBs of connector jars.
- The pre-bundled FileSystems are also on the verge of adding 100s of MBs
themselves.
- The metric reporters are bit by bit growing as well.

The following could be a compromise:

The flink-dist would include
- the core flink libraries (core, apis, runtime, etc.)
- yarn / mesos etc. adapters
- examples (the examples should be a small set of self-contained programs
without additional dependencies)
- default logging
- default metric reporter (jmx)
- shells (scala, sql)

The flink-dist would NOT include the following libs (and these would be
offered for individual download)
- Hadoop libs
- the pre-shaded file systems
- the pre-packaged SQL connectors
- additional metric reporters

On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang wrote:

Thanks Chesnay for raising this discussion thread. I think there are 3
major use scenarios for flink binary distribution.

1. Use it to set up standalone cluster
2. Use it to experience features of flink, such as via scala-shell,
sql-client
3. Downstream project use it to integrate with their system

I did a size estimation of flink dist folder, lib folder take around 100M
and opt folder take around 200M. Overall I agree to make a thin flink dist.
So the next problem is which components to drop. I check the opt folder,
and I think the filesystem components and metrics components could be moved
out. Because they are pluggable components and is only used in scenario 1 I
think (setting up standalone cluster). Other components like flink-table,
flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
still use it to try the features of flink. For me, scala-shell is the first
option to try new features of flink.

Fabian Hueske 于2019年1月18日周五 下午7:34写道：

Hi Chesnay,

Thank you for the proposal.
I think this is a good idea.
We follow a similar approach already for Hadoop dependencies and
connectors (although in application space).

Fabian

Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
ches...@apache.org>:

Hello,

the binary distribution that we release by now contains quite a lot of
optional components, including various filesystems, metric reporters and
libraries. Most users will only use a fraction of these, and as such
pretty much only increase the size of flink-dist.

With Flink growing more and more in scope I don't believe it to be
feasible to ship everything we have with every distribution, and instead
suggest more of a "pick-what-you-need" model, where flink-dist is rather
lean and additional components are downloaded separately and added by
the user.

This would primarily affect the /opt directory, but could also be
extended to cover flink-dist. For example, the yarn and mesos code could
be spliced out into separate jars that could be added to lib manually.

Let me know what you think.

Regards,

Chesnay

--
Best Regards

Jeff Zhang

[jira] [Created] (FLINK-11415) Introduce JobMasterServiceFactor for JobManagerRunner

2019-01-23 Thread Till Rohrmann (JIRA)

Till Rohrmann created FLINK-11415:
-

 Summary: Introduce JobMasterServiceFactor for JobManagerRunner
 Key: FLINK-11415
 URL: https://issues.apache.org/jira/browse/FLINK-11415
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Coordination
Affects Versions: 1.8.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.8.0


In order to better test the {{JobManagerRunner}}, I suggest to introduce a 
{{JobMasterServiceFactory}} which allows to control how the 
{{JobManagerRunner}} instantiates a {{JobMasterService}}. At the moment we 
create a {{JobMaster}} in the constructor of the {{JobManagerRunner}}. This 
unnecessarily couples these two components together and makes testing harder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11414) Introduce JobMasterService interface for the JobManagerRunner

2019-01-23 Thread Till Rohrmann (JIRA)

Till Rohrmann created FLINK-11414:
-

 Summary: Introduce JobMasterService interface for the 
JobManagerRunner
 Key: FLINK-11414
 URL: https://issues.apache.org/jira/browse/FLINK-11414
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Coordination
Affects Versions: 1.8.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.8.0


In order to better separate concerns I suggest to introduce a 
{{JobMasterService}} interface which only exposes the lifecycle methods the 
{{JobManagerRunner}} needs instead of directly using the {{JobMaster}}. That 
way, we can later instantiate the {{JobManagerRunner}} easier for testing 
purposes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: issue in the MetricReporterRegistry


nvm, it does indeed affect behavior :/

On 23.01.2019 10:08, Chesnay Schepler wrote:
Just to make sure, this issue does not actually affect the behavior, 
does it? Since we only use these as a filter for reporters to activate.


On 21.01.2019 18:22, Matthieu Bonneviot wrote:

Hi

I don't have the jira permission but If you grant me the permission I 
could

contribute to fix the following issue:
When using java 11, "metrics.reporters" configuration has to be provided
for reporters to be taken into account.

The desired behavior:
The MetricRegistryConfiguration looks for a conf like 
"metrics.reporters =

foo,bar", if not found: all reporters that could be found in the
configuration will be started.

In the code is it done by
Set includedReporters =
reporterListPattern.splitAsStream(includedReportersString).collect(Collectors.toSet()); 

https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryConfiguration.java#L134 



Definition of splitAsStream: If this pattern does not match any 
subsequence

of the input then the resulting stream has just one element, namely the
input sequence in string form.
It means  reporterListPattern.splitAsStream("") should return "" and so
includedReporters should have size 1 with "" as unique element

However there is a misbehavior in some version of java 8, it does return
empty stream.
But working with java 11, the further code does not work: if
(includedReporters.isEmpty() || 
includedReporters.contains(reporterName))
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryConfiguration.java#L145 



I would suggest to filter empty string:
Set includedReporters =
reporterListPattern.splitAsStream(includedReportersString).*filter(s ->
!s.isEmpty())*.collect(Collectors.toSet());

Regards
Matthieu Bonneviot

Re: [DISCUSS] Start new Review Process

Okay, cool! I'll let you know when the bot is ready in a test repo.
While you (and others) are testing it, I'll open a PR for the docs.

On Wed, Jan 23, 2019 at 10:15 AM Fabian Hueske  wrote:

> Oh, that's great news!
> In that case we can just close the PR and start with the bot right away.
> I think it would be good to extend the PR Review guide [1] with a section
> about the bot and how to use it.
>
> Fabian
>
> [1] https://flink.apache.org/reviewing-prs.html
>
> Am Mi., 23. Jan. 2019 um 10:03 Uhr schrieb Robert Metzger <
> rmetz...@apache.org>:
>
> > Hey,
> >
> > as I've mentioned already in the pull request, I have started
> implementing
> > a little bot for GitHub that tracks the checklist [1]
> > The bot is monitoring incoming pull requests. It creates a comment with
> the
> > checklist.
> > Reviewers can write a message to the bot (such as "@flinkbot approve
> > contribution"), then the bot will update the checklist comment.
> >
> > As an upcoming feature, I also plan to add a label to the pull request
> when
> > all the checklist conditions have been met.
> >
> > I hope to finish the bot today. After some initial testing, we can deploy
> > it with Flink (if there are no objections by the community).
> >
> >
> > [1] https://github.com/rmetzger/flink-community-tools
> >
> >
> > On Tue, Jan 22, 2019 at 3:48 PM Fabian Hueske  wrote:
> >
> > > Hi everyone,
> > >
> > > A few months ago the community discussed and agreed to improve the PR
> > > review process [1-4].
> > > The idea is to follow a checklist to avoid in-depth reviews on
> > > contributions that might not be accepted for other reasons. Thereby,
> > > reviewers and contributors do not spend their time on PRs that will not
> > be
> > > merged.
> > > The checklist consists of five points:
> > >
> > > 1. The contribution is well-described.
> > > 2. There is consensus that the contribution should go into to Flink.
> > > 3. [Does not need specific attention | Needs specific attention for X |
> > Has
> > > attention for X by Y]
> > > 4. The architectural approach is sound.
> > > 5. Overall code quality is good.
> > >
> > > Back then we added a review guide to the website [5] but did not put
> the
> > > new process in place yet. I would like to start this now.
> > > There is a PR [6] that adds the review checklist to the PR template.
> > > Committers who review add PR should follow the checklist and tick and
> > sign
> > > off the boxes by updating the PR description. For that committers need
> to
> > > be members of the ASF Github organization.
> > >
> > > If nobody has concerns, I'll merge the PR in a few days.
> > > Once the PR is merged, the reviews of all new PRs should follow the
> > > checklist.
> > >
> > > Best,
> > > Fabian
> > >
> > > [1]
> > >
> > >
> >
> https://lists.apache.org/thread.html/dcbe377eb477b531f49c462e90d8b1e50e0ff33c6efd296081c6934d@%3Cdev.flink.apache.org%3E
> > > [2]
> > >
> > >
> >
> https://lists.apache.org/thread.html/172aa6d12ed442ea4da9ed2a72fe0894c9be7408fb2e1b7b50dfcb8c@%3Cdev.flink.apache.org%3E
> > > [3]
> > >
> > >
> >
> https://lists.apache.org/thread.html/5e07c1be8078dd7b89d93c67b71defacff137f3df56ccf4adb04b4d7@%3Cdev.flink.apache.org%3E
> > > [4]
> > >
> > >
> >
> https://lists.apache.org/thread.html/d7fd1fe45949f7c706142c62de85d246c7f6a1485a186fd3e9dced01@%3Cdev.flink.apache.org%3E
> > > [5] https://flink.apache.org/reviewing-prs.html
> > > [6] https://github.com/apache/flink/pull/6873
> > >
> >
>

?????? Request for permission

2019-01-23 Thread Run

Hi Fabian,


Thank you so much for the kind help!


Best,
Liya Fan




--  --
??: "Fabian Hueske";
: 2019??1??23??(??) 5:06
??: "dev";

: Re: Request for permission



Hi Liya Fan,

Welcome to the Flink community!
I gave you contributor permissions for Jira.

Best, Fabian

Am Mi., 23. Jan. 2019 um 08:01 Uhr schrieb Run :

> Hi Guys,
>
>
> I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor?
> My JIRA ID is fan_li_ya.
>
>
> Thanks a lot.
>
>
> Best,
> Liya Fan

Re: [DISCUSS] Start new Review Process

2019-01-23 Thread Fabian Hueske

Oh, that's great news!
In that case we can just close the PR and start with the bot right away.
I think it would be good to extend the PR Review guide [1] with a section
about the bot and how to use it.

Fabian

[1] https://flink.apache.org/reviewing-prs.html

Am Mi., 23. Jan. 2019 um 10:03 Uhr schrieb Robert Metzger <
rmetz...@apache.org>:

> Hey,
>
> as I've mentioned already in the pull request, I have started implementing
> a little bot for GitHub that tracks the checklist [1]
> The bot is monitoring incoming pull requests. It creates a comment with the
> checklist.
> Reviewers can write a message to the bot (such as "@flinkbot approve
> contribution"), then the bot will update the checklist comment.
>
> As an upcoming feature, I also plan to add a label to the pull request when
> all the checklist conditions have been met.
>
> I hope to finish the bot today. After some initial testing, we can deploy
> it with Flink (if there are no objections by the community).
>
>
> [1] https://github.com/rmetzger/flink-community-tools
>
>
> On Tue, Jan 22, 2019 at 3:48 PM Fabian Hueske  wrote:
>
> > Hi everyone,
> >
> > A few months ago the community discussed and agreed to improve the PR
> > review process [1-4].
> > The idea is to follow a checklist to avoid in-depth reviews on
> > contributions that might not be accepted for other reasons. Thereby,
> > reviewers and contributors do not spend their time on PRs that will not
> be
> > merged.
> > The checklist consists of five points:
> >
> > 1. The contribution is well-described.
> > 2. There is consensus that the contribution should go into to Flink.
> > 3. [Does not need specific attention | Needs specific attention for X |
> Has
> > attention for X by Y]
> > 4. The architectural approach is sound.
> > 5. Overall code quality is good.
> >
> > Back then we added a review guide to the website [5] but did not put the
> > new process in place yet. I would like to start this now.
> > There is a PR [6] that adds the review checklist to the PR template.
> > Committers who review add PR should follow the checklist and tick and
> sign
> > off the boxes by updating the PR description. For that committers need to
> > be members of the ASF Github organization.
> >
> > If nobody has concerns, I'll merge the PR in a few days.
> > Once the PR is merged, the reviews of all new PRs should follow the
> > checklist.
> >
> > Best,
> > Fabian
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/dcbe377eb477b531f49c462e90d8b1e50e0ff33c6efd296081c6934d@%3Cdev.flink.apache.org%3E
> > [2]
> >
> >
> https://lists.apache.org/thread.html/172aa6d12ed442ea4da9ed2a72fe0894c9be7408fb2e1b7b50dfcb8c@%3Cdev.flink.apache.org%3E
> > [3]
> >
> >
> https://lists.apache.org/thread.html/5e07c1be8078dd7b89d93c67b71defacff137f3df56ccf4adb04b4d7@%3Cdev.flink.apache.org%3E
> > [4]
> >
> >
> https://lists.apache.org/thread.html/d7fd1fe45949f7c706142c62de85d246c7f6a1485a186fd3e9dced01@%3Cdev.flink.apache.org%3E
> > [5] https://flink.apache.org/reviewing-prs.html
> > [6] https://github.com/apache/flink/pull/6873
> >
>

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Ufuk Celebi

I like the idea of a leaner binary distribution. At the same time I
agree with Jamie that the current binary is quite convenient and
connection speeds should not be that big of a deal. Since the binary
distribution is one of the first entry points for users, I'd like to
keep it as user-friendly as possible.

What do you think about building a lean distribution by default and a
"full" distribution that still bundles all the optional dependencies
for releases? (If you don't think that's feasible I'm still +1 to only
go with the "lean dist" approach.)

– Ufuk

On Wed, Jan 23, 2019 at 9:36 AM Stephan Ewen  wrote:
>
> There are some points where a leaner approach could help.
> There are many libraries and connectors that are currently being adding to
> Flink, which makes the "include all" approach not completely feasible in
> long run:
>
>   - Connectors: For a proper experience with the Shell/CLI (for example for
> SQL) we need a lot of fat connector jars.
> These come often for multiple versions, which alone accounts for 100s
> of MBs of connector jars.
>   - The pre-bundled FileSystems are also on the verge of adding 100s of MBs
> themselves.
>   - The metric reporters are bit by bit growing as well.
>
> The following could be a compromise:
>
> The flink-dist would include
>   - the core flink libraries (core, apis, runtime, etc.)
>   - yarn / mesos  etc. adapters
>   - examples (the examples should be a small set of self-contained programs
> without additional dependencies)
>   - default logging
>   - default metric reporter (jmx)
>   - shells (scala, sql)
>
> The flink-dist would NOT include the following libs (and these would be
> offered for individual download)
>   - Hadoop libs
>   - the pre-shaded file systems
>   - the pre-packaged SQL connectors
>   - additional metric reporters
>
>
> On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang  wrote:
>
> > Thanks Chesnay for raising this discussion thread.  I think there are 3
> > major use scenarios for flink binary distribution.
> >
> > 1. Use it to set up standalone cluster
> > 2. Use it to experience features of flink, such as via scala-shell,
> > sql-client
> > 3. Downstream project use it to integrate with their system
> >
> > I did a size estimation of flink dist folder, lib folder take around 100M
> > and opt folder take around 200M. Overall I agree to make a thin flink dist.
> > So the next problem is which components to drop. I check the opt folder,
> > and I think the filesystem components and metrics components could be moved
> > out. Because they are pluggable components and is only used in scenario 1 I
> > think (setting up standalone cluster). Other components like flink-table,
> > flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
> > still use it to try the features of flink. For me, scala-shell is the first
> > option to try new features of flink.
> >
> >
> >
> > Fabian Hueske  于2019年1月18日周五 下午7:34写道：
> >
> >> Hi Chesnay,
> >>
> >> Thank you for the proposal.
> >> I think this is a good idea.
> >> We follow a similar approach already for Hadoop dependencies and
> >> connectors (although in application space).
> >>
> >> +1
> >>
> >> Fabian
> >>
> >> Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
> >> ches...@apache.org>:
> >>
> >>> Hello,
> >>>
> >>> the binary distribution that we release by now contains quite a lot of
> >>> optional components, including various filesystems, metric reporters and
> >>> libraries. Most users will only use a fraction of these, and as such
> >>> pretty much only increase the size of flink-dist.
> >>>
> >>> With Flink growing more and more in scope I don't believe it to be
> >>> feasible to ship everything we have with every distribution, and instead
> >>> suggest more of a "pick-what-you-need" model, where flink-dist is rather
> >>> lean and additional components are downloaded separately and added by
> >>> the user.
> >>>
> >>> This would primarily affect the /opt directory, but could also be
> >>> extended to cover flink-dist. For example, the yarn and mesos code could
> >>> be spliced out into separate jars that could be added to lib manually.
> >>>
> >>> Let me know what you think.
> >>>
> >>> Regards,
> >>>
> >>> Chesnay
> >>>
> >>>
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >

Re: issue in the MetricReporterRegistry

Just to make sure, this issue does not actually affect the behavior, 
does it? Since we only use these as a filter for reporters to activate.


On 21.01.2019 18:22, Matthieu Bonneviot wrote:

Hi

I don't have the jira permission but If you grant me the permission I could
contribute to fix the following issue:
When using java 11, "metrics.reporters" configuration has to be provided
for reporters to be taken into account.

The desired behavior:
The MetricRegistryConfiguration looks for a conf like "metrics.reporters =
foo,bar", if not found: all reporters that could be found in the
configuration will be started.

In the code is it done by
Set includedReporters =
reporterListPattern.splitAsStream(includedReportersString).collect(Collectors.toSet());
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryConfiguration.java#L134

Definition of splitAsStream: If this pattern does not match any subsequence
of the input then the resulting stream has just one element, namely the
input sequence in string form.
It means  reporterListPattern.splitAsStream("") should return "" and so
includedReporters should have size 1 with "" as unique element

However there is a misbehavior in some version of java 8, it does return
empty stream.
But working with java 11, the further code does not work: if
(includedReporters.isEmpty() || includedReporters.contains(reporterName))
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryConfiguration.java#L145

I would suggest to filter empty string:
Set includedReporters =
reporterListPattern.splitAsStream(includedReportersString).*filter(s ->
!s.isEmpty())*.collect(Collectors.toSet());

Regards
Matthieu Bonneviot

Re: Applying for Flink contributor permission

Hey Lavkesh,

welcome to the Flink community!
I've added you as a contributor to our JIRA! Happy coding :)

On Wed, Jan 23, 2019 at 9:32 AM Lavkesh Lahngir  wrote:

> Please provide me controbution permission.
> email: lavkes...@gmail.com
> apache-username: lavkesh
>
> Thank you
>

Re: Applying for Flink contributor permission

Hey Matthieu,

welcome to the Flink community!
I've added you as a contributor to our JIRA! Happy coding :)

On Wed, Jan 23, 2019 at 9:39 AM Matthieu Bonneviot <
matthieu.bonnev...@datadome.co> wrote:

> Hi
>
> Please provide me contribution permission.
> email: matthieu.bonnev...@datadome.co 
> apache-username: mbonneviot
>
> Thank you
>

Re: Request for permission

2019-01-23 Thread Fabian Hueske

Hi Liya Fan,

Welcome to the Flink community!
I gave you contributor permissions for Jira.

Best, Fabian

Am Mi., 23. Jan. 2019 um 08:01 Uhr schrieb Run :

> Hi Guys,
>
>
> I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor?
> My JIRA ID is fan_li_ya.
>
>
> Thanks a lot.
>
>
> Best,
> Liya Fan

Re: Issues regarding Table-API

2019-01-23 Thread Fabian Hueske

Hi Elias,

Q1: Can you post the exception that you receive?

Q2: This is already possible today by converting a Table into a DataSet and
registering that DataSet again as a Table.
Under the hood, the following is happening: As you said, Tables are views
(or logical plans). Whenever, a Table is converted into a DataSet (or when
it is emitted to a TableSink), the logical plan is optimized and translated
into DataSet operators.
By registering the DataSet again as a Table, the result of the DataSet can
be queried. Every query will be attached to the previously computed DataSet
operator, i.e., the operator is not replicated, but feeds its result into
multiple queries (or multiple times into the same query).

Best,
Fabian

Am Mi., 23. Jan. 2019 um 02:38 Uhr schrieb Kurt Young :

> Hi Elias,
>
> For your question 2: this is doable, i think it will be resolved in future
> version of Flink.
>
> Best,
> Kurt
>
>
> On Tue, Jan 15, 2019 at 10:35 PM Elias Saalmann <
> es45g...@studserv.uni-leipzig.de> wrote:
>
>> Hi there,
>>
>> I'm working on the Gradoop project at the University of Leipzig (
>> https://github.com/dbs-leipzig/gradoop). Currently we're using the
>> Batch-API - now we're investigating Table-API as an abstraction for
>> Batch-API. I found 2 issues I want to discuss:
>>
>> 1. I get an error (Error while applying rule AggregateUnionAggregateRule)
>> on compile time when having a DISTINCT on a result of a JOIN within an
>> UNION, e.g.
>>
>> (
>>   SELECT DISTINCT c
>>   FROM a JOIN b ON a = b
>> )
>> UNION
>> (
>>   SELECT c
>>   FROM c
>> )
>>
>> Java example:
>> https://gist.github.com/lordon/27fc5277b0d5abd58158f4ec40cda384
>>
>> 2. As we have large workflows, several parts of such a workflow are
>> reused at differents point within the workflow. For example: Two datasets
>> get scanned, INTERSECTED and JOINED to another dataset. The resulting
>> dataset is used as JOIN partner for six other datasets. Using Table-API the
>> resulting operator tree looks like:
>> [image: Workflow]
>>
>> As you can see, the whole part of INTERSECTING and JOINING is executed
>> for each reference. I guess this is because you decided to treat Flink
>> Tables as VIEWs which get recalculated on each reference. In fact this
>> doesn't make sense for our large workflows (note we're using the
>> BatchEnvironment only). Is there any chance to avoid that behavior? Is
>> there a possibility to allow Calcite to optimize/combine such common sub
>> trees in the operator tree?
>>
>> Thanks in advance!
>>
>> Best,
>> Elias
>>
>

Re: [DISCUSS] Start new Review Process

Hey,

as I've mentioned already in the pull request, I have started implementing
a little bot for GitHub that tracks the checklist [1]
The bot is monitoring incoming pull requests. It creates a comment with the
checklist.
Reviewers can write a message to the bot (such as "@flinkbot approve
contribution"), then the bot will update the checklist comment.

As an upcoming feature, I also plan to add a label to the pull request when
all the checklist conditions have been met.

I hope to finish the bot today. After some initial testing, we can deploy
it with Flink (if there are no objections by the community).


[1] https://github.com/rmetzger/flink-community-tools


On Tue, Jan 22, 2019 at 3:48 PM Fabian Hueske  wrote:

> Hi everyone,
>
> A few months ago the community discussed and agreed to improve the PR
> review process [1-4].
> The idea is to follow a checklist to avoid in-depth reviews on
> contributions that might not be accepted for other reasons. Thereby,
> reviewers and contributors do not spend their time on PRs that will not be
> merged.
> The checklist consists of five points:
>
> 1. The contribution is well-described.
> 2. There is consensus that the contribution should go into to Flink.
> 3. [Does not need specific attention | Needs specific attention for X | Has
> attention for X by Y]
> 4. The architectural approach is sound.
> 5. Overall code quality is good.
>
> Back then we added a review guide to the website [5] but did not put the
> new process in place yet. I would like to start this now.
> There is a PR [6] that adds the review checklist to the PR template.
> Committers who review add PR should follow the checklist and tick and sign
> off the boxes by updating the PR description. For that committers need to
> be members of the ASF Github organization.
>
> If nobody has concerns, I'll merge the PR in a few days.
> Once the PR is merged, the reviews of all new PRs should follow the
> checklist.
>
> Best,
> Fabian
>
> [1]
>
> https://lists.apache.org/thread.html/dcbe377eb477b531f49c462e90d8b1e50e0ff33c6efd296081c6934d@%3Cdev.flink.apache.org%3E
> [2]
>
> https://lists.apache.org/thread.html/172aa6d12ed442ea4da9ed2a72fe0894c9be7408fb2e1b7b50dfcb8c@%3Cdev.flink.apache.org%3E
> [3]
>
> https://lists.apache.org/thread.html/5e07c1be8078dd7b89d93c67b71defacff137f3df56ccf4adb04b4d7@%3Cdev.flink.apache.org%3E
> [4]
>
> https://lists.apache.org/thread.html/d7fd1fe45949f7c706142c62de85d246c7f6a1485a186fd3e9dced01@%3Cdev.flink.apache.org%3E
> [5] https://flink.apache.org/reviewing-prs.html
> [6] https://github.com/apache/flink/pull/6873
>

[jira] [Created] (FLINK-11413) MetricReporter: "metrics.reporters" configuration has to be provided for reporters to be taken into account

2019-01-23 Thread Matthieu Bonneviot (JIRA)

Matthieu Bonneviot created FLINK-11413:
--

 Summary: MetricReporter: "metrics.reporters" configuration has to 
be provided for reporters to be taken into account
 Key: FLINK-11413
 URL: https://issues.apache.org/jira/browse/FLINK-11413
 Project: Flink
  Issue Type: Bug
  Components: Configuration
Affects Versions: 1.7.1
Reporter: Matthieu Bonneviot


When using java 11, "metrics.reporters" configuration has to be provided for 
reporters to be taken into account.
 
The desired behavior:
The MetricRegistryConfiguration looks for a conf like "metrics.reporters = 
foo,bar", if not found: all reporters that could be found in the configuration 
will be started.
 
In the code is it done bySet includedReporters = 
reporterListPattern.splitAsStream(includedReportersString).collect(Collectors.toSet());
[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryConfiguration.java#L134]
 Definition of splitAsStream: If this pattern does not match any subsequence of 
the input then the resulting stream has just one element, namely the input 
sequence in string form.
It means  reporterListPattern.splitAsStream("") should return "" and so 
includedReporters should have size 1 with "" as unique element


However there is a misbehavior in some version of java 8, it does return empty 
stream.
But working with java 11, the further code does not work: if 
(includedReporters.isEmpty() || includedReporters.contains(reporterName))
[https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryConfiguration.java#L145]


I would suggest to filter empty string:
Set includedReporters = 
reporterListPattern.splitAsStream(includedReportersString).*filter(s -> 
!s.isEmpty())*.collect(Collectors.toSet());



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Applying for Flink contributor permission

2019-01-23 Thread Matthieu Bonneviot

Hi

Please provide me contribution permission.
email: matthieu.bonnev...@datadome.co 
apache-username: mbonneviot

Thank you

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-23 Thread Stephan Ewen

There are some points where a leaner approach could help.
There are many libraries and connectors that are currently being adding to
Flink, which makes the "include all" approach not completely feasible in
long run:

  - Connectors: For a proper experience with the Shell/CLI (for example for
SQL) we need a lot of fat connector jars.
These come often for multiple versions, which alone accounts for 100s
of MBs of connector jars.
  - The pre-bundled FileSystems are also on the verge of adding 100s of MBs
themselves.
  - The metric reporters are bit by bit growing as well.

The following could be a compromise:

The flink-dist would include
  - the core flink libraries (core, apis, runtime, etc.)
  - yarn / mesos  etc. adapters
  - examples (the examples should be a small set of self-contained programs
without additional dependencies)
  - default logging
  - default metric reporter (jmx)
  - shells (scala, sql)

The flink-dist would NOT include the following libs (and these would be
offered for individual download)
  - Hadoop libs
  - the pre-shaded file systems
  - the pre-packaged SQL connectors
  - additional metric reporters


On Tue, Jan 22, 2019 at 3:19 AM Jeff Zhang  wrote:

> Thanks Chesnay for raising this discussion thread.  I think there are 3
> major use scenarios for flink binary distribution.
>
> 1. Use it to set up standalone cluster
> 2. Use it to experience features of flink, such as via scala-shell,
> sql-client
> 3. Downstream project use it to integrate with their system
>
> I did a size estimation of flink dist folder, lib folder take around 100M
> and opt folder take around 200M. Overall I agree to make a thin flink dist.
> So the next problem is which components to drop. I check the opt folder,
> and I think the filesystem components and metrics components could be moved
> out. Because they are pluggable components and is only used in scenario 1 I
> think (setting up standalone cluster). Other components like flink-table,
> flink-ml, flnk-gellay, we should still keep them IMHO, because new user may
> still use it to try the features of flink. For me, scala-shell is the first
> option to try new features of flink.
>
>
>
> Fabian Hueske  于2019年1月18日周五 下午7:34写道：
>
>> Hi Chesnay,
>>
>> Thank you for the proposal.
>> I think this is a good idea.
>> We follow a similar approach already for Hadoop dependencies and
>> connectors (although in application space).
>>
>> +1
>>
>> Fabian
>>
>> Am Fr., 18. Jan. 2019 um 10:59 Uhr schrieb Chesnay Schepler <
>> ches...@apache.org>:
>>
>>> Hello,
>>>
>>> the binary distribution that we release by now contains quite a lot of
>>> optional components, including various filesystems, metric reporters and
>>> libraries. Most users will only use a fraction of these, and as such
>>> pretty much only increase the size of flink-dist.
>>>
>>> With Flink growing more and more in scope I don't believe it to be
>>> feasible to ship everything we have with every distribution, and instead
>>> suggest more of a "pick-what-you-need" model, where flink-dist is rather
>>> lean and additional components are downloaded separately and added by
>>> the user.
>>>
>>> This would primarily affect the /opt directory, but could also be
>>> extended to cover flink-dist. For example, the yarn and mesos code could
>>> be spliced out into separate jars that could be added to lib manually.
>>>
>>> Let me know what you think.
>>>
>>> Regards,
>>>
>>> Chesnay
>>>
>>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Applying for Flink contributor permission