[jira] [Created] (FLINK-10427) Port JobSubmitTest to new code base

2018-09-25 Thread tison (JIRA)
tison created FLINK-10427:
-

 Summary: Port JobSubmitTest to new code base
 Key: FLINK-10427
 URL: https://issues.apache.org/jira/browse/FLINK-10427
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.7.0
Reporter: tison
Assignee: tison
 Fix For: 1.7.0


Port {{JobSubmitTest}} to new code base.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Codespeed deployment for Flink

2018-09-25 Thread Peter Huang
It is a great tool. Thanks for the contribution.

On Tue, Sep 25, 2018 at 11:39 AM Jin Sun  wrote:

> Great tool!
>
> > On Sep 24, 2018, at 10:59 PM, Zhijiang(wangzhijiang999) <
> wangzhijiang...@aliyun.com.INVALID> wrote:
> >
> > Thanks @Piotr Nowojski  and @Nico Kruber for the good job!
> >
> > I already benefit from this benchmark in the previous PRs. Wish the
> visualization tool becoming stronger to benefit more for the community!
> >
> > Best,
> > Zhijiang
> > --
> > 发件人:Piotr Nowojski 
> > 发送时间:2018年9月21日(星期五) 22:59
> > 收件人:dev 
> > 抄 送:Nico Kruber 
> > 主 题:Codespeed deployment for Flink
> >
> > Hello community,
> >
> > For almost a year in data Artisans Nico and I were maintaining a setup
> > that continuously evaluates Flink with benchmarks defined at
> > https://github.com/dataArtisans/flink-benchmarks <
> https://github.com/dataArtisans/flink-benchmarks>. With growing interest
> > and after proving useful a couple of times, we have finally decided to
> > publish the web UI layer of this setup. Currently it is accessible via
> > the following (maybe not so?) temporarily url:
> >
> > http://codespeed.dak8s.net:8000 
> >
> > This is a simple web UI to present performance changes over past and
> > present commits to Apache Flink. It only has a couple of views and the
> > most useful ones are:
> >
> > 1. Timeline
> > 2. Comparison (I recommend to use normalization)
> >
> > Timeline is useful for spotting unintended regressions or unexpected
> > improvements. It is being updated every six hours.
> > Comparison is useful for comparing a given branch (for example a pending
> > PR) with the master branch. More about that later.
> >
> > The codespeed project on it’s own is just a presentation layer. As
> > mentioned before, the only currently available benchmarks are defined in
> > the flink-benchmarks repository and they are executed periodically or on
> > demand by Jenkins on a single bare metal machine. The current setup
> > limits us only to micro benchmarks (they are easier to
> > setup/develop/maintain and have a quicker feedback loop compared to
> > cluster benchmarks) but there is no reason preventing us from setting up
> > other kinds of benchmarks and upload their results to our codespeed
> > instance as well.
> >
> > Regarding the comparison view. Currently data Artisans’ Flink mirror
> > repository at https://github.com/dataArtisans/flink <
> https://github.com/dataArtisans/flink> is configured to
> > trigger benchmark runs on every commit/change that happens on the
> > benchmark-request branch (We chose to use dataArtisans' repository here
> > because we needed a custom GitHub hook that we couldn’t add to the
> > apache/flink repository). Benchmarking usually takes between one and two
> > hours. One obvious limitation at the moment is that there is only one
> > comparison view, with one comparison branch, so trying to compare two
> > PRs at the same time is impossible. However we can tackle
> > this problem once it will become a real issue, not only a theoretical
> one.
> >
> > Piotrek & Nico
> >
>
>


Re: Codespeed deployment for Flink

2018-09-25 Thread Jin Sun
Great tool!

> On Sep 24, 2018, at 10:59 PM, Zhijiang(wangzhijiang999) 
>  wrote:
> 
> Thanks @Piotr Nowojski  and @Nico Kruber for the good job!
> 
> I already benefit from this benchmark in the previous PRs. Wish the 
> visualization tool becoming stronger to benefit more for the community!
> 
> Best,
> Zhijiang
> --
> 发件人:Piotr Nowojski 
> 发送时间:2018年9月21日(星期五) 22:59
> 收件人:dev 
> 抄 送:Nico Kruber 
> 主 题:Codespeed deployment for Flink
> 
> Hello community,
> 
> For almost a year in data Artisans Nico and I were maintaining a setup
> that continuously evaluates Flink with benchmarks defined at
> https://github.com/dataArtisans/flink-benchmarks 
> . With growing interest
> and after proving useful a couple of times, we have finally decided to
> publish the web UI layer of this setup. Currently it is accessible via
> the following (maybe not so?) temporarily url:
> 
> http://codespeed.dak8s.net:8000 
> 
> This is a simple web UI to present performance changes over past and
> present commits to Apache Flink. It only has a couple of views and the
> most useful ones are:
> 
> 1. Timeline
> 2. Comparison (I recommend to use normalization)
> 
> Timeline is useful for spotting unintended regressions or unexpected
> improvements. It is being updated every six hours.
> Comparison is useful for comparing a given branch (for example a pending
> PR) with the master branch. More about that later.
> 
> The codespeed project on it’s own is just a presentation layer. As
> mentioned before, the only currently available benchmarks are defined in
> the flink-benchmarks repository and they are executed periodically or on
> demand by Jenkins on a single bare metal machine. The current setup
> limits us only to micro benchmarks (they are easier to
> setup/develop/maintain and have a quicker feedback loop compared to
> cluster benchmarks) but there is no reason preventing us from setting up 
> other kinds of benchmarks and upload their results to our codespeed 
> instance as well.
> 
> Regarding the comparison view. Currently data Artisans’ Flink mirror
> repository at https://github.com/dataArtisans/flink 
>  is configured to
> trigger benchmark runs on every commit/change that happens on the
> benchmark-request branch (We chose to use dataArtisans' repository here
> because we needed a custom GitHub hook that we couldn’t add to the
> apache/flink repository). Benchmarking usually takes between one and two
> hours. One obvious limitation at the moment is that there is only one
> comparison view, with one comparison branch, so trying to compare two
> PRs at the same time is impossible. However we can tackle
> this problem once it will become a real issue, not only a theoretical one.
> 
> Piotrek & Nico
> 



Re: Enrich testing doc with more unit test examples using AbstractStreamOperator

2018-09-25 Thread Ken Krugler
Hi Tony,

I think this would be great - we’ve been building out tests using 
AbstractStreamOperator, and the lack of documentation has made it challenging.

For example, there was this exchange I had with Piotr about a month ago:

> You made a small mistake when restoring from state using test harness, that I 
> myself have also done in the past. Problem is with an ordering of those calls:
> 
> result.open();
> if (savedState != null) {
> result.initializeState(savedState);
> }
> 
> Open is supposed to be called after initializeState, and if you look into the 
> code of AbstractStreamOperatorTestHarness#open, if it is called before 
> initialize, it will initialize harness without any state.
> 
> Unfortunate is that this is implicit behaviour that doesn’t throw any error 
> (test harness is not part of a Flink’s public api). I will try to fix this: 
> https://issues.apache.org/jira/browse/FLINK-10159 
> 
— Ken

> On Sep 25, 2018, at 3:30 AM, Tony Wei  wrote:
> 
> Hi all,
> 
> It seems that there are more and more users from user mailing list ask how
> to do unit test with Flink
> features like states or timer. And the community usually tends to suggest
> them using
> `AbstractStreamOperator` and provide an example from Flink github repo.
> Here I sort out some
> examples and write them down in the testing documentation [1]. And I would
> link to contribute back
> to the Flink.
> 
> The reason why I ask it first in dev mailing list is that
> `AbstractStreamOperator` is an internal API and
> could be changed at any time. I'm not sure if it is worth to provide these
> examples on testing
> document, so I want to collect some feedbacks before I go to open a JIRA
> ticket.
> 
> If this is feasible and valuable, then I will open the corresponding JIRA
> ticket and we can discuss
> more details of what examples are good to have in the document or how to
> structure the content.
> 
> I would really appreciate any feedback from you. Thanks in advance.
> 
> Best Regards,
> Tony Wei
> 
> [1]
> https://github.com/apache/flink/compare/master...tony810430:flink-testing-doc

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra



Re: [DISCUSS] [Contributing] (2) - Review Steps

2018-09-25 Thread Tzu-Li Chen
I agree with Chesnay that we don't guarantee (quick) review of a PR at the
project level. As ASF statement[1]:

> Please show some patience with the developers if your patch is not
applied as fast as you'd like or a developer asks you to make changes to
the patch. If you do not receive any feedback in a reasonable amount of
time (say a week or two), feel free to send a follow-up e-mail to the
developer list. Open Source developers are all volunteers, often doing the
development in their spare time.

However, an open source community shows its friendliness to contributors.
Thus contributors believe their contribution would be take care of, even be
rejected with a reason; project members are thought kind to provide help to
the process.

Just like this thread kicked off, it is glad to see that Flink community
try best to help its contributors and committers, then take advantage of
"open source".

Best,
tison.

[1] http://www.apache.org/dev/contributors#patches


Chesnay Schepler  于2018年9月25日周二 下午11:21写道:

> There is no guarantee that a PR will be looked at nor is it possible to
> provide this in any way on the project level.
>
> As far as Apache is concerned all contributors/committers etc. work
> voluntarily, and
> as such assigning work (which includes ownership if it implies such) or
> similar is simply not feasible.
>
> On 25.09.2018 16:54, Thomas Weise wrote:
> > I think that all discussion/coordination related to a contribution / PR
> > should be handled through the official project channel.
> >
> > I would also prefer that there are no designated "owners" and "experts",
> > for the reasons Fabian mentioned.
> >
> > Ideally there is no need to have "suggested reviewers" either, but then
> > what will be the process to ensure that PRs will be looked at?
> >
> > Thanks,
> > Thomas
> >
> >
> >
> > On Tue, Sep 25, 2018 at 6:17 AM Tzu-Li Chen 
> wrote:
> >
> >> Hi Fabian,
> >>
> >> You convinced me. I miss the advantage we can take from mailing lists.
> >>
> >> Now I am of the same opinion.
> >>
> >> Best,
> >> tison.
> >>
> >>
> >> Fabian Hueske  于2018年9月25日周二 下午3:01写道:
> >>
> >>> Hi,
> >>>
> >>> I think questions about Flink should be posted on the public mailing
> >> lists
> >>> instead of asking just a single expert.
> >>>
> >>> There's many reasons for that:
> >>> * usually more than one person can answer the question (what if the
> >> expert
> >>> is not available?)
> >>> * non-committers can join the discussion and contribute to the
> community
> >>> (how can they become experts otherwise?)
> >>> * the knowledge is shared on the mailing list (helps in cases when only
> >> one
> >>> person can answer the question)
> >>>
> >>> Last but not least, my concern is that committers for popular
> >> contribution
> >>> areas would be flooded with requests.
> >>> Even without being listed as a "component expert", I cannot handle all
> >>> review requests directed at me.
> >>> I work on issues (PR reviews, my contributions, discussions) that I
> deem
> >>> important and being constantly pinged does not really help to speed
> >> things
> >>> up.
> >>> There are of course cases when it is important to be notified, but IMO
> >>> chances that those get the right attention decrease with the number of
> >>> requests.
> >>>
> >>> Best, Fabian
> >>>
> >>>
> >>>
> >>>
> >>> Am Di., 25. Sep. 2018 um 04:10 Uhr schrieb Tzu-Li Chen <
> >>> wander4...@gmail.com
>  :
>  Thanks for start the discussion Stephan!
> 
>  (1) Do we agree on the five basic steps below?*
>  +1 to the five steps and making the third question in the proposal the
>  first.
> 
>  (2) How do we understand that consensus is reached about adding the
>  feature?
>  +1 to lazy consensus with one committer's +1
> 
>  (3) To answer the question whether a PR needs special attention
> 
>  Contributor can ask for special attention, which is treated as a
>  suggestion.
>  Committer can ask for another committers' attention, either for advice
> >> or
>  transfer
>  the right of decision.
> 
>  IMO it is quite help to add a page about "component experts", attach
> or
>  link  it
>  from README. This would be a really helpful information to new
> >>> contributors
>  so that they know to whom he can cc or ask for advice. Besides it
> would
>  be helpful for those who want to know more about the mechanism
> >> underneath
>  Flink, now they know with whom they can consult.
> 
>  Best,
>  tison.
> 
>
>


Re: [DISCUSS] [Contributing] (2) - Review Steps

2018-09-25 Thread Stephan Ewen
Still, even with a group of volunteers coordinating well, it is possible to
do better than we currently do, which is the goal.
No hard guarantees, agreed, but reasonable estimates and rules-of-thumbs
can work well...

On Tue, Sep 25, 2018 at 5:21 PM Chesnay Schepler  wrote:

> There is no guarantee that a PR will be looked at nor is it possible to
> provide this in any way on the project level.
>
> As far as Apache is concerned all contributors/committers etc. work
> voluntarily, and
> as such assigning work (which includes ownership if it implies such) or
> similar is simply not feasible.
>
> On 25.09.2018 16:54, Thomas Weise wrote:
> > I think that all discussion/coordination related to a contribution / PR
> > should be handled through the official project channel.
> >
> > I would also prefer that there are no designated "owners" and "experts",
> > for the reasons Fabian mentioned.
> >
> > Ideally there is no need to have "suggested reviewers" either, but then
> > what will be the process to ensure that PRs will be looked at?
> >
> > Thanks,
> > Thomas
> >
> >
> >
> > On Tue, Sep 25, 2018 at 6:17 AM Tzu-Li Chen 
> wrote:
> >
> >> Hi Fabian,
> >>
> >> You convinced me. I miss the advantage we can take from mailing lists.
> >>
> >> Now I am of the same opinion.
> >>
> >> Best,
> >> tison.
> >>
> >>
> >> Fabian Hueske  于2018年9月25日周二 下午3:01写道:
> >>
> >>> Hi,
> >>>
> >>> I think questions about Flink should be posted on the public mailing
> >> lists
> >>> instead of asking just a single expert.
> >>>
> >>> There's many reasons for that:
> >>> * usually more than one person can answer the question (what if the
> >> expert
> >>> is not available?)
> >>> * non-committers can join the discussion and contribute to the
> community
> >>> (how can they become experts otherwise?)
> >>> * the knowledge is shared on the mailing list (helps in cases when only
> >> one
> >>> person can answer the question)
> >>>
> >>> Last but not least, my concern is that committers for popular
> >> contribution
> >>> areas would be flooded with requests.
> >>> Even without being listed as a "component expert", I cannot handle all
> >>> review requests directed at me.
> >>> I work on issues (PR reviews, my contributions, discussions) that I
> deem
> >>> important and being constantly pinged does not really help to speed
> >> things
> >>> up.
> >>> There are of course cases when it is important to be notified, but IMO
> >>> chances that those get the right attention decrease with the number of
> >>> requests.
> >>>
> >>> Best, Fabian
> >>>
> >>>
> >>>
> >>>
> >>> Am Di., 25. Sep. 2018 um 04:10 Uhr schrieb Tzu-Li Chen <
> >>> wander4...@gmail.com
>  :
>  Thanks for start the discussion Stephan!
> 
>  (1) Do we agree on the five basic steps below?*
>  +1 to the five steps and making the third question in the proposal the
>  first.
> 
>  (2) How do we understand that consensus is reached about adding the
>  feature?
>  +1 to lazy consensus with one committer's +1
> 
>  (3) To answer the question whether a PR needs special attention
> 
>  Contributor can ask for special attention, which is treated as a
>  suggestion.
>  Committer can ask for another committers' attention, either for advice
> >> or
>  transfer
>  the right of decision.
> 
>  IMO it is quite help to add a page about "component experts", attach
> or
>  link  it
>  from README. This would be a really helpful information to new
> >>> contributors
>  so that they know to whom he can cc or ask for advice. Besides it
> would
>  be helpful for those who want to know more about the mechanism
> >> underneath
>  Flink, now they know with whom they can consult.
> 
>  Best,
>  tison.
> 
>
>


Re: [DISCUSS] [Contributing] (2) - Review Steps

2018-09-25 Thread Chesnay Schepler
There is no guarantee that a PR will be looked at nor is it possible to 
provide this in any way on the project level.


As far as Apache is concerned all contributors/committers etc. work 
voluntarily, and
as such assigning work (which includes ownership if it implies such) or 
similar is simply not feasible.


On 25.09.2018 16:54, Thomas Weise wrote:

I think that all discussion/coordination related to a contribution / PR
should be handled through the official project channel.

I would also prefer that there are no designated "owners" and "experts",
for the reasons Fabian mentioned.

Ideally there is no need to have "suggested reviewers" either, but then
what will be the process to ensure that PRs will be looked at?

Thanks,
Thomas



On Tue, Sep 25, 2018 at 6:17 AM Tzu-Li Chen  wrote:


Hi Fabian,

You convinced me. I miss the advantage we can take from mailing lists.

Now I am of the same opinion.

Best,
tison.


Fabian Hueske  于2018年9月25日周二 下午3:01写道:


Hi,

I think questions about Flink should be posted on the public mailing

lists

instead of asking just a single expert.

There's many reasons for that:
* usually more than one person can answer the question (what if the

expert

is not available?)
* non-committers can join the discussion and contribute to the community
(how can they become experts otherwise?)
* the knowledge is shared on the mailing list (helps in cases when only

one

person can answer the question)

Last but not least, my concern is that committers for popular

contribution

areas would be flooded with requests.
Even without being listed as a "component expert", I cannot handle all
review requests directed at me.
I work on issues (PR reviews, my contributions, discussions) that I deem
important and being constantly pinged does not really help to speed

things

up.
There are of course cases when it is important to be notified, but IMO
chances that those get the right attention decrease with the number of
requests.

Best, Fabian




Am Di., 25. Sep. 2018 um 04:10 Uhr schrieb Tzu-Li Chen <
wander4...@gmail.com

:
Thanks for start the discussion Stephan!

(1) Do we agree on the five basic steps below?*
+1 to the five steps and making the third question in the proposal the
first.

(2) How do we understand that consensus is reached about adding the
feature?
+1 to lazy consensus with one committer's +1

(3) To answer the question whether a PR needs special attention

Contributor can ask for special attention, which is treated as a
suggestion.
Committer can ask for another committers' attention, either for advice

or

transfer
the right of decision.

IMO it is quite help to add a page about "component experts", attach or
link  it
from README. This would be a really helpful information to new

contributors

so that they know to whom he can cc or ask for advice. Besides it would
be helpful for those who want to know more about the mechanism

underneath

Flink, now they know with whom they can consult.

Best,
tison.





[jira] [Created] (FLINK-10426) Port TaskTest to new code base

2018-09-25 Thread tison (JIRA)
tison created FLINK-10426:
-

 Summary: Port TaskTest to new code base
 Key: FLINK-10426
 URL: https://issues.apache.org/jira/browse/FLINK-10426
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.7.0
Reporter: tison
Assignee: tison
 Fix For: 1.7.0


Port {{TaskTest}} to new code base



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10425) taskmaster.host is not respected

2018-09-25 Thread Andrew Kowpak (JIRA)
Andrew Kowpak created FLINK-10425:
-

 Summary: taskmaster.host is not respected
 Key: FLINK-10425
 URL: https://issues.apache.org/jira/browse/FLINK-10425
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Affects Versions: 1.6.1
Reporter: Andrew Kowpak


The documentation states that taskmanager.host can be set to override the 
discovered hostname, however, setting this value has no effect.

Looking at the code, the value never seems to be used.  Instead, the deprecated 
taskmanager.hostname is still used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] [Contributing] (2) - Review Steps

2018-09-25 Thread Thomas Weise
I think that all discussion/coordination related to a contribution / PR
should be handled through the official project channel.

I would also prefer that there are no designated "owners" and "experts",
for the reasons Fabian mentioned.

Ideally there is no need to have "suggested reviewers" either, but then
what will be the process to ensure that PRs will be looked at?

Thanks,
Thomas



On Tue, Sep 25, 2018 at 6:17 AM Tzu-Li Chen  wrote:

> Hi Fabian,
>
> You convinced me. I miss the advantage we can take from mailing lists.
>
> Now I am of the same opinion.
>
> Best,
> tison.
>
>
> Fabian Hueske  于2018年9月25日周二 下午3:01写道:
>
> > Hi,
> >
> > I think questions about Flink should be posted on the public mailing
> lists
> > instead of asking just a single expert.
> >
> > There's many reasons for that:
> > * usually more than one person can answer the question (what if the
> expert
> > is not available?)
> > * non-committers can join the discussion and contribute to the community
> > (how can they become experts otherwise?)
> > * the knowledge is shared on the mailing list (helps in cases when only
> one
> > person can answer the question)
> >
> > Last but not least, my concern is that committers for popular
> contribution
> > areas would be flooded with requests.
> > Even without being listed as a "component expert", I cannot handle all
> > review requests directed at me.
> > I work on issues (PR reviews, my contributions, discussions) that I deem
> > important and being constantly pinged does not really help to speed
> things
> > up.
> > There are of course cases when it is important to be notified, but IMO
> > chances that those get the right attention decrease with the number of
> > requests.
> >
> > Best, Fabian
> >
> >
> >
> >
> > Am Di., 25. Sep. 2018 um 04:10 Uhr schrieb Tzu-Li Chen <
> > wander4...@gmail.com
> > >:
> >
> > > Thanks for start the discussion Stephan!
> > >
> > > (1) Do we agree on the five basic steps below?*
> > > +1 to the five steps and making the third question in the proposal the
> > > first.
> > >
> > > (2) How do we understand that consensus is reached about adding the
> > > feature?
> > > +1 to lazy consensus with one committer's +1
> > >
> > > (3) To answer the question whether a PR needs special attention
> > >
> > > Contributor can ask for special attention, which is treated as a
> > > suggestion.
> > > Committer can ask for another committers' attention, either for advice
> or
> > > transfer
> > > the right of decision.
> > >
> > > IMO it is quite help to add a page about "component experts", attach or
> > > link  it
> > > from README. This would be a really helpful information to new
> > contributors
> > > so that they know to whom he can cc or ask for advice. Besides it would
> > > be helpful for those who want to know more about the mechanism
> underneath
> > > Flink, now they know with whom they can consult.
> > >
> > > Best,
> > > tison.
> > >
> >
>


[jira] [Created] (FLINK-10424) Inconsistency between JsonSchemaConveerter and FlinkTypeFactory

2018-09-25 Thread JIRA
Dominik Wosiński created FLINK-10424:


 Summary: Inconsistency between JsonSchemaConveerter and 
FlinkTypeFactory
 Key: FLINK-10424
 URL: https://issues.apache.org/jira/browse/FLINK-10424
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.6.0
Reporter: Dominik Wosiński
Assignee: Dominik Wosiński


There is still an inconsistency between _JsonSchemaConverter_ and 
_FlinkTypeFactory_ in case of using JsonSchema with _integer_ type field. 
_JsonSchemaConverter_ will return BigInteger Type Information for _integer_, 
but _FlinkTypeFactory_ currently does not support BigInteger Type Info and thus 
an exception will be thrown. 

Two possible ways of solving this issue are possible:
  - allow using _BigInteger_ Type Info in _FlinkTypeFactory_

  _-_ change _JsonSchemaConverter,_ so it returns Integer Type Info instead.


IMHO, the changes should be made in _FlinkTypeFactory._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10423) Forward RocksDB memory metrics to Flink metrics reporter

2018-09-25 Thread Seth Wiesman (JIRA)
Seth Wiesman created FLINK-10423:


 Summary: Forward RocksDB memory metrics to Flink metrics reporter 
 Key: FLINK-10423
 URL: https://issues.apache.org/jira/browse/FLINK-10423
 Project: Flink
  Issue Type: New Feature
  Components: Metrics, State Backends, Checkpointing
Reporter: Seth Wiesman
Assignee: Seth Wiesman


RocksDB contains a number of metrics at the column family level about current 
memory usage, open memtables,  etc that would be useful to users wishing 
greater insight what rocksdb is doing. This work is inspired heavily by the 
comments on this rocksdb issue thread 
(https://github.com/facebook/rocksdb/issues/3216#issuecomment-348779233)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10422) Follow AWS specs in Kinesis Consumer

2018-09-25 Thread eugen yushin (JIRA)
eugen yushin created FLINK-10422:


 Summary: Follow AWS specs in Kinesis Consumer 
 Key: FLINK-10422
 URL: https://issues.apache.org/jira/browse/FLINK-10422
 Project: Flink
  Issue Type: Improvement
  Components: Kinesis Connector
Affects Versions: 1.6.1
Reporter: eugen yushin


*Related conversation in mailing list:*

[https://lists.apache.org/thread.html/96de3bac9761564767cf283b58d664f5ae1b076e0c4431620552af5b@%3Cdev.flink.apache.org%3E]

*Summary:*

Flink Kinesis consumer checks shards id for a particular pattern:
{noformat}
"^shardId-\\d{12}"
{noformat}
[https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/model/StreamShardHandle.java#L132]

While this inlines with current Kinesis streams server implementation (all
 streams follows this pattern), it confronts with AWS docs:

 
{code:java}
ShardId
 The unique identifier of the shard within the stream.
 Type: String
 Length Constraints: Minimum length of 1. Maximum length of 128.
Pattern: [a-zA-Z0-9_.-]+
 Required: Yes
{code}
 

[https://docs.aws.amazon.com/kinesis/latest/APIReference/API_Shard.html]

*Intention:*
 We have no guarantees and can't rely on patterns other than provided in AWS
 manifest.
 Any custom implementation of Kinesis mock should rely on AWS manifest which
 claims ShardID to be alfanums. This prevents anyone to use Flink with such
 kind of mocks.

The reason behind the scene to use particular pattern "^shardId-
d12" is to create Flink's custom Shard comparator, filter already seen shards, 
and
 pass latest shard for client.listShards only to limit the scope for RPC
 call to AWS.

In the meantime, I think we can get rid of this logic at all. The current
 usage in project is:
 - fix Kinesalite bug (I've already opened an issue to cover this:
 [https://github.com/mhart/kinesalite/issues/76] and opened PR: 
[https://github.com/mhart/kinesalite/pull/77]). We can move this logic to
 test code base to keep production code clean for now
 
[https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java#L464]

 - adjust last seen shard id. We can simply omit this cause' AWS client
 won't return already seen shards and we will have new ids only or nothing.
 
[https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/KinesisDataFetcher.java#L475]
 
[https://github.com/apache/flink/blob/50d076ab6ad325907690a2c115ee2cb1c45775c9/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/proxy/KinesisProxy.java#L406]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] [Contributing] (2) - Review Steps

2018-09-25 Thread Tzu-Li Chen
Hi Fabian,

You convinced me. I miss the advantage we can take from mailing lists.

Now I am of the same opinion.

Best,
tison.


Fabian Hueske  于2018年9月25日周二 下午3:01写道:

> Hi,
>
> I think questions about Flink should be posted on the public mailing lists
> instead of asking just a single expert.
>
> There's many reasons for that:
> * usually more than one person can answer the question (what if the expert
> is not available?)
> * non-committers can join the discussion and contribute to the community
> (how can they become experts otherwise?)
> * the knowledge is shared on the mailing list (helps in cases when only one
> person can answer the question)
>
> Last but not least, my concern is that committers for popular contribution
> areas would be flooded with requests.
> Even without being listed as a "component expert", I cannot handle all
> review requests directed at me.
> I work on issues (PR reviews, my contributions, discussions) that I deem
> important and being constantly pinged does not really help to speed things
> up.
> There are of course cases when it is important to be notified, but IMO
> chances that those get the right attention decrease with the number of
> requests.
>
> Best, Fabian
>
>
>
>
> Am Di., 25. Sep. 2018 um 04:10 Uhr schrieb Tzu-Li Chen <
> wander4...@gmail.com
> >:
>
> > Thanks for start the discussion Stephan!
> >
> > (1) Do we agree on the five basic steps below?*
> > +1 to the five steps and making the third question in the proposal the
> > first.
> >
> > (2) How do we understand that consensus is reached about adding the
> > feature?
> > +1 to lazy consensus with one committer's +1
> >
> > (3) To answer the question whether a PR needs special attention
> >
> > Contributor can ask for special attention, which is treated as a
> > suggestion.
> > Committer can ask for another committers' attention, either for advice or
> > transfer
> > the right of decision.
> >
> > IMO it is quite help to add a page about "component experts", attach or
> > link  it
> > from README. This would be a really helpful information to new
> contributors
> > so that they know to whom he can cc or ask for advice. Besides it would
> > be helpful for those who want to know more about the mechanism underneath
> > Flink, now they know with whom they can consult.
> >
> > Best,
> > tison.
> >
>


[jira] [Created] (FLINK-10421) Shaded Hadoop S3A end-to-end test failed on Travis

2018-09-25 Thread Dawid Wysakowicz (JIRA)
Dawid Wysakowicz created FLINK-10421:


 Summary: Shaded Hadoop S3A end-to-end test failed on Travis
 Key: FLINK-10421
 URL: https://issues.apache.org/jira/browse/FLINK-10421
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Dawid Wysakowicz


https://api.travis-ci.org/v3/job/432916761/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10420) Create and drop view in sql client should check the view created based on the configuration.

2018-09-25 Thread vinoyang (JIRA)
vinoyang created FLINK-10420:


 Summary: Create and drop view in sql client should check the view 
created based on the configuration.
 Key: FLINK-10420
 URL: https://issues.apache.org/jira/browse/FLINK-10420
 Project: Flink
  Issue Type: Bug
  Components: SQL Client
Reporter: vinoyang
Assignee: vinoyang


Currently, just checked current session : 
{code:java}
private void callCreateView(SqlCommandCall cmdCall) {
   final String name = cmdCall.operands[0];
   final String query = cmdCall.operands[1];

   //here
   final String previousQuery = context.getViews().get(name);
   if (previousQuery != null) {
  printExecutionError(CliStrings.MESSAGE_VIEW_ALREADY_EXISTS);
  return;
   }
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10419) ClassNotFoundException while deserializing user exceptions from checkpointing

2018-09-25 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-10419:
---

 Summary: ClassNotFoundException while deserializing user 
exceptions from checkpointing
 Key: FLINK-10419
 URL: https://issues.apache.org/jira/browse/FLINK-10419
 Project: Flink
  Issue Type: Bug
  Components: Distributed Coordination, State Backends, Checkpointing
Affects Versions: 1.5.4, 1.6.1, 1.6.0, 1.5.3, 1.5.2, 1.5.1, 1.5.0, 1.7.0
Reporter: Nico Kruber
Assignee: Nico Kruber


If, during asynchronous checkpointing, a user-code exception is thrown, for 
example like this: {{TwoPhaseCommitSinkFunction#snapshotState}} -> 
{{FlinkKafkaProducer011#preCommit}}
-> {{FlinkKafka011Exception}}, it will be sent back to the checkpoint 
coordinator via Java serialization but will then fail during the 
de-serialization because the class is not available. This will result in the 
following error shadowing the real one:
{code}
java.lang.ClassNotFoundException: 
org.apache.flink.streaming.connectors.kafka.FlinkKafka011Exception
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.flink.util.InstantiationUtil.resolveClass(InstantiationUtil.java:76)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1859)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1745)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2033)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at 
java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:557)
at java.lang.Throwable.readObject(Throwable.java:914)
at sun.reflect.GeneratedMethodAccessor158.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427)
at 
org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation.readObject(RemoteRpcInvocation.java:222)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427)
at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:502)
at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:489)
at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:477)
at 
org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58)
at 
org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation.deserializeMethodInvocation(RemoteRpcInvocation.java:118)
at 
org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation.getMethodName(RemoteRpcInvocation.java:59)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:214)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at 
akka.actor.UntypedActor3728anonfun.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at 

[jira] [Created] (FLINK-10418) Add COSH math function supported in Table API and SQL

2018-09-25 Thread Aleksei Izmalkin (JIRA)
Aleksei Izmalkin created FLINK-10418:


 Summary: Add COSH math function supported in Table API and SQL
 Key: FLINK-10418
 URL: https://issues.apache.org/jira/browse/FLINK-10418
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Aleksei Izmalkin


Inspired by https://issues.apache.org/jira/browse/FLINK-10398

Refer to [https://www.techonthenet.com/oracle/functions/cosh.php]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Enrich testing doc with more unit test examples using AbstractStreamOperator

2018-09-25 Thread Tony Wei
Hi all,

It seems that there are more and more users from user mailing list ask how
to do unit test with Flink
features like states or timer. And the community usually tends to suggest
them using
`AbstractStreamOperator` and provide an example from Flink github repo.
Here I sort out some
examples and write them down in the testing documentation [1]. And I would
link to contribute back
to the Flink.

The reason why I ask it first in dev mailing list is that
`AbstractStreamOperator` is an internal API and
could be changed at any time. I'm not sure if it is worth to provide these
examples on testing
document, so I want to collect some feedbacks before I go to open a JIRA
ticket.

If this is feasible and valuable, then I will open the corresponding JIRA
ticket and we can discuss
more details of what examples are good to have in the document or how to
structure the content.

I would really appreciate any feedback from you. Thanks in advance.

Best Regards,
Tony Wei

[1]
https://github.com/apache/flink/compare/master...tony810430:flink-testing-doc


[jira] [Created] (FLINK-10417) Add option to throw exception on pattern variable miss with SKIP_TO_FIRST/LAST

2018-09-25 Thread Dawid Wysakowicz (JIRA)
Dawid Wysakowicz created FLINK-10417:


 Summary: Add option to throw exception on pattern variable miss 
with SKIP_TO_FIRST/LAST
 Key: FLINK-10417
 URL: https://issues.apache.org/jira/browse/FLINK-10417
 Project: Flink
  Issue Type: Improvement
  Components: CEP
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.7.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10416) Add to rat excludes files generated by jepsen tests

2018-09-25 Thread Dawid Wysakowicz (JIRA)
Dawid Wysakowicz created FLINK-10416:


 Summary: Add to rat excludes files generated by jepsen tests
 Key: FLINK-10416
 URL: https://issues.apache.org/jira/browse/FLINK-10416
 Project: Flink
  Issue Type: Bug
  Components: Build System, Tests
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.7.0


Currently jepsen generates some files that results in rat plugin failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10415) RestClient does not react to lost connection

2018-09-25 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-10415:
-

 Summary: RestClient does not react to lost connection
 Key: FLINK-10415
 URL: https://issues.apache.org/jira/browse/FLINK-10415
 Project: Flink
  Issue Type: Bug
  Components: REST
Affects Versions: 1.5.4, 1.6.1, 1.7.0
Reporter: Till Rohrmann
 Fix For: 1.7.0, 1.6.2, 1.5.5


While working on FLINK-10403, I noticed that Flink's {{RestClient}} does not 
seem to react to a lost connections in time. When sending a request to the 
current leader it happened that the leader was killed just after establishing 
the connection. Then the {{RestClient}} did not fail the connection and was 
stuck in writing a request or retrieving a response from the lost leader. I'm 
wondering whether we should introduce a {{ReadTimeoutHandler}} and 
{{WriteTimeoutHandler}} to handle these problems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-10414) Add skip to next strategy

2018-09-25 Thread Dawid Wysakowicz (JIRA)
Dawid Wysakowicz created FLINK-10414:


 Summary: Add skip to next strategy
 Key: FLINK-10414
 URL: https://issues.apache.org/jira/browse/FLINK-10414
 Project: Flink
  Issue Type: Improvement
  Components: CEP
Reporter: Dawid Wysakowicz
Assignee: Dawid Wysakowicz
 Fix For: 1.7.0


Add skip to next strategy, that should discard all partial matches that started 
with the same element as found match.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] [Contributing] (2) - Review Steps

2018-09-25 Thread Fabian Hueske
Hi,

I think questions about Flink should be posted on the public mailing lists
instead of asking just a single expert.

There's many reasons for that:
* usually more than one person can answer the question (what if the expert
is not available?)
* non-committers can join the discussion and contribute to the community
(how can they become experts otherwise?)
* the knowledge is shared on the mailing list (helps in cases when only one
person can answer the question)

Last but not least, my concern is that committers for popular contribution
areas would be flooded with requests.
Even without being listed as a "component expert", I cannot handle all
review requests directed at me.
I work on issues (PR reviews, my contributions, discussions) that I deem
important and being constantly pinged does not really help to speed things
up.
There are of course cases when it is important to be notified, but IMO
chances that those get the right attention decrease with the number of
requests.

Best, Fabian




Am Di., 25. Sep. 2018 um 04:10 Uhr schrieb Tzu-Li Chen :

> Thanks for start the discussion Stephan!
>
> (1) Do we agree on the five basic steps below?*
> +1 to the five steps and making the third question in the proposal the
> first.
>
> (2) How do we understand that consensus is reached about adding the
> feature?
> +1 to lazy consensus with one committer's +1
>
> (3) To answer the question whether a PR needs special attention
>
> Contributor can ask for special attention, which is treated as a
> suggestion.
> Committer can ask for another committers' attention, either for advice or
> transfer
> the right of decision.
>
> IMO it is quite help to add a page about "component experts", attach or
> link  it
> from README. This would be a really helpful information to new contributors
> so that they know to whom he can cc or ask for advice. Besides it would
> be helpful for those who want to know more about the mechanism underneath
> Flink, now they know with whom they can consult.
>
> Best,
> tison.
>


回复:Codespeed deployment for Flink

2018-09-25 Thread Zhijiang(wangzhijiang999)
Thanks @Piotr Nowojski  and @Nico Kruber for the good job!

I already benefit from this benchmark in the previous PRs. Wish the 
visualization tool becoming stronger to benefit more for the community!

Best,
Zhijiang
--
发件人:Piotr Nowojski 
发送时间:2018年9月21日(星期五) 22:59
收件人:dev 
抄 送:Nico Kruber 
主 题:Codespeed deployment for Flink

Hello community,

For almost a year in data Artisans Nico and I were maintaining a setup
that continuously evaluates Flink with benchmarks defined at
https://github.com/dataArtisans/flink-benchmarks 
. With growing interest
and after proving useful a couple of times, we have finally decided to
publish the web UI layer of this setup. Currently it is accessible via
the following (maybe not so?) temporarily url:

http://codespeed.dak8s.net:8000 

This is a simple web UI to present performance changes over past and
present commits to Apache Flink. It only has a couple of views and the
most useful ones are:

1. Timeline
2. Comparison (I recommend to use normalization)

Timeline is useful for spotting unintended regressions or unexpected
improvements. It is being updated every six hours.
Comparison is useful for comparing a given branch (for example a pending
PR) with the master branch. More about that later.

The codespeed project on it’s own is just a presentation layer. As
mentioned before, the only currently available benchmarks are defined in
the flink-benchmarks repository and they are executed periodically or on
demand by Jenkins on a single bare metal machine. The current setup
limits us only to micro benchmarks (they are easier to
setup/develop/maintain and have a quicker feedback loop compared to
cluster benchmarks) but there is no reason preventing us from setting up 
other kinds of benchmarks and upload their results to our codespeed 
instance as well.

Regarding the comparison view. Currently data Artisans’ Flink mirror
repository at https://github.com/dataArtisans/flink 
 is configured to
trigger benchmark runs on every commit/change that happens on the
benchmark-request branch (We chose to use dataArtisans' repository here
because we needed a custom GitHub hook that we couldn’t add to the
apache/flink repository). Benchmarking usually takes between one and two
hours. One obvious limitation at the moment is that there is only one
comparison view, with one comparison branch, so trying to compare two
PRs at the same time is impossible. However we can tackle
this problem once it will become a real issue, not only a theoretical one.

Piotrek & Nico