[jira] [Created] (FLINK-6185) Input readers and output writers/formats need to support gzip

2017-03-24 Thread Luke Hutchison (JIRA)
Luke Hutchison created FLINK-6185:
-

 Summary: Input readers and output writers/formats need to support 
gzip
 Key: FLINK-6185
 URL: https://issues.apache.org/jira/browse/FLINK-6185
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.2.0
Reporter: Luke Hutchison
Priority: Minor


File sources (such as {{env#readCsvFile()}}) and sinks (such as 
FileOutputFormat and its subclasses, and methods such as 
{{DataSet#writeAsText()}}) need the ability to transparently decompress and 
compress files. Primarily gzip would be useful, but it would be nice if this 
were pluggable to support bzip2, xz, etc.

There could be options for autodetect (based on file extension and/or file 
content), which could be the default, as well as no compression or a selected 
compression method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[ANNOUNCE] Flink Forward San Francisco 10-11 Apr 2017 community discount codes

2017-03-24 Thread Robert Metzger
Dear Flink community,

I would like to bring Flink Forward San Francisco to your attention. After
hosting Flink Forward for two years in Berlin, Germany, we decided to bring
it to the US west coast as well.

Check out this very nice video summary from last year's conference in
Berlin: https://www.youtube.com/watch?v=RYoFwQFvC_g

Given the amazing program we have for San Francisco in April, I'm sure
it'll be a similar success: http://sf.flink-forward.org/kb_day/day2/ (with
keynotes from Netflix and Uber and talks from many others)


Since the Flink community is an important part of the success of Flink,
data Artisans decided to offer all community members a 25% discount code: "
*Mailing_FFSF17_**25*".
Feel free to pass this code to your colleagues if they are not subscribed
to the mailing list.


We are also looking for volunteers to help as stage managers, at the
registration and other logistics.
Volunteers will get a free ticket! Please get in touch via:
http://sf.flink-forward.org/be-part-of-flink-forward-san-francisco/

There will also be a colocated Flink ML Hackathon organized by Parallel
Machines: https://www.meetup.com/Parallel-Machines-Meetup/events/238390498/


I'm really looking forward to see you all in San Francisco in April!
Almost all committers employed at data Artisans will come to San Francisco
to hang out, have beers and chat about what's going on in the community
these days.



Regards,
Robert


Re: [VOTE] Release Apache Flink 1.2.1 (RC1)

2017-03-24 Thread Ufuk Celebi
RC1 doesn't contain Stefan's backport for the Asynchronous snapshots
for heap-based keyed state that has been merged. Should we create RC2
with that fix since the voting period only starts on Monday? I think
it would only mean rerunning the scripts on your side, right?

– Ufuk


On Fri, Mar 24, 2017 at 3:05 PM, Robert Metzger  wrote:
> Dear Flink community,
>
> Please vote on releasing the following candidate as Apache Flink version 1.2
> .1.
>
> The commit to be voted on:
> *732e55bd* (*http://git-wip-us.apache.org/repos/asf/flink/commit/732e55bd
> *)
>
> Branch:
> release-1.2.1-rc1
>
> The release artifacts to be voted on can be found at:
> *http://people.apache.org/~rmetzger/flink-1.2.1-rc1/
> *
>
> The release artifacts are signed with the key with fingerprint D9839159:
> http://www.apache.org/dist/flink/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapacheflink-1116
>
> -
>
>
> The vote ends on Wednesday, March 29, 2017, 3pm CET.
>
>
> [ ] +1 Release this package as Apache Flink 1.2.1
> [ ] -1 Do not release this package, because ...


[jira] [Created] (FLINK-6184) Buffer metrics can cause NPE

2017-03-24 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-6184:
---

 Summary: Buffer metrics can cause NPE
 Key: FLINK-6184
 URL: https://issues.apache.org/jira/browse/FLINK-6184
 Project: Flink
  Issue Type: Bug
  Components: Metrics
Affects Versions: 1.3.0
Reporter: Chesnay Schepler
Priority: Blocker
 Fix For: 1.3.0


The Buffer metrics defined in the TaskIOMetricGroup are created when a Task is 
created. At this time, the bufferPool in the Input gates that the metrics make 
use of is still null, leading to possible NPEs.

These metrics should either be created after the required objects are fully 
initialized or guard this case with null checks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6183) TaskMetricGroup may not be cleanup when Task.run() is never called or exits early

2017-03-24 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-6183:
---

 Summary: TaskMetricGroup may not be cleanup when Task.run() is 
never called or exits early
 Key: FLINK-6183
 URL: https://issues.apache.org/jira/browse/FLINK-6183
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.3.0
Reporter: Chesnay Schepler
Priority: Blocker


The TaskMetricGroup is created when a Task is created. It is cleaned up at the 
end of Task.run() in the finally block. If however run() is never called due 
some failure between the creation and the call to run the metric group is never 
closed. This also means that the JobMetricGroup is never closed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Question about flink client

2017-03-24 Thread Till Rohrmann
Hi Yelei,

thanks for investigating the problem and pointing out the problematic
parts. In fact, I recently stumbled across the very same problem in the
JobClientActor and wrote a fix for it. It is already merged into the
master. I hope that this fix solved the problem you've described.

Cheers,
Till

On Wed, Mar 22, 2017 at 3:42 PM, Yelei Feng  wrote:

> Hi,
>
>
> I have two questions about flink client in interactive mode.
>
>
> One is for yarn-session.sh,  once the session CLI can't get cluster stauts
> (jobmanager is down), it will try to shutdown the cluster and cleanup
> related files even if new jobmanager will be created soon. As result,  yarn
> will fail to start a new jobmanager due to missing files on HDFS. As a
> workround, I can config `akka.lookup.timeout` to wait a bit longer,  say 60
> seconds. But I'm wondering if it will affect other components.
>
>
> Second is about flink cli. If cluster is down after submiting job using
> 'flink run xx.jar',  cli hangs there only showing "New JobManager elected.
> Connecting to null " instead of cleanup and close itself.
>
>
> After some digging, I found the main logic is in JobClientActor. It
> receives jobmanager status changes from two sources: zookeeper and akka
> deathwatch. It would terminate itself once receiving message
> `ConnectionTimeout`.
> Client sets current leaderSessionId and unwatch previous jobmanager from
> ZK; it receives `Teminated` of previous jobmanager from akka deathwatch and
> send `ConnectionTimeout` to itself after 60s. In a great chance, they would
> interfere with each other.
>
> Situation1:
>
>   1.  client get notified from zk, set leaderSessionId to null
>   2.  client unwatch previous jobmanager
>   3.  msg `Teminated` of previous jobmanager never got received
>
> Situation 2:
>
>   1.  msg `Teminated` of current jobmanager is received
>   2.  schedule msg ConnectionTimeout after 60s
>   3.  client get notified from zk, set `leaderSessionId` to null in less
> than 60s
>   4.  `ConnectionTimeout` will be filtered out due to different
> `leaderSessionId`
>
>
> Both of the two problems only happen in interactive mode,  not in detached
> mode.  I wonder if it's issues for interactive mode, or we should only use
> detached mode in production environment?
>
>
> Any insight is appreciated.
>
>
> Thanks,
>
> Yelei
>
>


Re: [ANNOUNCE] Apache Flink 1.1.5 Released

2017-03-24 Thread Till Rohrmann
Thanks a lot for your work being the release manager Gordon. Also thanks a
lot to the community for all the bug fixes which went into 1.1.5.

On Fri, Mar 24, 2017 at 3:40 PM, Stephan Ewen  wrote:

> Thanks a lot, Gordon, for being the release manager for that release!
>
>
> On Thu, Mar 23, 2017 at 9:34 AM, Tzu-Li (Gordon) Tai 
> wrote:
>
> > The Apache Flink community is pleased to announce the availability of
> > Flink 1.1.5, which is the next bugfix release for the 1.1 series.
> > The official release announcement: https://flink.apache.org/news/
> > 2017/03/23/release-1.1.5.html
> > Release binaries: http://apache.lauf-forum.at/flink/flink-1.1.5
> > For users of the Flink 1.1 series, please update your Maven dependencies
> > to the new 1.1.5 version and update your binaries.
> > On behalf of the community, I would like to thank everybody who
> > contributed to the release.
> >
> >
>


Re: [ANNOUNCE] Apache Flink 1.1.5 Released

2017-03-24 Thread Stephan Ewen
Thanks a lot, Gordon, for being the release manager for that release!


On Thu, Mar 23, 2017 at 9:34 AM, Tzu-Li (Gordon) Tai 
wrote:

> The Apache Flink community is pleased to announce the availability of
> Flink 1.1.5, which is the next bugfix release for the 1.1 series.
> The official release announcement: https://flink.apache.org/news/
> 2017/03/23/release-1.1.5.html
> Release binaries: http://apache.lauf-forum.at/flink/flink-1.1.5
> For users of the Flink 1.1 series, please update your Maven dependencies
> to the new 1.1.5 version and update your binaries.
> On behalf of the community, I would like to thank everybody who
> contributed to the release.
>
>


[VOTE] Release Apache Flink 1.2.1 (RC1)

2017-03-24 Thread Robert Metzger
Dear Flink community,

Please vote on releasing the following candidate as Apache Flink version 1.2
.1.

The commit to be voted on:
*732e55bd* (*http://git-wip-us.apache.org/repos/asf/flink/commit/732e55bd
*)

Branch:
release-1.2.1-rc1

The release artifacts to be voted on can be found at:
*http://people.apache.org/~rmetzger/flink-1.2.1-rc1/
*

The release artifacts are signed with the key with fingerprint D9839159:
http://www.apache.org/dist/flink/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapacheflink-1116

-


The vote ends on Wednesday, March 29, 2017, 3pm CET.


[ ] +1 Release this package as Apache Flink 1.2.1
[ ] -1 Do not release this package, because ...


Re: [DISCUSS] FLIP-18: Code Generation for improving sorting performance

2017-03-24 Thread Pattarawat Chormai
Hi,

Gábor and I plan to investigate the metamorphic call issue further. We will 
implement the idea of combining QuickSort and NormalizedKeySorter together in 
generated code and benchmark the improvement with a Flink’s job that uses 3 
different sorters.

Best,
Pat


> On Mar 23, 2017, at 6:31 PM, Gábor Gévay  wrote:
> 
> Hello,
> 
>> Second, additional classes will turn performance critical callsites 
>> megamorphic.
> 
> Yes, this is a completely valid point, thanks for raising this issue
> Greg. We were planning to have an offline discussion tomorrow with
> Pattarawat about this. We have a few options:
> 1. We could fuse the QuickSort and NormalizedKeySorter into a single
> class when generating the code.
> 2. As you said, duplicating the QuickSort class might also be a nice
> solution, if we can find a simple way to pull it off.
> 3. We might do some benchmarks and determine that the effect of this
> issue is negligible and therefore we can ignore it. But I'm worried
> that this is not the case, so we'll probably have to go with one of
> the above options.
> 
> Best,
> Gábor
> 
> 
> 
> On Thu, Mar 23, 2017 at 6:12 PM, Greg Hogan  wrote:
>> I would be more than happy to shepherd and review this PR.
>> 
>> I have two discussion points. First, a strategy for developing with 
>> templates. IntelliJ has a FreeMarker plugin but we lose formatting and code 
>> completion. To minimize this issue we can retain the untemplated code in an 
>> abstract class which is then concretely subclassed by the template.
>> 
>> Second, additional classes will turn performance critical callsites 
>> megamorphic. Stephan noted this issue in his work on MemorySegment.
>>  http://flink.apache.org/news/2015/09/16/off-heap-memory.html
>> 
>> For example, QuickSort calls IndexedSortable#compare and 
>> IndexedSortable#swap. With multiple compiled implementations of the sorter 
>> template these callsites can no longer be inlined (the same is true with 
>> NormalizedKeySorter and FixedLengthRecordSorter if the latter was 
>> instrumented).
>> 
>> I have not found a way to duplicate a Java class at runtime, but we may be 
>> able to use Janino to compile a class which is then uniquely renamed: each 
>> IndexSortable type would map to a different QuickSort type (same bytecode, 
>> but uniquely optimized). This should also boost performance of runtime 
>> operators calling user defined functions.
>> 
>> Given the code already written, I expect we can refactor, review, and 
>> benchmark for the 1.3 release.
>> 
>> Greg
>> 
>> 
>>> On Mar 21, 2017, at 3:46 PM, Fabian Hueske  wrote:
>>> 
>>> Hi Pat,
>>> 
>>> thanks a lot for this great proposal! I think it is very well structured 
>>> and has the right level of detail.
>>> The improvements of your performance benchmarks look very promising and I 
>>> think code-gen'd sorters would be a very nice improvement.
>>> I like that you plan to add a switch to activate this feature.
>>> 
>>> In order move on, we will need a committer who "champions" your FLIP, 
>>> reviews the pull request, and eventually merges it.
>>> 
>>> @Greg and @Stephan, what do you think about this proposal?
>>> 
>>> Best, Fabian
>>> 
>>> 
>>> 2017-03-14 16:10 GMT+01:00 Pattarawat Chormai >> >:
>>> Hi all,
>>> 
>>> I would like to initiate a discussion of applying code generation to 
>>> NormalizedKeySorter. The goal is to improve sorting performance by 
>>> generating suitable NormalizedKeySorter for underlying data. This generated 
>>> sorter will contains only necessary code in important methods, such as swap 
>>> and compare, hence improving sorting performance.
>>> 
>>> Details of the implementation is illustrated at FLIP-18 : Code Generation 
>>> for improving sorting performance. 
>>> 
>>> 
>>> 
>>> Also, because we’re doing it as a course project at TUB, we have completed 
>>> the implementation and made a pull-request 
>>>  to Flink repo already.
>>> 
>>> From our evaluation, we have found that the pull-request reduces sorting 
>>> time around 7-10% and together with FLINK-3722 
>>>  the sorting time is 
>>> decreased by 12-20%.
>>> 
>>> 
>>> 
>>> Please take a look at the document and the pull-request and let me know if 
>>> you have any suggestion.
>>> 
>>> Best,
>>> Pat
>>> 
>> 



Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

2017-03-24 Thread Tzu-Li (Gordon) Tai
Thanks Robert for taking care of this!

On March 24, 2017 at 7:24:59 PM, Robert Metzger (rmetz...@apache.org) wrote:

Hi Gordon,  
I didn't see your request for a release manager.  
I'm volunteering to take this one. Its a little bit easier for me as a PMC  
member to do the actual release in the end.  

On Fri, Mar 24, 2017 at 7:02 AM, Tzu-Li (Gordon) Tai   
wrote:  

> Update for 1.2.1:  
>  
> The last fix was just merged!  
>  
> Since nobody else seems interested in managing 1.2.1, I can also help with  
> this one :)  
> I’ll create the release candidate over the weekend so we can start the  
> testing / voting next Monday.  
>  
> - Gordon  
>  
> On March 22, 2017 at 12:35:25 AM, Tzu-Li (Gordon) Tai (tzuli...@apache.org)  
> wrote:  
>  
> Sorry, I missed one other pending issue for Flink 1.2.1:  
>  
> - https://issues.apache.org/jira/browse/FLINK-5972  
> Disallow shrinking merging windows. This would replace  
> https://issues.apache.org/jira/browse/FLINK-5713, which was previously  
> listed as a blocker for 1.2.1.  
> Status: PR review pending - https://github.com/apache/flink/pull/3587  
>  
> On March 22, 2017 at 12:23:03 AM, Tzu-Li (Gordon) Tai (tzuli...@apache.org)  
> wrote:  
>  
> Update for Flink 1.2.1:  
>  
> There’s only one PR pending that is LGTM -  
> https://issues.apache.org/jira/browse/FLINK-6084  
> Fix for Cassandra connector dropping metrics-core dependency.  
>  
> We can proceed to create the release candidate very soon :-)  
> Release 1.1.5 RC1 seems to be in good shape so far, so hopefully we can  
> start voting for 1.2.1 tomorrow.  
>  
> Also, we’re still lacking a release manager for 1.2.1. Is anyone  
> interested in volunteering for this release?  
> If nobody steps up for it before tomorrow, I can also do it.  
>  
> Cheers,  
> Gordon  
>  
> On March 18, 2017 at 12:52:48 AM, Robert Metzger (rmetz...@apache.org)  
> wrote:  
>  
> I don't think that his issue should be a reason to hold back a bugfix  
> release.  
> There are workarounds for the problem you are describing. Once we've fixed  
> it, we can include it into the next upcoming bugfix release.  
>  
> On Fri, Mar 17, 2017 at 4:22 PM, Flavio Pompermaier   
> wrote:  
>  
> > I propose to fix https://issues.apache.org/jira/browse/FLINK-6103 before  
> > issue a release  
> >  
> > On Fri, Mar 17, 2017 at 8:12 AM, Ufuk Celebi  wrote:  
> >  
> > > Cool! Thanks for taking care of this Gordon :-)  
> > >  
> > > On Fri, Mar 17, 2017 at 7:13 AM, Tzu-Li (Gordon) Tai  
> > >  wrote:  
> > > > Update for 1.1.5:  
> > > > The last fixes for 1.1.5 are in! I will create the RC today and start  
> > > the vote.  
> > > >  
> > > > Cheers,  
> > > > Gordon  
> > > >  
> > > >  
> > > > On March 17, 2017 at 1:14:53 AM, Robert Metzger (rmetz...@apache.org  
> )  
> > > wrote:  
> > > >  
> > > > The cassandra connector is probably not usable in Flink 1.2.0. I  
> would  
> > > like  
> > > > to include a fix in 1.2.1:  
> > > > https://issues.apache.org/jira/browse/FLINK-6084  
> > > >  
> > > > Please let me know if this fix becomes a blocker for the 1.2.1  
> release.  
> > > If  
> > > > so, I can validate the fix myself to speed up things.  
> > > >  
> > > > On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi   
> > > wrote:  
> > > >  
> > > >> @Tzu-li(Fordon)Tai  
> > > >>  
> > > >> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.  
> > > >>  
> > > >> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <  
> > > >> https://github.com/zentol/flink/tree/5650_python_test_debug>  
> > > >>  
> > > >>  
> > > >> > 在 2017年3月16日,上午3:37,Stephan Ewen  写道:  
> > > >> >  
> > > >> > Thanks for the update!  
> > > >> >  
> > > >> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove  
> > scheduled  
> > > >> > cancel-task from timer queue to prevent memory leaks  
> > > >> >  
> > > >> > The remaining issue list looks good, but I would say that (5) is  
> > > >> optional.  
> > > >> > It is not a critical production bug.  
> > > >> >  
> > > >> >  
> > > >> >  
> > > >> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <  
> > > >> tzuli...@apache.org>  
> > > >> > wrote:  
> > > >> >  
> > > >> >> Thanks a lot for the updates so far everyone!  
> > > >> >>  
> > > >> >> From the discussion so far, the below is the still unfixed  
> pending  
> > > >> issues  
> > > >> >> for 1.1.5 / 1.2.1 release.  
> > > >> >>  
> > > >> >> Since there’s only one backport for 1.1.5 left, I think having an  
> > RC  
> > > for  
> > > >> >> 1.1.5 near the end of this week / early next week is very  
> > promising,  
> > > as  
> > > >> >> basically everything is already in.  
> > > >> >> I’d be happy to volunteer to help manage the release for 1.1.5,  
> and  
> > > >> >> prepare the RC when it’s ready :)  
> > > >> >>  
> > > >> >> For 1.2.1, we can leave the pending list here for 

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

2017-03-24 Thread Robert Metzger
Hi Gordon,
I didn't see your request for a release manager.
I'm volunteering to take this one. Its a little bit easier for me as a PMC
member to do the actual release in the end.

On Fri, Mar 24, 2017 at 7:02 AM, Tzu-Li (Gordon) Tai 
wrote:

> Update for 1.2.1:
>
> The last fix was just merged!
>
> Since nobody else seems interested in managing 1.2.1, I can also help with
> this one :)
> I’ll create the release candidate over the weekend so we can start the
> testing / voting next Monday.
>
> - Gordon
>
> On March 22, 2017 at 12:35:25 AM, Tzu-Li (Gordon) Tai (tzuli...@apache.org)
> wrote:
>
> Sorry, I missed one other pending issue for Flink 1.2.1:
>
> - https://issues.apache.org/jira/browse/FLINK-5972
> Disallow shrinking merging windows. This would replace
> https://issues.apache.org/jira/browse/FLINK-5713, which was previously
> listed as a blocker for 1.2.1.
> Status: PR review pending - https://github.com/apache/flink/pull/3587
>
> On March 22, 2017 at 12:23:03 AM, Tzu-Li (Gordon) Tai (tzuli...@apache.org)
> wrote:
>
> Update for Flink 1.2.1:
>
> There’s only one PR pending that is LGTM -
> https://issues.apache.org/jira/browse/FLINK-6084
> Fix for Cassandra connector dropping metrics-core dependency.
>
> We can proceed to create the release candidate very soon :-)
> Release 1.1.5 RC1 seems to be in good shape so far, so hopefully we can
> start voting for 1.2.1 tomorrow.
>
> Also, we’re still lacking a release manager for 1.2.1. Is anyone
> interested in volunteering for this release?
> If nobody steps up for it before tomorrow, I can also do it.
>
> Cheers,
> Gordon
>
> On March 18, 2017 at 12:52:48 AM, Robert Metzger (rmetz...@apache.org)
> wrote:
>
> I don't think that his issue should be a reason to hold back a bugfix
> release.
> There are workarounds for the problem you are describing. Once we've fixed
> it, we can include it into the next upcoming bugfix release.
>
> On Fri, Mar 17, 2017 at 4:22 PM, Flavio Pompermaier 
> wrote:
>
> > I propose to fix https://issues.apache.org/jira/browse/FLINK-6103 before
> > issue a release
> >
> > On Fri, Mar 17, 2017 at 8:12 AM, Ufuk Celebi  wrote:
> >
> > > Cool! Thanks for taking care of this Gordon :-)
> > >
> > > On Fri, Mar 17, 2017 at 7:13 AM, Tzu-Li (Gordon) Tai
> > >  wrote:
> > > > Update for 1.1.5:
> > > > The last fixes for 1.1.5 are in! I will create the RC today and start
> > > the vote.
> > > >
> > > > Cheers,
> > > > Gordon
> > > >
> > > >
> > > > On March 17, 2017 at 1:14:53 AM, Robert Metzger (rmetz...@apache.org
> )
> > > wrote:
> > > >
> > > > The cassandra connector is probably not usable in Flink 1.2.0. I
> would
> > > like
> > > > to include a fix in 1.2.1:
> > > > https://issues.apache.org/jira/browse/FLINK-6084
> > > >
> > > > Please let me know if this fix becomes a blocker for the 1.2.1
> release.
> > > If
> > > > so, I can validate the fix myself to speed up things.
> > > >
> > > > On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi 
> > > wrote:
> > > >
> > > >> @Tzu-li(Fordon)Tai
> > > >>
> > > >> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.
> > > >>
> > > >> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <
> > > >> https://github.com/zentol/flink/tree/5650_python_test_debug>
> > > >>
> > > >>
> > > >> > 在 2017年3月16日,上午3:37,Stephan Ewen  写道:
> > > >> >
> > > >> > Thanks for the update!
> > > >> >
> > > >> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove
> > scheduled
> > > >> > cancel-task from timer queue to prevent memory leaks
> > > >> >
> > > >> > The remaining issue list looks good, but I would say that (5) is
> > > >> optional.
> > > >> > It is not a critical production bug.
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <
> > > >> tzuli...@apache.org>
> > > >> > wrote:
> > > >> >
> > > >> >> Thanks a lot for the updates so far everyone!
> > > >> >>
> > > >> >> From the discussion so far, the below is the still unfixed
> pending
> > > >> issues
> > > >> >> for 1.1.5 / 1.2.1 release.
> > > >> >>
> > > >> >> Since there’s only one backport for 1.1.5 left, I think having an
> > RC
> > > for
> > > >> >> 1.1.5 near the end of this week / early next week is very
> > promising,
> > > as
> > > >> >> basically everything is already in.
> > > >> >> I’d be happy to volunteer to help manage the release for 1.1.5,
> and
> > > >> >> prepare the RC when it’s ready :)
> > > >> >>
> > > >> >> For 1.2.1, we can leave the pending list here for tracking, and
> > come
> > > >> back
> > > >> >> to update it in the near future.
> > > >> >>
> > > >> >> If there’s anything I missed, please let me know!
> > > >> >>
> > > >> >>
> > > >> >> === Still pending for Flink 1.1.5 ===
> > > >> >>
> > > >> >> (1) https://issues.apache.org/jira/browse/FLINK-5701
> > > >> >> Broken at-least-once Kafka producer.
> > > >> >> Status: 

[jira] [Created] (FLINK-6182) Fix possible NPE in SourceStreamTask

2017-03-24 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-6182:
--

 Summary: Fix possible NPE in SourceStreamTask
 Key: FLINK-6182
 URL: https://issues.apache.org/jira/browse/FLINK-6182
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Reporter: Ufuk Celebi
Priority: Minor


If SourceStreamTask is cancelled before being invoked, `headOperator` is not 
set yet, which leads to an NPE. This is not critical but leads to noisy logs.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6181) Zookeeper scripts use invalid regex

2017-03-24 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-6181:
-

 Summary: Zookeeper scripts use invalid regex
 Key: FLINK-6181
 URL: https://issues.apache.org/jira/browse/FLINK-6181
 Project: Flink
  Issue Type: Bug
  Components: Build System
Reporter: Robert Metzger


This issue has been reported by a user: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/unable-to-add-more-servers-in-zookeeper-quorum-peers-in-flink-1-2-td12321.html





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6180) Remove TestingSerialRpcService

2017-03-24 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-6180:


 Summary: Remove TestingSerialRpcService
 Key: FLINK-6180
 URL: https://issues.apache.org/jira/browse/FLINK-6180
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination
Affects Versions: 1.3.0
Reporter: Till Rohrmann
 Fix For: 1.3.0


The {{TestingSerialRpcService}} is problematic because it allows execution 
interleavings which are not possible when using the {{AkkaRpcService}}, because 
main thread calls can be executed while another main thread call is still being 
executed. Therefore, we might test things which are not possible and might not 
test certain interleavings which occur when using the {{AkkaRpcService}}.

Therefore, I propose to remove the {{TestingSerialRpcService}} and to refactor 
the existing tests to use the {{AkkaRpcService}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6179) Create SerialMainThreadValidatorUtil to support TestingSerialRpcService

2017-03-24 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-6179:


 Summary: Create SerialMainThreadValidatorUtil to support 
TestingSerialRpcService
 Key: FLINK-6179
 URL: https://issues.apache.org/jira/browse/FLINK-6179
 Project: Flink
  Issue Type: Sub-task
  Components: Distributed Coordination
Affects Versions: 1.3.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
Priority: Minor


The {{MainThreadValidatorUtil}} does not play well together with the 
{{TestingSerialRpcService}} which executes all rpc call in the same thread. In 
order to still support main thread validation in the {{RpcEndpoint}} methods, I 
porpose to implement a {{SerialMainThreadValidatorUtil}} which not only sets 
the main thread if the current thread is {{null}}, but also if current thread 
is the entering thread. In order to make this properly work with releasing the 
main thread field, we have to count how often we have entered the main thread 
and only set the field to {{null}} iff the count equals 0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-6178) Allow upgrades to state serializers

2017-03-24 Thread Tzu-Li (Gordon) Tai (JIRA)
Tzu-Li (Gordon) Tai created FLINK-6178:
--

 Summary: Allow upgrades to state serializers
 Key: FLINK-6178
 URL: https://issues.apache.org/jira/browse/FLINK-6178
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing, Type Serialization System
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai


Currently, users are locked in with the serializer implementation used to write 
their state. This is suboptimal, as generally for users, it could easily be 
possible that they wish to change their serialization formats / state schemas 
and types in the future.

This is an umbrella JIRA for the required tasks to make this possible.

Here's an overview description of what to expect for the overall outcome of 
this JIRA (the specific details are outlined in their respective subtasks):

Ideally, the main user-facing change this would result in is that users 
implementing their custom {{TypeSerializer}} s will also need to implement hook 
methods that identify whether or not there is a change to the serialized format 
or even a change to the serialized data type. It would be the user's 
responsibility that the {{deserialize}} method can bridge the change between 
the old / new formats. We can also consider exposing this hook / identification 
only through {{StateDescriptor}} s that are configured with custom serializers.

For Flink's built-in serializers that are automatically built using the user's 
configuration (most notably the more complex {{KryoSerializer}} and 
{{GenericArraySerializer}}), Flink should be able to automatically 
"reconfigure" them using the new configuration, so that the reconfigured 
versions can be used to de- / serialize previous state. This would require 
knowledge of the previous configuration of the serializer, therefore 
"serializer configuration metadata" will be added to savepoints.

Note that for the first version of this, although additional infrastructure 
(e.g. serializer reconfigure hooks, serializer configuration metadata in 
savepoints) will be added to potentially allow Kryo version upgrade, this JIRA 
will not cover this. Kryo has breaking binary formats across major versions, 
and will most likely need some further changes. Therefore, for the 
{{KryoSerializer}}, "upgrading" it simply means changes in the registration of 
specific / default serializers, at least for now.

Finally, we would need to add a "convertState" phase to the task lifecycle, 
that takes place after the "open" phase and before checkpointing starts / the 
task starts running. It can only happen after "open", because only then can we 
be certain if any reconfiguration of state serialization has occurred, and 
state needs to be converted. Ideally, the code for the "convertState" is 
designed so that it can be easily exposed as an offline tool in the future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Flink as a Service (FaaS)

2017-03-24 Thread Chen Qin
Here is a working draft doc, feel free to comment out :)

https://docs.google.com/document/d/1MSsTOz7xUu50dAf_8v3gsQFfJFFy9LKnULdIl26yj0o/edit?usp=sharing

On Thu, Mar 23, 2017 at 5:00 PM, Chen Qin  wrote:

> Quick capture comments on FLINK-6085, we want to have rpc source that
> accept requests from clients and reroute response (callback to
> corresponding rpc source)
>
> ​
>
> On Tue, Mar 21, 2017 at 10:47 PM, Chen Qin  wrote:
>
>> Hi Radu/jinkui,
>>
>> Thanks for your input!
>>
>> I filed a master task to track discussion on this front
>> https://issues.apache.org/jira/browse/FLINK-6085
>>
>> Since this is a very broad topic, I would like to kick start with a tiny
>> deployment helper project.
>> What it try to address is to leverage various of service continuous
>> deployment pipelines in various of companies (amazon/facebook/uber) and
>> deploy/update jobmanager/taskmanagers as a high available micro service
>> (via zk and aws s3)
>>
>> I run this service in prod for a month (2 dc, 2 job managers per dc, 8-64
>> task managers per dc depending on workload) for testing usage.
>> Haven't seen problem so far.
>>
>> https://github.com/chenqin/flink-jar
>>
>> Thanks,
>> Chen
>>
>>
>>
>>
>> On Thu, Mar 16, 2017 at 2:05 AM, Radu Tudoran 
>> wrote:
>>
>>> Hi,
>>>
>>> I propose that we consider also the type of connectivity to be supported
>>> in the Flink API Gateway. I would propose to support a couple of calls
>>> option to ingest also events. I am thinking of:
>>> - callback mechanism
>>> - REST
>>> - RPC
>>>
>>>
>>>
>>>
>>> -Original Message-
>>> From: Chen Qin [mailto:qinnc...@gmail.com]
>>> Sent: Wednesday, March 15, 2017 7:31 PM
>>> To: dev@flink.apache.org
>>> Subject: Re: Flink as a Service (FaaS)
>>>
>>> Hi jinkui,
>>>
>>> I haven't go down to that deep yet. Sounds like you have better idea
>>> what needs to be in place.
>>> Can you try to come up with a doc and may be draw some diagram so we can
>>> discuss from there?
>>>
>>> My original intention is to discuss general function gap of running lots
>>> of micro services(like thousands of services as I observed). I feel flink
>>> low level has potential to fit in to highly critical services space and do
>>> good job fill those gaps.
>>>
>>>
>>> mobile apps
>>> ---
>>> front end request router
>>> --
>>> service A| service B  | service C
>>> database A |database B| database C
>>> ---
>>>  Flink as a service
>>> 
>>> serviceD | serviceE |service F
>>> database D | database E |database F
>>>
>>> Thanks,
>>> Chen
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Mar 14, 2017 at 12:01 AM, shijinkui 
>>> wrote:
>>>
>>> > Hi, Chen Qin
>>> >
>>> > We also met your end-to-end use case. A RPC Source and Sink such as
>>> > netty source sink can fit such requirements. I’ve submit a natty
>>> > module in bahir-flink project which only a demo.
>>> > If use connector source instead of Kafka, how do we make the data
>>> > persistent? One choice is distributedlog project developed by twitter.
>>> >
>>> > The idea of micro service is very good. Playframework is better choice
>>> > to provide micro-service of Flink instead of Flink Monitor which
>>> > implemented by netty.
>>> > Submit Flink job in the Mesos cluster, at the same time deploy the
>>> > micro-service by marathon to the same Mesos cluster, and enable
>>> > mesos-dns for service discovery.
>>> >
>>> > The the micro-service can be a API Gateway for:
>>> > 1. receiving data from device
>>> > 2. Sending the data to the Flink Job Source(Netty Source with
>>> > distributedlog)
>>> > 3. At same time, the sink send the streaming result data to the API
>>> > Gateway 4. API Gateway support streaming invoke: send the sink result
>>> > data to the device channel
>>> >
>>> > So this plan can guarantee the end-user invoke the service
>>> > synchronized,
>>> > and don’t care about Flink Job’s data processing.
>>> >
>>> > By the way, X as a Service actually is called by SAAS/PAAS in the
>>> > cloud platform, such as AWS/Azure. We can call it Flink micro
>>> > service.:)
>>> >
>>> > Best Regards
>>> > Jinkui Shi
>>> >
>>> > 在 2017/3/14 下午2:13, "Chen Qin"  写入:
>>> >
>>> > >Hi there,
>>> > >
>>> > >I am very happy about Flink 1.2 release. It was much more robust and
>>> > >feature rich compare to previous versions. In the following section,
>>> > >I would like to discuss a non typical use case in flink community.
>>> > >
>>> > >With ever increasing popularity of micro services[1] to scale out
>>> > >popular online services. Various aspect of source of truth is stored
>>> > >(a.k.a
>>> > >partitioned) behind various of service rpc endpoints. There is a
>>> > >general need of managing events traversal and enrichment throughout
>>> > >org SOA systems. (SOA) It is no 

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

2017-03-24 Thread Tzu-Li (Gordon) Tai
Update for 1.2.1:

The last fix was just merged!

Since nobody else seems interested in managing 1.2.1, I can also help with this 
one :)
I’ll create the release candidate over the weekend so we can start the testing 
/ voting next Monday.

- Gordon

On March 22, 2017 at 12:35:25 AM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) 
wrote:

Sorry, I missed one other pending issue for Flink 1.2.1:

- https://issues.apache.org/jira/browse/FLINK-5972
Disallow shrinking merging windows. This would replace 
https://issues.apache.org/jira/browse/FLINK-5713, which was previously listed 
as a blocker for 1.2.1.
Status: PR review pending - https://github.com/apache/flink/pull/3587

On March 22, 2017 at 12:23:03 AM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) 
wrote:

Update for Flink 1.2.1:

There’s only one PR pending that is LGTM -
https://issues.apache.org/jira/browse/FLINK-6084
Fix for Cassandra connector dropping metrics-core dependency.

We can proceed to create the release candidate very soon :-)
Release 1.1.5 RC1 seems to be in good shape so far, so hopefully we can start 
voting for 1.2.1 tomorrow.

Also, we’re still lacking a release manager for 1.2.1. Is anyone interested in 
volunteering for this release?
If nobody steps up for it before tomorrow, I can also do it.

Cheers,
Gordon

On March 18, 2017 at 12:52:48 AM, Robert Metzger (rmetz...@apache.org) wrote:

I don't think that his issue should be a reason to hold back a bugfix
release.
There are workarounds for the problem you are describing. Once we've fixed
it, we can include it into the next upcoming bugfix release.

On Fri, Mar 17, 2017 at 4:22 PM, Flavio Pompermaier 
wrote:

> I propose to fix https://issues.apache.org/jira/browse/FLINK-6103 before
> issue a release
>
> On Fri, Mar 17, 2017 at 8:12 AM, Ufuk Celebi  wrote:
>
> > Cool! Thanks for taking care of this Gordon :-)
> >
> > On Fri, Mar 17, 2017 at 7:13 AM, Tzu-Li (Gordon) Tai
> >  wrote:
> > > Update for 1.1.5:
> > > The last fixes for 1.1.5 are in! I will create the RC today and start
> > the vote.
> > >
> > > Cheers,
> > > Gordon
> > >
> > >
> > > On March 17, 2017 at 1:14:53 AM, Robert Metzger (rmetz...@apache.org)
> > wrote:
> > >
> > > The cassandra connector is probably not usable in Flink 1.2.0. I would
> > like
> > > to include a fix in 1.2.1:
> > > https://issues.apache.org/jira/browse/FLINK-6084
> > >
> > > Please let me know if this fix becomes a blocker for the 1.2.1 release.
> > If
> > > so, I can validate the fix myself to speed up things.
> > >
> > > On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi 
> > wrote:
> > >
> > >> @Tzu-li(Fordon)Tai
> > >>
> > >> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.
> > >>
> > >> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <
> > >> https://github.com/zentol/flink/tree/5650_python_test_debug>
> > >>
> > >>
> > >> > 在 2017年3月16日,上午3:37,Stephan Ewen  写道:
> > >> >
> > >> > Thanks for the update!
> > >> >
> > >> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove
> scheduled
> > >> > cancel-task from timer queue to prevent memory leaks
> > >> >
> > >> > The remaining issue list looks good, but I would say that (5) is
> > >> optional.
> > >> > It is not a critical production bug.
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <
> > >> tzuli...@apache.org>
> > >> > wrote:
> > >> >
> > >> >> Thanks a lot for the updates so far everyone!
> > >> >>
> > >> >> From the discussion so far, the below is the still unfixed pending
> > >> issues
> > >> >> for 1.1.5 / 1.2.1 release.
> > >> >>
> > >> >> Since there’s only one backport for 1.1.5 left, I think having an
> RC
> > for
> > >> >> 1.1.5 near the end of this week / early next week is very
> promising,
> > as
> > >> >> basically everything is already in.
> > >> >> I’d be happy to volunteer to help manage the release for 1.1.5, and
> > >> >> prepare the RC when it’s ready :)
> > >> >>
> > >> >> For 1.2.1, we can leave the pending list here for tracking, and
> come
> > >> back
> > >> >> to update it in the near future.
> > >> >>
> > >> >> If there’s anything I missed, please let me know!
> > >> >>
> > >> >>
> > >> >> === Still pending for Flink 1.1.5 ===
> > >> >>
> > >> >> (1) https://issues.apache.org/jira/browse/FLINK-5701
> > >> >> Broken at-least-once Kafka producer.
> > >> >> Status: backport PR pending - https://github.com/apache/
> > flink/pull/3549
> > >> .
> > >> >> Since it is a relatively self-contained change, I expect this to
> be a
> > >> fast
> > >> >> fix.
> > >> >>
> > >> >>
> > >> >>
> > >> >> === Still pending for Flink 1.2.1 ===
> > >> >>
> > >> >> (1) https://issues.apache.org/jira/browse/FLINK-5808
> > >> >> Fix Missing verification for setParallelism and setMaxParallelism
> > >> >> Status: PR - https://github.com/apache/flink/pull/3509, review in
> > >> progress
> > >> >>
> >