Re: Kafka source stuck while canceling

2015-10-21 Thread Stephan Ewen
The Kafka consumer API has issue with being interruptible that will
hopefully get better in Kafka 0.8.3.

There must be another issue here in addition, though. Does the stacktrace
go any deeper than that?
I would assume that the main invokable thread is stuck in some blocking
method, or in a loop that does not terminate. It might also be stuck on a
lock, in which case it would be waiting for the lock holder to terminate.

Do you have the traces from other threads as well, so we could look which
one actually is stuck while holding the lock?

Greetings,
Stephan


On Mon, Oct 19, 2015 at 12:06 PM, Gyula Fóra  wrote:

> Hey guys,
>
> Has anyone ever got something similar working with the kafka sources?
>
> 11:52:48,838 WARN  org.apache.flink.runtime.taskmanager.Task
>   - Task 'Source: Kafka[***] (3/4)' did not react to cancelling signal,
> but is stuck in method:
>
>  
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:208)
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
> java.lang.Thread.run(Thread.java:745)
>
> The failure was caused by a different operator in the pipeline, but the job
> could never be fully cancelled and restarted due to this error.
>
> Any idea is appreciated :)
>
> Cheers,
> Gyula
>


Re: [VOTE] Release Apache Flink 0.10.0 (release-0.10.0-rc0)

2015-10-21 Thread Vasiliki Kalavri
Awesome! Thanks Max :))

I have a couple of questions:
- what about the blocker issue (according to the wiki) FLINK-2747?
- weren't we going to get rid of staging altogether?

Cheers,
-V.

On 21 October 2015 at 19:54, Stephan Ewen  wrote:

> Super, thanks Max!
>
> We should also bump the master to the next version then, to separate what
> goes into release fixes and what goes into the next version...
>
> Is that going to be 1.0-SNAPSHOT? ;-) That is a separate thread, I guess...
>
> On Wed, Oct 21, 2015 at 7:12 PM, Maximilian Michels 
> wrote:
>
> > Release candidates have to be tested thoroughly. Therefore, I would like
> > everybody to take a look at the release page in the wiki:
> > https://cwiki.apache.org/confluence/display/FLINK/0.10+Release
> >
> > I've compiled the checks into a document. I would like everyone to assign
> > one of the checks in the documents to test the release candidate:
> >
> >
> https://docs.google.com/document/d/1TWCFj55xTyJjGYe8x9YEqmICgSvcexDPlbgP4CnLpLY/edit?usp=sharing
> >
> > On Wed, Oct 21, 2015 at 7:10 PM, Maximilian Michels 
> > wrote:
> >
> > > Dear community,
> > >
> > > The past months we have been working very hard to push towards 0.10. I
> > > would like to propose the first release candidate.
> > >
> > > ===
> > > Please vote on releasing the following candidate as Apache Flink
> version
> > > 0.10.0:
> > >
> > > The commit to be voted on:
> > > b697064b71b97e51703caae13660038949d41631
> > >
> > > Branch:
> > > release-0.10.0-rc0 (see
> > > https://git1-us-west.apache.org/repos/asf/flink/?p=flink.git)
> > >
> > > The release artifacts to be voted on can be found at:
> > > http://people.apache.org/~mxm/flink-0.10.0-rc0/
> > >
> > > Release artifacts are signed with the key with fingerprint C2909CBF:
> > > http://www.apache.org/dist/flink/KEYS
> > >
> > > The staging repository for this release can be found at:
> > > https://repository.apache.org/content/repositories/orgapacheflink-1047
> > > -
> > >
> > > Please vote on releasing this package as Apache Flink 0.10.0.
> > >
> > > The vote is open for the next 72 hours and passes if a majority of at
> > > least three +1 PMC votes are cast.
> > >
> > > The vote ends on Monday October 26, 2015.
> > >
> > > [ ] +1 Release this package as Apache Flink 0.10.0
> > > [ ] -1 Do not release this package because ...
> > >
> > > ===
> > >
> >
>


Design document for FLINK-2254

2015-10-21 Thread Saumitra Shahapure
In FLINK-2254  , we want to 
extend Graph API of Gelly by adding support for bipartite graphs too. In the 
long term, the support for newer graph types can be added in the same way. Also 
specialised algorithms for special types of graphs should be implemented.

For bipartite graph, a new class BipartiteGraph can be implemented which 
extends Graph. Other graph algorithms which are written for Graph can be used 
for BipartiteGraph too, because of inheritance. 
Initialisation functions like  public static  Graph 
fromCollection(Collection> vertices,Collection> 
edges, ExecutionEnvironment context) need to be duplicated as public static  BipartiteGraph fromCollection(Collection> 
topVertices,Collection> bottomVertices, Collection> 
edges, ExecutionEnvironment context)
Graph validation does not need to happen implicitly. user can call 
BipartiteGraph.validate() in case he wants to check sanity of data.
Vertex modes is primary requirement. BipartiteGraph should have functions, 
getTopVertices() and getBottomVertices(). There are three ways to implement it:
Simplest and least efficient way is to maintain vertices variable in Graph in 
addition to two more Datasets topVertices, bottomVertices. Benefit of this 
approach is that Graph class would not need any modification at all. 
this.vertices variable access inside Graph class would work correctly. 
Disadvantage is that multiple copies of vertex dataset are created and 
maintained. So this is inefficient in terms of memory.
Another way is to keep topVertices and bottomVertices variables in 
BipartiteGraph. vertices variable in Graph would stay empty. getVertices() 
function of Graph would be overloaded by BipartiteGraph and reimplemented as 
union of of topVertices and bottomVertices. Since, vertices is a private 
variable, external view of vertices stays unaffected. All functions of Graph 
class which use vertices local variable (e.g. getVerticesAsTuple2()) need to 
use getVertices() instead of this.vertices. Disadvantage of this method is 
Graph.vertices variable would have invalid value throughout for BipartiteGraph.
Another way is to ‘label’ vertices with an integer. So public class 
BipartiteGraph extends Graph, EV> would be the 
signature. Initialisers would tag the vertices according to their mode. 
getVertices() should be overridden to strip-off the tag values so that users 
would get consistent view of vertex dataset for Graph and BipartiteGraph. 
getTopVertices/getBottomVertices would be filter functions on vertex tags.
I personally feel method 2 to be most elegant.
Functions like addVertices(vertexList) are not relevant without specifying 
whether they are top or bottom. A possibility could be to add them to 
topVertices by default. And have overloaded function addVertices(vertexList, 
mode) to add them to specific partition.
Specialised algorithms for Bipartite graphs can implement new 
BipartiteGraphAlgorithm interface. It would be similar to GraphAlgorithm.
In future, newer types of graphs can be similarly derived from Graph and type 
hierarchy of Graph can be created. Class hierarchy would be better here rather 
than interface hierarchy, because the functions which are applicable to all 
derived classes can be implemented in the base class.
As a part of future refactoring, Graph transformation functions should rather 
be a new class implementing GraphAlgorithm rather than function in Graph class. 
This may make implementing complex graph types easier.
PS. Do I need to add it in wiki too? I can copy the contents to wiki if you say 
so,
-- Saumitra Shahapure



[VOTE] Release Apache Flink 0.10.0 (release-0.10.0-rc0)

2015-10-21 Thread Maximilian Michels
Dear community,

The past months we have been working very hard to push towards 0.10. I
would like to propose the first release candidate.

===
Please vote on releasing the following candidate as Apache Flink version
0.10.0:

The commit to be voted on:
b697064b71b97e51703caae13660038949d41631

Branch:
release-0.10.0-rc0 (see
https://git1-us-west.apache.org/repos/asf/flink/?p=flink.git)

The release artifacts to be voted on can be found at:
http://people.apache.org/~mxm/flink-0.10.0-rc0/

Release artifacts are signed with the key with fingerprint C2909CBF:
http://www.apache.org/dist/flink/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapacheflink-1047
-

Please vote on releasing this package as Apache Flink 0.10.0.

The vote is open for the next 72 hours and passes if a majority of at least
three +1 PMC votes are cast.

The vote ends on Monday October 26, 2015.

[ ] +1 Release this package as Apache Flink 0.10.0
[ ] -1 Do not release this package because ...

===


Re: [VOTE] Release Apache Flink 0.10.0 (release-0.10.0-rc0)

2015-10-21 Thread Maximilian Michels
Release candidates have to be tested thoroughly. Therefore, I would like
everybody to take a look at the release page in the wiki:
https://cwiki.apache.org/confluence/display/FLINK/0.10+Release

I've compiled the checks into a document. I would like everyone to assign
one of the checks in the documents to test the release candidate:
https://docs.google.com/document/d/1TWCFj55xTyJjGYe8x9YEqmICgSvcexDPlbgP4CnLpLY/edit?usp=sharing

On Wed, Oct 21, 2015 at 7:10 PM, Maximilian Michels  wrote:

> Dear community,
>
> The past months we have been working very hard to push towards 0.10. I
> would like to propose the first release candidate.
>
> ===
> Please vote on releasing the following candidate as Apache Flink version
> 0.10.0:
>
> The commit to be voted on:
> b697064b71b97e51703caae13660038949d41631
>
> Branch:
> release-0.10.0-rc0 (see
> https://git1-us-west.apache.org/repos/asf/flink/?p=flink.git)
>
> The release artifacts to be voted on can be found at:
> http://people.apache.org/~mxm/flink-0.10.0-rc0/
>
> Release artifacts are signed with the key with fingerprint C2909CBF:
> http://www.apache.org/dist/flink/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapacheflink-1047
> -
>
> Please vote on releasing this package as Apache Flink 0.10.0.
>
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
>
> The vote ends on Monday October 26, 2015.
>
> [ ] +1 Release this package as Apache Flink 0.10.0
> [ ] -1 Do not release this package because ...
>
> ===
>


Re: Powered by Flink

2015-10-21 Thread Anwar Rizal
Nice indeed :-)


On Mon, Oct 19, 2015 at 3:08 PM, Suneel Marthi 
wrote:

> +1 to this.
>
> On Mon, Oct 19, 2015 at 3:00 PM, Fabian Hueske  wrote:
>
>> Sounds good +1
>>
>> 2015-10-19 14:57 GMT+02:00 Márton Balassi :
>>
>> > Thanks for starting and big +1 for making it more prominent.
>> >
>> > On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske 
>> wrote:
>> >
>> >> Thanks for starting this Kostas.
>> >>
>> >> I think the list is quite hidden in the wiki. Should we link from
>> >> flink.apache.org to that page?
>> >>
>> >> Cheers, Fabian
>> >>
>> >> 2015-10-19 14:50 GMT+02:00 Kostas Tzoumas :
>> >>
>> >>> Hi everyone,
>> >>>
>> >>> I started a "Powered by Flink" wiki page, listing some of the
>> >>> organizations that are using Flink:
>> >>>
>> >>> https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink
>> >>>
>> >>> If you would like to be added to the list, just send me a short email
>> >>> with your organization's name and a description and I will add you to
>> the
>> >>> wiki page.
>> >>>
>> >>> Best,
>> >>> Kostas
>> >>>
>> >>
>> >>
>> >
>>
>
>


Re: [VOTE] Release Apache Flink 0.10.0 (release-0.10.0-rc0)

2015-10-21 Thread Gyula Fóra
Thanks Max for the effort, this is going to be huge :)

Unfortunately I have to say -1

FLINK-2888 and FLINK-2824 are blockers from my point of view.

Cheers,
Gyula

Vasiliki Kalavri  ezt írta (időpont: 2015. okt.
21., Sze, 20:07):

> Awesome! Thanks Max :))
>
> I have a couple of questions:
> - what about the blocker issue (according to the wiki) FLINK-2747?
> - weren't we going to get rid of staging altogether?
>
> Cheers,
> -V.
>
> On 21 October 2015 at 19:54, Stephan Ewen  wrote:
>
> > Super, thanks Max!
> >
> > We should also bump the master to the next version then, to separate what
> > goes into release fixes and what goes into the next version...
> >
> > Is that going to be 1.0-SNAPSHOT? ;-) That is a separate thread, I
> guess...
> >
> > On Wed, Oct 21, 2015 at 7:12 PM, Maximilian Michels 
> > wrote:
> >
> > > Release candidates have to be tested thoroughly. Therefore, I would
> like
> > > everybody to take a look at the release page in the wiki:
> > > https://cwiki.apache.org/confluence/display/FLINK/0.10+Release
> > >
> > > I've compiled the checks into a document. I would like everyone to
> assign
> > > one of the checks in the documents to test the release candidate:
> > >
> > >
> >
> https://docs.google.com/document/d/1TWCFj55xTyJjGYe8x9YEqmICgSvcexDPlbgP4CnLpLY/edit?usp=sharing
> > >
> > > On Wed, Oct 21, 2015 at 7:10 PM, Maximilian Michels 
> > > wrote:
> > >
> > > > Dear community,
> > > >
> > > > The past months we have been working very hard to push towards 0.10.
> I
> > > > would like to propose the first release candidate.
> > > >
> > > > ===
> > > > Please vote on releasing the following candidate as Apache Flink
> > version
> > > > 0.10.0:
> > > >
> > > > The commit to be voted on:
> > > > b697064b71b97e51703caae13660038949d41631
> > > >
> > > > Branch:
> > > > release-0.10.0-rc0 (see
> > > > https://git1-us-west.apache.org/repos/asf/flink/?p=flink.git)
> > > >
> > > > The release artifacts to be voted on can be found at:
> > > > http://people.apache.org/~mxm/flink-0.10.0-rc0/
> > > >
> > > > Release artifacts are signed with the key with fingerprint C2909CBF:
> > > > http://www.apache.org/dist/flink/KEYS
> > > >
> > > > The staging repository for this release can be found at:
> > > >
> https://repository.apache.org/content/repositories/orgapacheflink-1047
> > > > -
> > > >
> > > > Please vote on releasing this package as Apache Flink 0.10.0.
> > > >
> > > > The vote is open for the next 72 hours and passes if a majority of at
> > > > least three +1 PMC votes are cast.
> > > >
> > > > The vote ends on Monday October 26, 2015.
> > > >
> > > > [ ] +1 Release this package as Apache Flink 0.10.0
> > > > [ ] -1 Do not release this package because ...
> > > >
> > > > ===
> > > >
> > >
> >
>


Re: [VOTE] Release Apache Flink 0.10.0 (release-0.10.0-rc0)

2015-10-21 Thread Flavio Pompermaier
I would also point out that Flink-2763 and Flink-2800 could be worth of
further investigations before this release

Best,
Flavio
On 21 Oct 2015 23:33, "Gyula Fóra"  wrote:

> Thanks Max for the effort, this is going to be huge :)
>
> Unfortunately I have to say -1
>
> FLINK-2888 and FLINK-2824 are blockers from my point of view.
>
> Cheers,
> Gyula
>
> Vasiliki Kalavri  ezt írta (időpont: 2015. okt.
> 21., Sze, 20:07):
>
> > Awesome! Thanks Max :))
> >
> > I have a couple of questions:
> > - what about the blocker issue (according to the wiki) FLINK-2747?
> > - weren't we going to get rid of staging altogether?
> >
> > Cheers,
> > -V.
> >
> > On 21 October 2015 at 19:54, Stephan Ewen  wrote:
> >
> > > Super, thanks Max!
> > >
> > > We should also bump the master to the next version then, to separate
> what
> > > goes into release fixes and what goes into the next version...
> > >
> > > Is that going to be 1.0-SNAPSHOT? ;-) That is a separate thread, I
> > guess...
> > >
> > > On Wed, Oct 21, 2015 at 7:12 PM, Maximilian Michels 
> > > wrote:
> > >
> > > > Release candidates have to be tested thoroughly. Therefore, I would
> > like
> > > > everybody to take a look at the release page in the wiki:
> > > > https://cwiki.apache.org/confluence/display/FLINK/0.10+Release
> > > >
> > > > I've compiled the checks into a document. I would like everyone to
> > assign
> > > > one of the checks in the documents to test the release candidate:
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1TWCFj55xTyJjGYe8x9YEqmICgSvcexDPlbgP4CnLpLY/edit?usp=sharing
> > > >
> > > > On Wed, Oct 21, 2015 at 7:10 PM, Maximilian Michels 
> > > > wrote:
> > > >
> > > > > Dear community,
> > > > >
> > > > > The past months we have been working very hard to push towards
> 0.10.
> > I
> > > > > would like to propose the first release candidate.
> > > > >
> > > > > ===
> > > > > Please vote on releasing the following candidate as Apache Flink
> > > version
> > > > > 0.10.0:
> > > > >
> > > > > The commit to be voted on:
> > > > > b697064b71b97e51703caae13660038949d41631
> > > > >
> > > > > Branch:
> > > > > release-0.10.0-rc0 (see
> > > > > https://git1-us-west.apache.org/repos/asf/flink/?p=flink.git)
> > > > >
> > > > > The release artifacts to be voted on can be found at:
> > > > > http://people.apache.org/~mxm/flink-0.10.0-rc0/
> > > > >
> > > > > Release artifacts are signed with the key with fingerprint
> C2909CBF:
> > > > > http://www.apache.org/dist/flink/KEYS
> > > > >
> > > > > The staging repository for this release can be found at:
> > > > >
> > https://repository.apache.org/content/repositories/orgapacheflink-1047
> > > > > -
> > > > >
> > > > > Please vote on releasing this package as Apache Flink 0.10.0.
> > > > >
> > > > > The vote is open for the next 72 hours and passes if a majority of
> at
> > > > > least three +1 PMC votes are cast.
> > > > >
> > > > > The vote ends on Monday October 26, 2015.
> > > > >
> > > > > [ ] +1 Release this package as Apache Flink 0.10.0
> > > > > [ ] -1 Do not release this package because ...
> > > > >
> > > > > ===
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (FLINK-2888) Default state not copied for AbstractHeapKvState

2015-10-21 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-2888:
-

 Summary: Default state not copied for AbstractHeapKvState
 Key: FLINK-2888
 URL: https://issues.apache.org/jira/browse/FLINK-2888
 Project: Flink
  Issue Type: Bug
Reporter: Gyula Fora
Priority: Blocker


The Default state value needs to be copied before returned for mutable states 
otherwise the map will be populated with the same object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Release Apache Flink 0.10.0 (release-0.10.0-rc0)

2015-10-21 Thread Ufuk Celebi
@testers: I think it's OK to forward test results not touching these parts
to the next RC.

On Wed, Oct 21, 2015 at 11:33 PM, Gyula Fóra  wrote:
>
> FLINK-2888 and FLINK-2824 are blockers from my point of view.
>

Regarding FLINK-2824: from the discussion on the ML [1] I understood that
the plan was to keep it as it is and to not block the release on it. Paris
volunteered to fix the SAMOA connector after the release. Or is this a
different issue?

[1]
http://mail-archives.apache.org/mod_mbox/flink-dev/201510.mbox/%3c665d8287-8461-4b6a-b11f-4a6617c11...@kth.se%3e


Re: [VOTE] Release Apache Flink 0.10.0 (release-0.10.0-rc0)

2015-10-21 Thread Stephan Ewen
>From my side 2888 is a valid blocker. Aljoscha also found another blocker
bug, so this RC will need a few patches.

I think for 2824 there was no consensus to what would actually be the
desired behavior, which makes it a bad candidate for a release blocker.

I would try and fix FLINK-2763 and FLINK-2800 if possible, but not block
the release on that. They seem to be very corner case. Good to fix them,
but not blockers. Too many people are on the 0.10 SNAPSHOT right now and
too many urgent fixes are in that people wait to be available in a release.

How about we start testing anyways, because I would expect us to find more
issues, and we save time if we do not create a new release candidate for
each patch.

On Wed, Oct 21, 2015 at 11:44 PM, Flavio Pompermaier 
wrote:

> I would also point out that Flink-2763 and Flink-2800 could be worth of
> further investigations before this release
>
> Best,
> Flavio
> On 21 Oct 2015 23:33, "Gyula Fóra"  wrote:
>
> > Thanks Max for the effort, this is going to be huge :)
> >
> > Unfortunately I have to say -1
> >
> > FLINK-2888 and FLINK-2824 are blockers from my point of view.
> >
> > Cheers,
> > Gyula
> >
> > Vasiliki Kalavri  ezt írta (időpont: 2015.
> okt.
> > 21., Sze, 20:07):
> >
> > > Awesome! Thanks Max :))
> > >
> > > I have a couple of questions:
> > > - what about the blocker issue (according to the wiki) FLINK-2747?
> > > - weren't we going to get rid of staging altogether?
> > >
> > > Cheers,
> > > -V.
> > >
> > > On 21 October 2015 at 19:54, Stephan Ewen  wrote:
> > >
> > > > Super, thanks Max!
> > > >
> > > > We should also bump the master to the next version then, to separate
> > what
> > > > goes into release fixes and what goes into the next version...
> > > >
> > > > Is that going to be 1.0-SNAPSHOT? ;-) That is a separate thread, I
> > > guess...
> > > >
> > > > On Wed, Oct 21, 2015 at 7:12 PM, Maximilian Michels 
> > > > wrote:
> > > >
> > > > > Release candidates have to be tested thoroughly. Therefore, I would
> > > like
> > > > > everybody to take a look at the release page in the wiki:
> > > > > https://cwiki.apache.org/confluence/display/FLINK/0.10+Release
> > > > >
> > > > > I've compiled the checks into a document. I would like everyone to
> > > assign
> > > > > one of the checks in the documents to test the release candidate:
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1TWCFj55xTyJjGYe8x9YEqmICgSvcexDPlbgP4CnLpLY/edit?usp=sharing
> > > > >
> > > > > On Wed, Oct 21, 2015 at 7:10 PM, Maximilian Michels <
> m...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Dear community,
> > > > > >
> > > > > > The past months we have been working very hard to push towards
> > 0.10.
> > > I
> > > > > > would like to propose the first release candidate.
> > > > > >
> > > > > > ===
> > > > > > Please vote on releasing the following candidate as Apache Flink
> > > > version
> > > > > > 0.10.0:
> > > > > >
> > > > > > The commit to be voted on:
> > > > > > b697064b71b97e51703caae13660038949d41631
> > > > > >
> > > > > > Branch:
> > > > > > release-0.10.0-rc0 (see
> > > > > > https://git1-us-west.apache.org/repos/asf/flink/?p=flink.git)
> > > > > >
> > > > > > The release artifacts to be voted on can be found at:
> > > > > > http://people.apache.org/~mxm/flink-0.10.0-rc0/
> > > > > >
> > > > > > Release artifacts are signed with the key with fingerprint
> > C2909CBF:
> > > > > > http://www.apache.org/dist/flink/KEYS
> > > > > >
> > > > > > The staging repository for this release can be found at:
> > > > > >
> > > https://repository.apache.org/content/repositories/orgapacheflink-1047
> > > > > > -
> > > > > >
> > > > > > Please vote on releasing this package as Apache Flink 0.10.0.
> > > > > >
> > > > > > The vote is open for the next 72 hours and passes if a majority
> of
> > at
> > > > > > least three +1 PMC votes are cast.
> > > > > >
> > > > > > The vote ends on Monday October 26, 2015.
> > > > > >
> > > > > > [ ] +1 Release this package as Apache Flink 0.10.0
> > > > > > [ ] -1 Do not release this package because ...
> > > > > >
> > > > > > ===
> > > > > >
> > > > >
> > > >
> > >
> >
>


RE: [VOTE] Release Apache Flink 0.10.0 (release-0.10.0-rc0)

2015-10-21 Thread fhueske
+1 to that, Stephan.

I can help with FLINK-2763 or FLINK-2800. 


From: Stephan Ewen
Sent: Thursday, October 22, 2015 0:02
To: dev@flink.apache.org
Subject: Re: [VOTE] Release Apache Flink 0.10.0 (release-0.10.0-rc0)


>From my side 2888 is a valid blocker. Aljoscha also found another blocker
bug, so this RC will need a few patches.

I think for 2824 there was no consensus to what would actually be the
desired behavior, which makes it a bad candidate for a release blocker.

I would try and fix FLINK-2763 and FLINK-2800 if possible, but not block
the release on that. They seem to be very corner case. Good to fix them,
but not blockers. Too many people are on the 0.10 SNAPSHOT right now and
too many urgent fixes are in that people wait to be available in a release.

How about we start testing anyways, because I would expect us to find more
issues, and we save time if we do not create a new release candidate for
each patch.

On Wed, Oct 21, 2015 at 11:44 PM, Flavio Pompermaier 
wrote:

> I would also point out that Flink-2763 and Flink-2800 could be worth of
> further investigations before this release
>
> Best,
> Flavio
> On 21 Oct 2015 23:33, "Gyula Fóra"  wrote:
>
> > Thanks Max for the effort, this is going to be huge :)
> >
> > Unfortunately I have to say -1
> >
> > FLINK-2888 and FLINK-2824 are blockers from my point of view.
> >
> > Cheers,
> > Gyula
> >
> > Vasiliki Kalavri  ezt írta (időpont: 2015.
> okt.
> > 21., Sze, 20:07):
> >
> > > Awesome! Thanks Max :))
> > >
> > > I have a couple of questions:
> > > - what about the blocker issue (according to the wiki) FLINK-2747?
> > > - weren't we going to get rid of staging altogether?
> > >
> > > Cheers,
> > > -V.
> > >
> > > On 21 October 2015 at 19:54, Stephan Ewen  wrote:
> > >
> > > > Super, thanks Max!
> > > >
> > > > We should also bump the master to the next version then, to separate
> > what
> > > > goes into release fixes and what goes into the next version...
> > > >
> > > > Is that going to be 1.0-SNAPSHOT? ;-) That is a separate thread, I
> > > guess...
> > > >
> > > > On Wed, Oct 21, 2015 at 7:12 PM, Maximilian Michels 
> > > > wrote:
> > > >
> > > > > Release candidates have to be tested thoroughly. Therefore, I would
> > > like
> > > > > everybody to take a look at the release page in the wiki:
> > > > > https://cwiki.apache.org/confluence/display/FLINK/0.10+Release
> > > > >
> > > > > I've compiled the checks into a document. I would like everyone to
> > > assign
> > > > > one of the checks in the documents to test the release candidate:
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1TWCFj55xTyJjGYe8x9YEqmICgSvcexDPlbgP4CnLpLY/edit?usp=sharing
> > > > >
> > > > > On Wed, Oct 21, 2015 at 7:10 PM, Maximilian Michels <
> m...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Dear community,
> > > > > >
> > > > > > The past months we have been working very hard to push towards
> > 0.10.
> > > I
> > > > > > would like to propose the first release candidate.
> > > > > >
> > > > > > ===
> > > > > > Please vote on releasing the following candidate as Apache Flink
> > > > version
> > > > > > 0.10.0:
> > > > > >
> > > > > > The commit to be voted on:
> > > > > > b697064b71b97e51703caae13660038949d41631
> > > > > >
> > > > > > Branch:
> > > > > > release-0.10.0-rc0 (see
> > > > > > https://git1-us-west.apache.org/repos/asf/flink/?p=flink.git)
> > > > > >
> > > > > > The release artifacts to be voted on can be found at:
> > > > > > http://people.apache.org/~mxm/flink-0.10.0-rc0/
> > > > > >
> > > > > > Release artifacts are signed with the key with fingerprint
> > C2909CBF:
> > > > > > http://www.apache.org/dist/flink/KEYS
> > > > > >
> > > > > > The staging repository for this release can be found at:
> > > > > >
> > > https://repository.apache.org/content/repositories/orgapacheflink-1047
> > > > > > -
> > > > > >
> > > > > > Please vote on releasing this package as Apache Flink 0.10.0.
> > > > > >
> > > > > > The vote is open for the next 72 hours and passes if a majority
> of
> > at
> > > > > > least three +1 PMC votes are cast.
> > > > > >
> > > > > > The vote ends on Monday October 26, 2015.
> > > > > >
> > > > > > [ ] +1 Release this package as Apache Flink 0.10.0
> > > > > > [ ] -1 Do not release this package because ...
> > > > > >
> > > > > > ===
> > > > > >
> > > > >
> > > >
> > >
> >
>




Re: [VOTE] Release Apache Flink 0.10.0 (release-0.10.0-rc0)

2015-10-21 Thread Stephan Ewen
Super, thanks Max!

We should also bump the master to the next version then, to separate what
goes into release fixes and what goes into the next version...

Is that going to be 1.0-SNAPSHOT? ;-) That is a separate thread, I guess...

On Wed, Oct 21, 2015 at 7:12 PM, Maximilian Michels  wrote:

> Release candidates have to be tested thoroughly. Therefore, I would like
> everybody to take a look at the release page in the wiki:
> https://cwiki.apache.org/confluence/display/FLINK/0.10+Release
>
> I've compiled the checks into a document. I would like everyone to assign
> one of the checks in the documents to test the release candidate:
>
> https://docs.google.com/document/d/1TWCFj55xTyJjGYe8x9YEqmICgSvcexDPlbgP4CnLpLY/edit?usp=sharing
>
> On Wed, Oct 21, 2015 at 7:10 PM, Maximilian Michels 
> wrote:
>
> > Dear community,
> >
> > The past months we have been working very hard to push towards 0.10. I
> > would like to propose the first release candidate.
> >
> > ===
> > Please vote on releasing the following candidate as Apache Flink version
> > 0.10.0:
> >
> > The commit to be voted on:
> > b697064b71b97e51703caae13660038949d41631
> >
> > Branch:
> > release-0.10.0-rc0 (see
> > https://git1-us-west.apache.org/repos/asf/flink/?p=flink.git)
> >
> > The release artifacts to be voted on can be found at:
> > http://people.apache.org/~mxm/flink-0.10.0-rc0/
> >
> > Release artifacts are signed with the key with fingerprint C2909CBF:
> > http://www.apache.org/dist/flink/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapacheflink-1047
> > -
> >
> > Please vote on releasing this package as Apache Flink 0.10.0.
> >
> > The vote is open for the next 72 hours and passes if a majority of at
> > least three +1 PMC votes are cast.
> >
> > The vote ends on Monday October 26, 2015.
> >
> > [ ] +1 Release this package as Apache Flink 0.10.0
> > [ ] -1 Do not release this package because ...
> >
> > ===
> >
>


[jira] [Created] (FLINK-2887) sendMessageToAllNeighbors ignores the EdgeDirection

2015-10-21 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-2887:


 Summary: sendMessageToAllNeighbors ignores the EdgeDirection
 Key: FLINK-2887
 URL: https://issues.apache.org/jira/browse/FLINK-2887
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Affects Versions: 0.9.0, 0.10
Reporter: Vasia Kalavri
Assignee: Vasia Kalavri


In vertex-centric iterations, while {{getEdges()}} correctly gathers all edges 
when {{EdgeDirection}} is set to ALL, {{sendMessageToAllNeighbors}} only sends 
messages to out-neighbors, no matter what edge direction has been configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2889) Apply JMH on LongSerializationSpeedBenchmark class

2015-10-21 Thread GaoLun (JIRA)
GaoLun created FLINK-2889:
-

 Summary: Apply JMH on LongSerializationSpeedBenchmark class
 Key: FLINK-2889
 URL: https://issues.apache.org/jira/browse/FLINK-2889
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Reporter: GaoLun
Assignee: GaoLun
Priority: Minor


JMH is a Java harness for building, running, and analysing 
nano/micro/milli/macro benchmarks.Use JMH to replace the old micro benchmarks 
method in order to get much more accurate results.Modify the 
LongSerializationSpeedBenchmark class and move it to flink-benchmark module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Using Flink Streaming to write to multiple output files in HDFS

2015-10-21 Thread Andra Lungu
Hey guys,

Long time, no see :). I recently started a new job and it involves
performing a set of real-time data analytics using Apache Kafka, Storm
and Flume.

What happens, on a very high level, is that set of signals is
collected, stored into a Kafka topic and then Storm is used to filter
certain fields out or to enrich the fields with other
meta-information. Finally, Flume writes the output into mutiple HDFS
files depending on the date, hour etc.

Now, I saw that Flink can play with a similar pipeline, but without
needing Flume for the writing to HDFS part (see
http://data-artisans.com/kafka-flink-a-practical-how-to/). Which
brings me to my question: jow does Flink handle writing to multiple
files in a streaming fashion? -until now, I was playing with batch and
writeAsCsv just took one file as a parameter-

Next question: What are the prerequisites to deploy a Flink Streaming
job on a cluster? Yarn, HDFS, anything else?

Final question, more of a request: I'd like to play around with Flink
Streaming to state whether it can substitute Storm in this use case
and whether it can outrun it :P. To this end, I'll need some starting
points: docs, blog posts, examples to read. Any input would be useful.

I wanted to dig for a newbie task in the streaming area, but I could
not find one... can we think of something easy to get me started?

Thanks! Hope you guys had fun at Flink Forward!
Andra


Re: [DISCUSS] Java code style

2015-10-21 Thread Ufuk Celebi
To summarize up to this point:

- All are in favour of Google check style (with the following possible
exceptions)
- Proposed exceptions so far:
  * Specific line length 100 vs. 120 characters
  * Keep tabs instead converting to spaces (this would translate to
skipping/coming up with some indentation rules as well)

If we keep tabs, we will have to specify the line length relative to a tab
size (like 4).

Let’s keep the discussion going a little longer. I think it has proceeded
in a very reasonable manner so far. Thanks for this!

– Ufuk

On Wed, Oct 21, 2015 at 10:29 AM, Fabian Hueske  wrote:

> Thanks Max for checking the modifications by the Google code style.
> It is very good to know, that the impact on the code base would not be too
> massive. If the Google code style would have touched almost every line, I
> would have been in favor of converting to spaces. However, your assessment
> is a strong argument to continue with tabs, IMO.
>
> Regarding the line length limit, I personally find 100 chars too narrow but
> would be +1 for having a limit.
>
> +1 for discussing the Scala style in a separate thread.
>
> Fabian
>
> 2015-10-20 18:12 GMT+02:00 Maximilian Michels :
>
> > I'm a little less excited about this. You might not be aware but, for
> > a large portion of the source code, we already follow the Google style
> > guide. The main changes will be tabs->spaces and 80/100 characters
> > line limit.
> >
> > Out of curiosity, I ran the official Google Style Checkstyle
> > configuration to confirm my suspicion:
> >
> >
> https://github.com/checkstyle/checkstyle/blob/master/src/main/resources/google_checks.xml
> > The changes are very little if we turn off line length limit and
> > tabs-to-spaces conversion.
> >
> > There are some things I really like about the Google style, e.g. every
> > class has to have a JavaDoc and spaces after keywords (can't stand if
> > there aren't any). I'm not sure if we should change tabs to spaces,
> > because it means touching almost every single line of code. However,
> > if we keep the tabs, we cannot make use of the different indention for
> > case statements or wrapped lines...maybe that's a compromise we can
> > live with.
> >
> > If we introduce the Google Style for Java, will we also impose a
> > stricter style check for Scala? IMHO the line length is the strictest
> > part of the Scala Checkstyle.
> >
> >
> > On Tue, Oct 20, 2015 at 4:14 PM, Henry Saputra 
> > wrote:
> > > 1) yes. Been dancing this issue for a while. Let's pull the trigger.
> Did
> > > the exercise with Tachyon while back and did help readability and
> > > homogeneity of code.
> > >
> > > 2) +1 for Google Java style with documented exceptions and explanation
> on
> > > why.
> > >
> > > On Tuesday, October 20, 2015, Ufuk Celebi  wrote:
> > >
> > >> DISCLAIMER: This is not my personal idea, but a community discussion
> > from
> > >> some time ago. Don't kill the messenger.
> > >>
> > >> In March we were discussing issues with heterogeneity of the code [1].
> > The
> > >> summary is that we had a consensus to enforce a stricter code style on
> > our
> > >> Java code base in order to make it easier to switch between projects
> > and to
> > >> have clear rules for new contributions. The main proposal in the last
> > >> discussion was to go with Google's Java code style. Not all were fully
> > >> satisfied with this, but still everyone agreed on some kind of style.
> > >>
> > >> I think the upcoming 0.10 release is a good point to finally go
> through
> > >> with these changes (right after the release/branch-off).
> > >>
> > >> I propose to go with Google's Java code style [2] as proposed earlier.
> > >>
> > >> PROs:
> > >> - Clear style guide available
> > >> - Tooling like checkstyle rules, IDE plugins already available
> > >>
> > >> CONs:
> > >> - Fully breaks our current style
> > >>
> > >> The main problem with this will be open pull requests, which will be
> > harder
> > >> to merge after all the changes. On the other hand, should pull
> requests
> > >> that have been open for a long time block this? Most of the important
> > >> changes will be merged for the release anyways. I think in the long
> run
> > we
> > >> will gain more than we loose by this (more homogenous code, clear
> > rules).
> > >> And it is questionable whether we will ever be able to do such a
> change
> > in
> > >> the future if we cannot do it now. The project will most likely grow
> and
> > >> attract more contributors, at which point it will be even harder to
> do.
> > >>
> > >> Please make sure to answer the following points in the discussion:
> > >>
> > >> 1) Are you (still) in favour of enforcing stricter rules on the Java
> > >> codebase?
> > >>
> > >> 2) If yes, would you be OK with the Google's Java code style?
> > >>
> > >> – Ufuk
> > >>
> > >> [1]
> > >>
> > >>
> >
> 

Re: [DISCUSS] Java code style

2015-10-21 Thread Gyula Fóra
I think the nice thing about a common codestyle is that everyone can set
the template in the IDE and use the formatting commands.

Matthias's suggestion makes this practically impossible so -1 for mixed
tabs/spaces from my side.

Matthias J. Sax  ezt írta (időpont: 2015. okt. 21., Sze,
11:46):

> I actually like tabs a lot, however, in a "mixed" style together with
> spaces. Example:
>
> myVar.callMethod(param1, // many more
> .paramX); // the dots mark space indention
>
> indenting "paramX" with tabs does not give nice aliment. Not sure if
> this would be a feasible compromise to keeps tabs in general, but use
> space for cases as above.
>
> If this in no feasible compromise, I would prefer space to get the
> correct indention in examples as above. Even if this result in a
> complete reformatting of the whole code.
>
>
> Why this? Everybody can set this in it's IDE/editor as he/she wishes...
>
> >> If we keep tabs, we will have to specify the line length relative to a
> tab
> >> size (like 4).
>
>
> -Matthias
>
>
>
>
>
> On 10/21/2015 11:06 AM, Ufuk Celebi wrote:
> > To summarize up to this point:
> >
> > - All are in favour of Google check style (with the following possible
> > exceptions)
> > - Proposed exceptions so far:
> >   * Specific line length 100 vs. 120 characters
> >   * Keep tabs instead converting to spaces (this would translate to
> > skipping/coming up with some indentation rules as well)
> >
> > If we keep tabs, we will have to specify the line length relative to a
> tab
> > size (like 4).
> >
> > Let’s keep the discussion going a little longer. I think it has proceeded
> > in a very reasonable manner so far. Thanks for this!
> >
> > – Ufuk
> >
> > On Wed, Oct 21, 2015 at 10:29 AM, Fabian Hueske 
> wrote:
> >
> >> Thanks Max for checking the modifications by the Google code style.
> >> It is very good to know, that the impact on the code base would not be
> too
> >> massive. If the Google code style would have touched almost every line,
> I
> >> would have been in favor of converting to spaces. However, your
> assessment
> >> is a strong argument to continue with tabs, IMO.
> >>
> >> Regarding the line length limit, I personally find 100 chars too narrow
> but
> >> would be +1 for having a limit.
> >>
> >> +1 for discussing the Scala style in a separate thread.
> >>
> >> Fabian
> >>
> >> 2015-10-20 18:12 GMT+02:00 Maximilian Michels :
> >>
> >>> I'm a little less excited about this. You might not be aware but, for
> >>> a large portion of the source code, we already follow the Google style
> >>> guide. The main changes will be tabs->spaces and 80/100 characters
> >>> line limit.
> >>>
> >>> Out of curiosity, I ran the official Google Style Checkstyle
> >>> configuration to confirm my suspicion:
> >>>
> >>>
> >>
> https://github.com/checkstyle/checkstyle/blob/master/src/main/resources/google_checks.xml
> >>> The changes are very little if we turn off line length limit and
> >>> tabs-to-spaces conversion.
> >>>
> >>> There are some things I really like about the Google style, e.g. every
> >>> class has to have a JavaDoc and spaces after keywords (can't stand if
> >>> there aren't any). I'm not sure if we should change tabs to spaces,
> >>> because it means touching almost every single line of code. However,
> >>> if we keep the tabs, we cannot make use of the different indention for
> >>> case statements or wrapped lines...maybe that's a compromise we can
> >>> live with.
> >>>
> >>> If we introduce the Google Style for Java, will we also impose a
> >>> stricter style check for Scala? IMHO the line length is the strictest
> >>> part of the Scala Checkstyle.
> >>>
> >>>
> >>> On Tue, Oct 20, 2015 at 4:14 PM, Henry Saputra <
> henry.sapu...@gmail.com>
> >>> wrote:
>  1) yes. Been dancing this issue for a while. Let's pull the trigger.
> >> Did
>  the exercise with Tachyon while back and did help readability and
>  homogeneity of code.
> 
>  2) +1 for Google Java style with documented exceptions and explanation
> >> on
>  why.
> 
>  On Tuesday, October 20, 2015, Ufuk Celebi  wrote:
> 
> > DISCLAIMER: This is not my personal idea, but a community discussion
> >>> from
> > some time ago. Don't kill the messenger.
> >
> > In March we were discussing issues with heterogeneity of the code
> [1].
> >>> The
> > summary is that we had a consensus to enforce a stricter code style
> on
> >>> our
> > Java code base in order to make it easier to switch between projects
> >>> and to
> > have clear rules for new contributions. The main proposal in the last
> > discussion was to go with Google's Java code style. Not all were
> fully
> > satisfied with this, but still everyone agreed on some kind of style.
> >
> > I think the upcoming 0.10 release is a good point to finally go
> >> through
> > with these changes (right after 

Re: [DISCUSS] Java code style

2015-10-21 Thread Matthias J. Sax
Agreed. That's the reason why I am in favor of using vanilla Google code
style.

On 10/21/2015 12:31 PM, Stephan Ewen wrote:
> We started out originally with mixed tab/spaces, but it ended up with
> people mixing spaces and tabs arbitrarily, and there is little way to
> enforce Matthias' specific suggestion via checkstyle.
> That's why we dropped spaces alltogether...
> 
> On Wed, Oct 21, 2015 at 12:03 PM, Gyula Fóra  wrote:
> 
>> I think the nice thing about a common codestyle is that everyone can set
>> the template in the IDE and use the formatting commands.
>>
>> Matthias's suggestion makes this practically impossible so -1 for mixed
>> tabs/spaces from my side.
>>
>> Matthias J. Sax  ezt írta (időpont: 2015. okt. 21., Sze,
>> 11:46):
>>
>>> I actually like tabs a lot, however, in a "mixed" style together with
>>> spaces. Example:
>>>
>>> myVar.callMethod(param1, // many more
>>> .paramX); // the dots mark space indention
>>>
>>> indenting "paramX" with tabs does not give nice aliment. Not sure if
>>> this would be a feasible compromise to keeps tabs in general, but use
>>> space for cases as above.
>>>
>>> If this in no feasible compromise, I would prefer space to get the
>>> correct indention in examples as above. Even if this result in a
>>> complete reformatting of the whole code.
>>>
>>>
>>> Why this? Everybody can set this in it's IDE/editor as he/she wishes...
>>>
> If we keep tabs, we will have to specify the line length relative to a
>>> tab
> size (like 4).
>>>
>>>
>>> -Matthias
>>>
>>>
>>>
>>>
>>>
>>> On 10/21/2015 11:06 AM, Ufuk Celebi wrote:
 To summarize up to this point:

 - All are in favour of Google check style (with the following possible
 exceptions)
 - Proposed exceptions so far:
   * Specific line length 100 vs. 120 characters
   * Keep tabs instead converting to spaces (this would translate to
 skipping/coming up with some indentation rules as well)

 If we keep tabs, we will have to specify the line length relative to a
>>> tab
 size (like 4).

 Let’s keep the discussion going a little longer. I think it has
>> proceeded
 in a very reasonable manner so far. Thanks for this!

 – Ufuk

 On Wed, Oct 21, 2015 at 10:29 AM, Fabian Hueske 
>>> wrote:

> Thanks Max for checking the modifications by the Google code style.
> It is very good to know, that the impact on the code base would not be
>>> too
> massive. If the Google code style would have touched almost every
>> line,
>>> I
> would have been in favor of converting to spaces. However, your
>>> assessment
> is a strong argument to continue with tabs, IMO.
>
> Regarding the line length limit, I personally find 100 chars too
>> narrow
>>> but
> would be +1 for having a limit.
>
> +1 for discussing the Scala style in a separate thread.
>
> Fabian
>
> 2015-10-20 18:12 GMT+02:00 Maximilian Michels :
>
>> I'm a little less excited about this. You might not be aware but, for
>> a large portion of the source code, we already follow the Google
>> style
>> guide. The main changes will be tabs->spaces and 80/100 characters
>> line limit.
>>
>> Out of curiosity, I ran the official Google Style Checkstyle
>> configuration to confirm my suspicion:
>>
>>
>
>>>
>> https://github.com/checkstyle/checkstyle/blob/master/src/main/resources/google_checks.xml
>> The changes are very little if we turn off line length limit and
>> tabs-to-spaces conversion.
>>
>> There are some things I really like about the Google style, e.g.
>> every
>> class has to have a JavaDoc and spaces after keywords (can't stand if
>> there aren't any). I'm not sure if we should change tabs to spaces,
>> because it means touching almost every single line of code. However,
>> if we keep the tabs, we cannot make use of the different indention
>> for
>> case statements or wrapped lines...maybe that's a compromise we can
>> live with.
>>
>> If we introduce the Google Style for Java, will we also impose a
>> stricter style check for Scala? IMHO the line length is the strictest
>> part of the Scala Checkstyle.
>>
>>
>> On Tue, Oct 20, 2015 at 4:14 PM, Henry Saputra <
>>> henry.sapu...@gmail.com>
>> wrote:
>>> 1) yes. Been dancing this issue for a while. Let's pull the trigger.
> Did
>>> the exercise with Tachyon while back and did help readability and
>>> homogeneity of code.
>>>
>>> 2) +1 for Google Java style with documented exceptions and
>> explanation
> on
>>> why.
>>>
>>> On Tuesday, October 20, 2015, Ufuk Celebi  wrote:
>>>
 DISCLAIMER: This is not my personal idea, but a community
>> discussion
>> from
 some time ago. Don't kill the 

Re: Using Flink Streaming to write to multiple output files in HDFS

2015-10-21 Thread Aljoscha Krettek
Hi,
the documentation has a guide about the Streaming API:
https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html

This also contains a section about the rolling (HDFS) FileSystem sink:
https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#hadoop-filesystem

For blog entries I would suggest these:
 - 
http://data-artisans.com/real-time-stream-processing-the-next-step-for-apache-flink/
 - 
http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
 - http://data-artisans.com/kafka-flink-a-practical-how-to/

I don’t think we have an easy starter issues right now on the Streaming API. 
But some might come up in the future. :D

Cheers,
Aljoscha
> On 21 Oct 2015, at 11:40, Andra Lungu  wrote:
> 
> Hey guys,
> 
> Long time, no see :). I recently started a new job and it involves
> performing a set of real-time data analytics using Apache Kafka, Storm
> and Flume.
> 
> What happens, on a very high level, is that set of signals is
> collected, stored into a Kafka topic and then Storm is used to filter
> certain fields out or to enrich the fields with other
> meta-information. Finally, Flume writes the output into mutiple HDFS
> files depending on the date, hour etc.
> 
> Now, I saw that Flink can play with a similar pipeline, but without
> needing Flume for the writing to HDFS part (see
> http://data-artisans.com/kafka-flink-a-practical-how-to/). Which
> brings me to my question: jow does Flink handle writing to multiple
> files in a streaming fashion? -until now, I was playing with batch and
> writeAsCsv just took one file as a parameter-
> 
> Next question: What are the prerequisites to deploy a Flink Streaming
> job on a cluster? Yarn, HDFS, anything else?
> 
> Final question, more of a request: I'd like to play around with Flink
> Streaming to state whether it can substitute Storm in this use case
> and whether it can outrun it :P. To this end, I'll need some starting
> points: docs, blog posts, examples to read. Any input would be useful.
> 
> I wanted to dig for a newbie task in the streaming area, but I could
> not find one... can we think of something easy to get me started?
> 
> Thanks! Hope you guys had fun at Flink Forward!
> Andra



Re: Using Flink Streaming to write to multiple output files in HDFS

2015-10-21 Thread Fabian Hueske
There are also training slides and programming exercises (incl. reference
solutions) for the DataStream API at

--> http://dataartisans.github.io/flink-training/

Cheers, Fabian

2015-10-21 14:03 GMT+02:00 Aljoscha Krettek :

> Hi,
> the documentation has a guide about the Streaming API:
>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html
>
> This also contains a section about the rolling (HDFS) FileSystem sink:
>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#hadoop-filesystem
>
> For blog entries I would suggest these:
>  -
> http://data-artisans.com/real-time-stream-processing-the-next-step-for-apache-flink/
>  -
> http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
>  - http://data-artisans.com/kafka-flink-a-practical-how-to/
>
> I don’t think we have an easy starter issues right now on the Streaming
> API. But some might come up in the future. :D
>
> Cheers,
> Aljoscha
> > On 21 Oct 2015, at 11:40, Andra Lungu  wrote:
> >
> > Hey guys,
> >
> > Long time, no see :). I recently started a new job and it involves
> > performing a set of real-time data analytics using Apache Kafka, Storm
> > and Flume.
> >
> > What happens, on a very high level, is that set of signals is
> > collected, stored into a Kafka topic and then Storm is used to filter
> > certain fields out or to enrich the fields with other
> > meta-information. Finally, Flume writes the output into mutiple HDFS
> > files depending on the date, hour etc.
> >
> > Now, I saw that Flink can play with a similar pipeline, but without
> > needing Flume for the writing to HDFS part (see
> > http://data-artisans.com/kafka-flink-a-practical-how-to/). Which
> > brings me to my question: jow does Flink handle writing to multiple
> > files in a streaming fashion? -until now, I was playing with batch and
> > writeAsCsv just took one file as a parameter-
> >
> > Next question: What are the prerequisites to deploy a Flink Streaming
> > job on a cluster? Yarn, HDFS, anything else?
> >
> > Final question, more of a request: I'd like to play around with Flink
> > Streaming to state whether it can substitute Storm in this use case
> > and whether it can outrun it :P. To this end, I'll need some starting
> > points: docs, blog posts, examples to read. Any input would be useful.
> >
> > I wanted to dig for a newbie task in the streaming area, but I could
> > not find one... can we think of something easy to get me started?
> >
> > Thanks! Hope you guys had fun at Flink Forward!
> > Andra
>
>


Checkpoints keep waiting on source locks

2015-10-21 Thread Gyula Fóra
Hey All,

I think there is some serious issue with the checkpoints. Running a simple
program like this won't complete any checkpoints:

StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(2);
env.enableCheckpointing(5000);

env.generateSequence(1, 100).map(t -> { Thread.sleep(1000); return t; })
.map(t -> t).print();
env.execute();

The job will start executing and triggering checkpoints but the the
triggerCheckpoint method of the StreamTask will be stuck waiting for the
checkpoint lock. It will never take a snapshot...

Any ideas?
This happens on any parallelism, and for other sources as well.

Cheers,
Gyula


Re: Checkpoints keep waiting on source locks

2015-10-21 Thread Stephan Ewen
Hey!

The issue is that checkpoints can only happen in between elements being in
the pipeline. You block the pipeline in the sleep() call.
Since the checkpoint lock is not fair, the few cycles that the source
releases the lock are not enough for the checkpointer to acquire it.

I wonder if this is an artificial corner case, or actually an issue. The
solution is theoretically simple: Use a fair lock, but we would need to
break the data sources API and switch from "synchronized(Object)" to a fair
"java.concurrent.ReentrantLock".

Greetings,
Stephan


On Wed, Oct 21, 2015 at 1:47 PM, Gyula Fóra  wrote:

> Hey All,
>
> I think there is some serious issue with the checkpoints. Running a simple
> program like this won't complete any checkpoints:
>
> StreamExecutionEnvironment env =
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setParallelism(2);
> env.enableCheckpointing(5000);
>
> env.generateSequence(1, 100).map(t -> { Thread.sleep(1000); return t; })
> .map(t -> t).print();
> env.execute();
>
> The job will start executing and triggering checkpoints but the the
> triggerCheckpoint method of the StreamTask will be stuck waiting for the
> checkpoint lock. It will never take a snapshot...
>
> Any ideas?
> This happens on any parallelism, and for other sources as well.
>
> Cheers,
> Gyula
>


[jira] [Created] (FLINK-2886) NettyClient parses wrong configuration key for number of client threads

2015-10-21 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-2886:
--

 Summary: NettyClient parses wrong configuration key for number of 
client threads
 Key: FLINK-2886
 URL: https://issues.apache.org/jira/browse/FLINK-2886
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: 0.10
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
Priority: Minor


NettyClient's {{initEpollBootstrap}} uses {{config.getServerNumThreads()}} for 
number of threads instead of {{getClientNumThreads()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Checkpoints keep waiting on source locks

2015-10-21 Thread Gyula Fóra
Thanks!
I am personally using Thread.sleep() a lot for flow control in my test
topologies, this might only be me, but it seems to be a pretty annoying
thing when you want to test your streaming jobs.

Stephan Ewen  ezt írta (időpont: 2015. okt. 21., Sze,
13:59):

> Hey!
>
> The issue is that checkpoints can only happen in between elements being in
> the pipeline. You block the pipeline in the sleep() call.
> Since the checkpoint lock is not fair, the few cycles that the source
> releases the lock are not enough for the checkpointer to acquire it.
>
> I wonder if this is an artificial corner case, or actually an issue. The
> solution is theoretically simple: Use a fair lock, but we would need to
> break the data sources API and switch from "synchronized(Object)" to a fair
> "java.concurrent.ReentrantLock".
>
> Greetings,
> Stephan
>
>
> On Wed, Oct 21, 2015 at 1:47 PM, Gyula Fóra  wrote:
>
> > Hey All,
> >
> > I think there is some serious issue with the checkpoints. Running a
> simple
> > program like this won't complete any checkpoints:
> >
> > StreamExecutionEnvironment env =
> > StreamExecutionEnvironment.getExecutionEnvironment();
> > env.setParallelism(2);
> > env.enableCheckpointing(5000);
> >
> > env.generateSequence(1, 100).map(t -> { Thread.sleep(1000); return t; })
> > .map(t -> t).print();
> > env.execute();
> >
> > The job will start executing and triggering checkpoints but the the
> > triggerCheckpoint method of the StreamTask will be stuck waiting for the
> > checkpoint lock. It will never take a snapshot...
> >
> > Any ideas?
> > This happens on any parallelism, and for other sources as well.
> >
> > Cheers,
> > Gyula
> >
>


Re: Scaling Flink

2015-10-21 Thread Till Rohrmann
Hi Greg,

there is no official guide for running Flink on large clusters. As far as I
know, the cluster we used for the matrix factorization was the largest
cluster we've run a serious job on. Thus, it would be highly interesting to
understand what made the JobManager to slow down. At some point, though,
this should happen since the JobManager always stays a single instance. Do
you have by chance access to the JobManager log file? This might be helpful.

Thanks for your help,
Till

On Tue, Oct 20, 2015 at 11:06 PM, Greg Hogan  wrote:

> Is there guidance for configuring Flink on large clusters? I have recently
> been working to benchmark some algorithms on and test AWS. I had no issues
> running on a 16 node cluster but when moving to 64 nodes the JobManager
> struggled mightily. It did not look to be parallelizing its workload. I was
> in the process of modifying my code to reduce the parallelism of earlier,
> smaller operations when I lost the cluster due to a spot price increase.
>
> The instances were c3.8xlarge and in the larger cluster one instance hosted
> the JobManager so the parallelism was 63 * 32 = 2016. The small cluster had
> parallelism of 512.
>
> I have seen the blog posts describing the performance of 640 core clusters
> on GCE. Is this a known limitation or can Flink scale much further?
>
>
> http://data-artisans.com/computing-recommendations-at-extreme-scale-with-apache-flink/
>
>
> http://data-artisans.com/how-to-factorize-a-700-gb-matrix-with-apache-flink/
>
> Thanks,
> Greg
>


Re: Scaling Flink

2015-10-21 Thread Maximilian Michels
Hi Greg,

It would be very interesting to do a profiling of the job master to
see what it mostly spends time on. Did you run your experiments with
0.9.X or the 0.10-SNAPSHOT? Would be interesting to know if there is a
regression.

Best,
Max

On Wed, Oct 21, 2015 at 10:08 AM, Till Rohrmann  wrote:
> Hi Greg,
>
> there is no official guide for running Flink on large clusters. As far as I
> know, the cluster we used for the matrix factorization was the largest
> cluster we've run a serious job on. Thus, it would be highly interesting to
> understand what made the JobManager to slow down. At some point, though,
> this should happen since the JobManager always stays a single instance. Do
> you have by chance access to the JobManager log file? This might be helpful.
>
> Thanks for your help,
> Till
>
> On Tue, Oct 20, 2015 at 11:06 PM, Greg Hogan  wrote:
>
>> Is there guidance for configuring Flink on large clusters? I have recently
>> been working to benchmark some algorithms on and test AWS. I had no issues
>> running on a 16 node cluster but when moving to 64 nodes the JobManager
>> struggled mightily. It did not look to be parallelizing its workload. I was
>> in the process of modifying my code to reduce the parallelism of earlier,
>> smaller operations when I lost the cluster due to a spot price increase.
>>
>> The instances were c3.8xlarge and in the larger cluster one instance hosted
>> the JobManager so the parallelism was 63 * 32 = 2016. The small cluster had
>> parallelism of 512.
>>
>> I have seen the blog posts describing the performance of 640 core clusters
>> on GCE. Is this a known limitation or can Flink scale much further?
>>
>>
>> http://data-artisans.com/computing-recommendations-at-extreme-scale-with-apache-flink/
>>
>>
>> http://data-artisans.com/how-to-factorize-a-700-gb-matrix-with-apache-flink/
>>
>> Thanks,
>> Greg
>>


Re: [DISCUSS] Java code style

2015-10-21 Thread Till Rohrmann
I think that the line length limitation and the space indentation are the
two rules which are most controversial in the Flink community because so
far it has been done completely different. Thus, they would also inflict
most of the changes. However, I think that at least the line length
limitation (independent of the actual length to some extent) has a positive
effect on readability of the code. Usually it's easier for people to find
the next line if one doesn't have to do too much horizontal parsing.

I think it would be best to discuss the Scala style guide in a separate
thread. Since there does not exist an exhaustive official style guide for
Scala, it will require more work from our side to come up with one.

On Tue, Oct 20, 2015 at 6:12 PM, Maximilian Michels  wrote:

> I'm a little less excited about this. You might not be aware but, for
> a large portion of the source code, we already follow the Google style
> guide. The main changes will be tabs->spaces and 80/100 characters
> line limit.
>
> Out of curiosity, I ran the official Google Style Checkstyle
> configuration to confirm my suspicion:
>
> https://github.com/checkstyle/checkstyle/blob/master/src/main/resources/google_checks.xml
> The changes are very little if we turn off line length limit and
> tabs-to-spaces conversion.
>
> There are some things I really like about the Google style, e.g. every
> class has to have a JavaDoc and spaces after keywords (can't stand if
> there aren't any). I'm not sure if we should change tabs to spaces,
> because it means touching almost every single line of code. However,
> if we keep the tabs, we cannot make use of the different indention for
> case statements or wrapped lines...maybe that's a compromise we can
> live with.
>
> If we introduce the Google Style for Java, will we also impose a
> stricter style check for Scala? IMHO the line length is the strictest
> part of the Scala Checkstyle.
>
>
> On Tue, Oct 20, 2015 at 4:14 PM, Henry Saputra 
> wrote:
> > 1) yes. Been dancing this issue for a while. Let's pull the trigger. Did
> > the exercise with Tachyon while back and did help readability and
> > homogeneity of code.
> >
> > 2) +1 for Google Java style with documented exceptions and explanation on
> > why.
> >
> > On Tuesday, October 20, 2015, Ufuk Celebi  wrote:
> >
> >> DISCLAIMER: This is not my personal idea, but a community discussion
> from
> >> some time ago. Don't kill the messenger.
> >>
> >> In March we were discussing issues with heterogeneity of the code [1].
> The
> >> summary is that we had a consensus to enforce a stricter code style on
> our
> >> Java code base in order to make it easier to switch between projects
> and to
> >> have clear rules for new contributions. The main proposal in the last
> >> discussion was to go with Google's Java code style. Not all were fully
> >> satisfied with this, but still everyone agreed on some kind of style.
> >>
> >> I think the upcoming 0.10 release is a good point to finally go through
> >> with these changes (right after the release/branch-off).
> >>
> >> I propose to go with Google's Java code style [2] as proposed earlier.
> >>
> >> PROs:
> >> - Clear style guide available
> >> - Tooling like checkstyle rules, IDE plugins already available
> >>
> >> CONs:
> >> - Fully breaks our current style
> >>
> >> The main problem with this will be open pull requests, which will be
> harder
> >> to merge after all the changes. On the other hand, should pull requests
> >> that have been open for a long time block this? Most of the important
> >> changes will be merged for the release anyways. I think in the long run
> we
> >> will gain more than we loose by this (more homogenous code, clear
> rules).
> >> And it is questionable whether we will ever be able to do such a change
> in
> >> the future if we cannot do it now. The project will most likely grow and
> >> attract more contributors, at which point it will be even harder to do.
> >>
> >> Please make sure to answer the following points in the discussion:
> >>
> >> 1) Are you (still) in favour of enforcing stricter rules on the Java
> >> codebase?
> >>
> >> 2) If yes, would you be OK with the Google's Java code style?
> >>
> >> – Ufuk
> >>
> >> [1]
> >>
> >>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201503.mbox/%3ccanc1h_von0b5omnwzxchtyzwhakeghbzvquyk7s9o2a36b8...@mail.gmail.com%3e
> >>
> >> [2] https://google.github.io/styleguide/javaguide.html
> >>
>


Re: [DISCUSS] Java code style

2015-10-21 Thread Fabian Hueske
Thanks Max for checking the modifications by the Google code style.
It is very good to know, that the impact on the code base would not be too
massive. If the Google code style would have touched almost every line, I
would have been in favor of converting to spaces. However, your assessment
is a strong argument to continue with tabs, IMO.

Regarding the line length limit, I personally find 100 chars too narrow but
would be +1 for having a limit.

+1 for discussing the Scala style in a separate thread.

Fabian

2015-10-20 18:12 GMT+02:00 Maximilian Michels :

> I'm a little less excited about this. You might not be aware but, for
> a large portion of the source code, we already follow the Google style
> guide. The main changes will be tabs->spaces and 80/100 characters
> line limit.
>
> Out of curiosity, I ran the official Google Style Checkstyle
> configuration to confirm my suspicion:
>
> https://github.com/checkstyle/checkstyle/blob/master/src/main/resources/google_checks.xml
> The changes are very little if we turn off line length limit and
> tabs-to-spaces conversion.
>
> There are some things I really like about the Google style, e.g. every
> class has to have a JavaDoc and spaces after keywords (can't stand if
> there aren't any). I'm not sure if we should change tabs to spaces,
> because it means touching almost every single line of code. However,
> if we keep the tabs, we cannot make use of the different indention for
> case statements or wrapped lines...maybe that's a compromise we can
> live with.
>
> If we introduce the Google Style for Java, will we also impose a
> stricter style check for Scala? IMHO the line length is the strictest
> part of the Scala Checkstyle.
>
>
> On Tue, Oct 20, 2015 at 4:14 PM, Henry Saputra 
> wrote:
> > 1) yes. Been dancing this issue for a while. Let's pull the trigger. Did
> > the exercise with Tachyon while back and did help readability and
> > homogeneity of code.
> >
> > 2) +1 for Google Java style with documented exceptions and explanation on
> > why.
> >
> > On Tuesday, October 20, 2015, Ufuk Celebi  wrote:
> >
> >> DISCLAIMER: This is not my personal idea, but a community discussion
> from
> >> some time ago. Don't kill the messenger.
> >>
> >> In March we were discussing issues with heterogeneity of the code [1].
> The
> >> summary is that we had a consensus to enforce a stricter code style on
> our
> >> Java code base in order to make it easier to switch between projects
> and to
> >> have clear rules for new contributions. The main proposal in the last
> >> discussion was to go with Google's Java code style. Not all were fully
> >> satisfied with this, but still everyone agreed on some kind of style.
> >>
> >> I think the upcoming 0.10 release is a good point to finally go through
> >> with these changes (right after the release/branch-off).
> >>
> >> I propose to go with Google's Java code style [2] as proposed earlier.
> >>
> >> PROs:
> >> - Clear style guide available
> >> - Tooling like checkstyle rules, IDE plugins already available
> >>
> >> CONs:
> >> - Fully breaks our current style
> >>
> >> The main problem with this will be open pull requests, which will be
> harder
> >> to merge after all the changes. On the other hand, should pull requests
> >> that have been open for a long time block this? Most of the important
> >> changes will be merged for the release anyways. I think in the long run
> we
> >> will gain more than we loose by this (more homogenous code, clear
> rules).
> >> And it is questionable whether we will ever be able to do such a change
> in
> >> the future if we cannot do it now. The project will most likely grow and
> >> attract more contributors, at which point it will be even harder to do.
> >>
> >> Please make sure to answer the following points in the discussion:
> >>
> >> 1) Are you (still) in favour of enforcing stricter rules on the Java
> >> codebase?
> >>
> >> 2) If yes, would you be OK with the Google's Java code style?
> >>
> >> – Ufuk
> >>
> >> [1]
> >>
> >>
> http://mail-archives.apache.org/mod_mbox/flink-dev/201503.mbox/%3ccanc1h_von0b5omnwzxchtyzwhakeghbzvquyk7s9o2a36b8...@mail.gmail.com%3e
> >>
> >> [2] https://google.github.io/styleguide/javaguide.html
> >>
>


[jira] [Created] (FLINK-2885) Path to Python resources is not constructed correctly

2015-10-21 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-2885:
-

 Summary: Path to Python resources is not constructed correctly
 Key: FLINK-2885
 URL: https://issues.apache.org/jira/browse/FLINK-2885
 Project: Flink
  Issue Type: Bug
  Components: Python API
Affects Versions: 0.9.1, 0.10
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Blocker
 Fix For: 0.9.2, 0.10


The paths set in {{config.sh}} changed, leading to a wrong path construction of 
the Python API resources. Executing even the most trivial Python programs fails:

{noformat}
org.apache.flink.client.program.ProgramInvocationException: The main method 
caused an error.
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
at org.apache.flink.client.program.Client.run(Client.java:315)
at 
org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:582)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:288)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:878)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:920)
Caused by: java.io.FileNotFoundException: File 
/Users/max/Downloads/flin/resources/python does not exist or the user running 
Flink ('max') has insufficient permissions to access it.
at 
org.apache.flink.core.fs.local.LocalFileSystem.getFileStatus(LocalFileSystem.java:107)
at org.apache.flink.runtime.filecache.FileCache.copy(FileCache.java:242)
at 
org.apache.flink.languagebinding.api.java.python.PythonPlanBinder.prepareFiles(PythonPlanBinder.java:138)
at 
org.apache.flink.languagebinding.api.java.python.PythonPlanBinder.runPlan(PythonPlanBinder.java:108)
at 
org.apache.flink.languagebinding.api.java.python.PythonPlanBinder.main(PythonPlanBinder.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
... 6 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)