Re: Turn lazy operator execution off for streaming jobs

2015-01-21 Thread Gyula Fóra
Thank you! I will play around with it. On Wed, Jan 21, 2015 at 3:50 PM, Ufuk Celebi u...@apache.org wrote: Hey Gyula, On 21 Jan 2015, at 15:41, Gyula Fóra gyf...@apache.org wrote: Hey Guys, I think it would make sense to turn lazy operator execution off for streaming programs because

Very strange behaviour of groupBy() - sort() - first()

2015-01-21 Thread Felix Neutatz
Hi, my use case is the following: I have a Tuple2String,Long. I want to group by the String and sum up the Long values accordingly. This works fine with these lines: DataSetLineitem lineitems = getLineitemDataSet(env); lineitems.project(new int []{3,0}).groupBy(0).aggregate(Aggregations.SUM,

Re: [flink-streaming] Regarding loops in the Job Graph

2015-01-21 Thread Stephan Ewen
Hi Paris! The Streaming API allows you to define iterations, where parts of the stream are fed back. Do those work for you? In general, cyclic flows are a tricky thing, as the topological order of operators is needed for scheduling (may not be important for continuous streams) but also for a

[jira] [Created] (FLINK-1430) Add test for streaming scala api completeness

2015-01-21 Thread JIRA
Márton Balassi created FLINK-1430: - Summary: Add test for streaming scala api completeness Key: FLINK-1430 URL: https://issues.apache.org/jira/browse/FLINK-1430 Project: Flink Issue Type:

Re: Turn lazy operator execution off for streaming jobs

2015-01-21 Thread Stephan Ewen
I think that this is a fairly delicate thing. The execution graph / scheduling is the most delicate part of the system. I would not feel too well about a quick fix there, so let's think this through a little bit. The logic currently does the following: 1) It schedules the sources (see

Re: Very strange behaviour of groupBy() - sort() - first()

2015-01-21 Thread Fabian Hueske
Chesnay is right. Right now, it is not possible to do want you want in a straightforward way because Flink does not support to fully sort a data set (there are several related issues in JIRA). A workaround would be to attach a constant value to each tuple, group on that (all tuples are sent to

Re: Very strange behaviour of groupBy() - sort() - first()

2015-01-21 Thread Chesnay Schepler
If i remember correctly first() returns the first n values for every group. the javadocs actually don't make this behaviour very clear. On 21.01.2015 19:18, Felix Neutatz wrote: Hi, my use case is the following: I have a Tuple2String,Long. I want to group by the String and sum up the Long

Re: Very strange behaviour of groupBy() - sort() - first()

2015-01-21 Thread Stephan Ewen
Chesnay is right. What you want is a non-grouped sort/first, which would need to be added... Stephan Am 21.01.2015 11:25 schrieb Chesnay Schepler chesnay.schep...@fu-berlin.de: If i remember correctly first() returns the first n values for every group. the javadocs actually don't make this

[jira] [Created] (FLINK-1428) Typos in Java code example for RichGroupReduceFunction

2015-01-21 Thread Felix Neutatz (JIRA)
Felix Neutatz created FLINK-1428: Summary: Typos in Java code example for RichGroupReduceFunction Key: FLINK-1428 URL: https://issues.apache.org/jira/browse/FLINK-1428 Project: Flink Issue

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Henry Saputra
Would it be better to use Github Jenkins plugin [1] to connect to ASF Jenkins cluster? [1] https://wiki.jenkins-ci.org/display/JENKINS/GitHub+pull+request+builder+plugin [2] http://events.linuxfoundation.org/sites/events/files/slides/Jenkins_at_ASF_2014.pdf On Tue, Jan 20, 2015 at 2:57 PM,

Re: Implementing a list accumulator

2015-01-21 Thread Stephan Ewen
True, that is tricky. The user code does not necessarily respect the non-reuse mode. That may be true for any user code. Can the list accumulator immediately serialize the objects and send over a byte array? That should since it reliably without adding overhead (serialization will happen anyways).

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Robert Metzger
Is the git hook something we can control for everybody? I thought its more like a personal thing everybody can set up if wanted? I'm against enforcing something like this for every committer. I don't want to wait for 15 minutes for pushing a typo fix to the documentation. On Wed, Jan 21, 2015

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Max Michels
Hi Robert, I like your solution using Travis and Google App Engine. However, I think there's a much simpler solution which can prevent commiters from pushing not even compiling or test-failing code to the master in the first place. Commiters could simply install a git pre-push hook in their git

[jira] [Created] (FLINK-1425) Turn lazy operator execution off for streaming programs

2015-01-21 Thread Gyula Fora (JIRA)
Gyula Fora created FLINK-1425: - Summary: Turn lazy operator execution off for streaming programs Key: FLINK-1425 URL: https://issues.apache.org/jira/browse/FLINK-1425 Project: Flink Issue Type:

Re: Master not building and how to notice it faster in the future

2015-01-21 Thread Ufuk Celebi
Thanks for the nice script. I've just installed it :-) On 21 Jan 2015, at 13:57, Max Michels m...@data-artisans.com wrote: I've created a pre-push hook that does what I described (and a bit more). It does only enforce a check for the remote flink master branch and doesn't disturb you on your

Turn lazy operator execution off for streaming jobs

2015-01-21 Thread Gyula Fóra
Hey Guys, I think it would make sense to turn lazy operator execution off for streaming programs because it would make life simpler for windowing. I also created a JIRA issue here https://issues.apache.org/jira/browse/FLINK-1425. Can anyone give me some quick pointers how to do this? Its