Re: Force enabling checkpoints for iterative streaming jobs

2015-06-10 Thread Gyula Fóra
wondering whether the state inside iterations still makes sense without these in-flight elements. But I also don't know the King use-case, that's why I though an example could be helpful. On Wed, Jun 10, 2015 at 12:37 PM, Gyula Fóra gyula.f...@gmail.com wrote: I don't understand the question, I vote

Re: Force enabling checkpoints for iterative streaming jobs

2015-06-10 Thread Gyula Fóra
-clustering/blob/master/src/main/scala/stream/clustering/StreamClustering.scala (checkpointing is not turned on but you will get the point) Gyula Fóra gyula.f...@gmail.com ezt írta (időpont: 2015. jún. 10., Sze, 12:47): You are right, to have consistent results we would need to persist the records

Re: Force enabling checkpoints for iterative streaming jobs

2015-06-10 Thread Gyula Fóra
that correspond to this state or created this state is desirable. Maybe an example could help understand this better? On Wed, Jun 10, 2015 at 11:27 AM, Gyula Fóra gyula.f...@gmail.com wrote: The other tests verify that the checkpointing algorithm runs properly. That also ensures that it runs

Re: Force enabling checkpoints for iterative streaming jobs

2015-06-10 Thread Gyula Fóra
within the loop and also slightly change the current protocol since it works only for DAGs. Before going into that direction though I would propose we first see whether there is a nice way to make iterations more structured. Paris From: Gyula Fóra

Re: Force enabling checkpoints for iterative streaming jobs

2015-06-10 Thread Gyula Fóra
of feedback data should not matter. Let us not block prominent users until the next release on this. On Wed, Jun 10, 2015 at 8:09 AM, Gyula Fóra gyula.f...@gmail.com wrote: As for people currently suffering from it: An application King is developing requires iterations, and they need

Re: Force enabling checkpoints for iterative streaming jobs

2015-06-10 Thread Gyula Fóra
...@apache.org wrote: We could add a method on the ExecutionConfig but mark it as deprecated and explain that it will go away once the interplay of iterations, state and so on is properly figured out. On Wed, Jun 10, 2015 at 2:36 PM, Ufuk Celebi u...@apache.org wrote: On 10 Jun 2015, at 14:29, Gyula

Re: Force enabling checkpoints for iterative streaming jobs

2015-06-10 Thread Gyula Fóra
Done, I will merge it after travis passes. Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10., Sze, 15:25): Let's mark the method of the environment as deprecated like Aljoscha suggested. Then I think we could merge it. On Wed, Jun 10, 2015 at 2:50 PM, Gyula Fóra gyula.f

Re: Force enabling checkpoints for iterative streaming jobs

2015-06-10 Thread Gyula Fóra
vote to postpone this. – Ufuk On 10 Jun 2015, at 00:19, Gyula Fóra gyf...@apache.org wrote: Hey all, It is currently impossible to enable state checkpointing for iterative jobs, because en exception is thrown when creating the jobgraph. This behaviour is motivated by the lack of precise

Re: Send events to parallel operator instances

2015-06-04 Thread Gyula Fóra
not yet a clear understanding about how we actually want it to behave, exactly. What do you think? On Wed, Jun 3, 2015 at 2:14 PM, Gyula Fóra gyula.f...@gmail.com wrote: I am talking of course about global delta windows. On the full stream not on a partition. Delta windows per partition

Re: Send events to parallel operator instances

2015-06-04 Thread Gyula Fóra
as a coordinator between vertices on the same level. On Thu, Jun 4, 2015 at 2:55 PM, Gyula Fóra gyula.f...@gmail.com wrote: Thank you! I was aware of the iterations as a possibility, but I was wondering if we might have lateral communications. Ufuk Celebi u...@apache.org ezt írta (időpont

Re: Send events to parallel operator instances

2015-06-04 Thread Gyula Fóra
Thank you! I was aware of the iterations as a possibility, but I was wondering if we might have lateral communications. Ufuk Celebi u...@apache.org ezt írta (időpont: 2015. jún. 4., Cs, 13:29): On 04 Jun 2015, at 12:46, Stephan Ewen se...@apache.org wrote: There is no lateral communication

Re: Send events to parallel operator instances

2015-06-03 Thread Gyula Fóra
and then it doesn't make sense to explore complicated workarounds for the current system. On Wed, Jun 3, 2015 at 11:07 AM, Gyula Fóra gyula.f...@gmail.com wrote: There are simple ways of implementing it in a non-distributed or inconsistent fashion. On Wed, Jun 3, 2015 at 8:55 AM Aljoscha

Re: Send events to parallel operator instances

2015-06-03 Thread Gyula Fóra
that starts the window can be processed on a different machine than the element that finishes the window? On Wed, Jun 3, 2015 at 12:11 PM, Gyula Fóra gyula.f...@gmail.com wrote: This is not connected to the current implementation. So lets not talk about that. This is about theoretical limits

Re: Send events to parallel operator instances

2015-06-03 Thread Gyula Fóra
, Gyula Fóra gyula.f...@gmail.com wrote: Hi Ufuk, In the concrete use case I have in mind I only want to send events to another subtask of the same task vertex. Specifically: if we want to do distributed delta based windows we need to send after every trigger the element that has triggered

Send events to parallel operator instances

2015-06-02 Thread Gyula Fóra
Hi, I am wondering, what is the suggested way to send some events directly to another parallel instance in a flink job? For example from one mapper to another mapper (of the same operator). Do we have any internal support for this? The first thing that we thought of is iterations but that is

Re: [DISCUSS] Streaming Sources (again)

2015-05-31 Thread Gyula Fóra
at 10:56 PM, Gyula Fóra gyula.f...@gmail.com wrote: Hey, It seems like both interfaces are pretty much capable of doing the same thing but work on slightly different assumptions. Isn't there a way that the kafka source can work with the interruptions? I think the reachedEnd

Re: [DISCUSS] Streaming Sources (again)

2015-05-29 Thread Gyula Fóra
Hey, It seems like both interfaces are pretty much capable of doing the same thing but work on slightly different assumptions. Isn't there a way that the kafka source can work with the interruptions? I think the reachedEnd/next interface is slightly easier to grasp than the run() with the locks.

Re: Memleak in the SessionWindowing example

2015-05-28 Thread Gyula Fóra
. Best regards, Gabor 2015-05-28 19:23 GMT+02:00 Gyula Fóra gyula.f...@gmail.com: Hi, Indeed a good catch, and a valid issue exactly because of the stateful nature of the trigger and eviction policies. I agree with the suggested approach that this should be configurable

Re: [DISCUSS] Canceling Streaming Jobs

2015-05-27 Thread Gyula Fóra
Hey, I would also strongly prefer the second option, users need to have the option to force cancel a program in case of something unwanted behaviour. Cheers, Gyula Matthias J. Sax mj...@informatik.hu-berlin.de ezt írta (időpont: 2015. máj. 27., Sze, 1:20): Hi, currently, the only way to

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Gyula Fóra
and not break the correct/expected behaviour. Paris? On 20 May 2015, at 11:02, Márton Balassi mbala...@apache.org wrote: +1 for copying. On May 20, 2015 10:50 AM, Gyula Fóra gyf...@apache.org wrote: Hey, The latest streaming operator rework removed the copying

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Gyula Fóra
: Gyula Fóra gyf...@apache.org Sent: Wednesday, May 20, 2015 2:06 PM To: dev@flink.apache.org Subject: Re: [DISCUSS] Re-add record copy to chained operator calls Copy before putting it into a window buffer and any other group buffer. Exactly my point. Any stateful operator should be able

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Gyula Fóra
operator in a chain is repeatedly emitting the same object, and the succeeding operator is gathering the objects, then it is a problem Or are there cases where the system itself repeatedly emits the same objects? On Wed, May 20, 2015 at 3:07 PM, Gyula Fóra gyf...@apache.org wrote: We

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Gyula Fóra
emit (or mutate) the same object for example in Spark, you get an RDD with completely messed up contents. On Wed, May 20, 2015 at 3:27 PM, Gyula Fóra gyf...@apache.org wrote: If the preceding operator is emitting a mutated object, or does something with the output object afterwards

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Gyula Fóra
a well informed decision, as this behavior is as much part of the API as the classname of the DataStream. On Wed, May 20, 2015 at 3:41 PM, Gyula Fóra gyula.f...@gmail.com wrote: I would go for the Failsafe option as a default behaviour with a clearly documented lightweight (no-copy) setting

Re: [DISCUSS] Re-add record copy to chained operator calls

2015-05-20 Thread Gyula Fóra
you are very keen on the failsafe variant. That is fine, I'd say let's go ahead. Then let us introduce a switch. The switch needs to work on copies for user functions only. Until the window buffers are serialized, we need to keep the copies there. On Wed, May 20, 2015 at 3:55 PM, Gyula Fóra

Re: [DISCUSS] Access to Time and Window in Streaming Operations

2015-05-13 Thread Gyula Fóra
I think this is that thread :) But as I said it is just a matter of what we want to add, and we can already do it. On Wed, May 13, 2015 at 11:37 AM, Stephan Ewen se...@apache.org wrote: This is a pretty central question, actually (timestamping the results of windows). Let us kick off a

Re: [DISCUSS] Access to Time and Window in Streaming Operations

2015-05-12 Thread Gyula Fóra
Hi, This was the exact need that motivated me to rework the windowing and introduce the StreamWindow abstraction which can hold any metadata that represents the current window. At this moment it only contains a unique id but this could be extended easily. When the user created a

Re: About Interplay of Merged Streams, Output Selectors and Checkpoint Barriers (and Watermarks)

2015-05-12 Thread Gyula Fóra
Hi, Checkpoint barriers are handled directly on top of the network layer and you are right they work similarly, by blocking input channels until it gets the barrier from all of them. A way of implementing this on the operator level would be by adding a way to ask the inputreader the channel

Re: [DISCUSS] Naming and Functionality of Stream Operators and Tasks

2015-05-08 Thread Gyula Fóra
Generally I am in favor of making these name changes. My only concern is regarding to the one-input and multiple inputs operators. There is a general problem with the n-ary operators regarding type safety, thats why we now have SingleInput and Co (two-input) operators. I think we should keep

Re: [DISCUSS] Change Streaming Operators to be Push-Only

2015-05-05 Thread Gyula Fóra
there would need to be some functionality there at some point. Or maybe it is not required there and can be handled in the StreamTask. Others might know this better than I do right now. On Tue, May 5, 2015 at 3:24 PM, Gyula Fóra gyula.f...@gmail.com javascript:; wrote: What would

Re: [DISCUSS] Change Streaming Operators to be Push-Only

2015-05-05 Thread Gyula Fóra
I think this a good idea in general. I would try to minimize the methods we include and make the ones that we keep very concrete. For instance i would not have the receive barrier method as that is handled on a totally different level already. And instead of punctuation I would directly add a

Re: [DISCUSS] Change Streaming Operators to be Push-Only

2015-05-05 Thread Gyula Fóra
to the TaskContext and ExecutionConfig and Serializers. The operator would emit values using an emit() method or the Collector interface, not sure about that yet. On Tue, May 5, 2015 at 3:12 PM, Gyula Fóra gyf...@apache.org javascript:; wrote: I think this a good idea in general. I would try

Re: Question about SlidingPreReducers

2015-04-29 Thread Gyula Fóra
:15 PM, Gyula Fóra gyf...@apache.org wrote: Hey, They actually work :P Although I have to admit I need to do some refactoring of the method names and parameters. I made some quick refactoring and added some comments for the key methods: https://github.com/mbalassi/flink/blob

Re: Periodic full stream aggregations

2015-04-21 Thread Gyula Fóra
On 20.04.2015 22:32, Gyula Fóra wrote: Hey all, I think we are missing a quite useful feature that could be implemented (with some slight modifications) on top of the current windowing api. We currently provide 2 ways of aggregating (or reducing) over streams: doing a continuous

Re: Periodic full stream aggregations

2015-04-21 Thread Gyula Fóra
, and max where you need to pass only a value to the next aggregation or also more complex data structures, e.g., a synopsis of the full stream, to compute an aggregation such as an approximate count distinct (item count)? Cheers, Bruno On 21.04.2015 15:18, Gyula Fóra wrote: You are right

Re: Periodic full stream aggregations

2015-04-21 Thread Gyula Fóra
)) I think that would be more consistent with the structure of the remaining API. Cheers, Fabian 2015-04-21 10:57 GMT+02:00 Gyula Fóra gyf...@apache.org: Hi Bruno, Of course you can do that as well. (That's the good part :p ) I will open a PR soon with the proposed changes (first

Periodic full stream aggregations

2015-04-20 Thread Gyula Fóra
Hey all, I think we are missing a quite useful feature that could be implemented (with some slight modifications) on top of the current windowing api. We currently provide 2 ways of aggregating (or reducing) over streams: doing a continuous aggregation and always output the aggregated value

Re: Merge Python API

2015-04-20 Thread Gyula Fóra
+1 On Mon, Apr 20, 2015 at 2:41 PM, Fabian Hueske fhue...@gmail.com wrote: +1 2015-04-20 14:39 GMT+02:00 Maximilian Michels m...@apache.org: +1 Let's merge it to flink-staging and get some people to use it. On Mon, Apr 20, 2015 at 2:21 PM, Kostas Tzoumas ktzou...@apache.org wrote:

Re: Rework of the window-join semantics

2015-04-18 Thread Gyula Fóra
). The timestamping for (intermediate) result tuples, is also an important question to be answered. -Matthias On 04/07/2015 11:37 AM, Gyula Fóra wrote: Hey, I agree with Kostas, if we define the exact semantics how this works, this is not more ad-hoc than any other stateful

Major Streaming refactoring

2015-04-13 Thread Gyula Fóra
Dear All, Today I did a major refactoring of some streaming components, with a lot of class renamings and some package restructuring. https://github.com/apache/flink/pull/594 1. I refactored the internal representation of the Streaming topologies (StreamGraph) to a more straightforward and less

Re: Rework of the window-join semantics

2015-04-03 Thread Gyula Fóra
, Márton Balassi balassi.mar...@gmail.com wrote: Big +1 for the proposal for Peter and Gyula. I'm really for bringing the windowing and window join API in sync. On Thu, Apr 2, 2015 at 6:32 PM, Gyula Fóra gyf...@apache.org wrote: Hey guys, As Aljoscha has highlighted earlier

Re: Storm compatibility layer for Flink (first beta available)

2015-04-02 Thread Gyula Fóra
This sounds amazing :) thanks Matthias! Tomorrow I will spend some time to look through your work and give some comments. Also I would love to help with this effort so once we merge an initial prototype let's open some Jiras and I will pick some up :) Gyula On Thursday, April 2, 2015, Márton

Rework of the window-join semantics

2015-04-02 Thread Gyula Fóra
Hey guys, As Aljoscha has highlighted earlier the current window join semantics in the streaming api doesn't follow the changes in the windowing api. More precisely, we currently only support joins over time windows of equal size on both streams. The reason for this is that we now take a window

Re: A small Project I've been working on

2015-04-01 Thread Gyula Fóra
Amazing! :) On Wed, Apr 1, 2015 at 9:41 AM, fhue...@gmail.com wrote: My ruby skills are a bit rust(y) but I’d love to contribute. Can you point me to a repository that I can fork? From: Till Rohrmann Sent: ‎Wednesday‎, ‎1‎. ‎April‎, ‎2015 ‎09‎:‎29 To: dev@flink.apache.org

Re: GSoC proposal

2015-03-26 Thread Gyula Fóra
Hey Gabor, Thank you for the proposal. It has many interesting ideas and a good potential. My comments: We already have a large amount of ongoing work on the windowing optimizations, covering your suggestions in section 1. It would be better to drop that part from the project because thats very

Re: Question about Streaming Windows in Join

2015-03-25 Thread Gyula Fóra
Hey Aljoscha, This is not an accident :D The reason behind only supporting time-window joins at the moment, is because time-window-joins can be implemented in a completely non-blocking fashion arising from the global nature of time. All other windows would need to be matched properly which

Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534

2015-03-24 Thread Gyula Fóra
semantics you would like to provide can hugely affect (probably limit) the possibilities that you have afterwards in terms of optimizations. Let me know if you have further questions regarding this :) Gyula On Tue, Mar 24, 2015 at 12:01 PM, Gyula Fóra gyf...@apache.org wrote: Hey, Give me an hour

Re: [GSoc][flink-streaming] Interested in pursuing FLINK-1617 and FLINK-1534

2015-03-24 Thread Gyula Fóra
Hey, Give me an hour or so as I am in a meeting currently, but I will get back to you afterwards. Regards, Gyula On Tue, Mar 24, 2015 at 11:03 AM, Akshay Dixit akshayd...@gmail.com wrote: Hi, It'd really help if I got a reply soon. It'll be helpful in writing the proposal since the deadline

Re: Question about Flink Streaming

2015-03-24 Thread Gyula Fóra
Hey Matthias, Let's see if I get these things for you :) 1) The difference between setup and open is that, setup to set things like collectors, runtimecontext and everything that will be used by the implemented invokable, and also by the rich functions. Open is called after setup, to actually

Re: Question about Flink Streaming

2015-03-24 Thread Gyula Fóra
is fine and should not be changed. Before opening a JIRA for it, you two should get in sync and decide what to do. -Matthias On 03/24/2015 05:38 PM, Gyula Fóra wrote: Hey Matthias, Let's see if I get these things for you :) 1) The difference between setup and open is that, setup

Re: Tests for the Steaming classes

2015-03-23 Thread Gyula Fóra
+1 for lots of streaming tests On Mon, Mar 23, 2015 at 11:23 AM, Márton Balassi balassi.mar...@gmail.com wrote: Thanks for looking into this, Stephan. +1 for the JIRAs. On Mon, Mar 23, 2015 at 10:55 AM, Ufuk Celebi u...@apache.org wrote: On 23 Mar 2015, at 10:44, Stephan Ewen

Re: Inconsistent git master

2015-03-11 Thread Gyula Fóra
. :( I don't think that anything is broken. Let's wait a little longer. – Ufuk On Wednesday, March 11, 2015, Gyula Fóra gyf...@apache.org wrote: Hey, I pushed some commits yesterday evening and it seems like the git repos somehow became inconsistent, the https://github.com/apache

Taskmanager memory error in Eclipse

2015-03-01 Thread Gyula Fóra
Hey All, I am getting a weird error in Eclipse when trying to run local flink programs for example any batch or streaming example program. I tried cleaning, rebuilding reimporting etc that didnt help. Please see the log and stack trace: 16:55:04,965 INFO

Re: Taskmanager memory error in Eclipse

2015-03-01 Thread Gyula Fóra
Setting the JVM heap larger solves this issue, but I still think that this is a misleading error. With -Xmx256m it works. On Sun, Mar 1, 2015 at 5:03 PM, Gyula Fóra gyf...@apache.org wrote: Hey All, I am getting a weird error in Eclipse when trying to run local flink programs for example

Re: Flink Streaming parallelism bug report

2015-02-27 Thread Gyula Fóra
They should actually return different values in many cases. Datastream.env.getDegreeOfParallelism returns the environment parallelism (default) Datastream.getparallelism() returns the parallelism of the operator. There is a reason when one or the other is used. Please watch out when you try to

Re: Flink Streaming parallelism bug report

2015-02-27 Thread Gyula Fóra
. The other ITCases ran properly, so I figured, the problem is with the windowing. Can you check it out for me? (WindowedDataStream, line 348) Peter 2015-02-27 10:06 GMT+01:00 Gyula Fóra gyf...@apache.org: They should actually return different values in many cases

Re: [jira] [Created] (FLINK-1594) DataStreams don't support self-join

2015-02-20 Thread Gyula Fóra
We have a join operator that is defined over windows in the data stream. So the problem is not with the join itself, but it seems that trying to apply this window-join with itself throws an error. On Fri, Feb 20, 2015 at 4:50 PM, Alexander Alexandrov alexander.s.alexand...@gmail.com wrote: I

Re: [DISCUSS] Dedicated streaming mode and start scripts

2015-02-17 Thread Gyula Fóra
So the current setup is to share results between the two apis by files. So I dont see any reason why this couldnt work with the 2 cluster setup. It makes deployment a little trickier but still feasible. On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi u...@apache.org wrote: I think this separation

Re: Turn lazy operator execution off for streaming jobs

2015-01-23 Thread Gyula Fóra
Great, thanks! It was fast :) I will try to check it out in the afternoon! Gyula On Fri, Jan 23, 2015 at 11:37 AM, Ufuk Celebi u...@apache.org wrote: On 22 Jan 2015, at 18:10, Stephan Ewen se...@apache.org wrote: So the crux is that the JobManager has a location for the sender task (and

Re: Turn lazy operator execution off for streaming jobs

2015-01-21 Thread Gyula Fóra
Thank you! I will play around with it. On Wed, Jan 21, 2015 at 3:50 PM, Ufuk Celebi u...@apache.org wrote: Hey Gyula, On 21 Jan 2015, at 15:41, Gyula Fóra gyf...@apache.org wrote: Hey Guys, I think it would make sense to turn lazy operator execution off for streaming programs because

Turn lazy operator execution off for streaming jobs

2015-01-21 Thread Gyula Fóra
Hey Guys, I think it would make sense to turn lazy operator execution off for streaming programs because it would make life simpler for windowing. I also created a JIRA issue here https://issues.apache.org/jira/browse/FLINK-1425. Can anyone give me some quick pointers how to do this? Its

<    2   3   4   5   6   7