Re: Monitoring backpressure

2015-12-07 Thread Gyula Fóra
t; method of the buffer pool... > > Stephan > > > On Mon, Dec 7, 2015 at 9:45 AM, Gyula Fóra <gyf...@apache.org> wrote: > > > Hey guys, > > > > Is there any way to monitor the backpressure in the Flink job? I find it > > hard to debug slow operators because of t

Re: Union a data stream with a product of itself

2015-11-25 Thread Gyula Fóra
Yes, I am not sure if this the intentional behaviour. I think you are supposed to be able to do the things you described. stream.union(stream.map(..)) and things like this are fair operations. Also maybe stream.union(stream) should just give stream instead of an error. Could someone comment on

Re: Union a data stream with a product of itself

2015-11-25 Thread Gyula Fóra
> "stream.union(stream.map(..))" should definitely be possible. Not sure > why > > this is not permitted. > > > > "stream.union(stream)" would contain each element twice, so should either > > give an error or actually union (or duplicate) elements.

Re: Either left() vs left(value)

2015-11-24 Thread Gyula Fóra
5. nov. 23., H, 22:03): > Either is abstract already ;) > > On 23 November 2015 at 21:54, Gyula Fóra <gyula.f...@gmail.com> wrote: > > > I think it is not too bad to only have the Right/Left classes. You can > then > > write it like this: > > > > Eith

Re: [VOTE] Release Apache Flink 0.10.1 (release-0.10.0-rc1)

2015-11-24 Thread Gyula Fóra
Hi, Regarding my previous comment for the Kafka/Zookeeper issue, let's discuss if this is critical enough so we want to include it in this release or the next bugfix. I will try to further investigate the reason the job failed in the first place (we suspect broker failure) Cheers, Gyula

Re: [VOTE] Release Apache Flink 0.10.1 (release-0.10.0-rc1)

2015-11-24 Thread Gyula Fóra
Hi, I vote -1 for the RC due to the fact that the zookeeper deadlock issue was not completely solved. Robert could find the problem with the dependency management plugin and has opened a PR: [FLINK-3067] Enforce zkclient 0.7 for Kafka https://github.com/apache/flink/pull/1399 Cheers, Gyula

Re: Either left() vs left(value)

2015-11-23 Thread Gyula Fóra
t and right values, no? > How about renaming to getLeft() / getRight()? > > -V. > > On 23 November 2015 at 09:55, Gyula Fóra <gyula.f...@gmail.com> wrote: > > > Hey guys, > > > > I know this should have been part of the PR discussion but it kind of > >

Re: ReduceByKeyAndWindow in Flink

2015-11-23 Thread Gyula Fóra
; Cheers, > > Konstantin > > On 23.11.2015 11:32, Gyula Fóra wrote: > > Hi, > > > > Alright it seems there are multiple ways of doing this. > > > > I would do something like: > > > > ds.keyBy(key) > > .timeWindow(w) > > .reduce(...) &

Either left() vs left(value)

2015-11-23 Thread Gyula Fóra
Hey guys, I know this should have been part of the PR discussion but it kind of slipped through the cracks :) I think it might be useful to change the method name for Either.left(value) to Either.Left(value) (or drop the method completely). The reason is that it is slightly awkward to use it

Re: Either left() vs left(value)

2015-11-23 Thread Gyula Fóra
aid in the PR, Left and Right make no sense on > their own, they're helper classes for Either. Hence, I believe they should > be private. Maybe we could rename the methods to createLeft() / > createRight() ? > > On 23 November 2015 at 20:58, Gyula Fóra <gyula.f...@gmail.com&g

Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Gyula Fóra
Hi all, Wouldnt you think that it would make sense to wait a week or so to find all the hot issues with the current release? To me it feels a little bit like rushing this out and we will have almost the same situation afterwards. I might be wrong but I think people should get a chance to try

Re: [DISCUSS] Release Flink 0.10.1 soon

2015-11-20 Thread Gyula Fóra
that we want fixed). We can > always do a new bug fix release. > > – Ufuk > > On Fri, Nov 20, 2015 at 11:25 AM, Gyula Fóra <gyula.f...@gmail.com> wrote: > > > Hi all, > > > > Wouldnt you think that it would make sense to wait a week or so to find > all >

Re: Failing kafka consumer unable to cancel

2015-11-17 Thread Gyula Fóra
ue, Nov 17, 2015 at 10:43 AM, Ufuk Celebi <u...@apache.org> wrote: > > > > > >> https://issues.apache.org/jira/browse/KAFKA-824 > > >> > > >> This has been fixed for Kafka’s 0.9.0 version. > > >> > > >> We should investiga

Re: Failing kafka consumer unable to cancel

2015-11-17 Thread Gyula Fóra
Should I open a JIRA for this? Gyula Fóra <gyula.f...@gmail.com> ezt írta (időpont: 2015. nov. 17., K, 11:30): > Thanks for the quick response and thorough explanation :) > > Gyula > > Robert Metzger <rmetz...@apache.org> ezt írta (időpont: 2015. nov. 17., &g

Failing kafka consumer unable to cancel

2015-11-17 Thread Gyula Fóra
Hey guys, I ran into some issue with the kafka consumers. I am reading from more than 50 topics with parallelism 1, and while running the job I got the following exception during the checkpoint notification (offset committing): java.lang.RuntimeException: Error while confirming checkpoint at

Re: Connection reset by peer

2015-11-15 Thread Gyula Fóra
thrown by the new web interface. Was it > > running with your job? > > > > My second best guess is that it was thrown by another component running > > Netty (maybe a Hadoop client?). > > > > – Ufuk > > > > PS Thanks for sharing the logs with me. :)

Connection reset by peer

2015-11-14 Thread Gyula Fóra
Hi guys, I have a Flink Streaming job running for about a day now without any errors and then I got this in the job manager log: 15:37:49,905 WARN io.netty.channel.DefaultChannelPipeline - An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually

Re: [DISCUSSION] Consistent shutdown of streaming jobs

2015-11-12 Thread Gyula Fóra
t; > 2) An option to shut down with external checkpoint would also be important, > to stop and resume from exactly there. > > > Stephan > > > On Wed, Nov 11, 2015 at 3:19 PM, Gyula Fóra <gyf...@apache.org> wrote: > > > Hey guys, > > > > With recen

Re: Flink managed memory in cluster mode

2015-11-12 Thread Gyula Fóra
aged memory is lazily allocated and the logged amount is an > upper bound. > > Cheers, Fabian > > 2015-11-12 13:37 GMT+01:00 Gyula Fóra <gyf...@apache.org>: > > > Hey guys, > > > > Is it normal that when I start the cluster with > start-cluster-streaming.sh &

Error in during TypeExtraction

2015-11-12 Thread Gyula Fóra
Hey, I get a weird error when I try to execute my job on the cluster. Locally this works fine but running it from the command line fails during typeextraction: input1.union(input2, input3).map(Either:: Left).returns(eventOrLongType); This fails when trying to extract the output

Flink managed memory in cluster mode

2015-11-12 Thread Gyula Fóra
Hey guys, Is it normal that when I start the cluster with start-cluster-streaming.sh out of the 16gb tm memory 10.6 gb becomes flink managed? (I get pretty much the same number when I use start-cluster.sh) I thought that Flink would only use a very small fraction in streaming mode. Cheers,

Re: [VOTE] Release Apache Flink 0.10.0 (release-0.10.0-rc8)

2015-11-12 Thread Gyula Fóra
This seems to be an issue only occuring when using Java 8 lambdas, which is still super annoying but may not be a release blocker. Gyula Fóra <gyula.f...@gmail.com> ezt írta (időpont: 2015. nov. 12., Cs, 15:38): > I am not sure if this issue affects the release or maybe I am j

Re: Streaming statefull operator with hashmap

2015-11-11 Thread Gyula Fóra
Hey, Yes what you wrote should work. You can alternatively use TypeExtractor.getForObject(modelMapInit) to extract the tye information. I also like to implement my custom type info for Hashmaps and the other types and use that. Cheers, Gyula Martin Neumann ezt írta (időpont:

Function input type validation

2015-11-08 Thread Gyula Fóra
Hey All, I am wondering what is the reason why Function input types are validated? This might become an issue if the user wants to write his own TypeInfo for a type that flink also handles natively. Let's say I want to implement my own TupleTypeinfo that handles null values, and I pass this

Accessing TM metrics

2015-11-07 Thread Gyula Fóra
Hey guys, I am trying to look at the throughput of my Flink Streaming job over time. Is there any way to extract this information from the dashboard or is it only possible to view the cumulative statistics at given time points. Also I am wondering whether there is any info about the latency in

Re: Accessing TM metrics

2015-11-07 Thread Gyula Fóra
special jobs where we could sample the records > that after two re-partitionings return to the same JVM, so we would not > have clock misalignment. Still thinking about good ways to have a general > purpose latency measurement mechanism. > > If you have any ideas there, let me know

Re: Error with window fold

2015-11-04 Thread Gyula Fóra
he.org> wrote: > > Hi Gyula, > > > > Trying to reproduce this error now. I'm assuming this is 0.10-SNAPSHOT? > > > > Cheers, > > Max > > > > On Wed, Nov 4, 2015 at 1:49 PM, Gyula Fóra <gyf...@apache.org> wrote: > >> Hey, > >

Error with window fold

2015-11-04 Thread Gyula Fóra
Hey, Running the following simple application gives me an error: //just counting by key, the streamOfIntegers.keyBy(x -> x).timeWindow(Time.milliseconds(3000)).fold(0, ( c, next) -> c + 1).print(); Executing this gives the following error: "No initial value was serialized for the fold window

Failure in KafkaIT case

2015-11-04 Thread Gyula Fóra
Hey, I found an interesting failure in the KafkaITCase, I am not sure if this happened before. It received a duplicate record and failed on that (not the usual zookeeper timeout thing) Logs are here: https://s3.amazonaws.com/archive.travis-ci.org/jobs/89171477/log.txt Cheers, Gyula

Re: Failure in KafkaIT case

2015-11-04 Thread Gyula Fóra
done Till Rohrmann <trohrm...@apache.org> ezt írta (időpont: 2015. nov. 4., Sze, 11:19): > Could you please open or update the corresponding JIRA issue if existing. > > On Wed, Nov 4, 2015 at 11:14 AM, Gyula Fóra <gyf...@apache.org> wrote: > > > Hey, > >

Re: Rethink the "always copy" policy for streaming topologies

2015-10-24 Thread Gyula Fóra
Hey guys, Have we disabled the default input copying after all? I don't remember seeing a Jira or PR for this (maybe I just missed it). And if not, do we want this in the 0.10 release? Cheers, Gyula On Fri, Oct 2, 2015 at 7:57 PM, Till Rohrmann wrote: > Do we know what

Re: Kafka source stuck while canceling

2015-10-22 Thread Gyula Fóra
ate. It might also be stuck on a > lock, in which case it would be waiting for the lock holder to terminate. > > Do you have the traces from other threads as well, so we could look which > one actually is stuck while holding the lock? > > Greetings, > Stephan > > &

Re: [VOTE] Release Apache Flink 0.10.0 (release-0.10.0-rc0)

2015-10-21 Thread Gyula Fóra
Thanks Max for the effort, this is going to be huge :) Unfortunately I have to say -1 FLINK-2888 and FLINK-2824 are blockers from my point of view. Cheers, Gyula Vasiliki Kalavri ezt írta (időpont: 2015. okt. 21., Sze, 20:07): > Awesome! Thanks Max :)) > > I have a

Re: [DISCUSS] Java code style

2015-10-21 Thread Gyula Fóra
I think the nice thing about a common codestyle is that everyone can set the template in the IDE and use the formatting commands. Matthias's suggestion makes this practically impossible so -1 for mixed tabs/spaces from my side. Matthias J. Sax ezt írta (időpont: 2015. okt.

Checkpoints keep waiting on source locks

2015-10-21 Thread Gyula Fóra
Hey All, I think there is some serious issue with the checkpoints. Running a simple program like this won't complete any checkpoints: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(2); env.enableCheckpointing(5000);

Re: Checkpoints keep waiting on source locks

2015-10-21 Thread Gyula Fóra
; > Greetings, > Stephan > > > On Wed, Oct 21, 2015 at 1:47 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: > > > Hey All, > > > > I think there is some serious issue with the checkpoints. Running a > simple > > program like this won't complete

Re: [DISCUSS] Java code style

2015-10-20 Thread Gyula Fóra
+1 for both :) Till Rohrmann ezt írta (időpont: 2015. okt. 20., K, 14:58): > I like the idea to have a bit stricter code style which will increase code > maintainability and makes it easier for people to go through the code. > Furthermore, it will relieve us from code

Kafka source stuck while canceling

2015-10-19 Thread Gyula Fóra
Hey guys, Has anyone ever got something similar working with the kafka sources? 11:52:48,838 WARN org.apache.flink.runtime.taskmanager.Task - Task 'Source: Kafka[***] (3/4)' did not react to cancelling signal, but is stuck in method:

Re: Iteration feedback partitioning does not work properly

2015-10-08 Thread Gyula Fóra
hole? > > Greetings, > Stephan > > > > On Thu, Oct 8, 2015 at 4:42 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: > > > The feedback tuples might get rebalanced but the normal input should not. > > > > But still the main problem is the fact that partitio

Re: Iteration feedback partitioning does not work properly

2015-10-08 Thread Gyula Fóra
the iteration head. I think this > should break ordering as well, in your case. > > On Tue, 6 Oct 2015 at 10:39 Gyula Fóra <gyula.f...@gmail.com> wrote: > > > Hi, > > > > This is just a workaround, which actually breaks input order from my > > source.

Re: TM failure when deploying a large number of sources

2015-10-07 Thread Gyula Fóra
on one machine, the JVM often has not > enough memory reserved for the stack space to create enough threads (1-2 > threads per task)... > > On Wed, Oct 7, 2015 at 2:13 PM, Gyula Fóra <gyf...@apache.org> wrote: > > > Hey guys, > > > > I am writing a job which inv

TM failure when deploying a large number of sources

2015-10-07 Thread Gyula Fóra
Hey guys, I am writing a job which involves creating many different sources to read data from (in this case 80 sources wiht the parallelism of 8 each, running locally on my mac). I cannot create less unfortunately. The problem is that the job fails while deploying the tasks with the following

Re: Iteration feedback partitioning does not work properly

2015-10-06 Thread Gyula Fóra
in.map(IdentityMap).setParallelism(2).iterate() > DataStream mapped = it.map(...) > it.closeWith(mapped.partitionByHash(someField)) > > The input is rebalanced to the map inside the iteration as in your example > and the feedback should be partitioned by hash. > > Cheers, > Aljosc

Iteration feedback partitioning does not work properly

2015-10-05 Thread Gyula Fóra
Hey, This question is mainly targeted towards Aljoscha but maybe someone can help me out here: I think the way feedback partitioning is handled does not work, let me illustrate with a simple example: IterativeStream it = ... (parallelism 1) DataStream mapped = it.map(...) (parallelism 2) //

Re: Streaming KV store abstraction

2015-09-15 Thread Gyula Fóra
: 2015. szept. 9., Sze, 20:25): > Yes, pretty clear. I guess semantically it's still a co-group, but > implemented slightly differently. > > Thanks! > > -- > Gianmarco > > On 9 September 2015 at 15:37, Gyula Fóra <gyula.f...@gmail.com> wrote: > > > Hey Gianmarc

Re: Using event timestamps

2015-09-14 Thread Gyula Fóra
sources that you are looking for, but maybe I am > > also missing it. :) > > It would be a convenient addition though. > > > > Best, > > > > Marton > > > > On Sun, Sep 13, 2015 at 8:59 PM, Gyula Fóra <gyf...@apache.org> wrote: > > > > > He

Using event timestamps

2015-09-13 Thread Gyula Fóra
Hey All! Is there a proper way of using a Flink Streaming source with event timestamps and watermarks? What I mean here is instead of implementing a custom SourceFunction, use an existing one and provide some Timestamp extractor (like the one currently used for Time windows), which will also

Re: Releasing 0.10.0-milestone1

2015-09-09 Thread Gyula Fóra
This sounds good +1 from me as well :) Till Rohrmann ezt írta (időpont: 2015. szept. 9., Sze, 10:40): > +1 for a milestone release with the TypeInformation issues fixed. I'm > working on it. > > On Tue, Sep 8, 2015 at 9:32 PM, Stephan Ewen wrote: > > >

Re: Streaming KV store abstraction

2015-09-09 Thread Gyula Fóra
is to see what > > kind of throughput the partitioned state system can handle and also how > it > > behaves with larger numbers of keys. The KVStore is just an interface and > > the really heavy lifting is done by the state system so this would be a > > good test for it. > &g

Streaming KV store abstraction

2015-09-08 Thread Gyula Fóra
Hey All, The last couple of days I have been playing around with the idea of building a streaming key-value store abstraction using stateful streaming operators that can be used within Flink Streaming programs seamlessly. Operations executed on this KV store would be fault tolerant as it

Re: Streaming KV store abstraction

2015-09-08 Thread Gyula Fóra
same > > users in an other subsystem; mapping from IDs of phone calls to lists > > of IDs of participating users; etc. > > So I imagine they would like this a lot. (At least, if they were > > considering moving to Flink :)) > > > > Best, > > Gabor > > > &g

Re: [ANNOUNCE] New Committer Chesnay Schepler

2015-08-20 Thread Gyula Fóra
Welcome! :) On Thu, Aug 20, 2015 at 12:34 PM Matthias J. Sax mj...@informatik.hu-berlin.de wrote: Congrats! The squirrel army is growing fast. :) On 08/20/2015 11:18 AM, Robert Metzger wrote: The Project Management Committee (PMC) for Apache Flink has asked Chesnay Schepler to become a

Re: Failing Test again

2015-08-04 Thread Gyula Fóra
Honestly I don't think the partitioned state changes have anything to do with the stability, only the reworked test case, which now test proper exactly-once which was missing before. Stephan Ewen se...@apache.org ezt írta (időpont: 2015. aug. 4., K, 12:12): Yes, the build stability is super

Re: Question About Preserve Partitioning in Stream Iteration

2015-08-03 Thread Gyula Fóra
) iter.closeWith(tail1.union(tail2)) (Which is also tricky with the parallelism of the input stream) On Sun, 2 Aug 2015 at 21:22 Gyula Fóra gyula.f...@gmail.com wrote: In a streaming program when we create an IterativeDataStream, we practically mark the union point of some

Re: Question About Preserve Partitioning in Stream Iteration

2015-08-03 Thread Gyula Fóra
iterations should still be possible... On Mon, Aug 3, 2015 at 10:21 AM, Gyula Fóra gyula.f...@gmail.com wrote: It is critical for many applications (such as SAMOA or Storm compatibility) to build arbitrary cyclic flows. If your suggestion covers all cases (for instance nested iterations

Re: Question About Preserve Partitioning in Stream Iteration

2015-08-02 Thread Gyula Fóra
the discussion here, can you help me with what you mean by different iteration heads and tails ? An iteration does not have one parallel head and one parallel tail? On Fri, Jul 31, 2015 at 6:52 PM, Gyula Fóra gyula.f...@gmail.com wrote: Maybe you can reuse some of the logic that is currently

Re: Question About Preserve Partitioning in Stream Iteration

2015-07-31 Thread Gyula Fóra
(at least IMHO). Maybe I'm wrong there, though. To me it seems intuitive that I get the feedback at the head they way I specify it at the tail. But maybe that's also just me... :D On Fri, 31 Jul 2015 at 14:00 Gyula Fóra gyf...@apache.org wrote: Hey, I am not sure what is the intuitive

Re: Question About Preserve Partitioning in Stream Iteration

2015-07-31 Thread Gyula Fóra
this program: https://gist.github.com/aljoscha/45aaf62b2a7957cfafd5 It works, and the implementation is very simple, actually. On Fri, 31 Jul 2015 at 14:30 Gyula Fóra gyula.f...@gmail.com wrote: I mean that the head operators have different parallelism: IterativeDataStream ids = ... ids.map

Re: Question About Preserve Partitioning in Stream Iteration

2015-07-31 Thread Gyula Fóra
that it does translate and run. Your observation is true. :D I'm wondering whether it makes sense to allow users to have iteration heads with differing parallelism, in fact. On Fri, 31 Jul 2015 at 16:40 Gyula Fóra gyula.f...@gmail.com wrote: I still don't get how it could possibly work

Re: Question About Preserve Partitioning in Stream Iteration

2015-07-31 Thread Gyula Fóra
documented yet, but the base class for transformations is StreamTransformation. From there anyone who want's to check it out can find the other transformations. On Fri, 31 Jul 2015 at 17:17 Gyula Fóra gyula.f...@gmail.com wrote: There might be reasons why a user would want different

Re: Question about DataStream class hierarchy

2015-07-31 Thread Gyula Fóra
Hi Matthias, I think Aljoscha is preparing a nice PR that completely reworks the DataStream classes and the information they actually contain. I don't think it's a good idea to mess things up before he gets a chance to open the PR. Also I don't see a well supported reason for moving the

Re: Question About Preserve Partitioning in Stream Iteration

2015-07-31 Thread Gyula Fóra
Hey, I am not sure what is the intuitive behaviour here. As you are not applying a transformation on the feedback stream but pass it to a closeWith method, I thought it was somehow nature that it gets the partitioning of the iteration input, but maybe its not intuitive. If others also think that

Re: Types in the Python API

2015-07-31 Thread Gyula Fóra
that a) disables all type checks b) creates serializers dynamically at runtime. a) should be fairly straight forward, b) on the other hand btw., the Python API itself doesn't require the type information, it already does the b part. On 30.07.2015 22:11, Gyula Fóra wrote

Re: Guide/design doc for streaming operator states

2015-07-30 Thread Gyula Fóra
at 15:55 Gyula Fóra gyf...@apache.org wrote: Hey! I started putting together a guide/design document for the streaming operator state interfaces and implementations. The idea would be to create a doc that contains all the details about the implementations so anyone can use

Types in the Python API

2015-07-30 Thread Gyula Fóra
Hey! Could anyone briefly tell me what exactly is the reason why we force the users in the Python API to declare types for operators? I don't really understand how this works in different systems but I am just curious why Flink has types and why Spark doesn't for instance. If you give me some

Re: On some GUI tools for building Flink Streaming data flows...

2015-07-29 Thread Gyula Fóra
Hi Slim, I totally agree with you that we should start working on supporting tools like these. StreamFlow is really nice tool, I also like it a lot. I think it would be not too hard to integrate it with Flink as there are several projects that have already used Flink streaming in a compositional

Re: Revert 78fd2146dd until we have consensus for FLINK-2419?

2015-07-28 Thread Gyula Fóra
added a test here that is true. With FLINK-2423 I was fixing some else's mistake who disregarded my message when merging a PR. We could now revert that PR that introduced that bug, but instead we are reverting my fix for that mistake. Gyula Fóra gyula.f...@gmail.com ezt írta (időpont: 2015. júl

Re: Revert 78fd2146dd until we have consensus for FLINK-2419?

2015-07-28 Thread Gyula Fóra
Hey, I am sorry that you feel bad about this, I only did not add a test case for FLINK-2419 because I am adding a test in my upcoming PR which verified the behaviour. As for FLINK-2423, it is actually very bad that issue is still there. You introduced this in your PR

Re: Revert 78fd2146dd until we have consensus for FLINK-2419?

2015-07-28 Thread Gyula Fóra
be subject to consensus in a similar way as PRs. The right to commit does not mean that consensus should not be reached, and this is a clear case of not having consensus. Kostas On Tue, Jul 28, 2015 at 8:30 PM, Gyula Fóra gyula.f...@gmail.com wrote: What concerns me here is that for FLINK-2419 I

Re: Revert 78fd2146dd until we have consensus for FLINK-2419?

2015-07-28 Thread Gyula Fóra
at 9:09 PM, Gyula Fóra gyula.f...@gmail.com wrote: I agree that consensus should be reached in all changes to the system. Then Robert and you should reach consensus on FLINK-2419. What is not clear to me is what is the subject of consensus in this case. As for FLINK-2423

Re: streaming iteration

2015-07-27 Thread Gyula Fóra
Hey, The JobGraph that is executed cannot have cycles (due to scheduling reasons), that's why there is no edge between the head and tail operators. What we do instead is we add an extra source to the head and a sink to the tail (with the same parallelism), and the feedback data is passed outside

Re: Extending the streaming scala api with stateful functions

2015-07-25 Thread Gyula Fóra
runtime constructs... On Fri, Jul 24, 2015 at 5:43 PM, Aljoscha Krettek aljos...@apache.org wrote: Yes, this might be nice. Till and I had similar ideas about using the pattern to make broadcast variables more useable in Scala, in fact. :D On Fri, 24 Jul 2015 at 17:39 Gyula Fóra

Extending the streaming scala api with stateful functions

2015-07-24 Thread Gyula Fóra
Hey, I would like to propose a way to extend the standard Streaming Scala API methods (map, flatmap, filter etc) with versions that take stateful functions as lambdas. I think this would eliminate the awkwardness of implementing RichFunctions in Scala and make statefulness more explicit: *For

Guide/design doc for streaming operator states

2015-07-23 Thread Gyula Fóra
Hey! I started putting together a guide/design document for the streaming operator state interfaces and implementations. The idea would be to create a doc that contains all the details about the implementations so anyone can use it as a reference later.

Re: Thoughts About Streaming

2015-07-23 Thread Gyula Fóra
at 09:46 Gyula Fóra gyula.f...@gmail.com wrote: I agree lets separate these topics from each other so we can get faster resolution. There is already a state discussion in the thread we started with Paris. On Wed, Jun 24, 2015 at 9:24

Re: Design documents for consolidated DataStream API

2015-07-14 Thread Gyula Fóra
useful for this purpose. On Mon, Jul 13, 2015 at 6:30 PM, Gyula Fóra gyula.f...@gmail.com wrote: +1 On Mon, Jul 13, 2015 at 6:23 PM Stephan Ewen se...@apache.org wrote: If naming is the only concern, then we should go ahead, because we can change names easily (before the release

Re: Design documents for consolidated DataStream API

2015-07-14 Thread Gyula Fóra
, Jul 14, 2015 at 10:48 AM, Gyula Fóra gyula.f...@gmail.com wrote: If we only want to have either keyBy or groupBy, why not keep groupBy? That would be more consistent with the batch api. On Tue, Jul 14, 2015 at 10:35 AM Stephan Ewen se...@apache.org wrote: Concerning your

Re: Design documents for consolidated DataStream API

2015-07-14 Thread Gyula Fóra
, Gyula Fóra gyula.f...@gmail.com wrote: I think Marton has some good points here. 1) Is KeyedDataStream a better name if this is only a renaming? 2) the discretize semantics is unclear indeed. Are we operating on a single or sequence of datasets? If the latter why not call it something

Re: Design documents for consolidated DataStream API

2015-07-13 Thread Gyula Fóra
this soon, because basically all streaming patches will have to be revisited in light of this... On Tue, Jul 7, 2015 at 3:41 PM, Gyula Fóra gyula.f...@gmail.com wrote: You are right thats an important issue. And I think we should also do some renaming with the iterations because

Re: Design documents for consolidated DataStream API

2015-07-13 Thread Gyula Fóra
. Not part of the first prototype. Do we agree on those points? On Mon, Jul 13, 2015 at 4:50 PM, Gyula Fóra gyula.f...@gmail.com wrote: In general I like it, although the main difference between the current and the new one is the windowing and that is still not very clear. Where do we have

Re: Design documents for consolidated DataStream API

2015-07-13 Thread Gyula Fóra
parallel windows. Pick what you need and what works. On Mon, Jul 13, 2015 at 6:13 PM, Gyula Fóra gyula.f...@gmail.com wrote: I think we agree on everything its more of a naming issue :) I thought it might be misleading that global time windows are non-parallel windows. We dont want

Re: Github behind aft-git

2015-07-12 Thread Gyula Fóra
This happens quite often :/ Matthias J. Sax mj...@informatik.hu-berlin.de ezt írta (időpont: 2015. júl. 12., V, 14:28): Hi, Github master is behind asf-git by two days. The last 6 commits are not available at Github. Please compare: - https://github.com/apache/flink/commits/master -

Re: Redesigned Features page

2015-07-07 Thread Gyula Fóra
I think the content is pretty good, much better than before. But the page structure could be better (and this is very important in my opinion). Now it just looks like a long list of features without any ways to navigate between them. We should probably have something at the top that summarizes the

Re: Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
:57 Gyula Fóra gyf...@apache.org wrote: Hey, Along with the suggested changes to the streaming API structure I think we should also rework the iteration api. Currently the iteration api tries to mimic the syntax of the batch API while the runtime behaviour is quite different. What we

Re: Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
@Kostas: This new API is I believe equivalent in expressivity with the current one. We can define nested loops now as well. And I also don't see nested loops much worse generally than simple loops. Gyula Fóra gyula.f...@gmail.com ezt írta (időpont: 2015. júl. 7., K, 16:14): Sorry Stephan I

Re: Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
yes. I was just about to open an Issue for changing the Streaming Iteration API. :D Then we should also make the implementation very straightforward and simple, right now, the implementation of the iterations is all over the place. On Tue, 7 Jul 2015 at 15:57 Gyula Fóra gyf

Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
Hey, Along with the suggested changes to the streaming API structure I think we should also rework the iteration api. Currently the iteration api tries to mimic the syntax of the batch API while the runtime behaviour is quite different. What we create instead of iterations is really just cyclic

Re: Rework of streaming iteration API

2015-07-07 Thread Gyula Fóra
approach, because it is explicit where streams could be altered in hind-sight (after their definition). On Tue, Jul 7, 2015 at 4:20 PM, Gyula Fóra gyula.f...@gmail.com wrote: @Aljoscha: Yes, thats basically my point as well. This is what happens now too but we give this mutable datastream

Re: Replacing Checkpointed interface with field annotations

2015-07-01 Thread Gyula Fóra
wrote: Wow, this looks pretty concise. I really like it! On Mon, Jun 29, 2015 at 3:27 PM Gyula Fóra gyf...@apache.org wrote: Hey all! Just to add something new to the end of the discussion list. After some discussion with Seif, and Paris, I have added a commit

Re: Replacing Checkpointed interface with field annotations

2015-07-01 Thread Gyula Fóra
tested. On Wed, Jul 1, 2015 at 11:15 AM, Ufuk Celebi u...@apache.org wrote: On 01 Jul 2015, at 10:57, Gyula Fóra gyula.f...@gmail.com wrote: Hey, Thanks for the feedback guys: @Max: You are right, this is not top priority to changes, I was just mocking up some alternatives

Replacing Checkpointed interface with field annotations

2015-06-29 Thread Gyula Fóra
Hey all! Just to add something new to the end of the discussion list. After some discussion with Seif, and Paris, I have added a commit that replaces the use of the Checkpointed interface with field annotations. This is probably the most lightweight state declaration so far and it will probably

Re: Is there Any api that let DataStream join DataSet ?

2015-06-29 Thread Gyula Fóra
You are right, one cannot use the current window-join implementation to this. A workaround is to implement your custom binary stream operator that will wait until it receives the whole file, then starts joining. For instance a filestream.connect(streamToJoinWith).flatMap( CustomCoFlatMap that

Re: Storm Compatibility Improvement

2015-06-29 Thread Gyula Fóra
Hey, I didn't look through the whole code so I probably don't get something but why don't you just do what storm does? Keep a map from the field names to indexes somewhere (make this accessible from the tuple) and then you can just use a simple Flink tuple. I think this is what's happening in

Re: Storm Compatibility Improvement

2015-06-29 Thread Gyula Fóra
By declare I mean we assume a Flink Tuple datatype and the user declares the name mapping (sorry its getting late). Gyula Fóra gyula.f...@gmail.com ezt írta (időpont: 2015. jún. 29., H, 23:57): Ah ok, now I get what I didn't get before :) So you want to take some input stream , and execute

Stream iteration head as ConnectedDataStream

2015-06-26 Thread Gyula Fóra
Hey! Now that we are implementing more and more applications for streaming that use iterations we realized a huge shortcoming of the current iteration api. Currently it only allows to feedback data of the same type to the iteration head. This makes sense because the operators are typed but makes

Re: Thoughts About Streaming

2015-06-23 Thread Gyula Fóra
useless. For the API, I proposed to restructure the interactions between all the different *DataStream classes and grouping/windowing. (See API section of the doc I posted.) On Mon, 22 Jun 2015 at 21:56 Gyula Fóra gyula.f...@gmail.com wrote: Hi Aljoscha, Thanks for the nice summary

Removing reduce/aggregations from non-grouped data streams

2015-06-22 Thread Gyula Fóra
Hey all, Currently we have reduce and aggregation methods for non-grouped DataStreams as well, which will produce local aggregates depending on the parallelism of the operator. This behaviour is neither intuitive nor useful as it only produces sensible results if the user specifically sets the

Re: Removing reduce/aggregations from non-grouped data streams

2015-06-22 Thread Gyula Fóra
I opened a PR https://github.com/apache/flink/pull/860 for this. Stephan Ewen se...@apache.org ezt írta (időpont: 2015. jún. 22., H, 19:25): +1 totally agreed On Mon, Jun 22, 2015 at 5:32 PM, Gyula Fóra gyf...@apache.org wrote: Hey all, Currently we have reduce and aggregation methods

Re: Thoughts About Streaming

2015-06-22 Thread Gyula Fóra
Hi Aljoscha, Thanks for the nice summary, this is a very good initiative. I added some comments to the respective sections (where I didnt fully agree :).). At some point I think it would be good to have a public hangout session on this, which could make a more dynamic discussion. Cheers, Gyula

Re: Testing Apache Flink 0.9.0-rc2

2015-06-15 Thread Gyula Fóra
The checkpoint cleanup works for HDFS right? I assume the job manager should see that as well. This is not a trivial problem in general, so the assumptions we were making now that the JM can actually execute the cleanup logic. Aljoscha Krettek aljos...@apache.org ezt írta (időpont: 2015. jún.

<    1   2   3   4   5   6   7   >