Re: Flink Event Time order

2017-05-03 Thread Aljoscha Krettek
Ok, thanks for letting us know that you solved it! > On 4. May 2017, at 07:41, Björn Zachrisson wrote: > > Hi, > > No that is only a subset of the code. > I realized it was the multiple sink/parallelization that caused my "mystique > behavior". > So when sorting each

Re: Tuning RocksDB

2017-05-03 Thread Jason Brelloch
Yep, that was exactly the issue. Thanks for the help! On Wed, May 3, 2017 at 2:44 PM, Stefan Richter wrote: > Ok, given the info that you are using ListState (which uses RocksDB’s > merge() internally) this is probably a case of this problem: >

Re: Problem: Please check that all IDs specified via `uid(String)` are unique.

2017-05-03 Thread Chesnay Schepler
Is the method called multiple times with different input sets? Does the issue occur regardless of which uid you set? On 03.05.2017 20:21, Rami Al-Isawi wrote: Hi, Nope, that is the only uid in the whole project. The “userRawDataStream" comes from a splitter: public static class

Re: Tuning RocksDB

2017-05-03 Thread Stefan Richter
Ok, given the info that you are using ListState (which uses RocksDB’s merge() internally) this is probably a case of this problem: https://github.com/facebook/rocksdb/issues/1988 We provide a custom version of RocksDB with Flink 1.2.1 (where we

Trevor Grant has shared a document on Google Docs with you

2017-05-03 Thread trevor . d . grant
Trevor Grant has invited you to view the following document: Open in Docs

Re: Problem: Please check that all IDs specified via `uid(String)` are unique.

2017-05-03 Thread Rami Al-Isawi
Hi, Nope, that is the only uid in the whole project. The “userRawDataStream" comes from a splitter: public static class Splitter implements OutputSelector{ @Override public Iterable select(EventRaw value) { List output = new ArrayList<>(); output.add(value.type);

Re: Tuning RocksDB

2017-05-03 Thread Jason Brelloch
So looking through the logs I found these lines (repeated same test again with a rocksDB backend, took 5m55s): 2017-05-03 12:52:24,131 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 2 @ 1493830344131 2017-05-03 12:52:24,132 INFO

Re: Tuning RocksDB

2017-05-03 Thread Stefan Richter
Sorry, just saw that your question was actually mainly about checkpointing, but it can still be related to my previous answer. I assume the checkpointing time is the time that is reported in the web interface? This would be the end-to-end runtime of the checkpoint which does not really tell you

Re: Tuning RocksDB

2017-05-03 Thread Stefan Richter
Hi, typically, I would expect that the bottleneck with the RocksDB backend is not RocksDB itself, but your TypeSerializers. I suggest to first run a profiler/sampling attached to the process and check if the problematic methods are in serialization or the actual accesses to RocksDB. The

Re: High Availability on Yarn

2017-05-03 Thread Jain, Ankit
Thanks for your reply Aljoscha. After building better understanding of Yarn and spending copious amount of time on Flink codebase, I think I now get how Flink & Yarn interact – I plan to document this soon in case it could help somebody starting afresh with Flink-Yarn. Regarding Zookeper, in

Re: Monitoring memory usage of a Flink Job

2017-05-03 Thread Sendoh
Hi Robert, Could I ask which endpoint you use to get the memory statistics of a Flink job? I checked here but don't know which one to use. https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/rest_api.html Or should we put the memory in the metrics? Best, Sendoh -- View

Tuning RocksDB

2017-05-03 Thread Jason Brelloch
Hey all, I am looking for some advice on tuning rocksDB for better performance in Flink 1.2. I created a pretty simple job with a single kafka source and one flatmap function that just stores 5 events in a single key of managed keyed state and then drops everything else, to test checkpoint

Re: Queries regarding Historical Reprocessing

2017-05-03 Thread Vinay Patil
Hi Guys, Can someone please help me in understanding this ? Regards, Vinay Patil On Thu, Apr 27, 2017 at 12:36 PM, Vinay Patil wrote: > Hi Guys, > > For historical reprocessing , I am reading the avro data from S3 and > passing these records to the same pipeline for

ElasticsearchSink on DataSet

2017-05-03 Thread Flavio Pompermaier
Hi to all, at the moment I have a Flink Job that generates a DataSet that I write to a File that is read by Logstash to index data on ES. I'd like to use the new ElasticsearchSink to index those JSON directly from Flink but ElasticsearchSink only works with streaming environment. Is there any

Re: Methods that trigger execution

2017-05-03 Thread Aljoscha Krettek
Hi, Yes you’re right, there is no convenient list. Off the top of my head, your list seems exhaustive. (You could add printToErr()). As a general remark, I don’t think it’s wise to use these methods when handling large amounts of data because they ship everything back to the client. Best,

Re: RocksDB error with flink 1.2.0

2017-05-03 Thread Stephan Ewen
Multiplexing patterns seems like the right thing to do. Aside from not sharing rocksdb, having 300 separate operators also results in more threads, network connections, etc. That makes it all less efficient... On Tue, May 2, 2017 at 6:06 PM, Aljoscha Krettek wrote: > They

Re: Programmatic management of Flink jobs

2017-05-03 Thread Aljoscha Krettek
Hi, Yes, this would work even though it requires canceling/restarting the job whenever the patterns change. Best, Aljoscha > On 3. May 2017, at 12:36, Moiz S Jinia wrote: > > The kind of program I intend to submit would be one that sets up a > StreamExecutionEnvironment,

Re: Join two kafka topics

2017-05-03 Thread Aljoscha Krettek
Hi, You get that by having the two input streams keyed on the same key. Either by doing keyBy() on them individually or by using keyBy() on the ConnectedStream. Best, Aljoscha > On 3. May 2017, at 13:06, Tarek khal-letaief > wrote: > > Hi Aljoscha, > Thanks for

Re: Problem: Please check that all IDs specified via `uid(String)` are unique.

2017-05-03 Thread Chesnay Schepler
Hello, was a uid set on "userRawDataStream", or any of it's parent transformations? On 03.05.2017 12:59, Rami Al-Isawi wrote: Hi, I am trying to set uids. I keep getting this (Flink.1.2): Exception in thread "main" java.lang.IllegalArgumentException: Hash collision on user-specified ID.

Problem: Please check that all IDs specified via `uid(String)` are unique.

2017-05-03 Thread Rami Al-Isawi
Hi, I am trying to set uids. I keep getting this (Flink.1.2): Exception in thread "main" java.lang.IllegalArgumentException: Hash collision on user-specified ID. Most likely cause is a non-unique ID. Please check that all IDs specified via `uid(String)` are unique. Here is the code snippet.

Methods that trigger execution

2017-05-03 Thread Sebastian Neef
Hi, I've heared of some methods that triggere an execution when using the Batch API: - print - collect - count - execute Some of them are discussed in older docs [0], but I can't find a good list or hints in the newer ones. Are there any other methods? Best regards, Sebastian [0]

Re: Window Function on AllWindowed Stream - Combining Kafka Topics

2017-05-03 Thread G.S.Vijay Raajaa
Thanks for your input, will try to incorporate them in my implementation. Regards, Vijay Raajaa G S On Wed, May 3, 2017 at 3:28 PM, Aljoscha Krettek wrote: > The approach could work, but if it can happen that an event from stream A > is not matched by an event in stream B

Re: Programmatic management of Flink jobs

2017-05-03 Thread Moiz S Jinia
Not sure I understand Operators. What I need is to have a Pattern that starts consuming from a Kafka stream. And I need the Patterns to come and go. Another option that comes to mind is this - The Patterns I'll need are well known in advance. Only certain parameters such as the time duration of

Re: Window Function on AllWindowed Stream - Combining Kafka Topics

2017-05-03 Thread G.S.Vijay Raajaa
Sure. Thanks for the pointer, let me reorder the same. Any comments about the approach followed for merging topics and creating a single JSON? Regards, Vijay Raajaa G S On Wed, May 3, 2017 at 2:41 PM, Aljoscha Krettek wrote: > Hi, > An AllWindow operator requires an

Re: Programmatic management of Flink jobs

2017-05-03 Thread Aljoscha Krettek
What would the pattern be added to. An existing custom operator? The REST interface only allows for managing the lifecycle of a job, not modifying their graph structure. > On 3. May 2017, at 11:43, Moiz S Jinia wrote: > > Thanks for the references. Looking at the REST

Re: Join two kafka topics

2017-05-03 Thread Tarek khal
Hi Aljoscha, Thanks for reply. I opted for the first solution: to use a connected FlatMap [1] but how i can simulate the while(key) and the equalTo(key) of a "join" because the function gets individual calls to flatMap1 and flatMap2. Best regards, Tarek, -- View this message in context:

Re: Programmatic management of Flink jobs

2017-05-03 Thread Aljoscha Krettek
Hi, For managing a Job you can either use the bin/flink command-line tool or the Rest API [1]. As for dynamically adding patterns, that’s outside of the scope of Flink right now. There are, however, some users that implemented this on top of Flink, see for example RBEA [2]. The basic idea is to

Re: Join two kafka topics

2017-05-03 Thread Aljoscha Krettek
Hi, Instead of a Join, I would suggest to use a connected FlatMap [1] (or a connected ProcessFunction [2]). The problem with a join is that the rules only “survive” for the length of the window while I suspect that you want them to survive longer than that so that they can be applied to events

Re: Periodic flush sink?

2017-05-03 Thread Ted Yu
bq. is the mutator thread safe? See HBASE-17361 On Wed, May 3, 2017 at 1:52 AM, Aljoscha Krettek wrote: > Hi Niels, > With any kind of buffering you need to be careful when it comes to fault > tolerance. In your case, you should make sure to flush the buffers when >

Re: Window Function on AllWindowed Stream - Combining Kafka Topics

2017-05-03 Thread Aljoscha Krettek
Hi, An AllWindow operator requires an AllWindowFunction, instead of a WindowFunction. In your case, the keyBy() seems to be in the wrong place, to get a keyed window you have to write something akin to: inputStream .keyBy(…) .window(…) .apply(…) // or reduce() In your case, you key the

Re: High Availability on Yarn

2017-05-03 Thread Aljoscha Krettek
Hi, As a first comment, the work mentioned in the FLIP-6 doc you linked is still work-in-progress. You cannot use these abstractions yet without going into the code and setting up a cluster “by hand”. The documentation for one-step deployment of a Job to YARN is available here:

Re: Periodic flush sink?

2017-05-03 Thread Aljoscha Krettek
Hi Niels, With any kind of buffering you need to be careful when it comes to fault tolerance. In your case, you should make sure to flush the buffers when checkpointing, otherwise you might lose data because those elements will not be resend after a failure. With the periodic timer my only

Programmatic management of Flink jobs

2017-05-03 Thread Moiz S Jinia
Is there an API that allows remotely adding, modifying, and cancelling Flink jobs? Example - changing the time window of a deployed Pattern, adding new Patterns, etc. Whats the best way to go about this? To the end user the Pattern would manifest as rules that can be updated anytime. Moiz

Re: Weird serialization bug?

2017-05-03 Thread Aljoscha Krettek
To elaborate on what Ted said: fooo is defined inside a method and probably has references to outer (non serialisable) classes. > On 30. Apr 2017, at 01:15, Ted Yu wrote: > > Have you tried making fooo static ? > > Cheers > > On Sat, Apr 29, 2017 at 4:26 AM, Sebastian