TimelyFlatMapFunction and DataStream

2016-11-01 Thread Ken Krugler
I’m curious why it seems like a TimelyFlatMapFunction can’t be used with a regular DataStream, but it can be used with a KeyedStream. Or maybe I’m missing something obvious (this is with 1.2-SNAPSHOT, pulled today). Also the documentation of TimelyFlatMapFunction

Re: Flink Metrics - InfluxDB + Grafana | Help with query influxDB query for Grafana to plot 'numRecordsIn' & 'numRecordsOut' for each operator/operation

2016-11-01 Thread Anchit Jatana
I've set the metric reporting frequency to InfluxDB as 10s. In the screenshot, I'm using Grafana query interval of 1s. I've tried 10s and more too, the graph shape changes a bit but the incorrect negative values are still plotted(makes no difference). Something to add: If the subtasks are less

Re: Flink Metrics - InfluxDB + Grafana | Help with query influxDB query for Grafana to plot 'numRecordsIn' & 'numRecordsOut' for each operator/operation

2016-11-01 Thread Jamie Grier
Hmm. I can't recreate that behavior here. I have seen some issues like this if you're grouping by a time interval different from the metrics reporting interval you're using, though. How often are you reporting metrics to Influx? Are you using the same interval in your Grafana queries? I see

Re: Flink on YARN - Fault Tolerance | use case supported or not

2016-11-01 Thread Anchit Jatana
Yes, thank Stephan. Regards, Anchit -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-on-YARN-Fault-Tolerance-use-case-supported-or-not-tp9776p9817.html Sent from the Apache Flink User Mailing List archive. mailing list archive at

Re: Flink Metrics - InfluxDB + Grafana | Help with query influxDB query for Grafana to plot 'numRecordsIn' & 'numRecordsOut' for each operator/operation

2016-11-01 Thread Anchit Jatana
Hi Jamie, Thank you so much for your response. The below query: SELECT derivative(sum("count"), 1s) FROM "numRecordsIn" WHERE "task_name" = 'Sink: Unnamed' AND $timeFilter GROUP BY time(1s) behaves the same as with the use of the templating variable in the 'All' case i.e. shows a plots of

Re: watermark trigger doesn't check whether element's timestamp is passed

2016-11-01 Thread aj heller
Hi Manu, Aljoscha, I had been interested in implementing FLIP-2, but I haven't been able to make time for it. There is no implementation yet that I'm aware of, and I'll gladly step aside (or help out how I can) if you or anyone is interested to take charge of it. That said, I'm also not sure if

Re: Question about the checkpoint mechanism in Flink.

2016-11-01 Thread Renjie Liu
Hi, Till: I think the multiple input should include the more general case where redistribution happens between subtasks, right? Since in this case we also need to align check barrier. Till Rohrmann 于2016年11月1日周二 下午11:05写道: > The tuples are not buffered until the snapshot is

Re: watermark trigger doesn't check whether element's timestamp is passed

2016-11-01 Thread Manu Zhang
Thanks. The ideal case is to fire after watermark past each element from the window but that requires a custom trigger and FLIP-2 as well. The enhanced window evictor will help to avoid the last firing. Are the discussions on FLIP-2 still going on ? Are there any opening JIRAs or PRs ? (The

Re: Flink Metrics - InfluxDB + Grafana | Help with query influxDB query for Grafana to plot 'numRecordsIn' & 'numRecordsOut' for each operator/operation

2016-11-01 Thread Jamie Grier
Ahh.. I haven’t used templating all that much but this also works for your substask variable so that you don’t have to enumerate all the possible values: Template Variable Type: query query: SHOW TAG VALUES FROM numRecordsIn WITH KEY = "subtask_index" ​ On Tue, Nov 1, 2016 at 2:51 PM, Jamie

Re: Flink Metrics - InfluxDB + Grafana | Help with query influxDB query for Grafana to plot 'numRecordsIn' & 'numRecordsOut' for each operator/operation

2016-11-01 Thread Jamie Grier
Another note. In the example the template variable type is "custom" and the values have to be enumerated manually. So in your case you would have to configure all the possible values of "subtask" to be 0-49. On Tue, Nov 1, 2016 at 2:43 PM, Jamie Grier wrote: > This

Re: Flink Metrics - InfluxDB + Grafana | Help with query influxDB query for Grafana to plot 'numRecordsIn' & 'numRecordsOut' for each operator/operation

2016-11-01 Thread Jamie Grier
This works well for me. This will aggregate the data across all sub-task instances: SELECT derivative(sum("count"), 1s) FROM "numRecordsIn" WHERE "task_name" = 'Sink: Unnamed' AND $timeFilter GROUP BY time(1s) You can also plot each sub-task instance separately on the same graph by doing:

Re: BoundedOutOfOrdernessTimestampExtractor and timestamps in the future

2016-11-01 Thread Konstantin Knauf
Hi Dominik, out of curiosity, how come that you receive timestamps from the future? ;) Depending on the semantics of these future events, it might also make sense to already "floor" the timestamp to processing time in the extractTimestamp()-Method. I am not sure, if I understand your follow up

Re: A custom FileInputFormat

2016-11-01 Thread Fabian Hueske
Hi Niklas, I don't know exactly what is going wrong there, but I have a few pointers for you: 1) in cluster setups, Flink redirects println() to ./log/*.out files, i.e, you have to search for the task manager that ran the DirReader and check its ./log/*.out file 2) you are using Java's File

Re: Can we do batch writes on cassandra using flink while leveraging the locality?

2016-11-01 Thread Chesnay Schepler
Hello, the main issue that prevented us from writing batches is that there is a server-side limit as to how big a batch may be, however there was no way to tell how big the batch that you are currently building up actually is. Regarding locality, I'm not sure if a partitioner alone solves

Re: Can we do batch writes on cassandra using flink while leveraging the locality?

2016-11-01 Thread Stephan Ewen
Hi! I do not know the details of how Cassandra supports batched writes, but here are some thoughts: - Grouping writes that go to the same partition together into one batch write request makes sense. If you have some sample code for that, it should be not too hard to integrate into the Flink

Re: Kinesis Connector Dependency Problems

2016-11-01 Thread Robert Metzger
Hi Justin, thank you for sharing the classpath of the Flink container with us. It contains what Till was already expecting: An older version of the AWS SDK. If you have some spare time, could you quickly try to run your program with a newer EMR version, just to validate our suspicion? If the

Re: Kinesis Connector Dependency Problems

2016-11-01 Thread Justin Yan
Hi there, We're using EMR 4.4.0 -> I suppose this is a bit old, and I can migrate forward if you think that would be best. I've appended the classpath that the Flink cluster was started with at the end of this email (with a slight improvement to the formatting to make it readable). Willing to

Re: watermark trigger doesn't check whether element's timestamp is passed

2016-11-01 Thread Aljoscha Krettek
Ah, I finally understand it. You would a way to query the current watermark in the window function to only emit those elements where the timestamp is lower than the watermark. When the window fires again, do you want to emit elements that you emitted during the last firing again? If not, I think

Re: Looping over a DataSet and accesing another DataSet

2016-11-01 Thread Greg Hogan
By 'loop' do you refer to an iteration? The output of a bulk iteration is processed as the input of the following iteration. Values updated in an iteration are available in the next iteration just as values updated by an operator are available to the following operator. Your chosen algorithm may

Re: Question about the checkpoint mechanism in Flink.

2016-11-01 Thread Till Rohrmann
The tuples are not buffered until the snapshot is globally complete (a snapshot is globally complete iff all operators have successfully taken a snapshot). They are only buffered until the corresponding checkpoint barrier on the second input is received. Once this is the case, the checkpoint

Re: Question about the checkpoint mechanism in Flink.

2016-11-01 Thread Renjie Liu
Hi, Till: By operator with multiple inputs, do you mean inputs from multiple subtasks? On Tue, Nov 1, 2016 at 8:56 PM Till Rohrmann wrote: > Hi Li, > > the statement refers to operators with multiple inputs (two in this case). > With the current implementation you will

Re: Question about the checkpoint mechanism in Flink.

2016-11-01 Thread Renjie Liu
Sorry the incorrect reply, please ignore this. On Tue, Nov 1, 2016 at 8:47 PM Renjie Liu wrote: > Essentially you are right, but the snapshot commit process is > asynchronous. That's what you have to pay for exactly once semantics. > > Li Wang

Re: Question about the checkpoint mechanism in Flink.

2016-11-01 Thread Renjie Liu
Essentially you are right, but the snapshot commit process is asynchronous. That's what you have to pay for exactly once semantics. Li Wang 于2016年11月1日周二 下午3:05写道: > Hi all, > > I have a question regarding to the state checkpoint mechanism in Flink. I > find the statement

BoundedOutOfOrdernessTimestampExtractor and timestamps in the future

2016-11-01 Thread Dominik Bruhn
Hey, I'm using a BoundedOutOfOrdernessTimestampExtractor for assigning my timestamps and discarding to old events (which happens sometimes). Now my problem is that some events, by accident have timestamps in the future. If the timestamps are more in the future than my `maxOutOfOrderness`,

Re: Question about the checkpoint mechanism in Flink.

2016-11-01 Thread Li Wang
Hi Till, Thanks for your prompt reply. I understand that input streams should be aligned such that a consistent state snapshot can be generated. In my opinion, that statement indicates that an operator will buffer its output tuples until the snapshot is committed. I am wondering if my

Re: Elasticsearch sink: Java.lang.NoSuchMethodError: org.elasticsearch.common.settings.Settings.settingsBuilder

2016-11-01 Thread Till Rohrmann
Hi Pedro, this looks like a version mismatch. Could you check which version of elasticsearch you've in your classpath respectively uber jar? It should be the version 2.3.5. Cheers, Till On Fri, Oct 28, 2016 at 6:59 PM, PedroMrChaves wrote: > Hello, > > I am using

Re: Question about the checkpoint mechanism in Flink.

2016-11-01 Thread Till Rohrmann
Hi Li, the statement refers to operators with multiple inputs (two in this case). With the current implementation you will indeed block one of the inputs after receiving a checkpoint barrier n until you've received the corresponding checkpoint barrier n on the other input as well. This is what we

Re: Kinesis Connector Dependency Problems

2016-11-01 Thread Till Rohrmann
Hi Justin, I think this might be a problem in Flink's Kinesis consumer. The Flink Kinesis consumer uses the aws-java-sdk version 1.10.71 which indeed contains the afore mentioned methods. However, already version 1.10.46 no longer contains this method. Thus, I suspect, that Yarn puts some older

Question about the checkpoint mechanism in Flink.

2016-11-01 Thread Li Wang
Hi all, I have a question regarding to the state checkpoint mechanism in Flink. I find the statement "Once the last stream has received barrier n, the operator emits all pending outgoing records, and then emits snapshot n barriers itself” on the document