Re: Parallelism, registerEventTimeTimer and watermark problem

2017-10-17 Thread Fritz Budiyanto
Sorry, missing copy paste for the exception thrown: 10/17/2017 20:21:30 dropDetection -> (aggFlowDropDetectPrintln -> Sink: Unnamed, aggFlowDropDetectPrintln -> Sink: Unnamed, Sink: kafkaSink)(3/4) switched to CANCELED 20:21:30,244 INFO

Parallelism, registerEventTimeTimer and watermark problem

2017-10-17 Thread Fritz Budiyanto
Hi All, If I have high parallelism and use processFunction to registerEventTimeTimer, the timer never gets fired. After debugging, I found out the watermark isn't updated because I have keyBy right after assignTimestampsAndWatermarks. And if I set assignTimestampsAndWatermarks right after the

Re: Task Manager was lost/killed due to full GC

2017-10-17 Thread ShB
I just wanted to leave an update about this issue, for someone else who might come across it. The problem was with memory, but it was disk memory and not heap/off-heap memory. Yarn was killing off my containers as they exceeded the threshold for disk utilization and this was manifesting as Task

Stumped writing to KafkaJSONSink

2017-10-17 Thread Kenny Gorman
I am hoping you guys can help me. I am stumped how to actually write to Kafka using Kafka09JsonTableSink using the Table API. Here is my code below, I am hoping you guys can shed some light on how this should be done. I don’t see any methods for the actual write to Kafka. I am probably doing

Maven release

2017-10-17 Thread Biswajit Das
Hi Team , Is there any instruction any where like how to publish release , I have been trying to publish release to my own private nexus repository but some how it seems always trying to upload *https://repository.apache.org/service/local/staging/deploy/

Re: Split a dataset

2017-10-17 Thread Fabian Hueske
Unfortunately, it's not possible to bridge the gap between the DataSet and DataStream APIs. However, you can also use a CsvInputFormat in the DataStream API. Since there's no built-in API to configure the CSV input, you would have to create (and configure) the CsvInputFormat yourself. Once you

Re: GROUP BY TUMBLE on ROW range

2017-10-17 Thread Fabian Hueske
Hi Stefano, this is not supported in Flink's SQL and we would need new Group Window functions (like TUMBLE) for this. A TUMBLE_COUNT function would be somewhat similar to SESSION, which also requires checks on the sorted neighboring rows to identify the window of a row. Such a function would

Re: Empty directories left over from checkpointing

2017-10-17 Thread Elias Levy
Stephan, Thanks for taking care of this. We'll give it a try once 1.4 drops. On Sat, Oct 14, 2017 at 1:25 PM, Stephan Ewen wrote: > Some updates on this: > > Aside from reworking how the S3 directory handling is done, we also looked > into supporting S3 different than we

GROUP BY TUMBLE on ROW range

2017-10-17 Thread Stefano Bortoli
Hi all, Is there a way to use a tumble window group by with row range in streamSQL? I mean, something like this: // "SELECT COUNT(*) " + // "FROM T1 " + //"GROUP BY TUMBLE(rowtime, INTERVAL '2' ROWS PRECEDING )" However, even looking at tests and looking at the "row

Re: hadoopcompatibility not in dist

2017-10-17 Thread eSKa
bumping up that issue, as i have similar problem now. We are running flink on Yarn and trying to submit job via java api using YarnClusterClient (run method with PackagedProgram). Job starts to execute (we can see it on Dashboard) but fails with error: Caused by: java.lang.RuntimeException:

Re: Split a dataset

2017-10-17 Thread Magnus Vojbacke
Thank you, Fabian! If batch semantics are not important to my use case, is there any way to "downgrade" or convert a DataSet to a DataStream? BR /Magnus > On 17 Oct 2017, at 10:54, Fabian Hueske wrote: > > Hi Magnus, > > there is no Split operator on the DataSet API. > >

Re: Unbalanced job scheduling

2017-10-17 Thread AndreaKinn
I'm in contact with the founder of the library to deal with the problem. I'm trying also to understand how implement myself slotSharingGroups -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Split a dataset

2017-10-17 Thread Fabian Hueske
Hi Magnus, there is no Split operator on the DataSet API. As you said, this can be done using a FilterFunction. This also allows for non-binary splits: DataSet setToSplit = ... DataSet firstSplit = setToSplit.filter(new SplitCondition1()); DataSet secondSplit = setToSplit.filter(new

Split a dataset

2017-10-17 Thread Magnus Vojbacke
I'm looking for something like DataStream.split(), but for DataSets. I'd like to split my streaming data so messages go to different parts of an execution graph, based on arbitrary logic. DataStream.split() seems to be perfect, except that my source is a CSV file, and I have only found built

Re: Unbalanced job scheduling

2017-10-17 Thread Fabian Hueske
Setting the slot sharing group is Flink's mechanism to solve this issue. I'd consider this a limitation of the library that provides LEARN and SELECT. Did you consider to open an issue at (or contributing to) the library to support setting the slotSharing group? 2017-10-17 9:38 GMT+02:00

Re: Case Class TypeInformation

2017-10-17 Thread Fabian Hueske
Hi Joshua, that's a limitation of the Scala API. Row requires to explicitly specify a TypeInformation[Row] but it is not possible to inject custom types into a CaseClassTypeInfo, which are automatically generated by a Scala compiler plugin. The probably easiest solution is to use Flink's Java

Re: Unbalanced job scheduling

2017-10-17 Thread AndreaKinn
Yes, I considered them but unfortunately I can't call setSlotSharingGroup method on LEARN and SELECT operators. I can call it on the other operators but this means that the two LEARN method will be constrained in the same "unnamed" slot. -- Sent from:

Re: Unbalanced job scheduling

2017-10-17 Thread Fabian Hueske
Hi Andrea, have you looked into assigning slot sharing groups [1]? Best, Fabian [1] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html#task-chaining-and-resource-groups 2017-10-16 18:01 GMT+02:00 AndreaKinn : > Hi all, > I want to expose