Re: Managing state migrations with Flink and Avro

2018-04-18 Thread Timo Walther
Thank you. Maybe we already identified the issue (see https://issues.apache.org/jira/browse/FLINK-9202). I will use your code to verify it. Regards, Timo Am 18.04.18 um 14:07 schrieb Petter Arvidsson: Hi Timo, Please find the generated class (for the second schema) attached. Regards,

Re: State-machine-based search logic in Flink ?

2018-04-18 Thread Fabian Hueske
As I said before, this is work in progress and there is a pending pull request (PR) to add this feature. So no, MATCH_RECOGNIZE is not supported by Flink yet and hence also not documented. Best, Fabian 2018-04-18 10:12 GMT+02:00 Esa Heikkinen : > Hi > > > > I did

Re: Flink job testing with

2018-04-18 Thread Tzu-Li (Gordon) Tai
Hi, The docs here [1] provide some example snippets of using the Kafka connector to consume from / write to Kafka topics. Once you consumed a `DataStream` from a Kafka topic using the Kafka consumer, you can use Flink transformations such as map, flatMap, etc. to perform processing on the

Re: CaseClassSerializer and/or TraversableSerializer may still not be threadsafe?

2018-04-18 Thread Stefan Richter
Hi, I agree that this looks like a serializer is shared between two threads, one of them being the event processing loop. I am doubting that the problem is with the async fs backend, because there is code in place that will duplicate all serializers for the async snapshot thread and this is

Outputting the content of in flight session windows

2018-04-18 Thread jelmer
I defined a session window and I would like to write the contents of the window to storage before the window closes Initially I was doing this by setting a CountTrigger.of(1) on the session window. But this leads to very frequent writes. To remedy this i switched to a

Jars uploaded to taskmanager are deleted but not free'ed by OS

2018-04-18 Thread Jeroen Steggink | knowsy
Hi, I'm having some troubles running the Flink taskmanager in a Docker container (OpenShift). The container's internal storage is filling up because the deleted jar files in blob storage are probably still in use and therefore resources are not free'ed. We are using Apache Beam to start an

Jars uploaded to jobmanager are deleted but not free'ed by OS

2018-04-18 Thread Jeroen Steggink | knowsy
Sorry, I meant the jobmanager, not the taskmanager. On 18-Apr-18 15:44, Jeroen Steggink | knowsy wrote: Hi, I'm having some troubles running the Flink taskmanager in a Docker container (OpenShift). The container's internal storage is filling up because the deleted jar files in blob storage

Substasks - Uneven allocation

2018-04-18 Thread PedroMrChaves
Hello, I have a job that has one async operational node (i.e. implements AsyncFunction). This Operational node will spawn multiple threads that perform heavy tasks (cpu bound). I have a Flink Standalone cluster deployed on two machines of 32 cores and 128 gb of RAM, each machine has one task

Re: Substasks - Uneven allocation

2018-04-18 Thread Ken Krugler
Hi Pedro, That’s interesting, and something we’d like to be able to control as well. I did a little research, and it seems like (with some stunts) there could be a way to achieve this via CoLocationConstraint

Re: Managing state migrations with Flink and Avro

2018-04-18 Thread Petter Arvidsson
Hi Timo, Please find the generated class (for the second schema) attached. Regards, Petter On Wed, Apr 18, 2018 at 11:32 AM, Timo Walther wrote: > Hi Petter, > > could you share the source code of the class that Avro generates out of > this schema? > > Thank you. > >

Re: Tracking deserialization errors

2018-04-18 Thread Tzu-Li (Gordon) Tai
Hi, These are valid concerns. And yes, AFAIK users have been writing to logs within the deserialization schema to track this. The connectors as of now have no logging themselves in case of a skipped record. I think we can implement both logging and metrics to track this, most of which you

Re: why doesn't the over-window-aggregation sort the element(considering watermark) before processing?

2018-04-18 Thread Yan Zhou [FDS Science]
nvm, I figure it out. The event is not process once it's arrived. It's registered to processed in event time. It make sense. best Yan From: Yan Zhou [FDS Science] Sent: Wednesday, April 18, 2018 12:56:58 PM To: Fabian Hueske Cc: user

Re: FlinkML

2018-04-18 Thread Christophe Salperwyck
Hi, You could try to plug MOA/Weka library too. I did some preliminary work with that: https://moa.cms.waikato.ac.nz/moa-with-apache-flink/ but then it is not anymore FlinkML algorithms. Best regards, Christophe 2018-04-18 21:13 GMT+02:00 shashank734 : > There are no

Re: FlinkML

2018-04-18 Thread Christophe Jolif
Szymon, The short answer is no. See: http://mail-archives.apache.org/mod_mbox/flink-user/201802.mbox/%3ccaadrtt39ciiec1uzwthzgnbkjxs-_h5yfzowhzph_zbidux...@mail.gmail.com%3E On Mon, Apr 16, 2018 at 11:25 PM, Szymon Szczypiński wrote: > Hi, > > i wonder if there are

Re: Tracking deserialization errors

2018-04-18 Thread Elias Levy
Either proposal would work. In the later case, at a minimum we'd need a way to identify the source within the metric. The basic error metric would then allow us to go into the logs to determine the cause of the error, as we already record the message causing trouble in the log. On Mon, Apr 16,

Re: FlinkML

2018-04-18 Thread shashank734
There are no active discussions or guide on that. But I found this example in the repo : https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/ml/IncrementalLearningSkeleton.java

Re: why doesn't the over-window-aggregation sort the element(considering watermark) before processing?

2018-04-18 Thread Yan Zhou [FDS Science]
Hi Fabian, Thanks for the reply. I think here is the problem. Currently, the timestamp of an event is compared with previous processed element's timestamp, instead of watermark, to determine if it's late. To my understanding, even the order of emitted event in preceding operator is

Re: rest.port is reset to 0 by YarnEntrypointUtils

2018-04-18 Thread Gary Yao
Hi Dongwon, I think the rationale was to avoid conflicts between multiple Flink instances running on the same YARN cluster. There is a ticket that proposes to allow configuring a port range instead [1]. Best, Gary [1] https://issues.apache.org/jira/browse/FLINK-5758 On Tue, Apr 17, 2018 at

Re: why doesn't the over-window-aggregation sort the element(considering watermark) before processing?

2018-04-18 Thread Fabian Hueske
The over window operates on an unbounded stream of data. Hence it is not possible to sort the complete stream. Instead we can sort ranges of the stream. Flink uses watermarks to define these ranges. The operator processes the records in timestamp order that are not late, i.e., have timestamps

masters file only needed when using start-cluster.sh script?

2018-04-18 Thread David Corley
The HA documentation is a little confusing in that it suggests JM registration and discovery is done via Zookeeper, but it also recommends creating a static `masters` file listing all JMs. The only use I can currently see for the masters file is by the `start-cluster.sh` script. Thus, if we're not

Managing state migrations with Flink and Avro

2018-04-18 Thread Petter Arvidsson
Hello everyone, I am trying to figure out how to set up Flink with Avro for state management (especially the content of snapshots) to enable state migrations (e.g. adding a nullable fields to a class). So far, in version 1.4.2, I tried to explicitly provide an instance of "new

Re: Need Help/Code Examples with reading/writing Parquet File with Flink ?

2018-04-18 Thread Shuyi Chen
AFA I remember, there is no ParquetInputFormat in Flink. But there is a JIRA logged and an attempt in this PR , but was never merged. We do have an internal implementation that is being used in our

Re: Need Help/Code Examples with reading/writing Parquet File with Flink ?

2018-04-18 Thread Jörn Franke
You can use the corresponding HadoopInputformat within Flink > On 18. Apr 2018, at 07:23, sohimankotia wrote: > > Hi .. > > I have file in hdfs in format file.snappy.parquet . Can someone please > point/help with code example of reading parquet files . > > > -Sohi >

RE: State-machine-based search logic in Flink ?

2018-04-18 Thread Esa Heikkinen
Hi I did mean like “finding series of consecutive events”, as it was described in [2]. Are these features already in Flink and how well they are documented ? Can I use Scala or only Java ? I would like some example codes, it they are exist ? Best, Esa From: Fabian Hueske

Re: assign time attribute after first window group when using Flink SQL

2018-04-18 Thread Fabian Hueske
This sounds like a windowed join between the raw stream and the aggregated stream. It might be possible to do the "lookup" in the second raw stream with another windowed join. If not, you can fall back to the DataStream API / ProcessFunction and implement the lookup logic as you need it. Best,

why doesn't the over-window-aggregation sort the element(considering watermark) before processing?

2018-04-18 Thread Yan Zhou [FDS Science]
Hi, I use bounded over-window aggregation in my application. However, sometimes some input elements are "discarded" and not generating output. By reading the source code of RowTimeBoundedRangeOver.scala, I realize the record is actually discarded if it is out of order. Please see the quoted

Re: rest.port is reset to 0 by YarnEntrypointUtils

2018-04-18 Thread Dongwon Kim
Hi Gary, Thanks a lot for replay. Hope the issue is resolved soon. I have a suggestion regarding the rest port. Considering the role of dispatcher, it needs to have its own port range that is not shared by job managers spawned by dispatcher. If I understand FLIP-6 correctly, only a few

Confusing debug level log output with Flink 1.5

2018-04-18 Thread Ken Krugler
Hi Till, I just saw https://issues.apache.org/jira/browse/FLINK-9215 I’ve been trying out 1.5, and noticed similar output in my logs, e.g. 18/04/18 17:33:47 DEBUG slotpool.SlotPool:751 - Releasing slot with slot request id

debug for Flink

2018-04-18 Thread Qian Ye
Hi I’m wondering if new debugging methods/tools are urgent for Flink development. I know there already exists some debug methods for Flink, e.g., remote debugging of flink clusters(https://cwiki.apache.org/confluence/display/FLINK/Remote+Debugging+of+Flink+Clusters

Re: Help with OneInputStreamOperatorTestHarness

2018-04-18 Thread Chris Schneider
Hi Ted, I should have written that we’re using Flink 1.4.0. Thanks for the suggestion re: FLINK-8268 ; it could well be the issue (though the pull request appears fairly complex so I’ll need

Help with OneInputStreamOperatorTestHarness

2018-04-18 Thread Chris Schneider
Hi Gang, I’m having trouble getting my streaming unit test to work. The following code: @Test public void testDemo() throws Throwable { OneInputStreamOperatorTestHarness testHarness = new KeyedOneInputStreamOperatorTestHarness