Re: Question about parallelism

2018-04-16 Thread TechnoMage
1.4.2. I have since set the parallelism explicitly after creating the env and that is working. I also made the stream object serializable which may also be involved in this. I will retest without the explicit parallelism when I get a chance. Michael > On Apr 16, 2018, at 2:05 AM, Fabian

Re: Trying to understand KafkaConsumer_records_lag_max

2018-04-16 Thread Julio Biason
Hi Gordon (and list), Yes, that's probably what's going on. I got another message from 徐骁 which told me almost the same thing -- something I completely forgot (he also mentioned auto.offset.reset, which could be forcing Flink to keep reading from the top of Kafka instead of trying to go back and

Re: data enrichment with SQL use case

2018-04-16 Thread miki haiat
HI thanks for the reply i will try to break your reply to the flow execution order . First data stream Will use AsyncIO and select the table , Second stream will be kafka and the i can join the stream and map it ? If that the case then i will select the table only once on load ? How can i

Re: Annotation in UDF dropped

2018-04-16 Thread Fabian Hueske
Hi Viktor, Flink does not modify user code. It distributes the job JAR file to the cluster and serializes the function objects using Java serialization to ship them to the worker nodes where they are deserialized. What type of annotation gets dropped? Can you show us a small example of the code?

Re: Tiemrs and restore

2018-04-16 Thread Aljoscha Krettek
Gordon is correct: there was a bug on a very old version of Flink that caused processing-timers not to be invoked after restore but that was fixed. Aljoscha > On 16. Apr 2018, at 06:20, Tzu-Li (Gordon) Tai wrote: > > Hi Alberto, > > Looking at the code, I think the

Annotation in UDF dropped

2018-04-16 Thread Rosenfeld, Viktor
Hi, I have a UDF that uses an annotation to the loop variable inside a for loop. I noticed that this annotation gets dropped at some point when the UDF is shipped to the TaskManager. I was told that this happens in the Optimizer but I would like to know where this happens exactly and if there

Re: Unable to launch job with 100 SQL queries in yarn cluster

2018-04-16 Thread Fabian Hueske
Hi Adrian, Thanks reaching out to the community. I don't think that this is an issue with Flink's SQL support. SQL queries are translated into regular streaming (or batch) jobs. The JM might just be overloaded by too many jobs. Since you are running in a YARN environment, it might make sense to

答复: Slow flink checkpoint

2018-04-16 Thread ma ky
Fabian: thanks for u replay. I have create a jira issue: https://issues.apache.org/jira/browse/FLINK-9182?jql=project%20%3D%20FLINK%20AND%20issuetype%20%3D%20Improvement%20AND%20created%20%3E%3D%20-10m I'll pull the code ASAP. MaKeyang TIG.JD.COM

Re: User-defined aggregation function and parallelism

2018-04-16 Thread Fabian Hueske
Hi Bill, Flink's built-in aggregation functions are implemented against the same interface as UDAGGs and are applied in parallel. The performance depends of course on the implementation of the UDAGG. For example, you should try to keep the size of the accumulator as small as possible because it

Re: Tracking deserialization errors

2018-04-16 Thread Fabian Hueske
Thanks for starting the discussion Elias. I see two ways to address this issue. 1) Add an interface that a deserialization schema can implement to register metrics. Each source would need to check for the interface and call it to setup metrics. 2) Check for null returns in the source functions

Re: How to rebalance a table without converting to dataset

2018-04-16 Thread Fabian Hueske
Hi Darshan, You are right. there's currently no rebalancing operation on the Table API. I see that this might be a good feature, not sure though how easy it would be to integrate because we need to pass it through the Calcite optimizer and rebalancing is not a relational operation. For now,

Re: 答复: Slow flink checkpoint

2018-04-16 Thread Fabian Hueske
Thanks MaKeyang! I've given you contributor permissions and assigned the issue to you. Best, Fabian 2018-04-16 13:19 GMT+02:00 ma ky : > Fabian: > thanks for u replay. > I have create a jira issue: > https://issues.apache.org/jira/browse/FLINK-9182?jql= >

Re: Unsure how to further debug - operator threads stuck on java.lang.Thread.State: WAITING

2018-04-16 Thread Miguel Coimbra
Thanks for the suggestions Chesnay, I will try them out. However, I have already tried your suggestion with the dependency flink-runtime-web and nothing happened. If I understood you correctly, adding that dependency in the pom.xml would make it so the web front-end is running when I call the

Re: Kafka consumer to sync topics by event time?

2018-04-16 Thread Juho Autio
Great. I'd be happy to contribute. I added 2 sub-tasks in https://issues.apache.org/jira/browse/FLINK-5479. Someone with the privileges could assign this sub-task to me: https://issues.apache.org/jira/browse/FLINK-9183? On Mon, Apr 16, 2018 at 3:14 PM, Fabian Hueske wrote: >

Detecting Patterns in CEP

2018-04-16 Thread Main Frame
Hi flink users! Are there any ways to force using custom comparator for pattern stream. In my use-case I have no event time but I have sequence number field(not a timestamp). Events can comes to platform with random short delay and can be of the sequence, I hope I can use constant watermark and

Re: Kafka consumer to sync topics by event time?

2018-04-16 Thread Fabian Hueske
Fully agree Juho! Do you want to contribute the docs fix? If yes, we should update FLINK-5479 to make sure that the warning is removed once the bug is fixed. Thanks, Fabian 2018-04-12 9:32 GMT+02:00 Juho Autio : > Looks like the bug

Re: Unsure how to further debug - operator threads stuck on java.lang.Thread.State: WAITING

2018-04-16 Thread Chesnay Schepler
ah yes, currently when you use that method the UI is started on a random port. I'm currently fixing that in this PR that will be merged today. For now you will enable logging and search for something along the lines of "http://: was granted

Re: Kafka consumer to sync topics by event time?

2018-04-16 Thread Fabian Hueske
Awesome! I've given you contributor permissions and assigned FLINK-9183 to you. With the permissions you can also do that yourself in the future. Here's a guide for contributions to the documentation [1]. Best, Fabian [1] http://flink.apache.org/contribute-documentation.html 2018-04-16 15:38

Re: data enrichment with SQL use case

2018-04-16 Thread Ken Krugler
Hi Miki, I haven’t tried mixing AsyncFunctions with SQL queries. Normally I’d create a regular DataStream workflow that first reads from Kafka, then has an AsyncFunction to read from the SQL database. If there are often duplicate keys in the Kafka-based stream, you could keyBy(key) before the

Re: Question about parallelism

2018-04-16 Thread Fabian Hueske
The parallelism.default property that is configured in the flink-conf.yaml file is only considered if the config file belongs to the submitting client. If you configured the property in the config file of your cluster setup and used submitted from a client that used a different configuration file,

FlinkML

2018-04-16 Thread Szymon Szczypiński
Hi, i wonder if there are possibility to build FlinkML streaming job not a batch job. Examples on https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/libs/ml/ are only batch examples. Is there any possibility? Best regards.

Re: Question about parallelism

2018-04-16 Thread TechnoMage
The client was not using a config file, it is a stand-alone java app using the flink-client jar file. Thanks for the clarification. Michael > On Apr 16, 2018, at 2:11 PM, Fabian Hueske wrote: > > The parallelism.default property that is configured in the flink-conf.yaml >

Re: Scaling down Graphite metrics

2018-04-16 Thread ashish pok
Thanks for that tip about override, will give that a shot at some point. We are already using interval. -- Ashish On Sun, Apr 15, 2018 at 6:18 PM, Chesnay Schepler wrote: Hello, you can configure the rate at which metrics are reported by setting

Re: Slow flink checkpoint

2018-04-16 Thread makeyang
since flink forward SF has done. can you guys give some minutes to take a look at this issue and give some thoughts on it? help to review/comments on my desgin? or give us a design so that I can help to implement it. thanks a lot. -- Sent from:

Re: Volume question

2018-04-16 Thread Fabian Hueske
Sorry, I forgot to CC the user mailing list in my reply. 2018-04-12 17:26 GMT+02:00 Fabian Hueske : > Hi, > > Sorry for the long delay. Many contributors are traveling due to Flink > Forward. > > Your use case should be well supported by Flink. Flink will partition and >

Re: assign time attribute after first window group when using Flink SQL

2018-04-16 Thread Fabian Hueske
Sorry, I forgot to CC the user mailing list in my reply. 2018-04-12 17:27 GMT+02:00 Fabian Hueske : > Hi, > > Assuming you are using event time, the right function to generate a row > time attribute from a window would be "w1.rowtime" instead of "w1.start". > > The reason why

Re: Question about parallelism

2018-04-16 Thread Fabian Hueske
(re-adding user mailing list) A non-serializable function object should cause the job to fail, but not to ignore a parallelism setting. This might be a bug. Most users specify the parallelism directly in the application code (via StreamExecutionEnvironment) or when submitting the application.

Re: Slow flink checkpoint

2018-04-16 Thread 林德强
Hi Stefan , Fabian , Keyang is engineer in our team, he has do a lot of efforts on the timers' snapshot async. What do you think of his idea? Best, Deqiang TIG.JD.COM > 在 2018年4月1日,下午7:21,makeyang 写道: > > I have put a lot of

Re: Flink/Kafka POC performance issue

2018-04-16 Thread Niclas Hedhman
Have you checked memory usage? It could be as simple as either having memory leaks, or aggregating more than you think (sometimes not obvious how much is kept around in memory for longer than one first thinks). If possible, connect FlightRecorder or similar tool and keep an eye on memory.

Flink & Kafka multi-node config

2018-04-16 Thread TechnoMage
If I use defaults for the most part but configure flink to have parallelism 5 and kafka to have 5 brokers (one of each on 5 nodes) will the connector and kafka be smart enough to use the kafka partition on the same node as the flink task manager for the 5 partitions? Do I need to explicitly

Flink/Kafka POC performance issue

2018-04-16 Thread TechnoMage
I am doing a short Proof of Concept for using Flink and Kafka in our product. On my laptop I can process 10M inputs in about 90 min. On 2 different EC2 instances (m4.xlarge and m5.xlarge both 4core 16GB ram and ssd storage) I see the process hit a wall around 50min into the test and short of