Re: Use keyBy to deterministically hash each record to a processor/task/slot

2017-10-30 Thread Tony Wei
Hi Max, The way that Flink to assign key to which subtask is based on `KeyGroupRangeAssignment.assignKeyToParallelOperator`. Its first step is to assign key to a key group based on the max parallelism [2]. Then, assign each key group to a specific subtask based on the current parallelism [3].

Re: count and window question with kafka

2017-10-30 Thread Tony Wei
Hi, I think ProcessFunction[1] is what you want. You can add it after keyBy and emit the result to sink after timeout or buffer filled. The reference has a good example that show you how to use it. Best Regards, Tony Wei [1]

How to lock and wait, until checkpointing is completed

2017-10-30 Thread Rinat
Hi guys, got one more question for you, maybe someone already implemented such feature or found a good technique. I wrote an IT, that runs a flink job, that reads data from kafka topic, and flushes it onto fs using BucketingSink. I implemented some custom logic, that fires on

Job Manager Configuration

2017-10-30 Thread Chan, Regina
Flink Users, I have about 300 parallel flows in one job each with 2 inputs, 3 operators, and 1 sink which makes for a large job. I keep getting the below timeout exception but I've already set it to a 30 minute time out with a 6GB heap on the JobManager? Is there a heuristic to better

count and window question with kafka

2017-10-30 Thread Telco Phone
I have a process that will take 250,000 records from kafka and produce a file. (Using a CustomFileSync)  Currently I just have the following: DataStream stream =env.addSource(new FlinkKafkaConsumer010("topic"", schema, properties)).setParallelism(40).flatMap(new

Use keyBy to deterministically hash each record to a processor/task/slot

2017-10-30 Thread m@xi
Hi all, After trying to understand exactly how keyBy works internally, I did not get anything more than "it applies obj.hashcode() % n", where n is the number of tasks/processors. This post for example

Flink send checkpointing message in IT

2017-10-30 Thread Rinat
Hi guys, I’ve got a question about working with checkpointing. I would like to implement IT test, where source is a fixed collection of items and sink performs additional logic, when checkpointing is completed. I would like to force executing checkpointing, when all messages from my test

RE: state size effects latency

2017-10-30 Thread Sofer, Tovi
Hi Biplob, We have created our own latency meter histogram, which contains the latency from congestion time till last operator. This is shown in log below (99’th percentile and mean value), and our estimations are based on it. The latency you mentioned is from checkpoint tab- which shows

Re: Telling if a job has caught up with Kafka

2017-10-30 Thread aitozi
Hi, rmetzger0 Sorry to reply to this old question, i found that we use the kafka client 0.9 in class kafkaThread which lead to the lose of many other detail metrics add in kafka client 10 like per partition consumer lag mentioned by this issuse https://issues.apache.org/jira/browse/FLINK-7945. i

Re: Capacity Planning For Large State in YARN Cluster

2017-10-30 Thread ashish pok
Thanks Till, I will pull it out today then. Sent from Yahoo Mail on Android On Mon, Oct 30, 2017 at 3:48 AM, Till Rohrmann wrote: Hi Ashish, great to hear that things work better with the RocksDB state backend. I would only start playing with the

Using Flink Ml with DataStream

2017-10-30 Thread Adarsh Jain
Hi, Is there a way to use Stochastic Outlier Selection (SOS) and/or SVM using CoCoA with streaming data. Please suggest and give pointers. Regards, Adarsh ‌

Re: state size effects latency

2017-10-30 Thread Biplob Biswas
Hi Tovi, This might seem a really naive question (and its neither a solution or answer to your question ) but I am trying to understand how latency is viewed. You said you achieved less than 5 ms latency and say for the 99th percentile you achieved 0.3 and 9 ms respectively, what kind of latency

Re: Programmatic way to determine required number of slots for a job?

2017-10-30 Thread Till Rohrmann
Hi Jared, I think by adding up the parallelism of the operator with the maximum degree of parallelism in each slot sharing group, you will end up with the correct number of required slots (given that all operators have to run concurrently). Cheers, Till On Fri, Oct 27, 2017 at 8:34 PM, Jared

RE: state size effects latency

2017-10-30 Thread Sofer, Tovi
Thank you Joshi. We are using currently FsStateBackend since in version 1.3 it supports async snapshots, and no RocksDB. Does anyone else has feedback on this issues? From: Narendra Joshi [mailto:narendr...@gmail.com] Sent: יום א 29 אוקטובר 2017 12:13 To: Sofer, Tovi [ICG-IT]

Re: Capacity Planning For Large State in YARN Cluster

2017-10-30 Thread Till Rohrmann
Hi Ashish, great to hear that things work better with the RocksDB state backend. I would only start playing with the containerized.heap-cutoff-ratio if you see TMs failing due to exceeding the direct memory limit. Currently, not all of the cutoff memory is set as the direct memory limit. We have