Hi Max,
The way that Flink to assign key to which subtask is based on
`KeyGroupRangeAssignment.assignKeyToParallelOperator`.
Its first step is to assign key to a key group based on the max parallelism
[2]. Then, assign each key group to a specific subtask based on the current
parallelism [3].
Hi,
I think ProcessFunction[1] is what you want. You can add it after keyBy and
emit the result to sink after timeout or buffer filled.
The reference has a good example that show you how to use it.
Best Regards,
Tony Wei
[1]
Hi guys, got one more question for you, maybe someone already implemented such
feature or found a good technique.
I wrote an IT, that runs a flink job, that reads data from kafka topic, and
flushes it onto fs using BucketingSink.
I implemented some custom logic, that fires on
Flink Users,
I have about 300 parallel flows in one job each with 2 inputs, 3 operators, and
1 sink which makes for a large job. I keep getting the below timeout exception
but I've already set it to a 30 minute time out with a 6GB heap on the
JobManager? Is there a heuristic to better
I have a process that will take 250,000 records from kafka and produce a file.
(Using a CustomFileSync)
Currently I just have the following:
DataStream stream =env.addSource(new
FlinkKafkaConsumer010("topic"", schema,
properties)).setParallelism(40).flatMap(new
Hi all,
After trying to understand exactly how keyBy works internally, I did not get
anything more than "it applies obj.hashcode() % n", where n is the number of
tasks/processors.
This post for example
Hi guys, I’ve got a question about working with checkpointing.
I would like to implement IT test, where source is a fixed collection of items
and sink performs additional logic, when checkpointing is completed.
I would like to force executing checkpointing, when all messages from my test
Hi Biplob,
We have created our own latency meter histogram, which contains the latency
from congestion time till last operator.
This is shown in log below (99’th percentile and mean value), and our
estimations are based on it.
The latency you mentioned is from checkpoint tab- which shows
Hi, rmetzger0
Sorry to reply to this old question, i found that we use the kafka client
0.9 in class kafkaThread which lead to the lose of many other detail metrics
add in kafka client 10 like per partition consumer lag mentioned by this
issuse https://issues.apache.org/jira/browse/FLINK-7945. i
Thanks Till, I will pull it out today then.
Sent from Yahoo Mail on Android
On Mon, Oct 30, 2017 at 3:48 AM, Till Rohrmann wrote:
Hi Ashish,
great to hear that things work better with the RocksDB state backend. I would
only start playing with the
Hi,
Is there a way to use Stochastic Outlier Selection (SOS) and/or SVM using
CoCoA with streaming data.
Please suggest and give pointers.
Regards,
Adarsh
Hi Tovi,
This might seem a really naive question (and its neither a solution or
answer to your question ) but I am trying to understand how latency is
viewed. You said you achieved less than 5 ms latency and say for the 99th
percentile you achieved 0.3 and 9 ms respectively, what kind of latency
Hi Jared,
I think by adding up the parallelism of the operator with the maximum
degree of parallelism in each slot sharing group, you will end up with the
correct number of required slots (given that all operators have to run
concurrently).
Cheers,
Till
On Fri, Oct 27, 2017 at 8:34 PM, Jared
Thank you Joshi.
We are using currently FsStateBackend since in version 1.3 it supports async
snapshots, and no RocksDB.
Does anyone else has feedback on this issues?
From: Narendra Joshi [mailto:narendr...@gmail.com]
Sent: יום א 29 אוקטובר 2017 12:13
To: Sofer, Tovi [ICG-IT]
Hi Ashish,
great to hear that things work better with the RocksDB state backend. I
would only start playing with the containerized.heap-cutoff-ratio if you
see TMs failing due to exceeding the direct memory limit. Currently, not
all of the cutoff memory is set as the direct memory limit. We have
15 matches
Mail list logo