Noticing that Flink takes very long inside collect(..) due to Avro
serialization that relies on ReflectDatumWriter & GenericDatumWriter. The
object being serialized here is an Avro object that implements
SpecificRecordBase. It is somewhat about large (~50Kb) and complex.
Looking for a way
watermark?
Maybe a method that is overridden in AbstractStreamOperator in your code?
On Sat, Aug 10, 2019 at 4:06 AM Roshan Naik
wrote:
> Have streaming use cases where it is useful & easier to generate the
> watermark in the Source (via ctx.emitWatermark() ) and assign
Have streaming use cases where it is useful & easier to generate the watermark
in the Source (via ctx.emitWatermark() ) and assign timestamps in a downstream
custom operator which calls output.collect(new StreamRecord(msg, time)).
When doing so, I see that the watermark reaches the downstream
I cant find any place to specify the parallelism for the join here.
stream1.join( stream2 )
.where( .. )
.equalTo( .. )
.window( .. )
.apply( .. );
How can we specify that ?
-roshan
Sorry, not looked into it much... but it occurred to me that it would be great
to have it as "Throttling: Operator" that can be applied anywhere after a
source and before a sink. Each parallel instance of it can operate at the
specified rate limit divided by the number of parallel instances
When generating watermarks outside the source using a
assignTimestampsAndWatermarks() .. once the source has finished generating all
the messages, I don't see evidence of a final watermark getting generated to
flush windows. Do the windows auto flush on job termination without need for
the
Sorry resending with proper formatting.. yahoo mail defaults to rich text...and
that messes up formatting on this mailing list.
Based on my (streaming mode) experiments I see that its not simply on heap and
off heap memory. There are actually 3 divisions of memory:
1- On heap (-Xmx) .
2-
ithub.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks>
A library that provides an embeddable, persistent key-value store for fast
storage. - facebook/rocksdb
github.com
Best
Yun Tang
From: Roshan Naik
Sent: Saturday, February 2
For yarn deployments, Lets say you have lets say the container size = 10 GB
containerized.heap-cutoff-ratio = 0.3 ( = 3GB)
That means 7GB is available for Flinks various subsystems which include,= jvm
heap, and all the DirectByteBufferAllocatins (netty + netw buff + .. ) and Java
metadata.
As per the docs, in Batch mode, dynamic memory allocation is avoided by storing
messages being processed in ByteBuffers via Unsafe methods.
Couldn't find any docs describing mem mgmt in Streamingn mode. So...
- Am wondering if this is also the case with Streaming ?
- If so, how does Flink
10 matches
Mail list logo