Using Avro SpecficRecord serialization instead of slower ReflectDatumWriter/GenericDatumWriter

2019-08-29 Thread Roshan Naik
Noticing that Flink takes very long inside collect(..) due to Avro serialization that relies on  ReflectDatumWriter & GenericDatumWriter.   The object being serialized here is an Avro object that implements SpecificRecordBase. It is somewhat about large (~50Kb) and complex.  Looking for a way

Re: Watermarking in Src and Timestamping downstream

2019-08-12 Thread Roshan Naik
watermark? Maybe a method that is overridden in AbstractStreamOperator in your code? On Sat, Aug 10, 2019 at 4:06 AM Roshan Naik wrote: > Have streaming use cases where it is useful & easier to generate the > watermark in the Source (via ctx.emitWatermark() ) and assign

Watermarking in Src and Timestamping downstream

2019-08-09 Thread Roshan Naik
Have streaming use cases where it is useful & easier to generate the watermark in the Source (via ctx.emitWatermark() ) and assign timestamps in a downstream custom operator which calls  output.collect(new StreamRecord(msg, time)). When doing so, I see that the watermark reaches the downstream

Specifying parallelism on join operation

2019-06-21 Thread Roshan Naik
I cant find any place to specify the parallelism for the join here.  stream1.join( stream2 )                      .where( .. )                     .equalTo( .. )                     .window( .. )                     .apply( .. ); How can we specify that ?  -roshan

Re: Recommended module for connector ratelimiting?

2019-02-27 Thread Roshan Naik
Sorry, not looked into it much... but it occurred to me that it would be great to have it as "Throttling: Operator" that can be applied anywhere after a source and before a sink.  Each parallel instance of it can operate at the specified rate limit  divided by the number of parallel instances

Watermark at the end of job

2019-02-23 Thread Roshan Naik
When generating watermarks outside the source using a  assignTimestampsAndWatermarks()  .. once the source has finished generating all the messages, I don't see evidence of a final watermark getting generated to flush windows. Do the windows auto flush on job termination without need for the

Re: Where does RocksDB's mem allocation occur

2019-02-23 Thread Roshan Naik
Sorry resending with proper formatting.. yahoo mail defaults to rich text...and that messes up formatting on this mailing list. Based on my (streaming mode) experiments I see that its not simply on heap and off heap memory. There are actually 3 divisions of memory: 1- On heap  (-Xmx) .  2-

Re: Where does RocksDB's mem allocation occur

2019-02-23 Thread Roshan Naik
ithub.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#indexes-and-filter-blocks> A library that provides an embeddable, persistent key-value store for fast storage. - facebook/rocksdb github.com Best Yun Tang From: Roshan Naik Sent: Saturday, February 2

Where does RocksDB's mem allocation occur

2019-02-22 Thread Roshan Naik
For yarn deployments,  Lets say you have  lets say the container size = 10 GB containerized.heap-cutoff-ratio = 0.3  ( = 3GB) That means 7GB is available for Flinks various subsystems which include,= jvm heap, and all the DirectByteBufferAllocatins (netty + netw buff + .. ) and Java metadata. 

Streaming - memory management

2016-08-30 Thread Roshan Naik
As per the docs, in Batch mode, dynamic memory allocation is avoided by storing messages being processed in ByteBuffers via Unsafe methods. Couldn't find any docs describing mem mgmt in Streamingn mode. So... - Am wondering if this is also the case with Streaming ? - If so, how does Flink