Hi Shashwat,

Are you specifying the RocksDBStateBackend from the flink-conf.yaml or from
code?

If you are specifying it from the code, you can try using
PredefinedOptions.FLASH_SSD_OPTIMIZED
Also, you can try setting incremental checkpointing ( this feature is in
Flink 1.3.0)

If the above does not solve your issue, you can control the memory usage of
RocksDB by tuning the following values and check your performance :

*DBOptions: *
     (along with the FLASH_SSD_OPTIONS add the following)
     maxBackgroundCompactions(4)

*ColumnFamilyOptions:*
  max_buffer_size : 512 MB
  block_cache_size : 128 MB
  max_write_buffer_number : 5
  minimum_buffer_number_to_merge : 2
  cacheIndexAndFilterBlocks : true
  optimizeFilterForHits: true


I would recommend reading the following documents:

*Memory usage of RocksDB* :
https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB

*RocksDB Tuning Guide:*
https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide

Hope it helps.


Regards,
Vinay Patil

On Tue, Jul 25, 2017 at 6:51 PM, Shashwat Rastogi [via Apache Flink User
Mailing List archive.] <ml+s2336050n14439...@n4.nabble.com> wrote:

> Hi,
>
> We have several Flink jobs, all of which reads data from Kafka do some
> aggregations (over sliding windows of (1d, 1h)) and writes data to
> Cassandra. Something like :
>
> ```
> DataStream<String> lines = env.addSource(new FlinkKafkaConsumer010( … ));
> DataStream<Event> events = lines.map(line -> parse(line));
> DataStream<Statistics> stats = stream
>         .keyBy(“id”)
>         .timeWindow(1d, 1h)
>         .sum(new MyAggregateFunction());
> writeToCassandra(stats);
> ```
>
> We recently made a switch to RocksDbStateBackend, for it’s suitability for
> large states/long windows. However, after making the switch a memory issues
> has come up, the memory utilisation on TaskManager gradually increases from
> 50 GB to ~63GB until the container is killed. We are unable to figure out
> what is causing this behaviour, is there some memory leak on the RocksDB ?
>
> How much memory should we allocate to the Flink TaskManager? Since,
> RocksDB is a native application and it does not use the JVM how much of the
> memory should we allocate/leave for RocksDB (out of 64GB of total memory).
> Is there a way to set the maximum amount of memory that will be used by
> RocksDB so that it doesn’t overwhelms the system? Are there some
> recommended optimal settings for RocksDB for larger states (for 1 day
> window average state size is 3GB).
>
> Any help would be greatly appreciated. I am using Flink v1.2.1.
> Thanks in advance.
>
> Best,
> Shashwat
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Memory-Leak-Flink-RocksDB-tp14439.html
> To start a new topic under Apache Flink User Mailing List archive., email
> ml+s2336050n1...@n4.nabble.com
> To unsubscribe from Apache Flink User Mailing List archive., click here
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx>
> .
> NAML
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Memory-Leak-Flink-RocksDB-tp14439p14441.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Reply via email to