Hi Zakelly,

thanks for the information, that's interesting. Would you say that reading
a subset from RocksDB is fast enough to be pretty much negligible, or could
it be a bottleneck if the state of each key is "large"? Again assuming the
number of distinct partition keys is large.

Regards,
Alexis.

On Sun, 18 Feb 2024, 05:02 Zakelly Lan, <zakelly....@gmail.com> wrote:

> Hi Alexis,
>
> Flink does need some heap memory to bridge requests to rocksdb and gather
> the results. In most cases, the memory is discarded immediately (eventually
> collected by GC). In case of timers, flink do cache a limited subset of
> key-values in heap to improve performance.
>
> In general you don't need to consider its heap consumption since it is
> minor.
>
>
> Best,
> Zakelly
>
> On Fri, Feb 16, 2024 at 4:43 AM Asimansu Bera <asimansu.b...@gmail.com>
> wrote:
>
>> Hello Alexis,
>>
>> I don't think data in RocksDB resides in JVM even with function calls.
>>
>> For more details, check the link below:
>>
>> https://github.com/facebook/rocksdb/wiki/RocksDB-Overview#3-high-level-architecture
>>
>> RocksDB has three main components - memtable, sstfile and WAL(not used in
>> Flink as Flink uses checkpointing). When TM starts with statebackend as
>> RocksDB,TM has its own RocksDB instance and the state is managed as column
>> Family by that TM. Any changes of state go into memtable --> sst-->
>> persistent store. When read, data goes to the buffers and cache of RocksDB.
>>
>> In the case of RocksDB as state backend, JVM still holds threads stack as
>> for high degree of parallelism, there are many stacks maintaining separate
>> thread information.
>>
>> Hope this helps!!
>>
>>
>>
>>
>>
>> On Thu, Feb 15, 2024 at 11:21 AM Alexis Sarda-Espinosa <
>> sarda.espin...@gmail.com> wrote:
>>
>>> Hi Asimansu
>>>
>>> The memory RocksDB manages is outside the JVM, yes, but the mentioned
>>> subsets must be bridged to the JVM somehow so that the data can be exposed
>>> to the functions running inside Flink, no?
>>>
>>> Regards,
>>> Alexis.
>>>
>>>
>>> On Thu, 15 Feb 2024, 14:06 Asimansu Bera, <asimansu.b...@gmail.com>
>>> wrote:
>>>
>>>> Hello Alexis,
>>>>
>>>> RocksDB resides off-heap and outside of JVM. The small subset of data
>>>> ends up on the off-heap in the memory.
>>>>
>>>> For more details, check the following link:
>>>>
>>>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_setup_tm/#managed-memory
>>>>
>>>> I hope this addresses your inquiry.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Feb 15, 2024 at 12:52 AM Alexis Sarda-Espinosa <
>>>> sarda.espin...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Most info regarding RocksDB memory for Flink focuses on what's needed
>>>>> independently of the JVM (although the Flink process configures its limits
>>>>> and so on). I'm wondering if there are additional special considerations
>>>>> with regards to the JVM heap in the following scenario.
>>>>>
>>>>> Assuming a key used to partition a Flink stream and its state has a
>>>>> high cardinality, but that the state of each key is small, when Flink
>>>>> prepares the state to expose to a user function during a call (with a 
>>>>> given
>>>>> partition key), I guess it loads only the required subset from RocksDB, 
>>>>> but
>>>>> does this small subset end (temporarily) up on the JVM heap? And if it
>>>>> does, does it stay "cached" in the JVM for some time or is it immediately
>>>>> discarded after the user function completes?
>>>>>
>>>>> Maybe this isn't even under Flink's control, but I'm curious.
>>>>>
>>>>> Regards,
>>>>> Alexis.
>>>>>
>>>>

Reply via email to