Thanks Hangxiang and Alex for the pointers. Have added audit logs into
RocsDBValueState (GET call: value() and PUT call: update()) and found
nothing wrong on the RocsDB side. It never sends Null to the GET call for
the key, which was PUT earlier. Then we added audit logs into the CX
application and found they have a cache (HashMap) on top of
RocsDBValueState to speed up, which is where the issue is. The application
checks the key from the cache first, and if it does not exist, it gets it
from RocsDBValueState. There is a race condition in their code where they
override the RocsDBValueState with a new entry that does not have the
previous state, causing an issue.

Sorry for the confusion; it turned out to be a problem on the Flink
Application side rather than the Framework side.




On Tue, Jun 27, 2023 at 2:53 PM Alexander Fedulov <
alexander.fedu...@gmail.com> wrote:

> Hi Prabhu,
>
> make sure that the key you use is the same for both records and try to
> reproduce the issue with the level of parallelism of 1.
>
> Best,
> Alex
>
> On Sun, 25 Jun 2023 at 04:29, Hangxiang Yu <master...@gmail.com> wrote:
>
>> Hi, Prabhu.
>>
>> This is a correctness issue. IIUC, It should not be related to the size
>> of the block cache, write buffer, or whether the bloom filter is enabled.
>>
>> Is your job a DataStream job? Does the job contain a custom Serializer?
>> You could check or share the logic of the Serializer, as this is one of the
>> main differences between RocksDBStateBackend and HashMapStateBackend
>> (HashMapStateBackend does not perform serialization and deserialization).
>>
>> On Wed, Jun 21, 2023 at 3:44 PM Prabhu Joseph <prabhujose.ga...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> RocksDB State Backend GET call on a key that was PUT into the state like
>>> 100 ms earlier but is not returned intermittently. The issue never happened
>>> with the HashDB State backend. We are trying to increase block cache size,
>>> write buffer size, and enable bloom filter as per the doc: -
>>> https://flink.apache.org/2021/01/18/using-rocksdb-state-backend-in-apache-flink-when-and-how/
>>>
>>> Any ideas on what could be wrong or how to debug this?
>>>
>>> Thanks,
>>> Prabhu Joseph
>>>
>>
>>
>> --
>> Best,
>> Hangxiang.
>>
>

Reply via email to