A. Sophie Blee-Goldman created KAFKA-14460:
----------------------------------------------

             Summary: In-memory store iterators can return results with null 
values
                 Key: KAFKA-14460
                 URL: https://issues.apache.org/jira/browse/KAFKA-14460
             Project: Kafka
          Issue Type: Bug
          Components: streams
            Reporter: A. Sophie Blee-Goldman


Due to the thread-safety model we adopted in our in-memory stores to avoid 
scaling issues, we synchronize all read/write methods and then during range 
scans, copy the keyset of all results rather than returning a direct iterator 
over the underlying map. When users call #next to read out the iterator 
results, we issue a point lookup on the next key and then simply return a new 
KeyValue<>(key, get(key))

This lets the range scan return results without blocking access to the store by 
other threads and without risk of ConcurrentModification, as a writer can 
modify the real store without affecting the keyset copy of the iterator. This 
also means that those changes won't be reflected in what the iterator sees or 
returns, which in itself is fine as we don't guarantee consistency semantics of 
any kind.

However, we _do_ guarantee that range scans "must not return null values" – and 
this contract may be violated if the StreamThread deletes a record that the 
iterator was going to return.

tl;dr we should check get(key) for null and skip to the next result if 
necessary in the in-memory store iterators. See for example 
InMemoryKeyValueIterator (note that we'll probably need to buffer one record in 
advance before we return true from #hasNext)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to