[
https://issues.apache.org/jira/browse/HDDS-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17160023#comment-17160023
]
Ethan Rose commented on HDDS-3976:
----------------------------------
Implementing seekLast() correctly and efficiently with a filter requires
seeking to the end of the RocksIterator and stepping back over mismatched keys
using its prev() method. Because KeyValueBlockIterator uses a MetaStoreIterator
as a wrapper over the RocksIterator, and MetaStoreIterator does not support
prev(), this method cannot be implemented in the current state of the code.
Options:
# Add support for prev() to MetaStoreIterator.
# Remove KeyValueBlockIterator#seekToLast since it is only used in tests.
> KeyValueBlockIterator#nextBlock skips valid blocks
> --------------------------------------------------
>
> Key: HDDS-3976
> URL: https://issues.apache.org/jira/browse/HDDS-3976
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Reporter: Ethan Rose
> Assignee: Ethan Rose
> Priority: Major
>
> HDDS-3854 fixed a bug in KeyValueBlockIterator#hasNext, but introduced
> another one in KeyValueBlockIterator#nextBlock, which depends on the behavior
> of that method. When the first key encountered does not pass the filter, the
> internal nextBlock field is never intialized. Then a call to nextBlock()
> results in call to hasNext() which returns true, which recursively calls
> nextBlock(), again calling hasNext(), etc until the end of the set is reached
> and an exception is thrown. This skips all valid keys that may occur past the
> first invalid key.
> Additionally, the current implementation of KeyValueBlockIterator#seekLast
> depends on the internal RocksDB iterators seekLast() method, which will skip
> to the last key in the DB regardless of whether it matches the filter or not.
> This could be different from last key according to the filter.
> This bug was identified while working on HDDS-3869, which adds a strong
> typing layer before objects are serialized into RocksDB for datanode. Due to
> RocksDB internals, this changes the database layout so that all prefixed keys
> are returned at the beginning of the key set, instead of in the end. Since
> the original layout returned all prefixed keys at the end of the key set,
> this bug was not evident in any of the original unit tests, since the
> behavior described above could not occur.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]