Wim Symons created OAK-7859:
-------------------------------
Summary: S3 Bucket iterator stops too early
Key: OAK-7859
URL: https://issues.apache.org/jira/browse/OAK-7859
Project: Jackrabbit Oak
Issue Type: Bug
Components: blob-cloud
Affects Versions: 1.6.6
Reporter: Wim Symons
Fixed a major bug in the S3 bucket iterator.
When the returned queue of records is empty due to the fact that we get a full
page of records starting with the META/ key, the iterator stops while there is
still data available in the bucket.
This causes problems with datastore GC, and datastore consistency checks (both
online and offline), and possibly even more.
A little explainer. But based on a batch size of 2 instead of 1000.
Suppose your list of S3 keys looks as follows:
* 1
* 2
* 3
* 4
* META/1
* META/2
* 5
* 6
loadBatch would first load [1, 2], filter out no META/ keys and pass [1, 2] to
the caller.
Next time, loadBatch would load [3, 4], filter out no META/ keys and pass [3,
4] to the caller.
Than, loadBatch would load [META/1, META/2], filter out the META/ keys and pass
[] to the caller.
When that happens, traversing the bucket would stop, because the returned list
is empty, even if there are many more batches to load.
The fix checks if the returned list is empty and there are more batches
available, it would load (a) new batch(es) until there is data in the batch or
there is no more batch available.
We are currently running Oak 1.6.6 on AEM 6.3.1.2, but as the bug is still in
trunk, all previous versions of Oak are affected as well.
I provided 2 pull requests: one for trunk
([https://github.com/apache/jackrabbit-oak/pull/103)] and one for the 1.6
branch ([https://github.com/apache/jackrabbit-oak/pull/104).]
CI failed on [https://github.com/apache/jackrabbit-oak/pull/103,] but I don't
think it's related to my changes.
For the record, the patch works as I was able to successfully test this on our
production repository using oak-run --id. With version 1.6.6 it reported 800k
items, with my patched version, it reported 1.8m items. (As our META/ nodes are
listed somewhere half-way through.)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)