Matt Ryan created OAK-7090:
------------------------------

             Summary: Use Bloom filters for composite data store blob ID lookup 
table
                 Key: OAK-7090
                 URL: https://issues.apache.org/jira/browse/OAK-7090
             Project: Jackrabbit Oak
          Issue Type: Technical task
            Reporter: Matt Ryan


The composite data store attempts to keep a mapping of blob ids to delegates 
where that blob id should be found.  We should use Bloom filters to make this 
mapping more efficient.

There are a couple of challenges with implementing Bloom filters for this 
purpose.
# Determining the appropriate size of the Bloom filter.  Assuming OAK-7089 is 
completed before this one, we should have a reasonable guess as to the number 
of blob IDs at startup time, but this may change over time.  This may require a 
task to rebuild the table for a more appropriate size once the table becomes 
too full (too many false positives).
# Handling deletions.  Once a record has been deleted, the corresponding blob 
ID may also need to be removed (similar algorithm to data store GC).  Bloom 
filters don't typically handle deletions though.  This may require something 
like e.g. [Invertible Bloom 
Filter|http://www.i-programmer.info/programming/theory/4641-the-invertible-bloom-filter.html],
 or this may be as simple as using data store GC time to rebuild the Bloom 
filter appropriately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to