Jukka Zitting created OAK-1392:
----------------------------------

             Summary: SegmentBlob.equals() optimization
                 Key: OAK-1392
                 URL: https://issues.apache.org/jira/browse/OAK-1392
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: core
            Reporter: Jukka Zitting


The current {{SegmentBlob.equals()}} method only checks for reference equality 
before falling back to the {{AbstractBlob.equals()}} method that just scans the 
entire byte stream.

This works well for the majority of cases where a binary won't change at all or 
at least not often. However, there are some cases where a client frequently 
updates a binary or even rewrites it with the exact same contents. We should 
optimize the handling of also those cases.

Some ideas on different things we can/should do:

# Make {{AbstractBlob.equals()}} compare the blob lengths before scanning the 
byte streams. If a blob has changed it's length is likely also different, in 
which case the length check should provide a quick shortcut.
# Keep a simple checksum like Adler-32 along with medium-sized value records 
and the block record references of a large value record. Compare those 
checksums before falling back to a full byte scan. This should capture 
practically all cases where the binaries are different even with equal lengths, 
but still not the case where they're equal.
# When updating a binary value, do an equality check with the previous value 
and reuse the previous value if equal. The extra cost of doing this should get 
recovered already when the commit hooks that look at the change won't have to 
consider an unchanged binary.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to