Jukka Zitting created OAK-1392:
----------------------------------
Summary: SegmentBlob.equals() optimization
Key: OAK-1392
URL: https://issues.apache.org/jira/browse/OAK-1392
Project: Jackrabbit Oak
Issue Type: Improvement
Components: core
Reporter: Jukka Zitting
The current {{SegmentBlob.equals()}} method only checks for reference equality
before falling back to the {{AbstractBlob.equals()}} method that just scans the
entire byte stream.
This works well for the majority of cases where a binary won't change at all or
at least not often. However, there are some cases where a client frequently
updates a binary or even rewrites it with the exact same contents. We should
optimize the handling of also those cases.
Some ideas on different things we can/should do:
# Make {{AbstractBlob.equals()}} compare the blob lengths before scanning the
byte streams. If a blob has changed it's length is likely also different, in
which case the length check should provide a quick shortcut.
# Keep a simple checksum like Adler-32 along with medium-sized value records
and the block record references of a large value record. Compare those
checksums before falling back to a full byte scan. This should capture
practically all cases where the binaries are different even with equal lengths,
but still not the case where they're equal.
# When updating a binary value, do an equality check with the previous value
and reuse the previous value if equal. The extra cost of doing this should get
recovered already when the commit hooks that look at the change won't have to
consider an unchanged binary.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)