[
https://issues.apache.org/jira/browse/OAK-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jukka Zitting updated OAK-1392:
-------------------------------
Attachment: 0001-OAK-1392-SegmentBlob.equals-optimization.patch
See the attached 0001 patch for a way to handle #3. When encountering a
property that is being overwritten, we check if the value has really changed
and reuse the old value when possible.
Note that we didn't do this earlier in order to keep the property value records
in the same segment with the node records. However, with the better string
cache we now have it's actually better to reuse old value records whenever
possible.
Also, after this change it would actually make sense to refactor the
{{SegmentWriter.writeNode()}} method to use {{compareAgainstBaseState()}} when
working against a base {{SegmentNodeState}}. But that's best handled in a
separate issue.
> SegmentBlob.equals() optimization
> ---------------------------------
>
> Key: OAK-1392
> URL: https://issues.apache.org/jira/browse/OAK-1392
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core
> Reporter: Jukka Zitting
> Attachments: 0001-OAK-1392-SegmentBlob.equals-optimization.patch,
> OAK-1392-v0.patch
>
>
> The current {{SegmentBlob.equals()}} method only checks for reference
> equality before falling back to the {{AbstractBlob.equals()}} method that
> just scans the entire byte stream.
> This works well for the majority of cases where a binary won't change at all
> or at least not often. However, there are some cases where a client
> frequently updates a binary or even rewrites it with the exact same contents.
> We should optimize the handling of also those cases.
> Some ideas on different things we can/should do:
> # Make {{AbstractBlob.equals()}} compare the blob lengths before scanning the
> byte streams. If a blob has changed it's length is likely also different, in
> which case the length check should provide a quick shortcut.
> # Keep a simple checksum like Adler-32 along with medium-sized value records
> and the block record references of a large value record. Compare those
> checksums before falling back to a full byte scan. This should capture
> practically all cases where the binaries are different even with equal
> lengths, but still not the case where they're equal.
> # When updating a binary value, do an equality check with the previous value
> and reuse the previous value if equal. The extra cost of doing this should
> get recovered already when the commit hooks that look at the change won't
> have to consider an unchanged binary.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)