[ 
https://issues.apache.org/jira/browse/OAK-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16355158#comment-16355158
 ] 

Thomas Mueller commented on OAK-5272:
-------------------------------------

[~amitjain] what about the case where one blob has a SHA-1 content hash, and 
the other has a SHA-256 content hash?
The content hash is different, but the content could still be the same.

>  currently the BlobStore(s) are not aware of the Blob object.

Are blobs aware of the blob store? If yes, what about adding a method "compare 
content" to the blob, something like this:

{noformat}
    public enum Equality { EQUALS, DIFFERENT, UNKNOWN };

    public Equality compareContent(Blob other) {
        if (this == other) {
            return Equality.EQUALS;
        } else if (other == null) {
            return Equality.DIFFERENT;
        } 
        if (length() != other.length()) {
            return Equality.DIFFERENT;
        }
        if (!blobStore.hasContentAdressableBlobIds()) {
            return Equality.UNKNOWN;
        }
        // TODO is strict type check needed, or is "instanceof" sufficient?
        if (other.getClass() == getClass()) {
            BlobStoreBlob otherBlob = (BlobStoreBlob) other;
            if (!otherBlob.blobStore.hasContentAdressableBlobIds()) {
                return Equality.UNKNOWN;
            }
            // TODO maybe blobId contains the length? in this case, truncate 
that part
            if (otherBlob.blobId.length() != blobId.length()) {
                return Equality.UNKNOWN;
            }
            return blobId.equals(otherBlob.blobId) ? Equality.EQUALS : 
Equality.DIFFERENT;
        }
        return Equality.UNKNOWN;
    }
{noformat}

I know, many cases...

Your method is still needed, but we would need to extend the description a bit:

{noformat}
/**
     *
     * Will return true if blob ids are generated from content hash.
     * Content hashes of the same length can be used for equality checks
     * (content hashes of different length are generated with different 
algorithms).
     *
     * @return true if blobs are content addressable
     */
    boolean hasContentAdressableBlobIds();
{noformat}



> Expose BlobStore API to provide information whether blob id is content hashed
> -----------------------------------------------------------------------------
>
>                 Key: OAK-5272
>                 URL: https://issues.apache.org/jira/browse/OAK-5272
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: blob
>            Reporter: Amit Jain
>            Priority: Major
>
> As per discussion in OAK-5253 it's better to have some information from the 
> BlobStore(s) whether the blob id can be solely relied upon for comparison.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to