[ 
https://issues.apache.org/jira/browse/OAK-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316324#comment-15316324
 ] 

Amit Jain commented on OAK-4430:
--------------------------------

The method {{DataStoreBlobStore#getAllChunkIds}} also used the DataRecord 
fetched to encode the length in the id. Considering that this method has only 
one consumer i.e. the {{MarkSweepGarbageCollector}}, we could alter this method 
itself to not encode the blob ids with the length and clearly specify in the 
javadocs. Alternately, we could add an overloaded method that returns all raw 
blob ids.
Either way this would require a method which the gc class can use to get a raw 
id given a length encoded id which the "node store referenced blobs"  
collection phase would return.

[~chetanm] wdyt?

> DataStoreBlobStore#getAllChunkIds fetches DataRecord when not needed
> --------------------------------------------------------------------
>
>                 Key: OAK-4430
>                 URL: https://issues.apache.org/jira/browse/OAK-4430
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob
>            Reporter: Amit Jain
>            Assignee: Amit Jain
>              Labels: candidate_oak_1_0, candidate_oak_1_2, candidate_oak_1_4
>             Fix For: 1.5.3
>
>
> DataStoreBlobStore#getAllChunkIds loads the DataRecord for checking that the 
> lastModifiedTime criteria is satisfied against the given 
> {{maxLastModifiedTime}}. 
> When the {{maxLastModifiedTime}} has a value 0 it  effectively means ignore 
> any last modified time check (and which is the only usage currently from 
> MarkSweepGarbageCollector). This should ignore fetching the DataRecords as 
> this can be very expensive for e.g on calls to S3 with millions of blobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to