Re: [MongoMK] BlobStore garbage collection

Thomas Mueller Wed, 07 Nov 2012 01:48:56 -0800

Hi,

Didn't we talk once about defining a format for blob id references, so
that a value of the format "bin:{blobId}" (or similar) is reference?


Regards,
Thomas



On 11/7/12 10:17 AM, "Michael Dürig" <[email protected]> wrote:

>
>On a related note: how does the garbage collector even find out whether
>a binary is "referenced"? That is, on the Microkernel level, what does
>it actually mean for a binary to be referenced?
>
>Michael
>
>On 6.11.12 18:45, Michael Marth wrote:
>> this might be a weird question from the leftfield, but are we actually
>>sure that the existing data store concept is worth the trouble? afaiu it
>>saves us from storing the same binary twice, but leads into the DSGC
>>topic. would it be possible to make it optional to store/address
>>binaries by hash (and thus not need DSGC for these configurations)?
>>
>> In any case we should definitely avoid to require repo traversal for
>>DSGC. This would operationally limit the repo sizes Oak can support.
>>
>>
>> --
>> Michael Marth | Engineering Manager
>> +41 61 226 55 22 | [email protected]<mailto:[email protected]>
>> Barfüsserplatz 6, CH-4001 Basel, Switzerland
>>
>> On Nov 6, 2012, at 9:24 AM, Thomas Mueller wrote:
>>
>> Hi,
>>
>> 1- What's considered an "old" node or commit? Technically, anything
>>other
>> than the head revision is old but can we remove them right away or do we
>> need to retain a number of revisions? If the latter, then how far back
>>do
>> we need to retain?
>>
>> we discussed this a while back, no good solution back then[1]
>>
>> Yes. Somebody has to decide which revisions are no longer needed.
>>Luckily
>> it doesn't need to be us :-) We might set a default value (10 minutes or
>> so), and then give the user the ability to change that, depending on
>> whether he cares more about disk space or the ability to read old data /
>> roll back to an old state.
>>
>> To free up disk space, BlobStore garbage collection is actually more
>> important, because usually 90% of the disk space is used by the
>>BlobStore.
>> So it would be nice if items (files) in the BlobStore are deleted as
>>soon
>> as possible after deleting old revisions. In Jackrabbit 2.x we have seen
>> that node and data store garbage collection that has to traverse the
>>whole
>> repository is problematic if the repository is large. So garbage
>> collection can be a scalability issue: if we have to traverse all
>> revisions of all nodes in order to delete unused data, we basically tie
>> garbage collection speed with repository size, unless if we find a way
>>to
>> run it in parallel. But running mark & sweep garbage collection
>>completely
>> in parallel is not easy (is it even possible? if yes I would have
>>guessed
>> modern JVMs should have it since a long time). So I think if we don't
>>need
>> to traverse the repository to delete old nodes, but just traverse the
>> journal, this would be much less of a problem.
>>
>> Regards,
>> Thomas
>>
>>
>>

Re: [MongoMK] BlobStore garbage collection

Reply via email to