On a related note, I think we also need a NodeStore (nodes & commits) garbage collection in MongoMK. Otherwise, MongoDB will be full of old node and commit data with no real benefit. The basic implementation idea is to have a background task to periodically go through old nodes and commits and delete them but this raises questions such as:
1- What's considered an "old" node or commit? Technically, anything other than the head revision is old but can we remove them right away or do we need to retain a number of revisions? If the latter, then how far back do we need to retain? 2- How often should the NodeStore GC run and for how long? How should this be controlled? 3- Do other MicroKernel implementations handle this, if so how? If you have any feedback on any of this, I'd like to hear. -Mete On 11/2/12 4:38 PM, "Mete Atamel" <[email protected]> wrote: >Thanks. Yes, I also think it's worthwhile to try implementing MongoDB >BlobStore based on AbstractBlobStore. Do we have tests somewhere where we >can compare different BlobStore implementations? > >-Mete > >On 11/2/12 3:50 PM, "Thomas Mueller" <[email protected]> wrote: > >>Hi, >> >>I would definitely at least *try* to implement a MongoDB BlobStore based >>on the AbstractBlobStore. It should be quite simple (one class). Then, it >>would be interesting to know which implementation is faster: the GridFS >>one or an implementation based on AbstractBlobStore :-) Specially if the >>difference is big. If GridFS is faster, maybe we could learn something >>from them. >> >>It looks like GridFS uses md5 hashes, that sounds a bit risky to me, >>specially if anonymous users can create binaries. An attacker could >>upload >>two files with the same md5 hash, which would at least "confuse" Oak and >>maybe GridFS, or maybe worse. I mean, using md5 for your own files is >>fine, but it seems problematic for Oak, because it would somewhat limit >>the use cases. >> >>Regards, >>Thomas >> >> >> >> >> >> >> >>On 11/2/12 10:30 AM, "Mete Atamel" <[email protected]> wrote: >> >>>Hi, >>> >>>One of the things I need to implement for MongoMK is BlobStore garbage >>>collection. I see that there's an initial implementation for garbage >>>collection in AbstractBlobStore in oak-mk and I also see this bug [0] to >>>improve that initial implementation. >>> >>>MongoMK uses a GridFS based BlobStore, separate from AbstractBlobStore >>>in >>>oak-mk. I could potentially come up with my own GC, based on that GridFS >>>implementation, or I could try a new AbstractBlobStore implementation >>>for >>>MongoMK (not GridFS based). With the second approach, I potentially get >>>current and future garbage collection improvements for free. >>> >>>Not sure which path to follow yet but I wanted to see what others >>>thought >>>before starting to work on it. >>> >>>Thanks, >>>Mete >>> >>>[0] https://issues.apache.org/jira/browse/OAK-377 >>> >> >
