Hi Tomek, To start with I think a flat file based approach should be fine. While working on [1] it was observed that 2M blobId consumed 500MB memory. As this logic is to be implemented in oak-run probably it should be fine for now to just use a in memory HashSet
Later if it becomes problem we can think of some offheap solution. You can also look into using MVStore which is being used in DocumentNodeStore for persistent cache. Chetan Mehrotra [1] https://issues.apache.org/jira/browse/OAK-2882?focusedCommentId=14550198&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14550198 On Mon, Aug 24, 2015 at 5:17 PM, Tomek Rekawek <[email protected]> wrote: > Hello, > > I started working on OAK-3148, which is a new feature that allows to > gradually migrate blobs from one store to another, without turning off the > instance. In order to create the SplitBlobStore I need a way to remember (and > save) already transferred blob ids. > > So, basically I need a persistent and mutable set of strings. Do we have > something like this in Oak already? I thought about a few custom solutions: > > 1. Saving blob ids in a file (at the beginning it can be a flat text file, > then some b-tree), with a memory cache and/or bloom filter. > - but it adds complexity, requires the maintenance, etc. > 2. Creating SegmentNodeStore, with bucketing via the hashcode > - but running the second segment node store just to persist a bunch of ids > seems a little excessive. > 3. Custom cache solution, like ehcache > - but adding a new, big library just to support this feature doesn’t seem > right as we have to deal with dependency versions, embedding, etc. > > So, maybe there is some lightweight and reliable “4” in the Oak already? > > Thanks, > Tomek > > -- > Tomek Rękawek | Adobe Research | www.adobe.com > [email protected]
