On Wed, Feb 23, 2011 at 10:40 AM, Les Mikesell <[email protected]> wrote: > On 2/23/2011 9:11 AM, Bryan Fink wrote: >> >> On Fri, Feb 18, 2011 at 8:54 PM, Les Mikesell<[email protected]> >> wrote: >>> >>> What happens if there is a read of the object while it is in the process >>> of >>> being updated if the update is several different operations? >> >> Luwak streams work in an "all or nothing" fashion. That is, no read >> will see the result of any stream until that stream is flushed. Luwak >> blocks are immutable, so old file trees will still reference >> completely valid old blocks while new ones are being written. The >> last action of flushing a stream is to point the file-metdata object >> (in the luwak_tld bucket) at the head of the new tree. >> >> A flush will only occur when a stream closes, unless your program >> explicitly calls luwak_put_stream:flush/1. > > > Thanks! A couple more somewhat related questions: is that atomic update > nature hard to duplicate outside of luwak (say by a client that needs to > keep several items in sync), and if the luwak blocks are immutable, how do > you ever clean up the space used by data that has been deleted or modified > and no longer referenced?
(Ryan Zezeski sent correct answers before I could finish this, but I'm sending anyway, with hopefully extra information.) Well, these two behaviors are partially related. It's easy to duplicate this behavior: write the new versions of your items, without removing the old versions, then when you're finished, replace the object that says which version of those items is the latest. It's akin to the old filesystem trick of writing out a new file, then using 'rename' to move it in place of the old one. (In reference to your followup email, yes, Luwak accomplishes this by effectively (tree-wise) putting all the keys in one object, which is updated last.) But, you've hit one one of Luwak's major specializations: it was originally designed for immutable data, and so it does nothing about cleaning up unreferenced blocks. At this point, it's a distributed online garbage collection problem that we haven't written a solution for yet. If you can pause all updates to Luwak, and be sure that the data is stable (i.e. no conflicts hidden by unreachable nodes), it's relatively simple to mark&sweep the luwak_node bucket, based on pointers from the luwak_tld bucket. There's even some history (look at the "ancestors" property of the file object) that might help out. But, (as both you and Ryan figured out) doing this live involves much more bookkeeping to keep track of not only blocks shared between files, but also blocks that are not linked solely because a stream hasn't finished flushing yet. /me takes down a note to review Ryan's GC experiments -Bryan _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
