Hi, On Tue, Jan 29, 2013 at 1:21 PM, Thomas Mueller <[email protected]> wrote: > It's not clear to me how to support scalable concurrent writes. This is > also a problem with the current MongoMK design, but I in your design I > actually see more problems in this area (concurrent writes to nodes in the > same segment for example). But maybe it's just that I don't understand > this part of your design yet..
Segments are immutable, so a commit would create a new segment instead of modifying an existing one. The new segment would contain just the modified parts of the tree and refer to the older segment(s) for the remaining tree. A quick estimate of the size overhead of a minimal commit that updates just a single property is in the order of hundreds of bytes, depending a bit on the content structure. > The data format in your proposal seems to be binary and not Json. For me, > using Json would have the advantage that we can use MongoDb features > (queries, indexes, atomic operations, debugging,..). With your design, > only 1% of the MongoDb features could be used (store a record, read a > record), so that basically we would need to implement the remaining > features ourselves. On the other hand, it would be extremely simple to > port to another storage engine. As far as I understand, all the data might > as well be stored in the data store / blob store with very little changes. Right. In addition to storage-independence, the main reasons for going with a custom binary format instead of JSON was to avoid having to parse an entire segment just to access an individual node or value. Note that the proposed design actually does rely on lots of MongoDB features beyond basic CRUD. Things like sharding, distributed access, atomic updates, etc. are essential for the design to scale up well. > As far as I understand, a commit where only one single value is changed > would result in one journal entry and one segment. I was thinking, would > it be possible to split a segment / journal into smaller blocks in such > case, but I'm not sure how complex that would be. And the reverse: merge > small segments from time to time. Indeed, see my response to Marcel's post. BR, Jukka Zitting
