>> So even adding a 2 >>MB chunk on a sharded system over remote connection would block read >>for that complete duration. So at minimum we should be avoiding that.
I guess if there are read replicas in the shard replica set then, it will mitigate the effect to some extent On Wed, Oct 30, 2013 at 3:04 PM, Chetan Mehrotra <chetan.mehro...@gmail.com>wrote: > > sounds reasonable. what is the impact of such a design when it comes > > to map-reduce features? I was thinking that we could use it e.g. for > > garbage collection, but I don't know if this is still an option when data > > is spread across multiple databases. > > Would investigate that aspect further > > > connecting to a second server would add quite some complexity to > Yup. Option was just provided for completeness sake. And something > like this would probably never be required. > > > that was one of my initial thoughts as well, but I was wondering what > > the impact of such a deployment is on data store garbage collection. > > Probably we can make a shadow node for the binary in the blob > collection and keep the binary content within the DataStore itself. > Stuff like Garbage collection would be performed on the Shadow node > and logic would use results from that to perform actual deletions. > > > Chetan Mehrotra > > > On Wed, Oct 30, 2013 at 1:13 PM, Marcel Reutegger <mreut...@adobe.com> > wrote: > > Hi, > > > >> Currently we are storing blobs by breaking them into small chunks and > >> then storing those chunks in MongoDB as part of blobs collection. This > >> approach would cause issues as Mongo maintains a global exclusive > >> write locks on a per database level [1]. So even writing multiple > >> small chunks of say 2 MB each would lead to write lock contention. > > > > so far we observed high lock content primarily when there are a lot of > > updates. inserts were not that big of a problem, because you can batch > > them. it would probably be good to have a test to see how big the > > impact is when blogs come into play. > > > >> Mongo also provides GridFS[2]. However it also uses a similar strategy > >> like we are currently using and such a support is built into the > >> Driver. For server they are just collection entries. > >> > >> So to minimize contentions for write locks for uses cases where big > >> assets are being stored in Oak we can opt for following strategies > >> > >> 1. Store the blobs collection in a different database. As Mongo write > >> locks [1] are taken per db level then storing the blobs in different > >> db would allow the read/write of node data (majority usecase) to > >> continue. > > > > sounds reasonable. what is the impact of such a design when it comes > > to map-reduce features? I was thinking that we could use it e.g. for > > garbage collection, but I don't know if this is still an option when data > > is spread across multiple databases. > > > >> 2. For more asset/binary heavy usecase use a separate database server > >> itself to server the binaries. > > > > connecting to a second server would add quite some complexity to > > the system. wouldn't it be easier to just leverage standard mongodb > > sharding to distribute the load? > > > >> 3. Bring back the JR2 DataStore implementation and just save metadata > >> related to binaries in Mongo. We already have S3 based implementation > >> there and they would continue to work with Oak also > > > > that was one of my initial thoughts as well, but I was wondering what > > the impact of such a deployment is on data store garbage collection. > > > > regards > > marcel > > > >> Chetan Mehrotra > >> [1] http://docs.mongodb.org/manual/faq/concurrency/#how-granular-are- > >> locks-in-mongodb > >> [2] http://docs.mongodb.org/manual/core/gridfs/ >