2013/11/26 MORITA Kazutaka <[email protected]> > At Mon, 25 Nov 2013 17:02:06 +0800, > Liu Yuan wrote: > > > > On Mon, Nov 25, 2013 at 05:43:19PM +0900, MORITA Kazutaka wrote: > > > At Mon, 25 Nov 2013 15:03:46 +0800, > > > Robin Dong wrote: > > > > > > > > The present implementation of http/swift is not perfect, it can't > create > > > > too much containers or objects. So we want to store all objects in > one > > > > hyper volume vdi and use new structure 'obj-inode' to identify its > offset > > > > and length in this vdi, just like some local file system. To achieve > this, > > > > we need distributed locks to ensure that only one thread can create > a new > > > > 'obj-inode' (or delete) in this vdi at a same time. > > > > > > > > This patch set is a try to implement the distributed lock. > > > > > > > > If we add code in sheep/cluster/zookeeper.c and use the framework of > > > > cluster to implement this distributed lock, then we have to add > > > > implementation for corosyncălocal and shepherd. That's too > complicated. So > > > > what we need is adding lock.c in sheep/http/ and only use it in http > > > > interface. > > > > > > If possible, I don't like to see zookeeper specific codes out side of > > > sheep/cluster/zookeeper.c. Can we use a SD_OP_TYPE_CLUSTER operation > > > for your purpose? It works like a cluster-wide distributed lock. > > > > > > For example, vdi creation works like as follows. > > > > > > 1. When sheep receives a SD_OP_NEW_VDI operation, sheep calls > > > cdrv->block() to block all the other cluster operations. > > > > > > 2. Sheep calls cluster_new_vdi() in sd_block_handler(). It is > > > ensured that no other sheep call sd_block_handler() at the same > > > time. This is necessary here because sheepdog doesn't allow > > > concurrent vdi creation requests. > > > > > > 3. All the sheep in the cluster call post_cluster_new_vdi() in > > > sd_notify_handler(). It is usually used for notification or > > > cleanups. > > > > > > > I don't think this approach is effecient though it is simpler because we > can > > make use of exsiting mechanism, since: > > > > - it can't scale, meaning there is only one lock in the cluster. > > And every object creations from different containers will try to > compete for > > this lock. > > > > - can be affected by operations even not related to http operations. For > example, > > 'vdi create' will block the cluster, it means before it unblocks the > cluster, > > we can't create/delete objects|container at all. > > > > I think a lock per operation is really needed. E.g, every container has > a lock > > to achieve concurence of creating objects and won't interfere with other > > containers. > > Getting a distributed lock is an expensive operation and it can causes > a severe performance problem if we do it for each object creation. > Can we find another way? Sheepdog is not designed to allow concurrent > write access. >
It will hurt performance if the object is very small, but for big object (1GB,10GB,100GB), we only need to lock at "create object inode" moment, after that, the object-uploading operation do not need the lock. I have tested this zookeeper lock, it could lock/unlock 200 times per second, which I think is not too slow even for small objects. > For example, how about determining one gateway based on the hash value > of the requested container name, and forwarding write requests to the > appropriate gateway so that all the objects in the same container is > accessed from only one gateway? > > Thanks, > > Kazutaka > -- -- Best Regard Robin Dong
-- sheepdog mailing list [email protected] http://lists.wpkg.org/mailman/listinfo/sheepdog
