On Mon, Mar 16, 2015 at 03:39:17PM +0900, Hitoshi Mitake wrote:
> At Mon, 16 Mar 2015 14:31:44 +0800,
> Liu Yuan wrote:
> > 
> > On Mon, Mar 16, 2015 at 02:08:25PM +0800, Liu Yuan wrote:
> > > On Mon, Mar 16, 2015 at 02:36:57PM +0900, Hitoshi Mitake wrote:
> > > > At Mon, 16 Mar 2015 10:21:50 +0800,
> > > > Liu Yuan wrote:
> > > > > 
> > > > > On Thu, Mar 12, 2015 at 08:14:33PM +0900, Hitoshi Mitake wrote:
> > > > > > At Thu, 12 Mar 2015 14:41:56 +0800,
> > > > > > Liu Yuan wrote:
> > > > > > > 
> > > > > > > On Tue, Jan 13, 2015 at 10:37:40AM +0900, Hitoshi Mitake wrote:
> > > > > > > > Current sheepdog never recycles VIDs. But it will cause problems
> > > > > > > > e.g. VID space exhaustion, too much garbage inode objects.
> > > > > > > > 
> > > > > > > > Keeping deleted inode objects is required because living inodes
> > > > > > > > (snapshots or clones) can point objects of the deleted inodes. 
> > > > > > > > So if
> > > > > > > > every member of VDI family is deleted, it is safe to remove 
> > > > > > > > deleted
> > > > > > > > inode objects.
> > > > > > > > 
> > > > > > > > v2:
> > > > > > > >  - update test scripts
> > > > > > > 
> > > > > > > All the nodes of our test cluster panic out for the following 
> > > > > > > problem:
> > > > > > > 
> > > > > > > Mar 12 00:05:03  DEBUG [main] zk_handle_notify(1216) NOTIFY
> > > > > > > Mar 12 00:05:03  DEBUG [main] sd_notify_handler(960) op 
> > > > > > > NOTIFY_VDI_ADD, size: 96, from: IPv4 ip:192.168.39.177 port:7000
> > > > > > > Mar 12 00:05:03  DEBUG [main] do_add_vdi_state(362) 7c2b2b, 3, 0, 
> > > > > > > 22, 0
> > > > > > > Mar 12 00:05:03  DEBUG [main] do_add_vdi_state(362) 7c2b2c, 3, 0, 
> > > > > > > 22, 7c2b2b
> > > > > > > Mar 12 00:05:03  EMERG [main] update_vdi_family(127) PANIC: 
> > > > > > > parent VID: 7c2b2b not found
> > > > > > > Mar 12 00:05:03  EMERG [main] crash_handler(286) sheep exits 
> > > > > > > unexpectedly (Aborted), si pid 4786, uid 0, errno 0, code -6
> > > > > > > Mar 12 00:05:03  EMERG [main] sd_backtrace(833) sheep.c:288: 
> > > > > > > crash_handler
> > > > > > > Mar 12 00:05:03  EMERG [main] sd_backtrace(847) 
> > > > > > > /lib64/libpthread.so.0() [0x338200f4ff]
> > > > > > > Mar 12 00:05:03  EMERG [main] sd_backtrace(847) 
> > > > > > > /lib64/libc.so.6(gsignal+0x34) [0x3381c328a4]
> > > > > > > Mar 12 00:05:03  EMERG [main] sd_backtrace(847) 
> > > > > > > /lib64/libc.so.6(abort+0x174) [0x3381c34084]
> > > > > > > Mar 12 00:05:03  EMERG [main] sd_backtrace(833) vdi.c:127: 
> > > > > > > update_vdi_family
> > > > > > > Mar 12 00:05:03  EMERG [main] sd_backtrace(833) vdi.c:398: 
> > > > > > > add_vdi_state
> > > > > > > Mar 12 00:05:03  EMERG [main] sd_backtrace(833) ops.c:711: 
> > > > > > > cluster_notify_vdi_add
> > > > > > > Mar 12 00:05:03  EMERG [main] sd_backtrace(833) group.c:975: 
> > > > > > > sd_notify_handler
> > > > > > > 
> > > > > > > So I tracked back to this patch set. The problem of this patch 
> > > > > > > set tried to
> > > > > > > solve is very clear and come along with sheepdog since its born. 
> > > > > > > This reveals
> > > > > > > actually the defeciency of our vdi allocation algorithm, which we 
> > > > > > > need rethink
> > > > > > > a completely new algorithm to replace it and is not fixable, 
> > > > > > > unfortunately.
> > > > > > > 
> > > > > > > One simple rule, we can't recyle any vid if it is once created 
> > > > > > > because of its
> > > > > > > current hash collision handling. Our current implementation 
> > > > > > > forbigs recycling.
> > > > > > > 
> > > > > > > Instead of fixing the above panic bug, I'd suggest we revert this 
> > > > > > > patch set.
> > > > > > > For the problem this patch set mentioned, I think we need a new 
> > > > > > > algoirthm and
> > > > > > > implementation. But before that, we should stay with old one, it 
> > > > > > > is stable and
> > > > > > > reliable and should work for small size cluster.
> > > > > > > 
> > > > > > > How do you think, Hitoshi and Kazutaka?
> > > > > > 
> > > > > > How about providing switch turn on/off VID recycling? e.g. dog 
> > > > > > cluster
> > > > > > format --enable-vid-recycle. The code can easily be pushed into
> > > > > > conditional branches. I can post a patch if this way is good for 
> > > > > > you.
> > > > > > 
> > > > > 
> > > > > This temporary workaroud looks okay but not good enough to me, what I 
> > > > > am
> > > > > concerned is that vdi recycle will probably never be implemented if 
> > > > > we stick to
> > > > > current vdi allocation algorithm. Once the new vdi allocation is 
> > > > > intruduced
> > > > > someday in the future, the new algorithm would have no this kind of 
> > > > > problem at
> > > > > all. If this is the case, the above code we leave here is also 
> > > > > useless.
> > > > > 
> > > > > I think we should focus on the new vdi allocation algorithm, e.g, 
> > > > > store
> > > > > {name, vid} directly into a kv engine either implemented by sheep or 
> > > > > by with the
> > > > > help of other software like zookeeper.
> > > > > 
> > > > > I'm inclined to revert above patch set, for
> > > > > 1. it can't fix a non-fixable problem inherently
> > > > > 2. the code is probalematic and can cause a catastraphic disaster 
> > > > > (all node die)
> > > > > 3. we might not need it in the future because it is specific for 
> > > > > current vdi
> > > > >    allocation algorithm.
> > > > 
> > > > We can simply employ whole range lookup of bitmap as a VID allocation
> > > > algorithm for recycling policy. Of course it would be harmful for
> > > > snapshot and clone creation, but it can work correctly (and we have
> > > > optimization e.g. parallelizing, caching, etc). In addition, the
> > > > performance degradation can happen potentially even if we use the
> > > > existing VID allocation algorithm (e.g. hash collision, although
> > > > course it can happen rearely).
> > > 
> > > Do we really need vdi recycle if we bring very complex lines of code? 
> > > Current
> > > algorithm can *reuse* of deleted vdi IDs and inodes. So the very problem 
> > > is
> > > actually the space effeciency, so you try to reclaim the space occupied by
> > > deleted vdis.
> > > 
> > > If it is very easy to reclaim deleted inodes, I'd say great and let's go 
> > > ahead.
> > > But it apparently not. We have this patch set and then the lookup 
> > > algorithm is
> > > heavily degrated.
> > > 
> > > I'm afraid, lookup the whole range is too costy, considering the deleted 
> > > inodes
> > > space we reclaim. I think, most of users can bear the very little space 
> > > overhead
> > > for better performance. So this patch set trade the code complexity and
> > > performance for the space efficiency. Note, we can reclaim inodes only in 
> > > the
> > > case that we delete the whole snapshot chain and parent. This is actually 
> > > a rare
> > > case.
> > 
> > Delete the vdi & snapthos data objects is really good enough to me. Your 
> > patch
> > set is the one of efforts to perfect current algorithms. But the cost is too
> > high because the hottest path of vdi_lookup() is heavily degrated for 
> > gerenal
> > cases, even though later we can fix all the bugs related to the this patch 
> > set.
> > 
> > Please consider it.
> 
> Of course the increased cost of vdi_lookup() is problem. So I'm
> posting a patch for providing an option for enabling/disabling vid
> recycling. In default, the recycling will be disabled with the
> patch. So users can choose two different policies with different
> pros/cons.
> 
> The recycling VID is an actual requirement from the development team
> of NTT DATA. I need to provide it at least as an option.
 
THe requrement is for reclaim the deleted inodes? The vid exaustion problem
mentioned in your patch set, is actually not a problem, no? We can reuse deleted
inode and vid. If so, reclaim the deleted inodes, which are very little and
reuable, is so important?

Thanks,
Yuan
-- 
sheepdog mailing list
[email protected]
https://lists.wpkg.org/mailman/listinfo/sheepdog

Reply via email to