Hi, Xu Fang Good catch!
This issues arises, when nodes number is less than the number of the vdi in the cache on that host, because the push work for one vdi cache requires 2 threads at least. Right? I'm not familiar enough with sheepdog code. But I agree WQ_UNLIMITED is not rational, (2 * number of object cache) is a good choice, but firstly you need get the number. And I just look through the code quickly, I havn't find the record in global_cache for the vdi number in the cache, so maybe you need more work, and we should do it. We can avoid the side effect via a new option WQ_CACHE besides WQ_ORDERED, WQ_DYNAMIC, WQ_UNLIMITED. Hope other guys can give a more direct way.:) Currently, a quick and simple way to work around this issue is that deploy sheepdog on more nodes. Thanks & Regards Ivan Message: 2 Date: Fri, 4 Jul 2014 12:21:21 +0800 From: ?? <xufa...@gmail.com> To: sheepdog@lists.wpkg.org, namei.u...@gmail.com Subject: [sheepdog] BUG: dirty object cache stop pushing Message-ID: <ca+wfgebsp-jk5ulrxwp6eaj1r+n_mhzyfj-iexpo7nm8svf...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" If 5 sheepdog nodes are running with cache, and more than 10 vms running on each node. I mount a tmpfs to /cache directory, and start sheep with: sheep -l level=debug -n /home/admin/sheepdogmetadata,/disk1/sheepdogstoredata,/disk2/sheepdogstoreda ta,/disk3/sheepdogstoredata,/disk4/sheepdogstoredata,/disk5/sheepdogstoredat a,/disk7/sheepdogstoredata,/disk8/sheepdogstoredata,/disk9/sheepdogstoredata -w size=20G dir=/cache -b 0.0.0.0 -y **.**.**.** -c zookeeper:**.**.**.**:2181 There is a possibility that all object push threads are running do_background_push work, and no threads is running do_push_object work. In my test environment, this occurs: [1] 13:09:30 [SUCCESS] vmsecdomainhost1 Name Tag Total Dirty Clean win7_type4_node8.img 4.7 GB 4.7 GB 4.0 MB standard.img images 0.0 MB 0.0 MB 0.0 MB win7_type4_node1.img 4.8 GB 4.8 GB 28 MB win7_type4_node10.img 5.0 GB 4.9 GB 32 MB win7_type4_node2.img 4.7 GB 4.6 GB 68 MB win7_type4_node3.img 4.7 GB 4.7 GB 4.0 MB win7_type4_node6.img 4.8 GB 4.7 GB 40 MB win7_type4_node4.img 4.8 GB 4.7 GB 20 MB win7_type4_node7.img 4.8 GB 4.8 GB 24 MB win7_type4_node9.img 4.7 GB 4.7 GB 32 MB win7_type4_node5.img 4.2 GB 4.2 GB 8.0 MB Cache size 20 GB, used 47 GB, non-directio I found that, 7 object push threads are working with work_queue "oc_push", and their call stacks are: Thread 37 (Thread 0x7f3c2a1fc700 (LWP 116747)): #0 0x0000003916eda37d in read () from /lib64/libc.so.6 #1 0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6 #2 0x000000000042a89d in eventfd_xread () #3 0x0000000000419acb in object_cache_push () *#4 0x0000000000419b83 in do_background_push ()* #5 0x000000000042e56a in worker_routine () #6 0x0000003917207851 in start_thread () from /lib64/libpthread.so.0 #7 0x0000003916ee767d in clone () from /lib64/libc.so.6 Thread 36 (Thread 0x7f3c2abfd700 (LWP 116775)): #0 0x0000003916eda37d in read () from /lib64/libc.so.6 #1 0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6 #2 0x000000000042a89d in eventfd_xread () #3 0x0000000000419acb in object_cache_push () *#4 0x0000000000419b83 in do_background_push ()* #5 0x000000000042e56a in worker_routine () #6 0x0000003917207851 in start_thread () from /lib64/libpthread.so.0 #7 0x0000003916ee767d in clone () from /lib64/libc.so.6 Thread 35 (Thread 0x7f3b5d7fb700 (LWP 116889)): #0 0x0000003916eda37d in read () from /lib64/libc.so.6 #1 0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6 #2 0x000000000042a89d in eventfd_xread () #3 0x0000000000419acb in object_cache_push () *#4 0x0000000000419b83 in do_background_push ()* #5 0x000000000042e56a in worker_routine () #6 0x0000003917207851 in start_thread () from /lib64/libpthread.so.0 #7 0x0000003916ee767d in clone () from /lib64/libc.so.6 Thread 34 (Thread 0x7f3b4ffff700 (LWP 116891)): #0 0x0000003916eda37d in read () from /lib64/libc.so.6 #1 0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6 #2 0x000000000042a89d in eventfd_xread () #3 0x0000000000419acb in object_cache_push () *#4 0x0000000000419b83 in do_background_push ()* #5 0x000000000042e56a in worker_routine () #6 0x0000003917207851 in start_thread () from /lib64/libpthread.so.0 #7 0x0000003916ee767d in clone () from /lib64/libc.so.6 Thread 33 (Thread 0x7f3ac8dfa700 (LWP 117040)): #0 0x0000003916eda37d in read () from /lib64/libc.so.6 #1 0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6 #2 0x000000000042a89d in eventfd_xread () #3 0x0000000000419acb in object_cache_push () *#4 0x0000000000419b83 in do_background_push ()* #5 0x000000000042e56a in worker_routine () #6 0x0000003917207851 in start_thread () from /lib64/libpthread.so.0 #7 0x0000003916ee767d in clone () from /lib64/libc.so.6 Thread 32 (Thread 0x7f3ac83f9700 (LWP 117041)): #0 0x0000003916eda37d in read () from /lib64/libc.so.6 #1 0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6 #2 0x000000000042a89d in eventfd_xread () #3 0x0000000000419acb in object_cache_push () *#4 0x0000000000419b83 in do_background_push ()* #5 0x000000000042e56a in worker_routine () #6 0x0000003917207851 in start_thread () from /lib64/libpthread.so.0 #7 0x0000003916ee767d in clone () from /lib64/libc.so.6 Thread 31 (Thread 0x7f3ac65f6700 (LWP 117044)): #0 0x0000003916eda37d in read () from /lib64/libc.so.6 #1 0x0000003916ee7a1e in eventfd_read () from /lib64/libc.so.6 #2 0x000000000042a89d in eventfd_xread () #3 0x0000000000419acb in object_cache_push () *#4 0x0000000000419b83 in do_background_push ()* #5 0x000000000042e56a in worker_routine () #6 0x0000003917207851 in start_thread () from /lib64/libpthread.so.0 #7 0x0000003916ee767d in clone () from /lib64/libc.so.6 No threads are pushing objects, so no object_cache_push work finished. In gdb, we can see the information of each object cache in object_cache_push: vid = 9627038, push_count = 26, dirty_count = 150, total_count = 154 vid = 3508964, push_count = 22, dirty_count = 1456, total_count = 1464 vid = 360229, push_count = 18, dirty_count = 1437, total_count = 1444 vid = 9678955, push_count = 34, dirty_count = 1462, total_count = 1470 vid = 9008538, push_count = 17, dirty_count = 1490, total_count = 1493 vid = 2383510, push_count = 28, dirty_count = 1494, total_count = 1498 vid = 16192623, push_count = 19, dirty_count = 1447, total_count = 1451 push_count is far less than dirty_count, and no threads is doing do_push_object work, so static void do_push_object(struct work *work) if (uatomic_sub_return(&oc->push_count, 1) == 0) eventfd_xwrite(oc->push_efd, 1); will never be kicked. And in static bool wq_need_grow(struct wq_info *wi) { if (wi->nr_threads < uatomic_read(&wi->nr_queued_work) && wi->nr_threads * 2 <= wq_get_roof(wi)) { wi->tm_end_of_protection = get_msec_time() + WQ_PROTECTION_PERIOD; return true; } return false; } nr_threads is 7, wq_get_roof(wi) returns 10( 2 * five nodes). so no more threads will be created, and all threads are waiting for do_push_object finished. Hope that the above information is clearly for everyone. Let's discuss the solution now. The oc_push_wqueue is created with WQ_DYNAMIC: sys->oc_push_wqueue = create_work_queue("oc_push", WQ_DYNAMIC) So the roof of threads number will be case WQ_DYNAMIC: /* FIXME: 2 * nr_nodes threads. No rationale yet. */ nr = nr_nodes * 2; break; There are also other work queue created with WQ_DYNAMIC: wq = create_work_queue("vdi check", WQ_DYNAMIC); sys->http_wqueue = create_work_queue("http", WQ_DYNAMIC); oc_push created with WQ_UNLIMITED is not rational too. *I think that, the nr_threads working with oc_push should be (2 * number of object cache), not (2 * nr_nodes), to ensure that there will be always enougth threads doing do_push_object work.* With your advises, I wish to submit patches to solve this problem. Thanks. -- Xu Fang -- sheepdog mailing list sheepdog@lists.wpkg.org http://lists.wpkg.org/mailman/listinfo/sheepdog