Re: Sv: Re: [Users] Infinite loop in __d_lookup ?
Hi Pavel (and others) Loop is in __d_lookup as trace show. Any ideas ? /Jakob [76893.524305] __d_lookup: Abort on 5000 loop iteration in a chain [76893.525411] [76893.525412] Call Trace: [76893.526538] [8020ae20] show_trace+0xae/0x360 [76893.527619] [8020b0e7] dump_stack+0x15/0x17 [76893.528677] [8029b343] __d_lookup+0x13a/0x187 [76893.529779] [8029105d] do_lookup+0x2c/0x193 [76893.530846] [80293122] __link_path_walk+0xb07/0x10ac [76893.532066] [8029374e] link_path_walk+0x87/0x140 [76893.533230] [80293c76] do_path_lookup+0x2d3/0x2f8 [76893.534404] [802945e2] __user_walk_fd+0x41/0x62 [76893.535559] [80282a09] sys_faccessat+0xf4/0x1b5 [76893.536705] [80282add] sys_access+0x13/0x15 [76893.537873] [80209902] system_call+0x7e/0x83 [76893.538898] DWARF2 unwinder stuck at system_call+0x7e/0x83 [76893.539964] Leftover inexact backtrace: [76893.540768] [76893.541202] __d_lookup: Abort on 5000 loop iteration in a chain On Thu, 2008-05-15 at 20:21 +0400, Pavel Emelyanov wrote: Jakob Goldbach wrote: That would be great. Thanks.There are usually a few days between it gets stuck. Ok. Happily, I've managed to invent what I need to check first before it's too late here in Moscow ;) I presume, that the infinite loop is really somewhere near the __d_lookup. Please, apply this patch in attach (I made it against 2.6.18-028stab053.5, but should fit OK all the other 028stab053 releases) and check for warnings in dmesg ;) Let's see whether this is really __d_lookup. /jakob - oprindelig besked - Emne: Re: [Users] Infinite loop in __d_lookup ? Fra:Pavel Emelyanov [EMAIL PROTECTED] Dato: 15-05-2008 12:34 Jakob Goldbach wrote: Hi, I regularly have processes that gets stock eating all cpu. SysRq-p says it is stock in __d_lookup+0x10b as seen in dmesg output below. If you can reproduce this in a reasonable time I can send you a debugging patch to find out what's going on there. Let's try with it? I run vanilla 2.6.18 with 028stab053 and the lustre filesystem. I also run lustre on non-openvz kernel without problems, hence this mail to this group. I believe I've found where the problem is, but I'm not a kernel hacker so I don't know what to do about this information. I'd appreciate any hints on what to do next to get this solved. Below is what I could find out. Thanks, Jakob gdb find that the process is in the hlist_for_each_entry_rcu loop: (gdb) list *__d_lookup+0x10b 0x12f0 is in __d_lookup (fs/dcache.c:1153). 1148struct dentry *dentry, *found; 1149 1150rcu_read_lock(); 1151 1152found = NULL; 1153hlist_for_each_entry_rcu(dentry, node, head, d_hash) { 1154struct qstr *qstr; 1155 1156if (dentry-d_name.hash != hash) 1157continue; I believe this is the relevant part (0x12f0) of the disassembled object: 12e0: 4d 8b 24 24 mov(%r12),%r12 12e4: 4d 85 e4test %r12,%r12 12e7: 74 2c je 1315 __d_lookup+0x130 12e9: 49 8b 04 24 mov(%r12),%rax 12ed: 0f 18 08prefetcht0 (%rax) 12f0: 49 8d 5c 24 d8 lea0xffd8(%r12), %rbx 12f5: 8b 45 ccmov0xffcc(%rbp), %eax 12f8: 39 43 40cmp%eax,0x40(%rbx) 12fb: 75 e3 jne12e0 __d_lookup+0xfb Dmesg after sysrq-p: [186124.494329] SysRq: Show Regs [186124.495218] --- IPI show regs --- [186124.496136] CPU 3, VCPU 0:1 [186124.496804] Modules linked in: simfs vznetdev vzethdev vzrst ip_nat vzcpt ip_conntrack nfnetlink vzdquota vzmon vzdev xt_length ipt_ttl xt_ tcpmss ipt_TCPMSS iptable_mangle xt_multiport xt_limit ipt_tos ipt_REJECT iptable_filter ip_tables x_tables 8021q osc mgc lustre lov lquota mdc ksocklnd ptlrpc obdclass lnet lvfs libcfs bonding xfs [186124.503636] Pid: 22699, comm: find Not tainted 2.6.18.8-openvz-028stab053-bnx2-1.6.7b-arpannounce1 #3 028stab053 [186124.505535] RIP: 0060:[8029b314] [8029b314] __d_lookup+0x10b/0x142 [186124.507265] RSP: 0068:810073d63bc8 EFLAGS: 0282 [186124.508296] RAX: 8101016dc298 RBX: 8101016dc270 RCX: 0013 [186124.509768] RDX: 00025ff5 RSI: 00c38320c56a5ff5 RDI: 810118b056b0 [186124.511480] RBP: 810073d63c08 R08: 8100ac9e8000 R09: 810118b056b0 [186124.512963] R10: R11: R12: 8101016dc298 [186124.514452] R13: 810073d63e38 R14: 810118b056b0 R15: 810073d63c78 [186124.515931] FS: 2ba786cb56d0(
Re: Sv: Re: [Users] Infinite loop in __d_lookup ?
. Does dump_stack kill the process ? Ah - there was a break; after the dump_stack() ___ Users mailing list Users@openvz.org https://openvz.org/mailman/listinfo/users
Re: [Users] Infinite loop in __d_lookup ?
Jakob Goldbach wrote: Hi, I regularly have processes that gets stock eating all cpu. SysRq-p says it is stock in __d_lookup+0x10b as seen in dmesg output below. If you can reproduce this in a reasonable time I can send you a debugging patch to find out what's going on there. Let's try with it? I run vanilla 2.6.18 with 028stab053 and the lustre filesystem. I also run lustre on non-openvz kernel without problems, hence this mail to this group. I believe I've found where the problem is, but I'm not a kernel hacker so I don't know what to do about this information. I'd appreciate any hints on what to do next to get this solved. Below is what I could find out. Thanks, Jakob gdb find that the process is in the hlist_for_each_entry_rcu loop: (gdb) list *__d_lookup+0x10b 0x12f0 is in __d_lookup (fs/dcache.c:1153). 1148struct dentry *dentry, *found; 1149 1150rcu_read_lock(); 1151 1152found = NULL; 1153hlist_for_each_entry_rcu(dentry, node, head, d_hash) { 1154struct qstr *qstr; 1155 1156if (dentry-d_name.hash != hash) 1157continue; I believe this is the relevant part (0x12f0) of the disassembled object: 12e0: 4d 8b 24 24 mov(%r12),%r12 12e4: 4d 85 e4test %r12,%r12 12e7: 74 2c je 1315 __d_lookup+0x130 12e9: 49 8b 04 24 mov(%r12),%rax 12ed: 0f 18 08prefetcht0 (%rax) 12f0: 49 8d 5c 24 d8 lea0xffd8(%r12), %rbx 12f5: 8b 45 ccmov0xffcc(%rbp), %eax 12f8: 39 43 40cmp%eax,0x40(%rbx) 12fb: 75 e3 jne12e0 __d_lookup+0xfb Dmesg after sysrq-p: [186124.494329] SysRq: Show Regs [186124.495218] --- IPI show regs --- [186124.496136] CPU 3, VCPU 0:1 [186124.496804] Modules linked in: simfs vznetdev vzethdev vzrst ip_nat vzcpt ip_conntrack nfnetlink vzdquota vzmon vzdev xt_length ipt_ttl xt_ tcpmss ipt_TCPMSS iptable_mangle xt_multiport xt_limit ipt_tos ipt_REJECT iptable_filter ip_tables x_tables 8021q osc mgc lustre lov lquota mdc ksocklnd ptlrpc obdclass lnet lvfs libcfs bonding xfs [186124.503636] Pid: 22699, comm: find Not tainted 2.6.18.8-openvz-028stab053-bnx2-1.6.7b-arpannounce1 #3 028stab053 [186124.505535] RIP: 0060:[8029b314] [8029b314] __d_lookup+0x10b/0x142 [186124.507265] RSP: 0068:810073d63bc8 EFLAGS: 0282 [186124.508296] RAX: 8101016dc298 RBX: 8101016dc270 RCX: 0013 [186124.509768] RDX: 00025ff5 RSI: 00c38320c56a5ff5 RDI: 810118b056b0 [186124.511480] RBP: 810073d63c08 R08: 8100ac9e8000 R09: 810118b056b0 [186124.512963] R10: R11: R12: 8101016dc298 [186124.514452] R13: 810073d63e38 R14: 810118b056b0 R15: 810073d63c78 [186124.515931] FS: 2ba786cb56d0() GS:81012a693340() knlGS: [186124.517538] CS: 0060 DS: ES: CR0: 80050033 [186124.518587] CR2: 00539938 CR3: 73f06000 CR4: 06e0 [186124.520022] [186124.520023] Call Trace: [186124.521245] [8029105d] do_lookup+0x2c/0x193 [186124.522363] [80293122] __link_path_walk+0xb07/0x10ac [186124.523642] [8029374e] link_path_walk+0x87/0x140 [186124.524818] [80293c76] do_path_lookup+0x2d3/0x2f8 [186124.526000] [802945e2] __user_walk_fd+0x41/0x62 [186124.527156] [8028cecb] vfs_lstat_fd+0x24/0x5a [186124.528278] [8028cf23] sys_newlstat+0x22/0x3c [186124.529383] [80209902] system_call+0x7e/0x83 [186124.530362] DWARF2 unwinder stuck at system_call+0x7e/0x83 [186124.531460] Leftover inexact backtrace: [186124.532563] ___ Users mailing list Users@openvz.org https://openvz.org/mailman/listinfo/users ___ Users mailing list Users@openvz.org https://openvz.org/mailman/listinfo/users
[Users] Infinite loop in __d_lookup ?
Hi, I regularly have processes that gets stock eating all cpu. SysRq-p says it is stock in __d_lookup+0x10b as seen in dmesg output below. I run vanilla 2.6.18 with 028stab053 and the lustre filesystem. I also run lustre on non-openvz kernel without problems, hence this mail to this group. I believe I've found where the problem is, but I'm not a kernel hacker so I don't know what to do about this information. I'd appreciate any hints on what to do next to get this solved. Below is what I could find out. Thanks, Jakob gdb find that the process is in the hlist_for_each_entry_rcu loop: (gdb) list *__d_lookup+0x10b 0x12f0 is in __d_lookup (fs/dcache.c:1153). 1148struct dentry *dentry, *found; 1149 1150rcu_read_lock(); 1151 1152found = NULL; 1153hlist_for_each_entry_rcu(dentry, node, head, d_hash) { 1154struct qstr *qstr; 1155 1156if (dentry-d_name.hash != hash) 1157continue; I believe this is the relevant part (0x12f0) of the disassembled object: 12e0: 4d 8b 24 24 mov(%r12),%r12 12e4: 4d 85 e4test %r12,%r12 12e7: 74 2c je 1315 __d_lookup+0x130 12e9: 49 8b 04 24 mov(%r12),%rax 12ed: 0f 18 08prefetcht0 (%rax) 12f0: 49 8d 5c 24 d8 lea0xffd8(%r12), %rbx 12f5: 8b 45 ccmov0xffcc(%rbp), %eax 12f8: 39 43 40cmp%eax,0x40(%rbx) 12fb: 75 e3 jne12e0 __d_lookup+0xfb Dmesg after sysrq-p: [186124.494329] SysRq: Show Regs [186124.495218] --- IPI show regs --- [186124.496136] CPU 3, VCPU 0:1 [186124.496804] Modules linked in: simfs vznetdev vzethdev vzrst ip_nat vzcpt ip_conntrack nfnetlink vzdquota vzmon vzdev xt_length ipt_ttl xt_ tcpmss ipt_TCPMSS iptable_mangle xt_multiport xt_limit ipt_tos ipt_REJECT iptable_filter ip_tables x_tables 8021q osc mgc lustre lov lquota mdc ksocklnd ptlrpc obdclass lnet lvfs libcfs bonding xfs [186124.503636] Pid: 22699, comm: find Not tainted 2.6.18.8-openvz-028stab053-bnx2-1.6.7b-arpannounce1 #3 028stab053 [186124.505535] RIP: 0060:[8029b314] [8029b314] __d_lookup+0x10b/0x142 [186124.507265] RSP: 0068:810073d63bc8 EFLAGS: 0282 [186124.508296] RAX: 8101016dc298 RBX: 8101016dc270 RCX: 0013 [186124.509768] RDX: 00025ff5 RSI: 00c38320c56a5ff5 RDI: 810118b056b0 [186124.511480] RBP: 810073d63c08 R08: 8100ac9e8000 R09: 810118b056b0 [186124.512963] R10: R11: R12: 8101016dc298 [186124.514452] R13: 810073d63e38 R14: 810118b056b0 R15: 810073d63c78 [186124.515931] FS: 2ba786cb56d0() GS:81012a693340() knlGS: [186124.517538] CS: 0060 DS: ES: CR0: 80050033 [186124.518587] CR2: 00539938 CR3: 73f06000 CR4: 06e0 [186124.520022] [186124.520023] Call Trace: [186124.521245] [8029105d] do_lookup+0x2c/0x193 [186124.522363] [80293122] __link_path_walk+0xb07/0x10ac [186124.523642] [8029374e] link_path_walk+0x87/0x140 [186124.524818] [80293c76] do_path_lookup+0x2d3/0x2f8 [186124.526000] [802945e2] __user_walk_fd+0x41/0x62 [186124.527156] [8028cecb] vfs_lstat_fd+0x24/0x5a [186124.528278] [8028cf23] sys_newlstat+0x22/0x3c [186124.529383] [80209902] system_call+0x7e/0x83 [186124.530362] DWARF2 unwinder stuck at system_call+0x7e/0x83 [186124.531460] Leftover inexact backtrace: [186124.532563] ___ Users mailing list Users@openvz.org https://openvz.org/mailman/listinfo/users