Re: Sv: Re: [Users] Infinite loop in __d_lookup ?

2008-05-20 Thread Jakob Goldbach
Hi Pavel (and others)

Loop is in __d_lookup as trace show. Any ideas ? 

/Jakob 


[76893.524305] __d_lookup: Abort on 5000 loop iteration in a chain
[76893.525411] 
[76893.525412] Call Trace:
[76893.526538]  [8020ae20] show_trace+0xae/0x360
[76893.527619]  [8020b0e7] dump_stack+0x15/0x17
[76893.528677]  [8029b343] __d_lookup+0x13a/0x187
[76893.529779]  [8029105d] do_lookup+0x2c/0x193
[76893.530846]  [80293122] __link_path_walk+0xb07/0x10ac
[76893.532066]  [8029374e] link_path_walk+0x87/0x140
[76893.533230]  [80293c76] do_path_lookup+0x2d3/0x2f8
[76893.534404]  [802945e2] __user_walk_fd+0x41/0x62
[76893.535559]  [80282a09] sys_faccessat+0xf4/0x1b5
[76893.536705]  [80282add] sys_access+0x13/0x15
[76893.537873]  [80209902] system_call+0x7e/0x83
[76893.538898] DWARF2 unwinder stuck at system_call+0x7e/0x83
[76893.539964] Leftover inexact backtrace:
[76893.540768] 
[76893.541202] __d_lookup: Abort on 5000 loop iteration in a chain




On Thu, 2008-05-15 at 20:21 +0400, Pavel Emelyanov wrote:
 Jakob Goldbach wrote:
  That would be great. Thanks.There are usually a few days  between it gets 
  stuck.
 
 Ok. Happily, I've managed to invent what I need to check first
 before it's too late here in Moscow ;)
 
 I presume, that the infinite loop is really somewhere near the
 __d_lookup. Please, apply this patch in attach (I made it against
 2.6.18-028stab053.5, but should fit OK all the other 028stab053
 releases) and check for warnings in dmesg ;)
 
 Let's see whether this is really __d_lookup.
 
  /jakob
  - oprindelig besked -
  Emne:   Re: [Users] Infinite loop in __d_lookup ?
  Fra:Pavel Emelyanov [EMAIL PROTECTED]
  Dato:   15-05-2008 12:34
  
  Jakob Goldbach wrote:
  Hi,
 
  I regularly have processes that gets stock eating all cpu. SysRq-p says
  it is stock in __d_lookup+0x10b as seen in dmesg output below.
  
  If you can reproduce this in a reasonable time I can send you
  a debugging patch to find out what's going on there. 
  
  Let's try with it?
  
  I run vanilla 2.6.18 with 028stab053 and the lustre filesystem. I also
  run lustre on non-openvz kernel without problems, hence this mail to
  this group. 
 
  I believe I've found where the problem is, but I'm not a kernel hacker
  so I don't know what to do about this information. 
 
  I'd appreciate any hints on what to do next to get this solved.
 
  Below is what I could find out. 
 
  Thanks,
  Jakob 
 
  gdb find that the process is in the hlist_for_each_entry_rcu loop:
 
  (gdb) list *__d_lookup+0x10b
  0x12f0 is in __d_lookup (fs/dcache.c:1153).
  1148struct dentry *dentry, *found;
  1149
  1150rcu_read_lock();
  1151
  1152found = NULL;
  1153hlist_for_each_entry_rcu(dentry, node, head, d_hash) {
  1154struct qstr *qstr;
  1155
  1156if (dentry-d_name.hash != hash)
  1157continue;
 
  I believe this is the relevant part (0x12f0) of the disassembled object:
   
  12e0:   4d 8b 24 24 mov(%r12),%r12
  12e4:   4d 85 e4test   %r12,%r12
  12e7:   74 2c   je 1315 __d_lookup+0x130
  12e9:   49 8b 04 24 mov(%r12),%rax
  12ed:   0f 18 08prefetcht0 (%rax)
  12f0:   49 8d 5c 24 d8  lea0xffd8(%r12),
  %rbx
  12f5:   8b 45 ccmov0xffcc(%rbp),
  %eax
  12f8:   39 43 40cmp%eax,0x40(%rbx)
  12fb:   75 e3   jne12e0 __d_lookup+0xfb
 
 
  Dmesg after sysrq-p:
 
 
 
 
  [186124.494329] SysRq: Show Regs
  [186124.495218] --- IPI show regs ---
  [186124.496136] CPU 3, VCPU 0:1
  [186124.496804] Modules linked in: simfs vznetdev vzethdev vzrst ip_nat
  vzcpt ip_conntrack nfnetlink vzdquota vzmon vzdev xt_length ipt_ttl xt_
  tcpmss ipt_TCPMSS iptable_mangle xt_multiport xt_limit ipt_tos
  ipt_REJECT iptable_filter ip_tables x_tables 8021q osc mgc lustre lov
  lquota mdc
   ksocklnd ptlrpc obdclass lnet lvfs libcfs bonding xfs
  [186124.503636] Pid: 22699, comm: find Not tainted
  2.6.18.8-openvz-028stab053-bnx2-1.6.7b-arpannounce1 #3 028stab053
  [186124.505535] RIP: 0060:[8029b314]  [8029b314]
  __d_lookup+0x10b/0x142
  [186124.507265] RSP: 0068:810073d63bc8  EFLAGS: 0282
  [186124.508296] RAX: 8101016dc298 RBX: 8101016dc270 RCX:
  0013
  [186124.509768] RDX: 00025ff5 RSI: 00c38320c56a5ff5 RDI:
  810118b056b0
  [186124.511480] RBP: 810073d63c08 R08: 8100ac9e8000 R09:
  810118b056b0
  [186124.512963] R10:  R11:  R12:
  8101016dc298
  [186124.514452] R13: 810073d63e38 R14: 810118b056b0 R15:
  810073d63c78
  [186124.515931] FS:  2ba786cb56d0(

Re: Sv: Re: [Users] Infinite loop in __d_lookup ?

2008-05-20 Thread Jakob Goldbach

 . Does dump_stack kill the
 process ?
 

Ah - there was a break; after the dump_stack()


___
Users mailing list
Users@openvz.org
https://openvz.org/mailman/listinfo/users


Re: [Users] Infinite loop in __d_lookup ?

2008-05-15 Thread Pavel Emelyanov
Jakob Goldbach wrote:
 Hi,
 
 I regularly have processes that gets stock eating all cpu. SysRq-p says
 it is stock in __d_lookup+0x10b as seen in dmesg output below.

If you can reproduce this in a reasonable time I can send you
a debugging patch to find out what's going on there. 

Let's try with it?

 I run vanilla 2.6.18 with 028stab053 and the lustre filesystem. I also
 run lustre on non-openvz kernel without problems, hence this mail to
 this group. 
 
 I believe I've found where the problem is, but I'm not a kernel hacker
 so I don't know what to do about this information. 
 
 I'd appreciate any hints on what to do next to get this solved.
 
 Below is what I could find out. 
 
 Thanks,
 Jakob 
 
 gdb find that the process is in the hlist_for_each_entry_rcu loop:
 
 (gdb) list *__d_lookup+0x10b
 0x12f0 is in __d_lookup (fs/dcache.c:1153).
 1148struct dentry *dentry, *found;
 1149
 1150rcu_read_lock();
 1151
 1152found = NULL;
 1153hlist_for_each_entry_rcu(dentry, node, head, d_hash) {
 1154struct qstr *qstr;
 1155
 1156if (dentry-d_name.hash != hash)
 1157continue;
 
 I believe this is the relevant part (0x12f0) of the disassembled object:
  
 12e0:   4d 8b 24 24 mov(%r12),%r12
 12e4:   4d 85 e4test   %r12,%r12
 12e7:   74 2c   je 1315 __d_lookup+0x130
 12e9:   49 8b 04 24 mov(%r12),%rax
 12ed:   0f 18 08prefetcht0 (%rax)
 12f0:   49 8d 5c 24 d8  lea0xffd8(%r12),
 %rbx
 12f5:   8b 45 ccmov0xffcc(%rbp),
 %eax
 12f8:   39 43 40cmp%eax,0x40(%rbx)
 12fb:   75 e3   jne12e0 __d_lookup+0xfb
 
 
 Dmesg after sysrq-p:
 
 
 
 
 [186124.494329] SysRq: Show Regs
 [186124.495218] --- IPI show regs ---
 [186124.496136] CPU 3, VCPU 0:1
 [186124.496804] Modules linked in: simfs vznetdev vzethdev vzrst ip_nat
 vzcpt ip_conntrack nfnetlink vzdquota vzmon vzdev xt_length ipt_ttl xt_
 tcpmss ipt_TCPMSS iptable_mangle xt_multiport xt_limit ipt_tos
 ipt_REJECT iptable_filter ip_tables x_tables 8021q osc mgc lustre lov
 lquota mdc
  ksocklnd ptlrpc obdclass lnet lvfs libcfs bonding xfs
 [186124.503636] Pid: 22699, comm: find Not tainted
 2.6.18.8-openvz-028stab053-bnx2-1.6.7b-arpannounce1 #3 028stab053
 [186124.505535] RIP: 0060:[8029b314]  [8029b314]
 __d_lookup+0x10b/0x142
 [186124.507265] RSP: 0068:810073d63bc8  EFLAGS: 0282
 [186124.508296] RAX: 8101016dc298 RBX: 8101016dc270 RCX:
 0013
 [186124.509768] RDX: 00025ff5 RSI: 00c38320c56a5ff5 RDI:
 810118b056b0
 [186124.511480] RBP: 810073d63c08 R08: 8100ac9e8000 R09:
 810118b056b0
 [186124.512963] R10:  R11:  R12:
 8101016dc298
 [186124.514452] R13: 810073d63e38 R14: 810118b056b0 R15:
 810073d63c78
 [186124.515931] FS:  2ba786cb56d0() GS:81012a693340()
 knlGS:
 [186124.517538] CS:  0060 DS:  ES:  CR0: 80050033
 [186124.518587] CR2: 00539938 CR3: 73f06000 CR4:
 06e0
 [186124.520022] 
 [186124.520023] Call Trace:
 [186124.521245]  [8029105d] do_lookup+0x2c/0x193
 [186124.522363]  [80293122] __link_path_walk+0xb07/0x10ac
 [186124.523642]  [8029374e] link_path_walk+0x87/0x140
 [186124.524818]  [80293c76] do_path_lookup+0x2d3/0x2f8
 [186124.526000]  [802945e2] __user_walk_fd+0x41/0x62
 [186124.527156]  [8028cecb] vfs_lstat_fd+0x24/0x5a
 [186124.528278]  [8028cf23] sys_newlstat+0x22/0x3c
 [186124.529383]  [80209902] system_call+0x7e/0x83
 [186124.530362] DWARF2 unwinder stuck at system_call+0x7e/0x83
 [186124.531460] Leftover inexact backtrace:
 [186124.532563] 
 
 
 ___
 Users mailing list
 Users@openvz.org
 https://openvz.org/mailman/listinfo/users
 

___
Users mailing list
Users@openvz.org
https://openvz.org/mailman/listinfo/users


[Users] Infinite loop in __d_lookup ?

2008-05-12 Thread Jakob Goldbach
Hi,

I regularly have processes that gets stock eating all cpu. SysRq-p says
it is stock in __d_lookup+0x10b as seen in dmesg output below.

I run vanilla 2.6.18 with 028stab053 and the lustre filesystem. I also
run lustre on non-openvz kernel without problems, hence this mail to
this group. 

I believe I've found where the problem is, but I'm not a kernel hacker
so I don't know what to do about this information. 

I'd appreciate any hints on what to do next to get this solved.

Below is what I could find out. 

Thanks,
Jakob 

gdb find that the process is in the hlist_for_each_entry_rcu loop:

(gdb) list *__d_lookup+0x10b
0x12f0 is in __d_lookup (fs/dcache.c:1153).
1148struct dentry *dentry, *found;
1149
1150rcu_read_lock();
1151
1152found = NULL;
1153hlist_for_each_entry_rcu(dentry, node, head, d_hash) {
1154struct qstr *qstr;
1155
1156if (dentry-d_name.hash != hash)
1157continue;

I believe this is the relevant part (0x12f0) of the disassembled object:
 
12e0:   4d 8b 24 24 mov(%r12),%r12
12e4:   4d 85 e4test   %r12,%r12
12e7:   74 2c   je 1315 __d_lookup+0x130
12e9:   49 8b 04 24 mov(%r12),%rax
12ed:   0f 18 08prefetcht0 (%rax)
12f0:   49 8d 5c 24 d8  lea0xffd8(%r12),
%rbx
12f5:   8b 45 ccmov0xffcc(%rbp),
%eax
12f8:   39 43 40cmp%eax,0x40(%rbx)
12fb:   75 e3   jne12e0 __d_lookup+0xfb


Dmesg after sysrq-p:




[186124.494329] SysRq: Show Regs
[186124.495218] --- IPI show regs ---
[186124.496136] CPU 3, VCPU 0:1
[186124.496804] Modules linked in: simfs vznetdev vzethdev vzrst ip_nat
vzcpt ip_conntrack nfnetlink vzdquota vzmon vzdev xt_length ipt_ttl xt_
tcpmss ipt_TCPMSS iptable_mangle xt_multiport xt_limit ipt_tos
ipt_REJECT iptable_filter ip_tables x_tables 8021q osc mgc lustre lov
lquota mdc
 ksocklnd ptlrpc obdclass lnet lvfs libcfs bonding xfs
[186124.503636] Pid: 22699, comm: find Not tainted
2.6.18.8-openvz-028stab053-bnx2-1.6.7b-arpannounce1 #3 028stab053
[186124.505535] RIP: 0060:[8029b314]  [8029b314]
__d_lookup+0x10b/0x142
[186124.507265] RSP: 0068:810073d63bc8  EFLAGS: 0282
[186124.508296] RAX: 8101016dc298 RBX: 8101016dc270 RCX:
0013
[186124.509768] RDX: 00025ff5 RSI: 00c38320c56a5ff5 RDI:
810118b056b0
[186124.511480] RBP: 810073d63c08 R08: 8100ac9e8000 R09:
810118b056b0
[186124.512963] R10:  R11:  R12:
8101016dc298
[186124.514452] R13: 810073d63e38 R14: 810118b056b0 R15:
810073d63c78
[186124.515931] FS:  2ba786cb56d0() GS:81012a693340()
knlGS:
[186124.517538] CS:  0060 DS:  ES:  CR0: 80050033
[186124.518587] CR2: 00539938 CR3: 73f06000 CR4:
06e0
[186124.520022] 
[186124.520023] Call Trace:
[186124.521245]  [8029105d] do_lookup+0x2c/0x193
[186124.522363]  [80293122] __link_path_walk+0xb07/0x10ac
[186124.523642]  [8029374e] link_path_walk+0x87/0x140
[186124.524818]  [80293c76] do_path_lookup+0x2d3/0x2f8
[186124.526000]  [802945e2] __user_walk_fd+0x41/0x62
[186124.527156]  [8028cecb] vfs_lstat_fd+0x24/0x5a
[186124.528278]  [8028cf23] sys_newlstat+0x22/0x3c
[186124.529383]  [80209902] system_call+0x7e/0x83
[186124.530362] DWARF2 unwinder stuck at system_call+0x7e/0x83
[186124.531460] Leftover inexact backtrace:
[186124.532563] 


___
Users mailing list
Users@openvz.org
https://openvz.org/mailman/listinfo/users