Re: [Cluster-devel] Re: [gfs2][RFC] readdir caused ls process into D (uninterruptible) state, under testing with Samba 3.0.25

2007-08-20 Thread Steven Whitehouse
Hi,

On Mon, 2007-08-20 at 17:36 +0800, rae l wrote:
> On 8/17/07, Steven Whitehouse <[EMAIL PROTECTED]> wrote:
> ...
> > > the stack trace of the 'D' state `ls`:
> > >
> > >  ===
> > > lsD F89B83F8  2200 12018  1 (NOTLB)
> > >f3eeadd4 0082 f6a425c0 f89b83f8 f3eead9c f6a425d4 f6f32d80 
> > > f573a93c
> > >0001 f89b83f3  c40a2030 c3fa9fa0 c40aaa70 c40aab7c 
> > > 0e89
> > >b2a4b036 02e4 c40a2030 f3eeae1c  c3f85e98 f8e11e09 
> > > f8e11e0e
> > > Call Trace:
> > >  [] gdlm_bast+0x0/0x93 [lock_dlm]
> > >  [] gdlm_ast+0x0/0x5 [lock_dlm]
> > >  [] holder_wait+0x0/0x8 [gfs2]
> > >  [] holder_wait+0x5/0x8 [gfs2]
> >  This function doesn't exist in recent kernels, so I
> > guess you are using an older kernel. Which version is it?
> Sorry for the late,
> The kernel I'm testing is 2.6.21.7, just because our testing cluster
> suite is from the last month when cluster-2.01 from here didn't come
> out,
> ftp://sources.redhat.com/pub/cluster/releases/
> 
> So now we were keeping testing on kernel 2.6.21.y series, just for its
> stability, I don't know how about the stability of 2.6.22.y, I haven't
> tested it yet.
> 
> So the problem I said has been fixed in later kernel after 2.6.22,
> please feel free to let me know.
> 
I suspect that it might have been, but I can't say for certain. We've
fixed a number of things which look very similar, but not exactly like
the bug you seem to have hit. In the latest Linus' kernels there is a
fix for a problem in the DLM which it would be worth trying so if you
are in a position to test something more recent, then I would suggest
that as a first course of action.

Let me know if that doesn't solve the problem,

Steve.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] Re: [gfs2][RFC] readdir caused ls process into D (uninterruptible) state, under testing with Samba 3.0.25

2007-08-20 Thread rae l
On 8/17/07, Steven Whitehouse <[EMAIL PROTECTED]> wrote:
...
> > the stack trace of the 'D' state `ls`:
> >
> >  ===
> > lsD F89B83F8  2200 12018  1 (NOTLB)
> >f3eeadd4 0082 f6a425c0 f89b83f8 f3eead9c f6a425d4 f6f32d80 
> > f573a93c
> >0001 f89b83f3  c40a2030 c3fa9fa0 c40aaa70 c40aab7c 
> > 0e89
> >b2a4b036 02e4 c40a2030 f3eeae1c  c3f85e98 f8e11e09 
> > f8e11e0e
> > Call Trace:
> >  [] gdlm_bast+0x0/0x93 [lock_dlm]
> >  [] gdlm_ast+0x0/0x5 [lock_dlm]
> >  [] holder_wait+0x0/0x8 [gfs2]
> >  [] holder_wait+0x5/0x8 [gfs2]
>  This function doesn't exist in recent kernels, so I
> guess you are using an older kernel. Which version is it?
Sorry for the late,
The kernel I'm testing is 2.6.21.7, just because our testing cluster
suite is from the last month when cluster-2.01 from here didn't come
out,
ftp://sources.redhat.com/pub/cluster/releases/

So now we were keeping testing on kernel 2.6.21.y series, just for its
stability, I don't know how about the stability of 2.6.22.y, I haven't
tested it yet.

So the problem I said has been fixed in later kernel after 2.6.22,
please feel free to let me know.

>
> >  [] __wait_on_bit+0x2c/0x51
> >  [] out_of_line_wait_on_bit+0x6f/0x77
> >  [] holder_wait+0x0/0x8 [gfs2]
> >  [] wake_bit_function+0x0/0x3c
> >  [] wake_bit_function+0x0/0x3c
> >  [] wait_on_holder+0x3c/0x40 [gfs2]
> >  [] glock_wait_internal+0x81/0x1a3 [gfs2]
> >  [] gfs2_glock_nq+0x5e/0x79 [gfs2]
> >  [] gfs2_getattr+0x72/0xb5 [gfs2]
> >  [] gfs2_getattr+0x6b/0xb5 [gfs2]
> >  [] do_path_lookup+0x17a/0x1c3
> >  [] gfs2_getattr+0x0/0xb5 [gfs2]
> >  [] vfs_getattr+0x3e/0x51
> >  [] vfs_lstat_fd+0x2b/0x3d
> >  [] do_path_lookup+0x17a/0x1c3
> >  [] mntput_no_expire+0x11/0x6e
> >  [] sys_lstat64+0xf/0x23
> >  [] sys_symlinkat+0x81/0xb5
> >  [] sysenter_past_esp+0x5d/0x81
> >  [] __ipv6_addr_type+0x88/0xb8
> >
> > the system is still running, so the mormal 'R' and 'S' state process
> > are ignored, But it turns out that it's not the readdir's fault from
> > this call trace, but gdlm_bast's problem in lock_dlm module.
> >
> Yes, it does look a bit odd. There was a bug fix (which has only very
> recently made it into Linus' kernel as of the last GFS2 pull a few days
> ago) which fixes a problem in the DLM, although this doesn't look like
> that, at least at first sight.
>
> The other thing which you can check is the glock state which you can
> find in /sys/kernel/debug/gfs2//glocks on each node. The list is
> usually quite large, so its best to just email a url where it can be
> found. That will tell you which processes own which locks and thus what
> is holding the lock which is causing the problem. Likewise there is also
> a debugfs file which contains the locks from the DLM's point of view
> too.
I'll try it. Thanks.

>
> Steve.
>
>
>

-- 
Denis Cheng
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] Re: [gfs2][RFC] readdir caused ls process into D (uninterruptible) state, under testing with Samba 3.0.25

2007-08-20 Thread rae l
On 8/17/07, Steven Whitehouse [EMAIL PROTECTED] wrote:
...
  the stack trace of the 'D' state `ls`:
 
   ===
  lsD F89B83F8  2200 12018  1 (NOTLB)
 f3eeadd4 0082 f6a425c0 f89b83f8 f3eead9c f6a425d4 f6f32d80 
  f573a93c
 0001 f89b83f3  c40a2030 c3fa9fa0 c40aaa70 c40aab7c 
  0e89
 b2a4b036 02e4 c40a2030 f3eeae1c  c3f85e98 f8e11e09 
  f8e11e0e
  Call Trace:
   [f89b83f8] gdlm_bast+0x0/0x93 [lock_dlm]
   [f89b83f3] gdlm_ast+0x0/0x5 [lock_dlm]
   [f8e11e09] holder_wait+0x0/0x8 [gfs2]
   [f8e11e0e] holder_wait+0x5/0x8 [gfs2]
  This function doesn't exist in recent kernels, so I
 guess you are using an older kernel. Which version is it?
Sorry for the late,
The kernel I'm testing is 2.6.21.7, just because our testing cluster
suite is from the last month when cluster-2.01 from here didn't come
out,
ftp://sources.redhat.com/pub/cluster/releases/

So now we were keeping testing on kernel 2.6.21.y series, just for its
stability, I don't know how about the stability of 2.6.22.y, I haven't
tested it yet.

So the problem I said has been fixed in later kernel after 2.6.22,
please feel free to let me know.


   [c0303adf] __wait_on_bit+0x2c/0x51
   [c0303b73] out_of_line_wait_on_bit+0x6f/0x77
   [f8e11e09] holder_wait+0x0/0x8 [gfs2]
   [c012dd7d] wake_bit_function+0x0/0x3c
   [c012dd7d] wake_bit_function+0x0/0x3c
   [f8e11e4d] wait_on_holder+0x3c/0x40 [gfs2]
   [f8e12a9a] glock_wait_internal+0x81/0x1a3 [gfs2]
   [f8e12d64] gfs2_glock_nq+0x5e/0x79 [gfs2]
   [f8e1fc02] gfs2_getattr+0x72/0xb5 [gfs2]
   [f8e1fbfb] gfs2_getattr+0x6b/0xb5 [gfs2]
   [c0166946] do_path_lookup+0x17a/0x1c3
   [f8e1fb90] gfs2_getattr+0x0/0xb5 [gfs2]
   [c0161f92] vfs_getattr+0x3e/0x51
   [c016201e] vfs_lstat_fd+0x2b/0x3d
   [c0166946] do_path_lookup+0x17a/0x1c3
   [c0171e40] mntput_no_expire+0x11/0x6e
   [c016260b] sys_lstat64+0xf/0x23
   [c01681a0] sys_symlinkat+0x81/0xb5
   [c01030b8] sysenter_past_esp+0x5d/0x81
   [c030] __ipv6_addr_type+0x88/0xb8
 
  the system is still running, so the mormal 'R' and 'S' state process
  are ignored, But it turns out that it's not the readdir's fault from
  this call trace, but gdlm_bast's problem in lock_dlm module.
 
 Yes, it does look a bit odd. There was a bug fix (which has only very
 recently made it into Linus' kernel as of the last GFS2 pull a few days
 ago) which fixes a problem in the DLM, although this doesn't look like
 that, at least at first sight.

 The other thing which you can check is the glock state which you can
 find in /sys/kernel/debug/gfs2/fsname/glocks on each node. The list is
 usually quite large, so its best to just email a url where it can be
 found. That will tell you which processes own which locks and thus what
 is holding the lock which is causing the problem. Likewise there is also
 a debugfs file which contains the locks from the DLM's point of view
 too.
I'll try it. Thanks.


 Steve.




-- 
Denis Cheng
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] Re: [gfs2][RFC] readdir caused ls process into D (uninterruptible) state, under testing with Samba 3.0.25

2007-08-20 Thread Steven Whitehouse
Hi,

On Mon, 2007-08-20 at 17:36 +0800, rae l wrote:
 On 8/17/07, Steven Whitehouse [EMAIL PROTECTED] wrote:
 ...
   the stack trace of the 'D' state `ls`:
  
===
   lsD F89B83F8  2200 12018  1 (NOTLB)
  f3eeadd4 0082 f6a425c0 f89b83f8 f3eead9c f6a425d4 f6f32d80 
   f573a93c
  0001 f89b83f3  c40a2030 c3fa9fa0 c40aaa70 c40aab7c 
   0e89
  b2a4b036 02e4 c40a2030 f3eeae1c  c3f85e98 f8e11e09 
   f8e11e0e
   Call Trace:
[f89b83f8] gdlm_bast+0x0/0x93 [lock_dlm]
[f89b83f3] gdlm_ast+0x0/0x5 [lock_dlm]
[f8e11e09] holder_wait+0x0/0x8 [gfs2]
[f8e11e0e] holder_wait+0x5/0x8 [gfs2]
   This function doesn't exist in recent kernels, so I
  guess you are using an older kernel. Which version is it?
 Sorry for the late,
 The kernel I'm testing is 2.6.21.7, just because our testing cluster
 suite is from the last month when cluster-2.01 from here didn't come
 out,
 ftp://sources.redhat.com/pub/cluster/releases/
 
 So now we were keeping testing on kernel 2.6.21.y series, just for its
 stability, I don't know how about the stability of 2.6.22.y, I haven't
 tested it yet.
 
 So the problem I said has been fixed in later kernel after 2.6.22,
 please feel free to let me know.
 
I suspect that it might have been, but I can't say for certain. We've
fixed a number of things which look very similar, but not exactly like
the bug you seem to have hit. In the latest Linus' kernels there is a
fix for a problem in the DLM which it would be worth trying so if you
are in a position to test something more recent, then I would suggest
that as a first course of action.

Let me know if that doesn't solve the problem,

Steve.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] Re: [gfs2][RFC] readdir caused ls process into D (uninterruptible) state, under testing with Samba 3.0.25

2007-08-17 Thread Steven Whitehouse
Hi,

On Fri, 2007-08-17 at 15:43 +0800, rae l wrote:
[some comments trimmed for brevity]
> > > then I start a simple ls command on the gfs2 mouting point:
> > > $ ls /mnt/gfs2
> > > the ls process is also changed to D state,
> > >
> > > I think it's problems about readdir implementation in gfs2, and I want
> > > to fix it, someone could give me some pointers?
> > >
> > Can you get a stack trace? echo 't' >/proc/sysrq-trigger
> > That should show where Samba is getting stuck,
> >
> > Steve.
> the stack trace of the 'D' state `ls`:
> 
>  ===
> lsD F89B83F8  2200 12018  1 (NOTLB)
>f3eeadd4 0082 f6a425c0 f89b83f8 f3eead9c f6a425d4 f6f32d80 f573a93c
>0001 f89b83f3  c40a2030 c3fa9fa0 c40aaa70 c40aab7c 0e89
>b2a4b036 02e4 c40a2030 f3eeae1c  c3f85e98 f8e11e09 f8e11e0e
> Call Trace:
>  [] gdlm_bast+0x0/0x93 [lock_dlm]
>  [] gdlm_ast+0x0/0x5 [lock_dlm]
>  [] holder_wait+0x0/0x8 [gfs2]
>  [] holder_wait+0x5/0x8 [gfs2]
 This function doesn't exist in recent kernels, so I
guess you are using an older kernel. Which version is it?

>  [] __wait_on_bit+0x2c/0x51
>  [] out_of_line_wait_on_bit+0x6f/0x77
>  [] holder_wait+0x0/0x8 [gfs2]
>  [] wake_bit_function+0x0/0x3c
>  [] wake_bit_function+0x0/0x3c
>  [] wait_on_holder+0x3c/0x40 [gfs2]
>  [] glock_wait_internal+0x81/0x1a3 [gfs2]
>  [] gfs2_glock_nq+0x5e/0x79 [gfs2]
>  [] gfs2_getattr+0x72/0xb5 [gfs2]
>  [] gfs2_getattr+0x6b/0xb5 [gfs2]
>  [] do_path_lookup+0x17a/0x1c3
>  [] gfs2_getattr+0x0/0xb5 [gfs2]
>  [] vfs_getattr+0x3e/0x51
>  [] vfs_lstat_fd+0x2b/0x3d
>  [] do_path_lookup+0x17a/0x1c3
>  [] mntput_no_expire+0x11/0x6e
>  [] sys_lstat64+0xf/0x23
>  [] sys_symlinkat+0x81/0xb5
>  [] sysenter_past_esp+0x5d/0x81
>  [] __ipv6_addr_type+0x88/0xb8
> 
> the system is still running, so the mormal 'R' and 'S' state process
> are ignored, But it turns out that it's not the readdir's fault from
> this call trace, but gdlm_bast's problem in lock_dlm module.
> 
Yes, it does look a bit odd. There was a bug fix (which has only very
recently made it into Linus' kernel as of the last GFS2 pull a few days
ago) which fixes a problem in the DLM, although this doesn't look like
that, at least at first sight.

The other thing which you can check is the glock state which you can
find in /sys/kernel/debug/gfs2//glocks on each node. The list is
usually quite large, so its best to just email a url where it can be
found. That will tell you which processes own which locks and thus what
is holding the lock which is causing the problem. Likewise there is also
a debugfs file which contains the locks from the DLM's point of view
too.

Steve.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] Re: [gfs2][RFC] readdir caused ls process into D (uninterruptible) state, under testing with Samba 3.0.25

2007-08-17 Thread rae l
On 8/16/07, Steven Whitehouse <[EMAIL PROTECTED]> wrote:
> Hi,
>
> On Thu, 2007-08-16 at 16:20 +0800, 程任全 wrote:
> > It seems that gfs2 cannot work well with Samba,
> >
> > I'm using the gfs2 and the new cluster suite(cman with openais),
> >
> > 1. the testing environment is that 1 iscsi target and 2 cluster node,
> > 2. the two nodes both used iscsi initiator connect to the target,
> > 3. they're using the same physical iscsi disk,
> > 4. run LVM2 on top of the same iscsi disk,
> > 5. on the same lv (logical volume), I created a gfs2 filesystem,
> > 6. mount the gfs2 system to a same path under 2 nodes,
> > 7. start samba to shared the gfs2 mounting pointer on the 2 nodes,
> >
> > now test with windows client, when two or above clients connects to the 
> > samba,
> > everything is still normal; but when heavy writers or readers start,
> > the samba server daemon changed to D state, that's uninterruptible in
> > the kernel,
> > I wonder that's a problem of gfs2?
> >
> Which version of gfs2 are you using? GFS2 doesn't support leases which I
> know that Samba uses, however only relatively recent kernels have been
> able to report that fact via the VFS.
>
> > then I start a simple ls command on the gfs2 mouting point:
> > $ ls /mnt/gfs2
> > the ls process is also changed to D state,
> >
> > I think it's problems about readdir implementation in gfs2, and I want
> > to fix it, someone could give me some pointers?
> >
> Can you get a stack trace? echo 't' >/proc/sysrq-trigger
> That should show where Samba is getting stuck,
>
> Steve.
the stack trace of the 'D' state `ls`:

 ===
lsD F89B83F8  2200 12018  1 (NOTLB)
   f3eeadd4 0082 f6a425c0 f89b83f8 f3eead9c f6a425d4 f6f32d80 f573a93c
   0001 f89b83f3  c40a2030 c3fa9fa0 c40aaa70 c40aab7c 0e89
   b2a4b036 02e4 c40a2030 f3eeae1c  c3f85e98 f8e11e09 f8e11e0e
Call Trace:
 [] gdlm_bast+0x0/0x93 [lock_dlm]
 [] gdlm_ast+0x0/0x5 [lock_dlm]
 [] holder_wait+0x0/0x8 [gfs2]
 [] holder_wait+0x5/0x8 [gfs2]
 [] __wait_on_bit+0x2c/0x51
 [] out_of_line_wait_on_bit+0x6f/0x77
 [] holder_wait+0x0/0x8 [gfs2]
 [] wake_bit_function+0x0/0x3c
 [] wake_bit_function+0x0/0x3c
 [] wait_on_holder+0x3c/0x40 [gfs2]
 [] glock_wait_internal+0x81/0x1a3 [gfs2]
 [] gfs2_glock_nq+0x5e/0x79 [gfs2]
 [] gfs2_getattr+0x72/0xb5 [gfs2]
 [] gfs2_getattr+0x6b/0xb5 [gfs2]
 [] do_path_lookup+0x17a/0x1c3
 [] gfs2_getattr+0x0/0xb5 [gfs2]
 [] vfs_getattr+0x3e/0x51
 [] vfs_lstat_fd+0x2b/0x3d
 [] do_path_lookup+0x17a/0x1c3
 [] mntput_no_expire+0x11/0x6e
 [] sys_lstat64+0xf/0x23
 [] sys_symlinkat+0x81/0xb5
 [] sysenter_past_esp+0x5d/0x81
 [] __ipv6_addr_type+0x88/0xb8

the system is still running, so the mormal 'R' and 'S' state process
are ignored, But it turns out that it's not the readdir's fault from
this call trace, but gdlm_bast's problem in lock_dlm module.

>
>
>


-- 
Denis Cheng
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] Re: [gfs2][RFC] readdir caused ls process into D (uninterruptible) state, under testing with Samba 3.0.25

2007-08-17 Thread rae l
On 8/16/07, Steven Whitehouse [EMAIL PROTECTED] wrote:
 Hi,

 On Thu, 2007-08-16 at 16:20 +0800, 程任全 wrote:
  It seems that gfs2 cannot work well with Samba,
 
  I'm using the gfs2 and the new cluster suite(cman with openais),
 
  1. the testing environment is that 1 iscsi target and 2 cluster node,
  2. the two nodes both used iscsi initiator connect to the target,
  3. they're using the same physical iscsi disk,
  4. run LVM2 on top of the same iscsi disk,
  5. on the same lv (logical volume), I created a gfs2 filesystem,
  6. mount the gfs2 system to a same path under 2 nodes,
  7. start samba to shared the gfs2 mounting pointer on the 2 nodes,
 
  now test with windows client, when two or above clients connects to the 
  samba,
  everything is still normal; but when heavy writers or readers start,
  the samba server daemon changed to D state, that's uninterruptible in
  the kernel,
  I wonder that's a problem of gfs2?
 
 Which version of gfs2 are you using? GFS2 doesn't support leases which I
 know that Samba uses, however only relatively recent kernels have been
 able to report that fact via the VFS.

  then I start a simple ls command on the gfs2 mouting point:
  $ ls /mnt/gfs2
  the ls process is also changed to D state,
 
  I think it's problems about readdir implementation in gfs2, and I want
  to fix it, someone could give me some pointers?
 
 Can you get a stack trace? echo 't' /proc/sysrq-trigger
 That should show where Samba is getting stuck,

 Steve.
the stack trace of the 'D' state `ls`:

 ===
lsD F89B83F8  2200 12018  1 (NOTLB)
   f3eeadd4 0082 f6a425c0 f89b83f8 f3eead9c f6a425d4 f6f32d80 f573a93c
   0001 f89b83f3  c40a2030 c3fa9fa0 c40aaa70 c40aab7c 0e89
   b2a4b036 02e4 c40a2030 f3eeae1c  c3f85e98 f8e11e09 f8e11e0e
Call Trace:
 [f89b83f8] gdlm_bast+0x0/0x93 [lock_dlm]
 [f89b83f3] gdlm_ast+0x0/0x5 [lock_dlm]
 [f8e11e09] holder_wait+0x0/0x8 [gfs2]
 [f8e11e0e] holder_wait+0x5/0x8 [gfs2]
 [c0303adf] __wait_on_bit+0x2c/0x51
 [c0303b73] out_of_line_wait_on_bit+0x6f/0x77
 [f8e11e09] holder_wait+0x0/0x8 [gfs2]
 [c012dd7d] wake_bit_function+0x0/0x3c
 [c012dd7d] wake_bit_function+0x0/0x3c
 [f8e11e4d] wait_on_holder+0x3c/0x40 [gfs2]
 [f8e12a9a] glock_wait_internal+0x81/0x1a3 [gfs2]
 [f8e12d64] gfs2_glock_nq+0x5e/0x79 [gfs2]
 [f8e1fc02] gfs2_getattr+0x72/0xb5 [gfs2]
 [f8e1fbfb] gfs2_getattr+0x6b/0xb5 [gfs2]
 [c0166946] do_path_lookup+0x17a/0x1c3
 [f8e1fb90] gfs2_getattr+0x0/0xb5 [gfs2]
 [c0161f92] vfs_getattr+0x3e/0x51
 [c016201e] vfs_lstat_fd+0x2b/0x3d
 [c0166946] do_path_lookup+0x17a/0x1c3
 [c0171e40] mntput_no_expire+0x11/0x6e
 [c016260b] sys_lstat64+0xf/0x23
 [c01681a0] sys_symlinkat+0x81/0xb5
 [c01030b8] sysenter_past_esp+0x5d/0x81
 [c030] __ipv6_addr_type+0x88/0xb8

the system is still running, so the mormal 'R' and 'S' state process
are ignored, But it turns out that it's not the readdir's fault from
this call trace, but gdlm_bast's problem in lock_dlm module.






-- 
Denis Cheng
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Cluster-devel] Re: [gfs2][RFC] readdir caused ls process into D (uninterruptible) state, under testing with Samba 3.0.25

2007-08-17 Thread Steven Whitehouse
Hi,

On Fri, 2007-08-17 at 15:43 +0800, rae l wrote:
[some comments trimmed for brevity]
   then I start a simple ls command on the gfs2 mouting point:
   $ ls /mnt/gfs2
   the ls process is also changed to D state,
  
   I think it's problems about readdir implementation in gfs2, and I want
   to fix it, someone could give me some pointers?
  
  Can you get a stack trace? echo 't' /proc/sysrq-trigger
  That should show where Samba is getting stuck,
 
  Steve.
 the stack trace of the 'D' state `ls`:
 
  ===
 lsD F89B83F8  2200 12018  1 (NOTLB)
f3eeadd4 0082 f6a425c0 f89b83f8 f3eead9c f6a425d4 f6f32d80 f573a93c
0001 f89b83f3  c40a2030 c3fa9fa0 c40aaa70 c40aab7c 0e89
b2a4b036 02e4 c40a2030 f3eeae1c  c3f85e98 f8e11e09 f8e11e0e
 Call Trace:
  [f89b83f8] gdlm_bast+0x0/0x93 [lock_dlm]
  [f89b83f3] gdlm_ast+0x0/0x5 [lock_dlm]
  [f8e11e09] holder_wait+0x0/0x8 [gfs2]
  [f8e11e0e] holder_wait+0x5/0x8 [gfs2]
 This function doesn't exist in recent kernels, so I
guess you are using an older kernel. Which version is it?

  [c0303adf] __wait_on_bit+0x2c/0x51
  [c0303b73] out_of_line_wait_on_bit+0x6f/0x77
  [f8e11e09] holder_wait+0x0/0x8 [gfs2]
  [c012dd7d] wake_bit_function+0x0/0x3c
  [c012dd7d] wake_bit_function+0x0/0x3c
  [f8e11e4d] wait_on_holder+0x3c/0x40 [gfs2]
  [f8e12a9a] glock_wait_internal+0x81/0x1a3 [gfs2]
  [f8e12d64] gfs2_glock_nq+0x5e/0x79 [gfs2]
  [f8e1fc02] gfs2_getattr+0x72/0xb5 [gfs2]
  [f8e1fbfb] gfs2_getattr+0x6b/0xb5 [gfs2]
  [c0166946] do_path_lookup+0x17a/0x1c3
  [f8e1fb90] gfs2_getattr+0x0/0xb5 [gfs2]
  [c0161f92] vfs_getattr+0x3e/0x51
  [c016201e] vfs_lstat_fd+0x2b/0x3d
  [c0166946] do_path_lookup+0x17a/0x1c3
  [c0171e40] mntput_no_expire+0x11/0x6e
  [c016260b] sys_lstat64+0xf/0x23
  [c01681a0] sys_symlinkat+0x81/0xb5
  [c01030b8] sysenter_past_esp+0x5d/0x81
  [c030] __ipv6_addr_type+0x88/0xb8
 
 the system is still running, so the mormal 'R' and 'S' state process
 are ignored, But it turns out that it's not the readdir's fault from
 this call trace, but gdlm_bast's problem in lock_dlm module.
 
Yes, it does look a bit odd. There was a bug fix (which has only very
recently made it into Linus' kernel as of the last GFS2 pull a few days
ago) which fixes a problem in the DLM, although this doesn't look like
that, at least at first sight.

The other thing which you can check is the glock state which you can
find in /sys/kernel/debug/gfs2/fsname/glocks on each node. The list is
usually quite large, so its best to just email a url where it can be
found. That will tell you which processes own which locks and thus what
is holding the lock which is causing the problem. Likewise there is also
a debugfs file which contains the locks from the DLM's point of view
too.

Steve.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/