Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-28 Thread Jeffrey E. Hundstad
Stephen C. Tweedie wrote:
Hi,
On Fri, 2005-01-28 at 20:15, Jeffrey E. Hundstad wrote:
 

Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr 
problem fixed?
   

 

Not sure about how much of -ac went in, but it has the xattr fix.
 

 

I've had my machine that would crash daily if not hourly stay up for 10 
days now.  This is with the linux-2.6.10-ac10 kernel. 
   

Good to know.  Are you using xattrs extensively (eg. for ACLs, SELinux
or Samba 4)?
--Stephen
 

On the machines that were having problems we really weren't using them 
for anything.  I think I may have been running into the BIO problem that 
was fixed in 2.6.10-ac10.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-28 Thread Stephen C. Tweedie
Hi,

On Fri, 2005-01-28 at 20:15, Jeffrey E. Hundstad wrote:

> >>Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr 
> >>problem fixed?

> >Not sure about how much of -ac went in, but it has the xattr fix.

> I've had my machine that would crash daily if not hourly stay up for 10 
> days now.  This is with the linux-2.6.10-ac10 kernel. 

Good to know.  Are you using xattrs extensively (eg. for ACLs, SELinux
or Samba 4)?

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-28 Thread Jeffrey E. Hundstad
Stephen C. Tweedie wrote:
Hi,
On Tue, 2005-01-25 at 15:09, Jeffrey Hundstad wrote:
 

Bad things happening to journaled filesystem machines
Oops in kjournald
   

 

I wonder if there are several problems.  Alan Cox claimed that there was 
a fix in linux-2.6.10-ac10 that might alleviate the problem.
   

I'm not sure --- there are a couple of bio/bh-related fixes in that
patch, but nothing against jbd/ext3 itself. 

 

Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr 
problem fixed?
   

Not sure about how much of -ac went in, but it has the xattr fix.
--Stephen
 

I've had my machine that would crash daily if not hourly stay up for 10 
days now.  This is with the linux-2.6.10-ac10 kernel.  I was wondering 
if anyone else is having similiar results.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-28 Thread Jeffrey E. Hundstad
Stephen C. Tweedie wrote:
Hi,
On Tue, 2005-01-25 at 15:09, Jeffrey Hundstad wrote:
 

Bad things happening to journaled filesystem machines
Oops in kjournald
   

 

I wonder if there are several problems.  Alan Cox claimed that there was 
a fix in linux-2.6.10-ac10 that might alleviate the problem.
   

I'm not sure --- there are a couple of bio/bh-related fixes in that
patch, but nothing against jbd/ext3 itself. 

 

Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr 
problem fixed?
   

Not sure about how much of -ac went in, but it has the xattr fix.
--Stephen
 

I've had my machine that would crash daily if not hourly stay up for 10 
days now.  This is with the linux-2.6.10-ac10 kernel.  I was wondering 
if anyone else is having similiar results.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-28 Thread Stephen C. Tweedie
Hi,

On Fri, 2005-01-28 at 20:15, Jeffrey E. Hundstad wrote:

 Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr 
 problem fixed?

 Not sure about how much of -ac went in, but it has the xattr fix.

 I've had my machine that would crash daily if not hourly stay up for 10 
 days now.  This is with the linux-2.6.10-ac10 kernel. 

Good to know.  Are you using xattrs extensively (eg. for ACLs, SELinux
or Samba 4)?

--Stephen

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-28 Thread Jeffrey E. Hundstad
Stephen C. Tweedie wrote:
Hi,
On Fri, 2005-01-28 at 20:15, Jeffrey E. Hundstad wrote:
 

Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr 
problem fixed?
   

 

Not sure about how much of -ac went in, but it has the xattr fix.
 

 

I've had my machine that would crash daily if not hourly stay up for 10 
days now.  This is with the linux-2.6.10-ac10 kernel. 
   

Good to know.  Are you using xattrs extensively (eg. for ACLs, SELinux
or Samba 4)?
--Stephen
 

On the machines that were having problems we really weren't using them 
for anything.  I think I may have been running into the BIO problem that 
was fixed in 2.6.10-ac10.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-25 Thread Stephen C. Tweedie
Hi,

On Tue, 2005-01-25 at 15:09, Jeffrey Hundstad wrote:

> >>  Bad things happening to journaled filesystem machines
> >>  Oops in kjournald

> I wonder if there are several problems.  Alan Cox claimed that there was 
> a fix in linux-2.6.10-ac10 that might alleviate the problem.

I'm not sure --- there are a couple of bio/bh-related fixes in that
patch, but nothing against jbd/ext3 itself. 

> Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr 
> problem fixed?

Not sure about how much of -ac went in, but it has the xattr fix.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-25 Thread Jeffrey Hundstad
Stephen C. Tweedie wrote:
Hi,
On Mon, 2005-01-17 at 21:31, Jeffrey Hundstad wrote:
 

For more of this look up subjects:
 Bad things happening to journaled filesystem machines
 Oops in kjournald
   

That seems to have been due to the xattr problems recently fixed in
Linus's tree.  The xattr race was allowing one process to delete an
unshared xattr block while another was trying to share it, and the
journaling code was getting upset when the second process then tried to
commit the now-deleted block.
 

Thanks for the update.
I wonder if there are several problems.  Alan Cox claimed that there was 
a fix in linux-2.6.10-ac10 that might alleviate the problem.

On linux-2.6.10-ac10 I've got one machine that's been up for 6 days now 
that would never last more then 1 before.  On the other hand I have one 
machine that did die after two days.

Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr 
problem fixed?  If so, I'll test there.

--
Jeffrey Hundstad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-25 Thread Stephen C. Tweedie
Hi,

On Mon, 2005-01-17 at 21:31, Jeffrey Hundstad wrote:
> For more of this look up subjects:
>   Bad things happening to journaled filesystem machines
>   Oops in kjournald

That seems to have been due to the xattr problems recently fixed in
Linus's tree.  The xattr race was allowing one process to delete an
unshared xattr block while another was trying to share it, and the
journaling code was getting upset when the second process then tried to
commit the now-deleted block.

--Stephen


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-25 Thread Stephen C. Tweedie
Hi,

On Mon, 2005-01-17 at 21:31, Jeffrey Hundstad wrote:
 For more of this look up subjects:
   Bad things happening to journaled filesystem machines
   Oops in kjournald

That seems to have been due to the xattr problems recently fixed in
Linus's tree.  The xattr race was allowing one process to delete an
unshared xattr block while another was trying to share it, and the
journaling code was getting upset when the second process then tried to
commit the now-deleted block.

--Stephen


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-25 Thread Jeffrey Hundstad
Stephen C. Tweedie wrote:
Hi,
On Mon, 2005-01-17 at 21:31, Jeffrey Hundstad wrote:
 

For more of this look up subjects:
 Bad things happening to journaled filesystem machines
 Oops in kjournald
   

That seems to have been due to the xattr problems recently fixed in
Linus's tree.  The xattr race was allowing one process to delete an
unshared xattr block while another was trying to share it, and the
journaling code was getting upset when the second process then tried to
commit the now-deleted block.
 

Thanks for the update.
I wonder if there are several problems.  Alan Cox claimed that there was 
a fix in linux-2.6.10-ac10 that might alleviate the problem.

On linux-2.6.10-ac10 I've got one machine that's been up for 6 days now 
that would never last more then 1 before.  On the other hand I have one 
machine that did die after two days.

Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr 
problem fixed?  If so, I'll test there.

--
Jeffrey Hundstad
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-25 Thread Stephen C. Tweedie
Hi,

On Tue, 2005-01-25 at 15:09, Jeffrey Hundstad wrote:

   Bad things happening to journaled filesystem machines
   Oops in kjournald

 I wonder if there are several problems.  Alan Cox claimed that there was 
 a fix in linux-2.6.10-ac10 that might alleviate the problem.

I'm not sure --- there are a couple of bio/bh-related fixes in that
patch, but nothing against jbd/ext3 itself. 

 Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr 
 problem fixed?

Not sure about how much of -ac went in, but it has the xattr fix.

--Stephen

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-20 Thread Jeffrey E. Hundstad
Jeffrey Hundstad wrote:
For more of this look up subjects:
 Bad things happening to journaled filesystem machines
 Oops in kjournald
and from author:
 Anders Saaby
I also can't keep a recent 2.6 or 2.6*-ac* kernel up more than a few 
hours on a machine under real load.   Perhaps us folks with the 
problem need to talk to the powers who be to come up with a strategy 
to make a report they can use.  My guess is we're not sending 
something that can be used.

I have found two server in my operation that seem to do quite well on 
linux-2.6.7.  So I believe the brokenness is after this point and before 
linux-2.6.8.1.

...so far I'm not seeing problems after two days with 
linux-2.6.10-ac10.  I'm still crossing my fingers and knocking on wood.

--
jeffrey hundstad
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-20 Thread Jeffrey E. Hundstad
Jeffrey Hundstad wrote:
For more of this look up subjects:
 Bad things happening to journaled filesystem machines
 Oops in kjournald
and from author:
 Anders Saaby
I also can't keep a recent 2.6 or 2.6*-ac* kernel up more than a few 
hours on a machine under real load.   Perhaps us folks with the 
problem need to talk to the powers who be to come up with a strategy 
to make a report they can use.  My guess is we're not sending 
something that can be used.

I have found two server in my operation that seem to do quite well on 
linux-2.6.7.  So I believe the brokenness is after this point and before 
linux-2.6.8.1.

...so far I'm not seeing problems after two days with 
linux-2.6.10-ac10.  I'm still crossing my fingers and knocking on wood.

--
jeffrey hundstad
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: inode with st_mode == 0

2005-01-18 Thread Jan Kasprzak
Christoph Hellwig wrote:
: I have a better patch than the one I gave you (attached below).  If you
: send me a mail with steps to reproduce your remaining problems I'll put
: this very high on my TODO list after christmas.  Btw, any chance you could
: try XFS CVS (which is at 2.6.9) + the patch below instead of plain 2.6.9,
: there have been various other fixes in the last months.
: 
Just FWIW, this patch (applied to 2.6.10) seems to fix the problem
for me. I was not able to reproduce it by running my test script for ~24 hours.

Thanks!

-Yenya

-- 
| Jan "Yenya" Kasprzak   |
| GPG: ID 1024/D3498839  Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
> Whatever the Java applications and desktop dances may lead to, Unix will <
> still be pushing the packets around for a quite a while.  --Rob Pike <
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: inode with st_mode == 0

2005-01-18 Thread Jan Kasprzak
Christoph Hellwig wrote:
: I have a better patch than the one I gave you (attached below).  If you
: send me a mail with steps to reproduce your remaining problems I'll put
: this very high on my TODO list after christmas.  Btw, any chance you could
: try XFS CVS (which is at 2.6.9) + the patch below instead of plain 2.6.9,
: there have been various other fixes in the last months.
: 
Just FWIW, this patch (applied to 2.6.10) seems to fix the problem
for me. I was not able to reproduce it by running my test script for ~24 hours.

Thanks!

-Yenya

-- 
| Jan Yenya Kasprzak  kas at {fi.muni.cz - work | yenya.net - private} |
| GPG: ID 1024/D3498839  Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/   Czech Linux Homepage: http://www.linux.cz/ |
 Whatever the Java applications and desktop dances may lead to, Unix will 
 still be pushing the packets around for a quite a while.  --Rob Pike 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-17 Thread Alan Cox
On Llu, 2005-01-17 at 21:31, Jeffrey Hundstad wrote:
> I also can't keep a recent 2.6 or 2.6*-ac* kernel up more than a few 
> hours on a machine under real load.   Perhaps us folks with the problem 
> need to talk to the powers who be to come up with a strategy to make a 
> report they can use.  My guess is we're not sending something that can 
> be used.

I need a way to reproduce it. Preferably on a hardware configuration
that is running 2.6.10-ac10 or later because of the bio and acpi fixes.
I'm not interested in any report including binary drivers and to be
honest the least complex configuration the better. I also care that the
hardware passes memtest86+ !

I also don't care about XFS although Christoph may well do.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-17 Thread Jeffrey Hundstad
For more of this look up subjects:
 Bad things happening to journaled filesystem machines
 Oops in kjournald
and from author:
 Anders Saaby
I also can't keep a recent 2.6 or 2.6*-ac* kernel up more than a few 
hours on a machine under real load.   Perhaps us folks with the problem 
need to talk to the powers who be to come up with a strategy to make a 
report they can use.  My guess is we're not sending something that can 
be used.

--
jeffrey hundstad
Jakob Oestergaard wrote:
On Sun, Jan 16, 2005 at 01:51:12PM +, Christoph Hellwig wrote:
 

On Fri, Jan 14, 2005 at 07:23:09PM +0100, Jakob Oestergaard wrote:
   

So apart from the general well known instability problems that will
occur when you actually start *using* the system, there should be no
 

What known instabilities?
   

Where should I begin?  ;)
Most of the following have already been posted to LKML - primarily by
Anders ([EMAIL PROTECTED]) - it seems that noone cares, but I'll repost a
summary that Anders sent me below:
---
Scenario 1: Mailservers:
 2.6.10 (~24-40 hours uptime):
 Running ext3 on mailqueue:

Unable to handle kernel NULL pointer dereference at virtual address 0004
printing eip:
c018a095
*pde = 
Oops: 0002 [#1]
SMP
Modules linked in: nfs e1000 iptable_nat ipt_connlimit rtc
CPU:2
EIP:0060:[]Not tainted
EFLAGS: 00010286   (2.6.8.1)
EIP is at journal_commit_transaction+0x535/0x10e5
eax: cac1e26c   ebx:    ecx: f7cec400   edx: f7cec400
esi: f65f3000   edi: cac1e26c   ebp: f65f3000   esp: f65f3dc0
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 174, threadinfo=f65f3000 task=c2308b70)
Stack: f65f3e64      f7cec400 cda565fc
  149a 0004 f65f3e48 c01132d8 0002 c202ad20 0001 f65f3e5c
  c202ad20 c202ad20 0002 0001 001e 01c1af60 f65f3e68 c0407dc0
Call Trace:
[] scheduler_tick+0x468/0x470
[] find_busiest_group+0x105/0x310
[] del_timer_sync+0x7e/0xa0
[] kjournald+0xbd/0x230
[] autoremove_wake_function+0x0/0x40
[] autoremove_wake_function+0x0/0x40
[] ret_from_fork+0x6/0x14
[] commit_timeout+0x0/0x10
[] kjournald+0x0/0x230
[] kernel_thread_helper+0x5/0x18
Code: f0 ff 43 04 8b 03 83 e0 04 74 4c 8b 8c 24 b8 01 00 00 c6 81
<2>SoftDog: Initiating system reboot

---
Scenario 2: Mailservers:
 Running XFS on mailqueue:

Filesystem "sdb1": xfs_trans_delete_ail: attempting to delete a log item that 
is not in the AIL
xfs_force_shutdown(sdb1,0x8) called from line 382 of file 
fs/xfs/xfs_trans_ail.c.  Return address = 0xc0216a56
@Linux version 2.6.9 ([EMAIL PROTECTED]) (gcc version 2.96 2731 (Red 
Hat Linux 7.3 2.96-113)) #1 SMP Tue Oct 19 16:04:55 CEST 2004


===
Resolution to the mailserver problem:
2.4.28 is perfectly stable on these machines.
---
Scenario 3: Webservers:
 2.6.10, 2.6.10-ac8 (~3-12 hours uptime):
   
   Unable to handle kernel paging request
   <2>SoftDog: Initiating system reboot.
   
   (No more...) :(
===
Resolution to the webserver problem:
2.4.28/2.4.29-rc2 are stable here
---
Scenario 4: Storageservers: 
 2.6.8.1:
   Oopses after ~5-10 hours whith SMP on. - Cannot find the actual Oopses 
anymore and 2.6.8+ havent been tested as we cannot afford anymore downtime on 
these servers.

===
Resolution to the storage server problem:
2.6.8.1 UP is stable (but oopses regularly after memory allocation
failures)

Hardware on all servers: IBM x335 and x345.
Mentioned errors seen on a total of 17 servers.
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: inode with st_mode == 0

2005-01-17 Thread Anders Saaby
Hi,

On Monday 17 January 2005 12:55, Jan-Frode Myklebust wrote:
>
> Guess we've been struggeling with much of the same problems..

Seems like it. :)

> > ---
> > Scenario 2: Mailservers:
> >   Running XFS on mailqueue:
>
> The 2.6.10-1.737_FC3 + 's/posix_lock_file/posix_lock_file_wait/' on
> fs/nfs/file.c seems stable on our mailserver running XFS on
> mail queue and spool (mbox). 4 days of uptime!

Yes - We had those errors to:

"Kernel panic - not syncing: Attempting to free lock with active block list"

- on 2.6.10 on the webservers, which was fixed with that particular patch. But 
this is a different error as our mailservers dont't act as NFS clients. All 
use local XFS.

Sad thing is that the mailservers crashes every 10-20 hours on 2.6.x, but I'm 
not able to reproduce it in a test environment, and at time of original post 
to LKML noone was able to do anything about it without a reproduceable 
testcase. :(

> > ===
> > Resolution to the storage server problem:
> >  2.6.8.1 UP is stable (but oopses regularly after memory allocation
> >  failures)
>
> My XFS-fileserver ran 2.6.9-rc3 stable since october 25. Got lots of
> "possible deadlock in kmem_alloc (mode:0xd0)" this weekend, so I
> upgraded to plain 2.6.10. Seems OK so far.
>

OK, as far as i remember, we had the same messages in the kernel log when 
running with SMP.

-- 
Med venlig hilsen - Best regards - Meilleures salutations

Anders Saaby
Systems Engineer

Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby
Phone: +45 45 880 888 - Fax: +45 45 880 777
Mail: [EMAIL PROTECTED] - http://www.cohaesio.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: inode with st_mode == 0

2005-01-17 Thread Jan-Frode Myklebust
On Mon, Jan 17, 2005 at 11:07:46AM +0100, Jakob Oestergaard wrote:
> 
> Where should I begin?  ;)

Guess we've been struggeling with much of the same problems..

> ---
> Scenario 2: Mailservers:
>   Running XFS on mailqueue:

The 2.6.10-1.737_FC3 + 's/posix_lock_file/posix_lock_file_wait/' on
fs/nfs/file.c seems stable on our mailserver running XFS on
mail queue and spool (mbox). 4 days of uptime! 

> 
> ===
> Resolution to the storage server problem:
>  2.6.8.1 UP is stable (but oopses regularly after memory allocation
>  failures)

My XFS-fileserver ran 2.6.9-rc3 stable since october 25. Got lots of
"possible deadlock in kmem_alloc (mode:0xd0)" this weekend, so I
upgraded to plain 2.6.10. Seems OK so far. 

> 
> Hardware on all servers: IBM x335 and x345.

Mail servers: Dell 2650, IBM ServeRAID 6M, EXP400.
File servers: IBM x330, qla2300, infortrend eonstor.

All running Whitebox/centos RHEL clone.


  -jf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: inode with st_mode == 0

2005-01-17 Thread Jakob Oestergaard
On Sun, Jan 16, 2005 at 01:51:12PM +, Christoph Hellwig wrote:
> On Fri, Jan 14, 2005 at 07:23:09PM +0100, Jakob Oestergaard wrote:
> > So apart from the general well known instability problems that will
> > occur when you actually start *using* the system, there should be no
> 
> What known instabilities?

Where should I begin?  ;)

Most of the following have already been posted to LKML - primarily by
Anders ([EMAIL PROTECTED]) - it seems that noone cares, but I'll repost a
summary that Anders sent me below:

---
Scenario 1: Mailservers:
  2.6.10 (~24-40 hours uptime):
  Running ext3 on mailqueue:


Unable to handle kernel NULL pointer dereference at virtual address 0004
printing eip:
c018a095
*pde = 
Oops: 0002 [#1]
SMP
Modules linked in: nfs e1000 iptable_nat ipt_connlimit rtc
CPU:2
EIP:0060:[]Not tainted
EFLAGS: 00010286   (2.6.8.1)
EIP is at journal_commit_transaction+0x535/0x10e5
eax: cac1e26c   ebx:    ecx: f7cec400   edx: f7cec400
esi: f65f3000   edi: cac1e26c   ebp: f65f3000   esp: f65f3dc0
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 174, threadinfo=f65f3000 task=c2308b70)
Stack: f65f3e64      f7cec400 cda565fc
   149a 0004 f65f3e48 c01132d8 0002 c202ad20 0001 f65f3e5c
   c202ad20 c202ad20 0002 0001 001e 01c1af60 f65f3e68 c0407dc0
Call Trace:
 [] scheduler_tick+0x468/0x470
 [] find_busiest_group+0x105/0x310
 [] del_timer_sync+0x7e/0xa0
 [] kjournald+0xbd/0x230
 [] autoremove_wake_function+0x0/0x40
 [] autoremove_wake_function+0x0/0x40
 [] ret_from_fork+0x6/0x14
 [] commit_timeout+0x0/0x10
 [] kjournald+0x0/0x230
 [] kernel_thread_helper+0x5/0x18
Code: f0 ff 43 04 8b 03 83 e0 04 74 4c 8b 8c 24 b8 01 00 00 c6 81
 <2>SoftDog: Initiating system reboot


---
Scenario 2: Mailservers:
  Running XFS on mailqueue:


Filesystem "sdb1": xfs_trans_delete_ail: attempting to delete a log item that 
is not in the AIL
xfs_force_shutdown(sdb1,0x8) called from line 382 of file 
fs/xfs/xfs_trans_ail.c.  Return address = 0xc0216a56
@Linux version 2.6.9 ([EMAIL PROTECTED]) (gcc version 2.96 2731 (Red 
Hat Linux 7.3 2.96-113)) #1 SMP Tue Oct 19 16:04:55 CEST 2004



===
Resolution to the mailserver problem:
 2.4.28 is perfectly stable on these machines.

---
Scenario 3: Webservers:

  2.6.10, 2.6.10-ac8 (~3-12 hours uptime):


Unable to handle kernel paging request
<2>SoftDog: Initiating system reboot.

(No more...) :(

===
Resolution to the webserver problem:
 2.4.28/2.4.29-rc2 are stable here

---
Scenario 4: Storageservers: 
  2.6.8.1:
Oopses after ~5-10 hours whith SMP on. - Cannot find the actual Oopses 
anymore and 2.6.8+ havent been tested as we cannot afford anymore downtime on 
these servers.


===
Resolution to the storage server problem:
 2.6.8.1 UP is stable (but oopses regularly after memory allocation
 failures)



Hardware on all servers: IBM x335 and x345.

Mentioned errors seen on a total of 17 servers.

-- 

 / jakob

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: inode with st_mode == 0

2005-01-17 Thread Jakob Oestergaard
On Sun, Jan 16, 2005 at 01:51:12PM +, Christoph Hellwig wrote:
 On Fri, Jan 14, 2005 at 07:23:09PM +0100, Jakob Oestergaard wrote:
  So apart from the general well known instability problems that will
  occur when you actually start *using* the system, there should be no
 
 What known instabilities?

Where should I begin?  ;)

Most of the following have already been posted to LKML - primarily by
Anders ([EMAIL PROTECTED]) - it seems that noone cares, but I'll repost a
summary that Anders sent me below:

---
Scenario 1: Mailservers:
  2.6.10 (~24-40 hours uptime):
  Running ext3 on mailqueue:

SNIP
Unable to handle kernel NULL pointer dereference at virtual address 0004
printing eip:
c018a095
*pde = 
Oops: 0002 [#1]
SMP
Modules linked in: nfs e1000 iptable_nat ipt_connlimit rtc
CPU:2
EIP:0060:[c018a095]Not tainted
EFLAGS: 00010286   (2.6.8.1)
EIP is at journal_commit_transaction+0x535/0x10e5
eax: cac1e26c   ebx:    ecx: f7cec400   edx: f7cec400
esi: f65f3000   edi: cac1e26c   ebp: f65f3000   esp: f65f3dc0
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 174, threadinfo=f65f3000 task=c2308b70)
Stack: f65f3e64      f7cec400 cda565fc
   149a 0004 f65f3e48 c01132d8 0002 c202ad20 0001 f65f3e5c
   c202ad20 c202ad20 0002 0001 001e 01c1af60 f65f3e68 c0407dc0
Call Trace:
 [c01132d8] scheduler_tick+0x468/0x470
 [c01127b5] find_busiest_group+0x105/0x310
 [c011db8e] del_timer_sync+0x7e/0xa0
 [c018cd4d] kjournald+0xbd/0x230
 [c0114b10] autoremove_wake_function+0x0/0x40
 [c0114b10] autoremove_wake_function+0x0/0x40
 [c0103f16] ret_from_fork+0x6/0x14
 [c018cc70] commit_timeout+0x0/0x10
 [c018cc90] kjournald+0x0/0x230
 [c01024bd] kernel_thread_helper+0x5/0x18
Code: f0 ff 43 04 8b 03 83 e0 04 74 4c 8b 8c 24 b8 01 00 00 c6 81
 2SoftDog: Initiating system reboot
/SNIP

---
Scenario 2: Mailservers:
  Running XFS on mailqueue:

SNIP
Filesystem sdb1: xfs_trans_delete_ail: attempting to delete a log item that 
is not in the AIL
xfs_force_shutdown(sdb1,0x8) called from line 382 of file 
fs/xfs/xfs_trans_ail.c.  Return address = 0xc0216a56
@Linux version 2.6.9 ([EMAIL PROTECTED]) (gcc version 2.96 2731 (Red 
Hat Linux 7.3 2.96-113)) #1 SMP Tue Oct 19 16:04:55 CEST 2004
/SNIP


===
Resolution to the mailserver problem:
 2.4.28 is perfectly stable on these machines.

---
Scenario 3: Webservers:

  2.6.10, 2.6.10-ac8 (~3-12 hours uptime):

SNIP
Unable to handle kernel paging request
2SoftDog: Initiating system reboot.
SNIP
(No more...) :(

===
Resolution to the webserver problem:
 2.4.28/2.4.29-rc2 are stable here

---
Scenario 4: Storageservers: 
  2.6.8.1:
Oopses after ~5-10 hours whith SMP on. - Cannot find the actual Oopses 
anymore and 2.6.8+ havent been tested as we cannot afford anymore downtime on 
these servers.


===
Resolution to the storage server problem:
 2.6.8.1 UP is stable (but oopses regularly after memory allocation
 failures)



Hardware on all servers: IBM x335 and x345.

Mentioned errors seen on a total of 17 servers.

-- 

 / jakob

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: inode with st_mode == 0

2005-01-17 Thread Jan-Frode Myklebust
On Mon, Jan 17, 2005 at 11:07:46AM +0100, Jakob Oestergaard wrote:
 
 Where should I begin?  ;)

Guess we've been struggeling with much of the same problems..

 ---
 Scenario 2: Mailservers:
   Running XFS on mailqueue:

The 2.6.10-1.737_FC3 + 's/posix_lock_file/posix_lock_file_wait/' on
fs/nfs/file.c seems stable on our mailserver running XFS on
mail queue and spool (mbox). 4 days of uptime! 

 
 ===
 Resolution to the storage server problem:
  2.6.8.1 UP is stable (but oopses regularly after memory allocation
  failures)

My XFS-fileserver ran 2.6.9-rc3 stable since october 25. Got lots of
possible deadlock in kmem_alloc (mode:0xd0) this weekend, so I
upgraded to plain 2.6.10. Seems OK so far. 

 
 Hardware on all servers: IBM x335 and x345.

Mail servers: Dell 2650, IBM ServeRAID 6M, EXP400.
File servers: IBM x330, qla2300, infortrend eonstor.

All running Whitebox/centos RHEL clone.


  -jf
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: inode with st_mode == 0

2005-01-17 Thread Anders Saaby
Hi,

On Monday 17 January 2005 12:55, Jan-Frode Myklebust wrote:

 Guess we've been struggeling with much of the same problems..

Seems like it. :)

  ---
  Scenario 2: Mailservers:
Running XFS on mailqueue:

 The 2.6.10-1.737_FC3 + 's/posix_lock_file/posix_lock_file_wait/' on
 fs/nfs/file.c seems stable on our mailserver running XFS on
 mail queue and spool (mbox). 4 days of uptime!

Yes - We had those errors to:

Kernel panic - not syncing: Attempting to free lock with active block list

- on 2.6.10 on the webservers, which was fixed with that particular patch. But 
this is a different error as our mailservers dont't act as NFS clients. All 
use local XFS.

Sad thing is that the mailservers crashes every 10-20 hours on 2.6.x, but I'm 
not able to reproduce it in a test environment, and at time of original post 
to LKML noone was able to do anything about it without a reproduceable 
testcase. :(

  ===
  Resolution to the storage server problem:
   2.6.8.1 UP is stable (but oopses regularly after memory allocation
   failures)

 My XFS-fileserver ran 2.6.9-rc3 stable since october 25. Got lots of
 possible deadlock in kmem_alloc (mode:0xd0) this weekend, so I
 upgraded to plain 2.6.10. Seems OK so far.


OK, as far as i remember, we had the same messages in the kernel log when 
running with SMP.

-- 
Med venlig hilsen - Best regards - Meilleures salutations

Anders Saaby
Systems Engineer

Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby
Phone: +45 45 880 888 - Fax: +45 45 880 777
Mail: [EMAIL PROTECTED] - http://www.cohaesio.com

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-17 Thread Jeffrey Hundstad
For more of this look up subjects:
 Bad things happening to journaled filesystem machines
 Oops in kjournald
and from author:
 Anders Saaby
I also can't keep a recent 2.6 or 2.6*-ac* kernel up more than a few 
hours on a machine under real load.   Perhaps us folks with the problem 
need to talk to the powers who be to come up with a strategy to make a 
report they can use.  My guess is we're not sending something that can 
be used.

--
jeffrey hundstad
Jakob Oestergaard wrote:
On Sun, Jan 16, 2005 at 01:51:12PM +, Christoph Hellwig wrote:
 

On Fri, Jan 14, 2005 at 07:23:09PM +0100, Jakob Oestergaard wrote:
   

So apart from the general well known instability problems that will
occur when you actually start *using* the system, there should be no
 

What known instabilities?
   

Where should I begin?  ;)
Most of the following have already been posted to LKML - primarily by
Anders ([EMAIL PROTECTED]) - it seems that noone cares, but I'll repost a
summary that Anders sent me below:
---
Scenario 1: Mailservers:
 2.6.10 (~24-40 hours uptime):
 Running ext3 on mailqueue:
SNIP
Unable to handle kernel NULL pointer dereference at virtual address 0004
printing eip:
c018a095
*pde = 
Oops: 0002 [#1]
SMP
Modules linked in: nfs e1000 iptable_nat ipt_connlimit rtc
CPU:2
EIP:0060:[c018a095]Not tainted
EFLAGS: 00010286   (2.6.8.1)
EIP is at journal_commit_transaction+0x535/0x10e5
eax: cac1e26c   ebx:    ecx: f7cec400   edx: f7cec400
esi: f65f3000   edi: cac1e26c   ebp: f65f3000   esp: f65f3dc0
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 174, threadinfo=f65f3000 task=c2308b70)
Stack: f65f3e64      f7cec400 cda565fc
  149a 0004 f65f3e48 c01132d8 0002 c202ad20 0001 f65f3e5c
  c202ad20 c202ad20 0002 0001 001e 01c1af60 f65f3e68 c0407dc0
Call Trace:
[c01132d8] scheduler_tick+0x468/0x470
[c01127b5] find_busiest_group+0x105/0x310
[c011db8e] del_timer_sync+0x7e/0xa0
[c018cd4d] kjournald+0xbd/0x230
[c0114b10] autoremove_wake_function+0x0/0x40
[c0114b10] autoremove_wake_function+0x0/0x40
[c0103f16] ret_from_fork+0x6/0x14
[c018cc70] commit_timeout+0x0/0x10
[c018cc90] kjournald+0x0/0x230
[c01024bd] kernel_thread_helper+0x5/0x18
Code: f0 ff 43 04 8b 03 83 e0 04 74 4c 8b 8c 24 b8 01 00 00 c6 81
2SoftDog: Initiating system reboot
/SNIP
---
Scenario 2: Mailservers:
 Running XFS on mailqueue:
SNIP
Filesystem sdb1: xfs_trans_delete_ail: attempting to delete a log item that 
is not in the AIL
xfs_force_shutdown(sdb1,0x8) called from line 382 of file 
fs/xfs/xfs_trans_ail.c.  Return address = 0xc0216a56
@Linux version 2.6.9 ([EMAIL PROTECTED]) (gcc version 2.96 2731 (Red 
Hat Linux 7.3 2.96-113)) #1 SMP Tue Oct 19 16:04:55 CEST 2004
/SNIP

===
Resolution to the mailserver problem:
2.4.28 is perfectly stable on these machines.
---
Scenario 3: Webservers:
 2.6.10, 2.6.10-ac8 (~3-12 hours uptime):
   SNIP
   Unable to handle kernel paging request
   2SoftDog: Initiating system reboot.
   SNIP
   (No more...) :(
===
Resolution to the webserver problem:
2.4.28/2.4.29-rc2 are stable here
---
Scenario 4: Storageservers: 
 2.6.8.1:
   Oopses after ~5-10 hours whith SMP on. - Cannot find the actual Oopses 
anymore and 2.6.8+ havent been tested as we cannot afford anymore downtime on 
these servers.

===
Resolution to the storage server problem:
2.6.8.1 UP is stable (but oopses regularly after memory allocation
failures)

Hardware on all servers: IBM x335 and x345.
Mentioned errors seen on a total of 17 servers.
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0

2005-01-17 Thread Alan Cox
On Llu, 2005-01-17 at 21:31, Jeffrey Hundstad wrote:
 I also can't keep a recent 2.6 or 2.6*-ac* kernel up more than a few 
 hours on a machine under real load.   Perhaps us folks with the problem 
 need to talk to the powers who be to come up with a strategy to make a 
 report they can use.  My guess is we're not sending something that can 
 be used.

I need a way to reproduce it. Preferably on a hardware configuration
that is running 2.6.10-ac10 or later because of the bio and acpi fixes.
I'm not interested in any report including binary drivers and to be
honest the least complex configuration the better. I also care that the
hardware passes memtest86+ !

I also don't care about XFS although Christoph may well do.

Alan

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: inode with st_mode == 0

2005-01-16 Thread Jakob Oestergaard
On Sat, Jan 15, 2005 at 01:09:08PM +1100, Nathan Scott wrote:
...
> > AFAIK the best you can do is to get the most recent XFS kernel from
> > SGI's CVS (this one is based on 2.6.10).
> 
> The -mm tree also has these fixes; we'll get them merged into
> mainline soon.

Okeydokey - good

> 
> > If you run that kernel, then most of the former problems will be gone;
> > *) I only have one undeletable directory on my system - so it seems that
> > this error is no longer common   ;)
> 
> You may need to run xfs_repair to clean that up..?  Or does
> the problem persist after a repair?

I'm running Debian Woody - the xfs_check/xfs_repair there didn't seem to
find anything last I tried. I have not re-checked for this last problem
though.

I figured I might need to run the CVS version of xfs tools, and, well,
me being busy and all, I thought I'd just leave the 'delete_me'
directory hanging until some time I got more time on my hands  ;)

-- 

 / jakob

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: inode with st_mode == 0

2005-01-16 Thread Christoph Hellwig
On Fri, Jan 14, 2005 at 07:23:09PM +0100, Jakob Oestergaard wrote:
> So apart from the general well known instability problems that will
> occur when you actually start *using* the system, there should be no

What known instabilities?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: inode with st_mode == 0

2005-01-16 Thread Christoph Hellwig
On Fri, Jan 14, 2005 at 07:23:09PM +0100, Jakob Oestergaard wrote:
 So apart from the general well known instability problems that will
 occur when you actually start *using* the system, there should be no

What known instabilities?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS: inode with st_mode == 0

2005-01-16 Thread Jakob Oestergaard
On Sat, Jan 15, 2005 at 01:09:08PM +1100, Nathan Scott wrote:
...
  AFAIK the best you can do is to get the most recent XFS kernel from
  SGI's CVS (this one is based on 2.6.10).
 
 The -mm tree also has these fixes; we'll get them merged into
 mainline soon.

Okeydokey - good

 
  If you run that kernel, then most of the former problems will be gone;
  *) I only have one undeletable directory on my system - so it seems that
  this error is no longer common   ;)
 
 You may need to run xfs_repair to clean that up..?  Or does
 the problem persist after a repair?

I'm running Debian Woody - the xfs_check/xfs_repair there didn't seem to
find anything last I tried. I have not re-checked for this last problem
though.

I figured I might need to run the CVS version of xfs tools, and, well,
me being busy and all, I thought I'd just leave the 'delete_me'
directory hanging until some time I got more time on my hands  ;)

-- 

 / jakob

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/