Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Stephen C. Tweedie wrote: Hi, On Fri, 2005-01-28 at 20:15, Jeffrey E. Hundstad wrote: Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr problem fixed? Not sure about how much of -ac went in, but it has the xattr fix. I've had my machine that would crash daily if not hourly stay up for 10 days now. This is with the linux-2.6.10-ac10 kernel. Good to know. Are you using xattrs extensively (eg. for ACLs, SELinux or Samba 4)? --Stephen On the machines that were having problems we really weren't using them for anything. I think I may have been running into the BIO problem that was fixed in 2.6.10-ac10. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Hi, On Fri, 2005-01-28 at 20:15, Jeffrey E. Hundstad wrote: > >>Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr > >>problem fixed? > >Not sure about how much of -ac went in, but it has the xattr fix. > I've had my machine that would crash daily if not hourly stay up for 10 > days now. This is with the linux-2.6.10-ac10 kernel. Good to know. Are you using xattrs extensively (eg. for ACLs, SELinux or Samba 4)? --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Stephen C. Tweedie wrote: Hi, On Tue, 2005-01-25 at 15:09, Jeffrey Hundstad wrote: Bad things happening to journaled filesystem machines Oops in kjournald I wonder if there are several problems. Alan Cox claimed that there was a fix in linux-2.6.10-ac10 that might alleviate the problem. I'm not sure --- there are a couple of bio/bh-related fixes in that patch, but nothing against jbd/ext3 itself. Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr problem fixed? Not sure about how much of -ac went in, but it has the xattr fix. --Stephen I've had my machine that would crash daily if not hourly stay up for 10 days now. This is with the linux-2.6.10-ac10 kernel. I was wondering if anyone else is having similiar results. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Stephen C. Tweedie wrote: Hi, On Tue, 2005-01-25 at 15:09, Jeffrey Hundstad wrote: Bad things happening to journaled filesystem machines Oops in kjournald I wonder if there are several problems. Alan Cox claimed that there was a fix in linux-2.6.10-ac10 that might alleviate the problem. I'm not sure --- there are a couple of bio/bh-related fixes in that patch, but nothing against jbd/ext3 itself. Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr problem fixed? Not sure about how much of -ac went in, but it has the xattr fix. --Stephen I've had my machine that would crash daily if not hourly stay up for 10 days now. This is with the linux-2.6.10-ac10 kernel. I was wondering if anyone else is having similiar results. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Hi, On Fri, 2005-01-28 at 20:15, Jeffrey E. Hundstad wrote: Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr problem fixed? Not sure about how much of -ac went in, but it has the xattr fix. I've had my machine that would crash daily if not hourly stay up for 10 days now. This is with the linux-2.6.10-ac10 kernel. Good to know. Are you using xattrs extensively (eg. for ACLs, SELinux or Samba 4)? --Stephen - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Stephen C. Tweedie wrote: Hi, On Fri, 2005-01-28 at 20:15, Jeffrey E. Hundstad wrote: Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr problem fixed? Not sure about how much of -ac went in, but it has the xattr fix. I've had my machine that would crash daily if not hourly stay up for 10 days now. This is with the linux-2.6.10-ac10 kernel. Good to know. Are you using xattrs extensively (eg. for ACLs, SELinux or Samba 4)? --Stephen On the machines that were having problems we really weren't using them for anything. I think I may have been running into the BIO problem that was fixed in 2.6.10-ac10. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Hi, On Tue, 2005-01-25 at 15:09, Jeffrey Hundstad wrote: > >> Bad things happening to journaled filesystem machines > >> Oops in kjournald > I wonder if there are several problems. Alan Cox claimed that there was > a fix in linux-2.6.10-ac10 that might alleviate the problem. I'm not sure --- there are a couple of bio/bh-related fixes in that patch, but nothing against jbd/ext3 itself. > Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr > problem fixed? Not sure about how much of -ac went in, but it has the xattr fix. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Stephen C. Tweedie wrote: Hi, On Mon, 2005-01-17 at 21:31, Jeffrey Hundstad wrote: For more of this look up subjects: Bad things happening to journaled filesystem machines Oops in kjournald That seems to have been due to the xattr problems recently fixed in Linus's tree. The xattr race was allowing one process to delete an unshared xattr block while another was trying to share it, and the journaling code was getting upset when the second process then tried to commit the now-deleted block. Thanks for the update. I wonder if there are several problems. Alan Cox claimed that there was a fix in linux-2.6.10-ac10 that might alleviate the problem. On linux-2.6.10-ac10 I've got one machine that's been up for 6 days now that would never last more then 1 before. On the other hand I have one machine that did die after two days. Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr problem fixed? If so, I'll test there. -- Jeffrey Hundstad - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Hi, On Mon, 2005-01-17 at 21:31, Jeffrey Hundstad wrote: > For more of this look up subjects: > Bad things happening to journaled filesystem machines > Oops in kjournald That seems to have been due to the xattr problems recently fixed in Linus's tree. The xattr race was allowing one process to delete an unshared xattr block while another was trying to share it, and the journaling code was getting upset when the second process then tried to commit the now-deleted block. --Stephen - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Hi, On Mon, 2005-01-17 at 21:31, Jeffrey Hundstad wrote: For more of this look up subjects: Bad things happening to journaled filesystem machines Oops in kjournald That seems to have been due to the xattr problems recently fixed in Linus's tree. The xattr race was allowing one process to delete an unshared xattr block while another was trying to share it, and the journaling code was getting upset when the second process then tried to commit the now-deleted block. --Stephen - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Stephen C. Tweedie wrote: Hi, On Mon, 2005-01-17 at 21:31, Jeffrey Hundstad wrote: For more of this look up subjects: Bad things happening to journaled filesystem machines Oops in kjournald That seems to have been due to the xattr problems recently fixed in Linus's tree. The xattr race was allowing one process to delete an unshared xattr block while another was trying to share it, and the journaling code was getting upset when the second process then tried to commit the now-deleted block. Thanks for the update. I wonder if there are several problems. Alan Cox claimed that there was a fix in linux-2.6.10-ac10 that might alleviate the problem. On linux-2.6.10-ac10 I've got one machine that's been up for 6 days now that would never last more then 1 before. On the other hand I have one machine that did die after two days. Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr problem fixed? If so, I'll test there. -- Jeffrey Hundstad - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Hi, On Tue, 2005-01-25 at 15:09, Jeffrey Hundstad wrote: Bad things happening to journaled filesystem machines Oops in kjournald I wonder if there are several problems. Alan Cox claimed that there was a fix in linux-2.6.10-ac10 that might alleviate the problem. I'm not sure --- there are a couple of bio/bh-related fixes in that patch, but nothing against jbd/ext3 itself. Does linux-2.6.11-rc2 have both the linux-2.6.10-ac10 fix and the xattr problem fixed? Not sure about how much of -ac went in, but it has the xattr fix. --Stephen - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Jeffrey Hundstad wrote: For more of this look up subjects: Bad things happening to journaled filesystem machines Oops in kjournald and from author: Anders Saaby I also can't keep a recent 2.6 or 2.6*-ac* kernel up more than a few hours on a machine under real load. Perhaps us folks with the problem need to talk to the powers who be to come up with a strategy to make a report they can use. My guess is we're not sending something that can be used. I have found two server in my operation that seem to do quite well on linux-2.6.7. So I believe the brokenness is after this point and before linux-2.6.8.1. ...so far I'm not seeing problems after two days with linux-2.6.10-ac10. I'm still crossing my fingers and knocking on wood. -- jeffrey hundstad - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
Jeffrey Hundstad wrote: For more of this look up subjects: Bad things happening to journaled filesystem machines Oops in kjournald and from author: Anders Saaby I also can't keep a recent 2.6 or 2.6*-ac* kernel up more than a few hours on a machine under real load. Perhaps us folks with the problem need to talk to the powers who be to come up with a strategy to make a report they can use. My guess is we're not sending something that can be used. I have found two server in my operation that seem to do quite well on linux-2.6.7. So I believe the brokenness is after this point and before linux-2.6.8.1. ...so far I'm not seeing problems after two days with linux-2.6.10-ac10. I'm still crossing my fingers and knocking on wood. -- jeffrey hundstad - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS: inode with st_mode == 0
Christoph Hellwig wrote: : I have a better patch than the one I gave you (attached below). If you : send me a mail with steps to reproduce your remaining problems I'll put : this very high on my TODO list after christmas. Btw, any chance you could : try XFS CVS (which is at 2.6.9) + the patch below instead of plain 2.6.9, : there have been various other fixes in the last months. : Just FWIW, this patch (applied to 2.6.10) seems to fix the problem for me. I was not able to reproduce it by running my test script for ~24 hours. Thanks! -Yenya -- | Jan "Yenya" Kasprzak | | GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E | | http://www.fi.muni.cz/~kas/ Czech Linux Homepage: http://www.linux.cz/ | > Whatever the Java applications and desktop dances may lead to, Unix will < > still be pushing the packets around for a quite a while. --Rob Pike < - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS: inode with st_mode == 0
Christoph Hellwig wrote: : I have a better patch than the one I gave you (attached below). If you : send me a mail with steps to reproduce your remaining problems I'll put : this very high on my TODO list after christmas. Btw, any chance you could : try XFS CVS (which is at 2.6.9) + the patch below instead of plain 2.6.9, : there have been various other fixes in the last months. : Just FWIW, this patch (applied to 2.6.10) seems to fix the problem for me. I was not able to reproduce it by running my test script for ~24 hours. Thanks! -Yenya -- | Jan Yenya Kasprzak kas at {fi.muni.cz - work | yenya.net - private} | | GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E | | http://www.fi.muni.cz/~kas/ Czech Linux Homepage: http://www.linux.cz/ | Whatever the Java applications and desktop dances may lead to, Unix will still be pushing the packets around for a quite a while. --Rob Pike - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
On Llu, 2005-01-17 at 21:31, Jeffrey Hundstad wrote: > I also can't keep a recent 2.6 or 2.6*-ac* kernel up more than a few > hours on a machine under real load. Perhaps us folks with the problem > need to talk to the powers who be to come up with a strategy to make a > report they can use. My guess is we're not sending something that can > be used. I need a way to reproduce it. Preferably on a hardware configuration that is running 2.6.10-ac10 or later because of the bio and acpi fixes. I'm not interested in any report including binary drivers and to be honest the least complex configuration the better. I also care that the hardware passes memtest86+ ! I also don't care about XFS although Christoph may well do. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
For more of this look up subjects: Bad things happening to journaled filesystem machines Oops in kjournald and from author: Anders Saaby I also can't keep a recent 2.6 or 2.6*-ac* kernel up more than a few hours on a machine under real load. Perhaps us folks with the problem need to talk to the powers who be to come up with a strategy to make a report they can use. My guess is we're not sending something that can be used. -- jeffrey hundstad Jakob Oestergaard wrote: On Sun, Jan 16, 2005 at 01:51:12PM +, Christoph Hellwig wrote: On Fri, Jan 14, 2005 at 07:23:09PM +0100, Jakob Oestergaard wrote: So apart from the general well known instability problems that will occur when you actually start *using* the system, there should be no What known instabilities? Where should I begin? ;) Most of the following have already been posted to LKML - primarily by Anders ([EMAIL PROTECTED]) - it seems that noone cares, but I'll repost a summary that Anders sent me below: --- Scenario 1: Mailservers: 2.6.10 (~24-40 hours uptime): Running ext3 on mailqueue: Unable to handle kernel NULL pointer dereference at virtual address 0004 printing eip: c018a095 *pde = Oops: 0002 [#1] SMP Modules linked in: nfs e1000 iptable_nat ipt_connlimit rtc CPU:2 EIP:0060:[]Not tainted EFLAGS: 00010286 (2.6.8.1) EIP is at journal_commit_transaction+0x535/0x10e5 eax: cac1e26c ebx: ecx: f7cec400 edx: f7cec400 esi: f65f3000 edi: cac1e26c ebp: f65f3000 esp: f65f3dc0 ds: 007b es: 007b ss: 0068 Process kjournald (pid: 174, threadinfo=f65f3000 task=c2308b70) Stack: f65f3e64 f7cec400 cda565fc 149a 0004 f65f3e48 c01132d8 0002 c202ad20 0001 f65f3e5c c202ad20 c202ad20 0002 0001 001e 01c1af60 f65f3e68 c0407dc0 Call Trace: [] scheduler_tick+0x468/0x470 [] find_busiest_group+0x105/0x310 [] del_timer_sync+0x7e/0xa0 [] kjournald+0xbd/0x230 [] autoremove_wake_function+0x0/0x40 [] autoremove_wake_function+0x0/0x40 [] ret_from_fork+0x6/0x14 [] commit_timeout+0x0/0x10 [] kjournald+0x0/0x230 [] kernel_thread_helper+0x5/0x18 Code: f0 ff 43 04 8b 03 83 e0 04 74 4c 8b 8c 24 b8 01 00 00 c6 81 <2>SoftDog: Initiating system reboot --- Scenario 2: Mailservers: Running XFS on mailqueue: Filesystem "sdb1": xfs_trans_delete_ail: attempting to delete a log item that is not in the AIL xfs_force_shutdown(sdb1,0x8) called from line 382 of file fs/xfs/xfs_trans_ail.c. Return address = 0xc0216a56 @Linux version 2.6.9 ([EMAIL PROTECTED]) (gcc version 2.96 2731 (Red Hat Linux 7.3 2.96-113)) #1 SMP Tue Oct 19 16:04:55 CEST 2004 === Resolution to the mailserver problem: 2.4.28 is perfectly stable on these machines. --- Scenario 3: Webservers: 2.6.10, 2.6.10-ac8 (~3-12 hours uptime): Unable to handle kernel paging request <2>SoftDog: Initiating system reboot. (No more...) :( === Resolution to the webserver problem: 2.4.28/2.4.29-rc2 are stable here --- Scenario 4: Storageservers: 2.6.8.1: Oopses after ~5-10 hours whith SMP on. - Cannot find the actual Oopses anymore and 2.6.8+ havent been tested as we cannot afford anymore downtime on these servers. === Resolution to the storage server problem: 2.6.8.1 UP is stable (but oopses regularly after memory allocation failures) Hardware on all servers: IBM x335 and x345. Mentioned errors seen on a total of 17 servers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS: inode with st_mode == 0
Hi, On Monday 17 January 2005 12:55, Jan-Frode Myklebust wrote: > > Guess we've been struggeling with much of the same problems.. Seems like it. :) > > --- > > Scenario 2: Mailservers: > > Running XFS on mailqueue: > > The 2.6.10-1.737_FC3 + 's/posix_lock_file/posix_lock_file_wait/' on > fs/nfs/file.c seems stable on our mailserver running XFS on > mail queue and spool (mbox). 4 days of uptime! Yes - We had those errors to: "Kernel panic - not syncing: Attempting to free lock with active block list" - on 2.6.10 on the webservers, which was fixed with that particular patch. But this is a different error as our mailservers dont't act as NFS clients. All use local XFS. Sad thing is that the mailservers crashes every 10-20 hours on 2.6.x, but I'm not able to reproduce it in a test environment, and at time of original post to LKML noone was able to do anything about it without a reproduceable testcase. :( > > === > > Resolution to the storage server problem: > > 2.6.8.1 UP is stable (but oopses regularly after memory allocation > > failures) > > My XFS-fileserver ran 2.6.9-rc3 stable since october 25. Got lots of > "possible deadlock in kmem_alloc (mode:0xd0)" this weekend, so I > upgraded to plain 2.6.10. Seems OK so far. > OK, as far as i remember, we had the same messages in the kernel log when running with SMP. -- Med venlig hilsen - Best regards - Meilleures salutations Anders Saaby Systems Engineer Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby Phone: +45 45 880 888 - Fax: +45 45 880 777 Mail: [EMAIL PROTECTED] - http://www.cohaesio.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS: inode with st_mode == 0
On Mon, Jan 17, 2005 at 11:07:46AM +0100, Jakob Oestergaard wrote: > > Where should I begin? ;) Guess we've been struggeling with much of the same problems.. > --- > Scenario 2: Mailservers: > Running XFS on mailqueue: The 2.6.10-1.737_FC3 + 's/posix_lock_file/posix_lock_file_wait/' on fs/nfs/file.c seems stable on our mailserver running XFS on mail queue and spool (mbox). 4 days of uptime! > > === > Resolution to the storage server problem: > 2.6.8.1 UP is stable (but oopses regularly after memory allocation > failures) My XFS-fileserver ran 2.6.9-rc3 stable since october 25. Got lots of "possible deadlock in kmem_alloc (mode:0xd0)" this weekend, so I upgraded to plain 2.6.10. Seems OK so far. > > Hardware on all servers: IBM x335 and x345. Mail servers: Dell 2650, IBM ServeRAID 6M, EXP400. File servers: IBM x330, qla2300, infortrend eonstor. All running Whitebox/centos RHEL clone. -jf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS: inode with st_mode == 0
On Sun, Jan 16, 2005 at 01:51:12PM +, Christoph Hellwig wrote: > On Fri, Jan 14, 2005 at 07:23:09PM +0100, Jakob Oestergaard wrote: > > So apart from the general well known instability problems that will > > occur when you actually start *using* the system, there should be no > > What known instabilities? Where should I begin? ;) Most of the following have already been posted to LKML - primarily by Anders ([EMAIL PROTECTED]) - it seems that noone cares, but I'll repost a summary that Anders sent me below: --- Scenario 1: Mailservers: 2.6.10 (~24-40 hours uptime): Running ext3 on mailqueue: Unable to handle kernel NULL pointer dereference at virtual address 0004 printing eip: c018a095 *pde = Oops: 0002 [#1] SMP Modules linked in: nfs e1000 iptable_nat ipt_connlimit rtc CPU:2 EIP:0060:[]Not tainted EFLAGS: 00010286 (2.6.8.1) EIP is at journal_commit_transaction+0x535/0x10e5 eax: cac1e26c ebx: ecx: f7cec400 edx: f7cec400 esi: f65f3000 edi: cac1e26c ebp: f65f3000 esp: f65f3dc0 ds: 007b es: 007b ss: 0068 Process kjournald (pid: 174, threadinfo=f65f3000 task=c2308b70) Stack: f65f3e64 f7cec400 cda565fc 149a 0004 f65f3e48 c01132d8 0002 c202ad20 0001 f65f3e5c c202ad20 c202ad20 0002 0001 001e 01c1af60 f65f3e68 c0407dc0 Call Trace: [] scheduler_tick+0x468/0x470 [] find_busiest_group+0x105/0x310 [] del_timer_sync+0x7e/0xa0 [] kjournald+0xbd/0x230 [] autoremove_wake_function+0x0/0x40 [] autoremove_wake_function+0x0/0x40 [] ret_from_fork+0x6/0x14 [] commit_timeout+0x0/0x10 [] kjournald+0x0/0x230 [] kernel_thread_helper+0x5/0x18 Code: f0 ff 43 04 8b 03 83 e0 04 74 4c 8b 8c 24 b8 01 00 00 c6 81 <2>SoftDog: Initiating system reboot --- Scenario 2: Mailservers: Running XFS on mailqueue: Filesystem "sdb1": xfs_trans_delete_ail: attempting to delete a log item that is not in the AIL xfs_force_shutdown(sdb1,0x8) called from line 382 of file fs/xfs/xfs_trans_ail.c. Return address = 0xc0216a56 @Linux version 2.6.9 ([EMAIL PROTECTED]) (gcc version 2.96 2731 (Red Hat Linux 7.3 2.96-113)) #1 SMP Tue Oct 19 16:04:55 CEST 2004 === Resolution to the mailserver problem: 2.4.28 is perfectly stable on these machines. --- Scenario 3: Webservers: 2.6.10, 2.6.10-ac8 (~3-12 hours uptime): Unable to handle kernel paging request <2>SoftDog: Initiating system reboot. (No more...) :( === Resolution to the webserver problem: 2.4.28/2.4.29-rc2 are stable here --- Scenario 4: Storageservers: 2.6.8.1: Oopses after ~5-10 hours whith SMP on. - Cannot find the actual Oopses anymore and 2.6.8+ havent been tested as we cannot afford anymore downtime on these servers. === Resolution to the storage server problem: 2.6.8.1 UP is stable (but oopses regularly after memory allocation failures) Hardware on all servers: IBM x335 and x345. Mentioned errors seen on a total of 17 servers. -- / jakob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS: inode with st_mode == 0
On Sun, Jan 16, 2005 at 01:51:12PM +, Christoph Hellwig wrote: On Fri, Jan 14, 2005 at 07:23:09PM +0100, Jakob Oestergaard wrote: So apart from the general well known instability problems that will occur when you actually start *using* the system, there should be no What known instabilities? Where should I begin? ;) Most of the following have already been posted to LKML - primarily by Anders ([EMAIL PROTECTED]) - it seems that noone cares, but I'll repost a summary that Anders sent me below: --- Scenario 1: Mailservers: 2.6.10 (~24-40 hours uptime): Running ext3 on mailqueue: SNIP Unable to handle kernel NULL pointer dereference at virtual address 0004 printing eip: c018a095 *pde = Oops: 0002 [#1] SMP Modules linked in: nfs e1000 iptable_nat ipt_connlimit rtc CPU:2 EIP:0060:[c018a095]Not tainted EFLAGS: 00010286 (2.6.8.1) EIP is at journal_commit_transaction+0x535/0x10e5 eax: cac1e26c ebx: ecx: f7cec400 edx: f7cec400 esi: f65f3000 edi: cac1e26c ebp: f65f3000 esp: f65f3dc0 ds: 007b es: 007b ss: 0068 Process kjournald (pid: 174, threadinfo=f65f3000 task=c2308b70) Stack: f65f3e64 f7cec400 cda565fc 149a 0004 f65f3e48 c01132d8 0002 c202ad20 0001 f65f3e5c c202ad20 c202ad20 0002 0001 001e 01c1af60 f65f3e68 c0407dc0 Call Trace: [c01132d8] scheduler_tick+0x468/0x470 [c01127b5] find_busiest_group+0x105/0x310 [c011db8e] del_timer_sync+0x7e/0xa0 [c018cd4d] kjournald+0xbd/0x230 [c0114b10] autoremove_wake_function+0x0/0x40 [c0114b10] autoremove_wake_function+0x0/0x40 [c0103f16] ret_from_fork+0x6/0x14 [c018cc70] commit_timeout+0x0/0x10 [c018cc90] kjournald+0x0/0x230 [c01024bd] kernel_thread_helper+0x5/0x18 Code: f0 ff 43 04 8b 03 83 e0 04 74 4c 8b 8c 24 b8 01 00 00 c6 81 2SoftDog: Initiating system reboot /SNIP --- Scenario 2: Mailservers: Running XFS on mailqueue: SNIP Filesystem sdb1: xfs_trans_delete_ail: attempting to delete a log item that is not in the AIL xfs_force_shutdown(sdb1,0x8) called from line 382 of file fs/xfs/xfs_trans_ail.c. Return address = 0xc0216a56 @Linux version 2.6.9 ([EMAIL PROTECTED]) (gcc version 2.96 2731 (Red Hat Linux 7.3 2.96-113)) #1 SMP Tue Oct 19 16:04:55 CEST 2004 /SNIP === Resolution to the mailserver problem: 2.4.28 is perfectly stable on these machines. --- Scenario 3: Webservers: 2.6.10, 2.6.10-ac8 (~3-12 hours uptime): SNIP Unable to handle kernel paging request 2SoftDog: Initiating system reboot. SNIP (No more...) :( === Resolution to the webserver problem: 2.4.28/2.4.29-rc2 are stable here --- Scenario 4: Storageservers: 2.6.8.1: Oopses after ~5-10 hours whith SMP on. - Cannot find the actual Oopses anymore and 2.6.8+ havent been tested as we cannot afford anymore downtime on these servers. === Resolution to the storage server problem: 2.6.8.1 UP is stable (but oopses regularly after memory allocation failures) Hardware on all servers: IBM x335 and x345. Mentioned errors seen on a total of 17 servers. -- / jakob - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS: inode with st_mode == 0
On Mon, Jan 17, 2005 at 11:07:46AM +0100, Jakob Oestergaard wrote: Where should I begin? ;) Guess we've been struggeling with much of the same problems.. --- Scenario 2: Mailservers: Running XFS on mailqueue: The 2.6.10-1.737_FC3 + 's/posix_lock_file/posix_lock_file_wait/' on fs/nfs/file.c seems stable on our mailserver running XFS on mail queue and spool (mbox). 4 days of uptime! === Resolution to the storage server problem: 2.6.8.1 UP is stable (but oopses regularly after memory allocation failures) My XFS-fileserver ran 2.6.9-rc3 stable since october 25. Got lots of possible deadlock in kmem_alloc (mode:0xd0) this weekend, so I upgraded to plain 2.6.10. Seems OK so far. Hardware on all servers: IBM x335 and x345. Mail servers: Dell 2650, IBM ServeRAID 6M, EXP400. File servers: IBM x330, qla2300, infortrend eonstor. All running Whitebox/centos RHEL clone. -jf - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS: inode with st_mode == 0
Hi, On Monday 17 January 2005 12:55, Jan-Frode Myklebust wrote: Guess we've been struggeling with much of the same problems.. Seems like it. :) --- Scenario 2: Mailservers: Running XFS on mailqueue: The 2.6.10-1.737_FC3 + 's/posix_lock_file/posix_lock_file_wait/' on fs/nfs/file.c seems stable on our mailserver running XFS on mail queue and spool (mbox). 4 days of uptime! Yes - We had those errors to: Kernel panic - not syncing: Attempting to free lock with active block list - on 2.6.10 on the webservers, which was fixed with that particular patch. But this is a different error as our mailservers dont't act as NFS clients. All use local XFS. Sad thing is that the mailservers crashes every 10-20 hours on 2.6.x, but I'm not able to reproduce it in a test environment, and at time of original post to LKML noone was able to do anything about it without a reproduceable testcase. :( === Resolution to the storage server problem: 2.6.8.1 UP is stable (but oopses regularly after memory allocation failures) My XFS-fileserver ran 2.6.9-rc3 stable since october 25. Got lots of possible deadlock in kmem_alloc (mode:0xd0) this weekend, so I upgraded to plain 2.6.10. Seems OK so far. OK, as far as i remember, we had the same messages in the kernel log when running with SMP. -- Med venlig hilsen - Best regards - Meilleures salutations Anders Saaby Systems Engineer Cohaesio A/S - Maglebjergvej 5D - DK-2800 Lyngby Phone: +45 45 880 888 - Fax: +45 45 880 777 Mail: [EMAIL PROTECTED] - http://www.cohaesio.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
For more of this look up subjects: Bad things happening to journaled filesystem machines Oops in kjournald and from author: Anders Saaby I also can't keep a recent 2.6 or 2.6*-ac* kernel up more than a few hours on a machine under real load. Perhaps us folks with the problem need to talk to the powers who be to come up with a strategy to make a report they can use. My guess is we're not sending something that can be used. -- jeffrey hundstad Jakob Oestergaard wrote: On Sun, Jan 16, 2005 at 01:51:12PM +, Christoph Hellwig wrote: On Fri, Jan 14, 2005 at 07:23:09PM +0100, Jakob Oestergaard wrote: So apart from the general well known instability problems that will occur when you actually start *using* the system, there should be no What known instabilities? Where should I begin? ;) Most of the following have already been posted to LKML - primarily by Anders ([EMAIL PROTECTED]) - it seems that noone cares, but I'll repost a summary that Anders sent me below: --- Scenario 1: Mailservers: 2.6.10 (~24-40 hours uptime): Running ext3 on mailqueue: SNIP Unable to handle kernel NULL pointer dereference at virtual address 0004 printing eip: c018a095 *pde = Oops: 0002 [#1] SMP Modules linked in: nfs e1000 iptable_nat ipt_connlimit rtc CPU:2 EIP:0060:[c018a095]Not tainted EFLAGS: 00010286 (2.6.8.1) EIP is at journal_commit_transaction+0x535/0x10e5 eax: cac1e26c ebx: ecx: f7cec400 edx: f7cec400 esi: f65f3000 edi: cac1e26c ebp: f65f3000 esp: f65f3dc0 ds: 007b es: 007b ss: 0068 Process kjournald (pid: 174, threadinfo=f65f3000 task=c2308b70) Stack: f65f3e64 f7cec400 cda565fc 149a 0004 f65f3e48 c01132d8 0002 c202ad20 0001 f65f3e5c c202ad20 c202ad20 0002 0001 001e 01c1af60 f65f3e68 c0407dc0 Call Trace: [c01132d8] scheduler_tick+0x468/0x470 [c01127b5] find_busiest_group+0x105/0x310 [c011db8e] del_timer_sync+0x7e/0xa0 [c018cd4d] kjournald+0xbd/0x230 [c0114b10] autoremove_wake_function+0x0/0x40 [c0114b10] autoremove_wake_function+0x0/0x40 [c0103f16] ret_from_fork+0x6/0x14 [c018cc70] commit_timeout+0x0/0x10 [c018cc90] kjournald+0x0/0x230 [c01024bd] kernel_thread_helper+0x5/0x18 Code: f0 ff 43 04 8b 03 83 e0 04 74 4c 8b 8c 24 b8 01 00 00 c6 81 2SoftDog: Initiating system reboot /SNIP --- Scenario 2: Mailservers: Running XFS on mailqueue: SNIP Filesystem sdb1: xfs_trans_delete_ail: attempting to delete a log item that is not in the AIL xfs_force_shutdown(sdb1,0x8) called from line 382 of file fs/xfs/xfs_trans_ail.c. Return address = 0xc0216a56 @Linux version 2.6.9 ([EMAIL PROTECTED]) (gcc version 2.96 2731 (Red Hat Linux 7.3 2.96-113)) #1 SMP Tue Oct 19 16:04:55 CEST 2004 /SNIP === Resolution to the mailserver problem: 2.4.28 is perfectly stable on these machines. --- Scenario 3: Webservers: 2.6.10, 2.6.10-ac8 (~3-12 hours uptime): SNIP Unable to handle kernel paging request 2SoftDog: Initiating system reboot. SNIP (No more...) :( === Resolution to the webserver problem: 2.4.28/2.4.29-rc2 are stable here --- Scenario 4: Storageservers: 2.6.8.1: Oopses after ~5-10 hours whith SMP on. - Cannot find the actual Oopses anymore and 2.6.8+ havent been tested as we cannot afford anymore downtime on these servers. === Resolution to the storage server problem: 2.6.8.1 UP is stable (but oopses regularly after memory allocation failures) Hardware on all servers: IBM x335 and x345. Mentioned errors seen on a total of 17 servers. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: journaled filesystems -- known instability; Was: XFS: inode with st_mode == 0
On Llu, 2005-01-17 at 21:31, Jeffrey Hundstad wrote: I also can't keep a recent 2.6 or 2.6*-ac* kernel up more than a few hours on a machine under real load. Perhaps us folks with the problem need to talk to the powers who be to come up with a strategy to make a report they can use. My guess is we're not sending something that can be used. I need a way to reproduce it. Preferably on a hardware configuration that is running 2.6.10-ac10 or later because of the bio and acpi fixes. I'm not interested in any report including binary drivers and to be honest the least complex configuration the better. I also care that the hardware passes memtest86+ ! I also don't care about XFS although Christoph may well do. Alan - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS: inode with st_mode == 0
On Sat, Jan 15, 2005 at 01:09:08PM +1100, Nathan Scott wrote: ... > > AFAIK the best you can do is to get the most recent XFS kernel from > > SGI's CVS (this one is based on 2.6.10). > > The -mm tree also has these fixes; we'll get them merged into > mainline soon. Okeydokey - good > > > If you run that kernel, then most of the former problems will be gone; > > *) I only have one undeletable directory on my system - so it seems that > > this error is no longer common ;) > > You may need to run xfs_repair to clean that up..? Or does > the problem persist after a repair? I'm running Debian Woody - the xfs_check/xfs_repair there didn't seem to find anything last I tried. I have not re-checked for this last problem though. I figured I might need to run the CVS version of xfs tools, and, well, me being busy and all, I thought I'd just leave the 'delete_me' directory hanging until some time I got more time on my hands ;) -- / jakob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS: inode with st_mode == 0
On Fri, Jan 14, 2005 at 07:23:09PM +0100, Jakob Oestergaard wrote: > So apart from the general well known instability problems that will > occur when you actually start *using* the system, there should be no What known instabilities? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS: inode with st_mode == 0
On Fri, Jan 14, 2005 at 07:23:09PM +0100, Jakob Oestergaard wrote: So apart from the general well known instability problems that will occur when you actually start *using* the system, there should be no What known instabilities? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: XFS: inode with st_mode == 0
On Sat, Jan 15, 2005 at 01:09:08PM +1100, Nathan Scott wrote: ... AFAIK the best you can do is to get the most recent XFS kernel from SGI's CVS (this one is based on 2.6.10). The -mm tree also has these fixes; we'll get them merged into mainline soon. Okeydokey - good If you run that kernel, then most of the former problems will be gone; *) I only have one undeletable directory on my system - so it seems that this error is no longer common ;) You may need to run xfs_repair to clean that up..? Or does the problem persist after a repair? I'm running Debian Woody - the xfs_check/xfs_repair there didn't seem to find anything last I tried. I have not re-checked for this last problem though. I figured I might need to run the CVS version of xfs tools, and, well, me being busy and all, I thought I'd just leave the 'delete_me' directory hanging until some time I got more time on my hands ;) -- / jakob - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/