Re: all processes waiting in TASK_UNINTERRUPTIBLE state
On Friday 29 June 2001 22:40, Jeff Dike wrote: > The bug was UML-specific and specific in such a way that I don't think it's > possible to find the bug in the native kernel by making analogies from the > UML bug. Heh, too bad, there goes that chance to show uml bagging a major kernel bug. But it's just a matter of time... -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: all processes waiting in TASK_UNINTERRUPTIBLE state
On Tue, Jun 26, 2001 at 10:47:12AM -0400, Bulent Abali wrote: > Andrea, > I would like try your patch but so far I can trigger the bug only when > running TUX 2.0-B6 which runs on 2.4.5-ac4. /bulent > to run tux you can apply those patches in `ls` order to 2.4.6pre5aa1: ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.6pre5aa1/30_tux/* Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: all processes waiting in TASK_UNINTERRUPTIBLE state
>> I am running in to a problem, seemingly a deadlock situation, where almost >> all the processes end up in the TASK_UNINTERRUPTIBLE state. All the > >could you try to reproduce with this patch applied on top of >2.4.6pre5aa1 or 2.4.6pre5 vanilla? Andrea, I would like try your patch but so far I can trigger the bug only when running TUX 2.0-B6 which runs on 2.4.5-ac4. /bulent - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: all processes waiting in TASK_UNINTERRUPTIBLE state
On Tue, 26 Jun 2001, Christian Ehrhardt wrote: > > I've seen this under UML, Rik van Riel has seen it on a physical box, and we > > suspect that they're the same problem (i.e. mine isn't a UML-specific bug). > > Could it be smbfs? No. I've seen the hang on pure ex2. Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: all processes waiting in TASK_UNINTERRUPTIBLE state
On Mon, Jun 25, 2001 at 12:05:14PM -0500, Jeff Dike wrote: > [EMAIL PROTECTED] said: > > I am running in to a problem, seemingly a deadlock situation, where > > almost all the processes end up in the TASK_UNINTERRUPTIBLE state. > > All the process eventually stop responding, including login shell, no > > screen updates, keyboard etc. Can ping and sysrq key works. I > > traced the tasks through sysrq-t key. The processors are in the idle > > state. Tasks all seem to get stuck in the __wait_on_page or > > __lock_page. > > I've seen this under UML, Rik van Riel has seen it on a physical box, and we > suspect that they're the same problem (i.e. mine isn't a UML-specific bug). > > I've done some poking at the problem, but haven't really learned anything > except that something is locking pages and not unlocking them. Figuring out > who that is was going to be my next step. Could it be smbfs? The following piece of code from smb_writepage looks like it could return with the page locked: static int smb_writepage(struct page *page) { /* */ /* easy case */ if (page->index < end_index) goto do_it; /* things got complicated... */ offset = inode->i_size & (PAGE_CACHE_SIZE-1); /* OK, are we completely out? */ if (page->index >= end_index+1 || !offset) return -EIO; <= This looks bad! do_it: get_page(page); err = smb_writepage_sync(inode, page, 0, offset); SetPageUptodate(page); UnlockPage(page); put_page(page); return err; } regards Christian -- THAT'S ALL FOLKS! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: all processes waiting in TASK_UNINTERRUPTIBLE state
Hi again i have a stack now an #0 schedule () at sched.c:536 #1 0x1002f932 in __wait_on_buffer (bh=0x50eb16e4) at buffer.c:157 #2 0x10036f46 in block_read (filp=0x5009787c, buf=0x80c08f0 "¤\201", count=8192, ppos=0x5009789c) at /home/mistral/dev/kernel/linux-2.4.5-um9/include/linux/locks.h:20 #3 0x1002eb4b in sys_read (fd=3, buf=0x80c00f0 "¤\201", count=8192) at read_write.c:133 #4 0x100fb807 in execute_syscall (regs={regs = {3, 135004400, 8192, 1283476480, 0, 3212835652, 4294967258, 43, 43, 0, 0, 3, 1074582884, 35, 582, 3212835588, 43}}) at syscall_kern.c:332 #5 0x100fb926 in syscall_handler (unused=0x0) at syscall_user.c:80 this should still be on #umldebug on irc.openproject.net for the next few ours if anybodys intresting at taking a look though gdb via a bot. James -- - Web: http://www.stev.org Mobile: +44 07779080838 E-Mail: [EMAIL PROTECTED] 8:30pm up 2 days, 42 min, 4 users, load average: 1.00, 0.94, 0.64 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: all processes waiting in TASK_UNINTERRUPTIBLE state
Hi i have been looking at it a lot over the past few days i seem to be the person who can trigger it easyest. over the past couple of days i have been running with the #define WAITQUEUE_DEBUG 1 no problems seem to have appeared there though and the bug still triggers. On Mon, 25 Jun 2001, Jeff Dike wrote: > [EMAIL PROTECTED] said: > > I am running in to a problem, seemingly a deadlock situation, where > > almost all the processes end up in the TASK_UNINTERRUPTIBLE state. > > All the process eventually stop responding, including login shell, no > > screen updates, keyboard etc. Can ping and sysrq key works. I > > traced the tasks through sysrq-t key. The processors are in the idle > > state. Tasks all seem to get stuck in the __wait_on_page or > > __lock_page. i also seem to get ut ub __wait_on_buffer and ___wait_on_page James -- - Web: http://www.stev.org Mobile: +44 07779080838 E-Mail: [EMAIL PROTECTED] 8:00pm up 2 days, 12 min, 4 users, load average: 1.41, 0.38, 0.40 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: all processes waiting in TASK_UNINTERRUPTIBLE state
On Mon, 25 Jun 2001, Jeff Dike wrote: > [EMAIL PROTECTED] said: > > Can you give more details? Was there an aic7xxx scsi driver on the > > box? run_task_queue(&tq_disk) should eventually unlock those pages but > > they remain locked. I am trying to narrow it down to fs/buffer code > > or the SCSI driver aic7xxx in my case. > > Rik would be the one to tell you whether there was an aic7xxx driver > on the physical box. There obviously isn't one on UML, so if we're > looking at the same bug, it's in the generic code. The box has as AIC-7880U controller. OTOH, my dual P5 also has an AIC7xxx controller and I've never seen the problem there... On our quad Xeon this problem really seems to be phase-of-moon related; it hasn't shown up in the last 5 days or so under heavy stress testing, but when the kernel is compiled just a little bit different it doesn't happen. ;) regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: all processes waiting in TASK_UNINTERRUPTIBLE state
>[EMAIL PROTECTED] said: >> I am running in to a problem, seemingly a deadlock situation, where >> almost all the processes end up in the TASK_UNINTERRUPTIBLE state. >> All the process eventually stop responding, including login shell, no >> screen updates, keyboard etc. Can ping and sysrq key works. I >> traced the tasks through sysrq-t key. The processors are in the idle >> state. Tasks all seem to get stuck in the __wait_on_page or >> __lock_page. > >I've seen this under UML, Rik van Riel has seen it on a physical box, and we >suspect that they're the same problem (i.e. mine isn't a UML-specific bug). Can you give more details? Was there an aic7xxx scsi driver on the box? run_task_queue(&tq_disk) should eventually unlock those pages but they remain locked. I am trying to narrow it down to fs/buffer code or the SCSI driver aic7xxx in my case. Thanks. /bulent - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
all processes waiting in TASK_UNINTERRUPTIBLE state
keywords: tux, aic7xxx, 2.4.5-ac4, specweb99, __wait_on_page, __lock_page Greetings, I am running in to a problem, seemingly a deadlock situation, where almost all the processes end up in the TASK_UNINTERRUPTIBLE state. All the process eventually stop responding, including login shell, no screen updates, keyboard etc. Can ping and sysrq key works. I traced the tasks through sysrq-t key. The processors are in the idle state. Tasks all seem to get stuck in the __wait_on_page or __lock_page. It appears from the source that they are waiting for pages to be unlocked. run_task_queue (&tq_disk) should eventually cause pages to unlock but it doesn't happen. Anybody familiar with this problem or have seen it before? Thanks for any comments. Bulent Here are the conditions: Dual PIII, 1GHz, 1GB of memory, aic7xxx scsi driver, acenic eth. This occurs while TUX (2.4.5-B6) webserver is being driven by SPECWeb99 benchmark at a rate of 800 c/s. The system is very busy doing disk and network I/O. Problem occurs sometimes in an hour and sometimes 10-20 hours in to the running. Bulent Process: 0, { swapper} EIP: 0010:[] CPU: 1 EFLAGS: 0246 EAX: EBX: c0105220 ECX: c2afe000 EDX: 0025 ESI: c2afe000 EDI: c2afe000 EBP: c0105220 DS: 0018 ES: 0018 CR0: 8005003b CR2: 08049df0 CR3: 268e CR4: 06d0 Call Trace: [] [] [] SysRq : Show Regs Process: 0, { swapper} EIP: 0010:[] CPU: 0 EFLAGS: 0246 EAX: EBX: c0105220 ECX: c030a000 EDX: ESI: c030a000 EDI: c030a000 EBP: c0105220 DS: 0018 ES: 0018 CR0: 8005003b CR2: 08049f7c CR3: 37a63000 CR4: 06d0 Call Trace: [] [] [] SysRq : Show Regs EIP: 0010:[] CPU: 1 EFLAGS: 0246 Using defaults from ksymoops -t elf32-i386 -a i386 EAX: EBX: c0105220 ECX: c2afe000 EDX: 0025 ESI: c2afe000 EDI: c2afe000 EBP: c0105220 DS: 0018 ES: 0018 CR0: 8005003b CR2: 08049df0 CR3: 268e CR4: 06d0 Call Trace: [] [] [] EIP: 0010:[] CPU: 0 EFLAGS: 0246 EAX: EBX: c0105220 ECX: c030a000 EDX: ESI: c030a000 EDI: c030a000 EBP: c0105220 DS: 0018 ES: 0018 CR0: 8005003b CR2: 08049f7c CR3: 37a63000 CR4: 06d0 Call Trace: [] [] [] >>EIP; c010524d<= Trace; c01052d2 Trace; c0119186 <__call_console_drivers+46/60> Trace; c01192fb >>EIP; c010524d<= Trace; c01052d2 Trace; c0105000 Trace; c01001cf = SysRq : Show Memory Mem-info: Free pages:4300kB ( 792kB HighMem) ( Active: 200434, inactive_dirty: 26808, inactive_clean: 1472, free: 1075 (574 1148 1722) ) 24*4kB 15*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 728kB) 493*4kB 3*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 2780kB) 0*4kB 1*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 792kB) Swap cache: add 2711, delete 643, find 5301/6721 Free swap: 2087996kB 253932 pages of RAM 24556 pages of HIGHMEM 7212 reserved pages 221419 pages shared 2068 pages swap cached 0 pages in page table cache Buffer memory:12164kB CLEAN: 2322 buffers, 9276 kbyte, 3 used (last=2322), 2 locked, 0 protected, 0 dirty LOCKED: 405 buffers, 1608 kbyte, 39 used (last=404), 348 locked, 0 protected, 0 dirty DIRTY: 322 buffers, 1288 kbyte, 0 used (last=0), 322 locked, 0 protected, 322 dirty = async IO 0/2 D 0013 0 1061 1059 1062 (NOTLB) Call Trace: [] [] [] [] [] [] [] [] [] [] Trace; c012e121 <___wait_on_page+91/c0> Trace; c012f059 Trace; c02614d7 Trace; c0258c44 Trace; c02588c0 Trace; c025c65a Trace; c0256848 Trace; c0258478 Trace; c0105636 Trace; c02582a0 == bash D C2AE541C 0 920912 (NOTLB) Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [ Trace; c012e1e1 <__lock_page+91/c0> Trace; c012e04d Trace; c016b880 Trace; c012fdac Trace; c012a49a Trace; c012a76a Trace; c012a8cb Trace; c021814c Trace; c0113ed0 Trace; c0114106 Trace; c0118aa5 Trace; c01417d2 Trace; c011e25b Trace; c0113ed0 Trace; c01075b8 void ___wait_on_page(struct page *page) { struct task_struct *tsk = current; DECLARE_WAITQUEUE(wait, tsk); add_wait_queue(&page->wait, &wait); do { sync_page(page); set_task_state(tsk, TASK_UNINTERRUPTIBLE); if (!PageLocked(page)) break; run_task_queue(&tq_disk); schedule(); } while (PageLocked(page)); tsk->state = TASK_RUNNING; remove_wait_queue(&page->wait, &wait); } static void __lock_page(struct page *page) { struct task_struct *tsk = current; DECLARE_WAITQUEUE(wait, tsk); add_wait_queue_exclusive(&page->wait, &wait); for (;;) { sync_page(page); set_task_state(tsk, TASK_UNINTERRUPTIBLE); if (PageLocked(page)) {