Re: all processes waiting in TASK_UNINTERRUPTIBLE state

2001-06-29 Thread Daniel Phillips

On Friday 29 June 2001 22:40, Jeff Dike wrote:
> The bug was UML-specific and specific in such a way that I don't think it's
> possible to find the bug in the native kernel by making analogies from the
> UML bug.

Heh, too bad, there goes that chance to show uml bagging a major kernel bug.  
But it's just a matter of time...

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: all processes waiting in TASK_UNINTERRUPTIBLE state

2001-06-26 Thread Andrea Arcangeli

On Tue, Jun 26, 2001 at 10:47:12AM -0400, Bulent Abali wrote:
> Andrea,
> I would like try your patch but so far I can trigger the bug only when
> running TUX 2.0-B6 which runs on 2.4.5-ac4.  /bulent
> 

to run tux you can apply those patches in `ls` order to 2.4.6pre5aa1:


ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.6pre5aa1/30_tux/*

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: all processes waiting in TASK_UNINTERRUPTIBLE state

2001-06-26 Thread Bulent Abali



>> I am running in to a problem, seemingly a deadlock situation, where
almost
>> all the processes end up in the TASK_UNINTERRUPTIBLE state.   All the
>
>could you try to reproduce with this patch applied on top of
>2.4.6pre5aa1 or 2.4.6pre5 vanilla?

Andrea,
I would like try your patch but so far I can trigger the bug only when
running TUX 2.0-B6 which runs on 2.4.5-ac4.  /bulent



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: all processes waiting in TASK_UNINTERRUPTIBLE state

2001-06-26 Thread Rik van Riel

On Tue, 26 Jun 2001, Christian Ehrhardt wrote:

> > I've seen this under UML, Rik van Riel has seen it on a physical box, and we 
> > suspect that they're the same problem (i.e. mine isn't a UML-specific bug).
> 
> Could it be smbfs?

No. I've seen the hang on pure ex2.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: all processes waiting in TASK_UNINTERRUPTIBLE state

2001-06-26 Thread Christian Ehrhardt

On Mon, Jun 25, 2001 at 12:05:14PM -0500, Jeff Dike wrote:
> [EMAIL PROTECTED] said:
> > I am running in to a problem, seemingly a deadlock situation, where
> > almost all the processes end up in the TASK_UNINTERRUPTIBLE state.
> > All the process eventually stop responding, including login shell, no
> > screen updates, keyboard etc.  Can ping and sysrq key works.   I
> > traced the tasks through sysrq-t key.  The processors are in the idle
> > state.  Tasks all seem to get stuck in the __wait_on_page or
> > __lock_page.
> 
> I've seen this under UML, Rik van Riel has seen it on a physical box, and we 
> suspect that they're the same problem (i.e. mine isn't a UML-specific bug).
> 
> I've done some poking at the problem, but haven't really learned anything 
> except that something is locking pages and not unlocking them.  Figuring out 
> who that is was going to be my next step.

Could it be smbfs? The following piece of code from smb_writepage
looks like it could return with the page locked:


  static int
  smb_writepage(struct page *page)
  {

  /*   */

  /* easy case */
  if (page->index < end_index)
  goto do_it;
  /* things got complicated... */
  offset = inode->i_size & (PAGE_CACHE_SIZE-1);
  /* OK, are we completely out? */
  if (page->index >= end_index+1 || !offset)
  return -EIO;   <=   This looks bad!
  do_it:
  get_page(page);
  err = smb_writepage_sync(inode, page, 0, offset);
  SetPageUptodate(page);
  UnlockPage(page);
  put_page(page);
  return err;
  }


   regards Christian

-- 
THAT'S ALL FOLKS!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: all processes waiting in TASK_UNINTERRUPTIBLE state

2001-06-25 Thread James Stevenson


Hi again

i have a stack now an

#0  schedule () at sched.c:536
#1  0x1002f932 in __wait_on_buffer (bh=0x50eb16e4) at  buffer.c:157
#2  0x10036f46 in block_read (filp=0x5009787c, buf=0x80c08f0
"¤\201", count=8192, ppos=0x5009789c) at
/home/mistral/dev/kernel/linux-2.4.5-um9/include/linux/locks.h:20
#3  0x1002eb4b in sys_read (fd=3, buf=0x80c00f0 "¤\201", count=8192)
at read_write.c:133
#4  0x100fb807 in execute_syscall (regs={regs = {3, 135004400, 8192,
1283476480, 0, 3212835652, 4294967258, 43, 43, 0, 0, 3,
1074582884, 35, 582, 3212835588, 43}}) at syscall_kern.c:332
#5  0x100fb926 in syscall_handler (unused=0x0) at
syscall_user.c:80

this should still be on #umldebug on irc.openproject.net
for the next few ours if anybodys intresting at taking a look though
gdb via a bot.

James

-- 
-
Web: http://www.stev.org
Mobile: +44 07779080838
E-Mail: [EMAIL PROTECTED]
  8:30pm  up 2 days, 42 min,  4 users,  load average: 1.00, 0.94, 0.64

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: all processes waiting in TASK_UNINTERRUPTIBLE state

2001-06-25 Thread James Stevenson


Hi

i have been looking at it a lot over the past few days i seem to be the
person who can trigger it easyest.

over the past couple of days i have been running with the
#define WAITQUEUE_DEBUG 1
no problems seem to have appeared there though and the bug still triggers.

On Mon, 25 Jun 2001, Jeff Dike wrote:

> [EMAIL PROTECTED] said:
> > I am running in to a problem, seemingly a deadlock situation, where
> > almost all the processes end up in the TASK_UNINTERRUPTIBLE state.
> > All the process eventually stop responding, including login shell, no
> > screen updates, keyboard etc.  Can ping and sysrq key works.   I
> > traced the tasks through sysrq-t key.  The processors are in the idle
> > state.  Tasks all seem to get stuck in the __wait_on_page or
> > __lock_page.

i also seem to get ut ub __wait_on_buffer and ___wait_on_page

James
-- 
-
Web: http://www.stev.org
Mobile: +44 07779080838
E-Mail: [EMAIL PROTECTED]
  8:00pm  up 2 days, 12 min,  4 users,  load average: 1.41, 0.38, 0.40

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: all processes waiting in TASK_UNINTERRUPTIBLE state

2001-06-25 Thread Rik van Riel

On Mon, 25 Jun 2001, Jeff Dike wrote:
> [EMAIL PROTECTED] said:
> > Can you give more details?  Was there an aic7xxx scsi driver on the
> > box? run_task_queue(&tq_disk) should eventually unlock those pages but
> > they remain locked.  I am trying to narrow it down to fs/buffer code
> > or the SCSI driver aic7xxx in my case.
> 
> Rik would be the one to tell you whether there was an aic7xxx driver
> on the physical box.  There obviously isn't one on UML, so if we're
> looking at the same bug, it's in the generic code.

The box has as AIC-7880U controller. OTOH, my dual P5 also has
an AIC7xxx controller and I've never seen the problem there...

On our quad Xeon this problem really seems to be phase-of-moon
related; it hasn't shown up in the last 5 days or so under heavy
stress testing, but when the kernel is compiled just a little bit
different it doesn't happen. ;)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: all processes waiting in TASK_UNINTERRUPTIBLE state

2001-06-25 Thread Bulent Abali



>[EMAIL PROTECTED] said:
>> I am running in to a problem, seemingly a deadlock situation, where
>> almost all the processes end up in the TASK_UNINTERRUPTIBLE state.
>> All the process eventually stop responding, including login shell, no
>> screen updates, keyboard etc.  Can ping and sysrq key works.   I
>> traced the tasks through sysrq-t key.  The processors are in the idle
>> state.  Tasks all seem to get stuck in the __wait_on_page or
>> __lock_page.
>
>I've seen this under UML, Rik van Riel has seen it on a physical box, and
we
>suspect that they're the same problem (i.e. mine isn't a UML-specific
bug).

Can you give more details?  Was there an aic7xxx scsi driver on the box?
run_task_queue(&tq_disk) should eventually unlock those pages
but they remain locked.  I am trying to narrow it down to fs/buffer
code or the SCSI driver aic7xxx in my case. Thanks. /bulent



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



all processes waiting in TASK_UNINTERRUPTIBLE state

2001-06-25 Thread Bulent Abali


keywords:  tux, aic7xxx, 2.4.5-ac4, specweb99, __wait_on_page, __lock_page

Greetings,

I am running in to a problem, seemingly a deadlock situation, where almost
all the processes end up in the TASK_UNINTERRUPTIBLE state.   All the
process eventually stop responding, including login shell, no screen
updates, keyboard etc.  Can ping and sysrq key works.   I traced the tasks
through sysrq-t key.  The processors are in the idle state.  Tasks all seem
to get stuck in the __wait_on_page or __lock_page.  It appears from the
source that they are waiting for pages to be unlocked.   run_task_queue
(&tq_disk) should eventually cause pages to unlock but it doesn't happen.
Anybody familiar with this problem or have seen it before?  Thanks for any
comments.
Bulent

Here are the conditions:
Dual PIII, 1GHz, 1GB of memory,  aic7xxx scsi driver, acenic eth.
This occurs while TUX  (2.4.5-B6) webserver is being driven by SPECWeb99
benchmark at a rate of 800 c/s.  The system is very busy doing disk and
network I/O.  Problem occurs sometimes in an hour and sometimes 10-20 hours
in to the running.

Bulent


Process: 0, { swapper}
EIP: 0010:[] CPU: 1 EFLAGS: 0246
EAX:  EBX: c0105220 ECX: c2afe000 EDX: 0025
ESI: c2afe000 EDI: c2afe000 EBP: c0105220 DS: 0018 ES: 0018
CR0: 8005003b CR2: 08049df0 CR3: 268e CR4: 06d0
Call Trace: [] [] []
SysRq : Show Regs

Process: 0, { swapper}
EIP: 0010:[] CPU: 0 EFLAGS: 0246
EAX:  EBX: c0105220 ECX: c030a000 EDX: 
ESI: c030a000 EDI: c030a000 EBP: c0105220 DS: 0018 ES: 0018
CR0: 8005003b CR2: 08049f7c CR3: 37a63000 CR4: 06d0
Call Trace: [] [] []
SysRq : Show Regs

EIP: 0010:[] CPU: 1 EFLAGS: 0246
Using defaults from ksymoops -t elf32-i386 -a i386
EAX:  EBX: c0105220 ECX: c2afe000 EDX: 0025
ESI: c2afe000 EDI: c2afe000 EBP: c0105220 DS: 0018 ES: 0018
CR0: 8005003b CR2: 08049df0 CR3: 268e CR4: 06d0
Call Trace: [] [] []

EIP: 0010:[] CPU: 0 EFLAGS: 0246
EAX:  EBX: c0105220 ECX: c030a000 EDX: 
ESI: c030a000 EDI: c030a000 EBP: c0105220 DS: 0018 ES: 0018
CR0: 8005003b CR2: 08049f7c CR3: 37a63000 CR4: 06d0
Call Trace: [] [] []

>>EIP; c010524d<=
Trace; c01052d2 
Trace; c0119186 <__call_console_drivers+46/60>
Trace; c01192fb 

>>EIP; c010524d<=
Trace; c01052d2 
Trace; c0105000 
Trace; c01001cf 

=

SysRq : Show Memory
Mem-info:
Free pages:4300kB (   792kB HighMem)
( Active: 200434, inactive_dirty: 26808, inactive_clean: 1472, free: 1075
(574 1148 1722) )
24*4kB 15*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 728kB)
493*4kB 3*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB
0*2048kB 0*4096kB = 2780kB)
0*4kB 1*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB
0*4096kB = 792kB)
Swap cache: add 2711, delete 643, find 5301/6721
Free swap:   2087996kB
253932 pages of RAM
24556 pages of HIGHMEM
7212 reserved pages
221419 pages shared
2068 pages swap cached
0 pages in page table cache
Buffer memory:12164kB
CLEAN: 2322 buffers, 9276 kbyte, 3 used (last=2322), 2 locked, 0
protected, 0 dirty
   LOCKED: 405 buffers, 1608 kbyte, 39 used (last=404), 348 locked, 0
protected, 0 dirty
DIRTY: 322 buffers, 1288 kbyte, 0 used (last=0), 322 locked, 0
protected, 322 dirty

=

async IO 0/2  D 0013 0  1061   1059  1062   (NOTLB)
Call Trace: [] [] [] []
[]
   [] [] [] [] []

Trace; c012e121 <___wait_on_page+91/c0>
Trace; c012f059 
Trace; c02614d7 
Trace; c0258c44 
Trace; c02588c0 
Trace; c025c65a 
Trace; c0256848 
Trace; c0258478 
Trace; c0105636 
Trace; c02582a0 


==

bash  D C2AE541C 0   920912 (NOTLB)
Call Trace: [] [] [] []
[]
   [] [] [] [] []
[]
   [] [] [] [

Trace; c012e1e1 <__lock_page+91/c0>
Trace; c012e04d 
Trace; c016b880 
Trace; c012fdac 
Trace; c012a49a 
Trace; c012a76a 
Trace; c012a8cb 
Trace; c021814c 
Trace; c0113ed0 
Trace; c0114106 
Trace; c0118aa5 
Trace; c01417d2 
Trace; c011e25b 
Trace; c0113ed0 
Trace; c01075b8 



void ___wait_on_page(struct page *page)
{
struct task_struct *tsk = current;
DECLARE_WAITQUEUE(wait, tsk);

add_wait_queue(&page->wait, &wait);
do {
sync_page(page);
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
if (!PageLocked(page))
break;
run_task_queue(&tq_disk);
schedule();
} while (PageLocked(page));
tsk->state = TASK_RUNNING;
remove_wait_queue(&page->wait, &wait);
}

static void __lock_page(struct page *page)
{
struct task_struct *tsk = current;
DECLARE_WAITQUEUE(wait, tsk);

add_wait_queue_exclusive(&page->wait, &wait);
for (;;) {
sync_page(page);
set_task_state(tsk, TASK_UNINTERRUPTIBLE);
if (PageLocked(page)) {