Re: Linuxulator: possible Giant pushdown victim
On 10-Sep-01 Dag-Erling Smorgrav wrote: > Julian Elischer <[EMAIL PROTECTED]> writes: >> Marcel Moolenaar wrote: >> > BTW: Do we have handy functions for use in the remote debugger, such >> > as show_proc, show_vm or whatever, that dump important information >> > in a readable form? >> Matt has a cool set of macros as does Grog. > > I have a couple of macros I've used for debugging KLDs, which may > serve as templates or inspiration for someone to write e.g. a "ps" > macro (it shouldn't be too different from the "kldstat" macro, just > walk the process table and print formatted info for every process) Grog has a ps macro. Look in sys/modules/vinum IIRC. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Linuxulator: possible Giant pushdown victim
Julian Elischer <[EMAIL PROTECTED]> writes: > Marcel Moolenaar wrote: > > BTW: Do we have handy functions for use in the remote debugger, such > > as show_proc, show_vm or whatever, that dump important information > > in a readable form? > Matt has a cool set of macros as does Grog. I have a couple of macros I've used for debugging KLDs, which may serve as templates or inspiration for someone to write e.g. a "ps" macro (it shouldn't be too different from the "kldstat" macro, just walk the process table and print formatted info for every process) define kldstat set $kld = linker_files.tqh_first printf "Id Refs AddressSize Name\n" while ($kld != 0) printf "%2d %4d 0x%08x %-8x %s\n", \ $kld->id, $kld->refs, $kld->address, $kld->size, $kld->filename set $kld = $kld->link.tqe_next end end document kldstat Lists the modules that were loaded when the kernel crashed. end define kldstat-v set $kld = linker_files.tqh_first printf "Id Refs AddressSize Name\n" while ($kld != 0) printf "%2d %4d 0x%08x %-8x %s\n", \ $kld->id, $kld->refs, $kld->address, $kld->size, $kld->filename printf "Contains modules:\n" printf "Id Name\n" set $module = $kld->modules.tqh_first while ($module != 0) printf "%2d %s\n", $module->id, $module->name set $module = $module->link.tqe_next end set $kld = $kld->link.tqe_next end end document kldstat-v Lists modules with full information. end define kldload set $kld = linker_files.tqh_first set $done = 0 while ($kld != 0 && $done == 0) if ($kld->filename == $arg0) set $done = 1 else set $kld = $kld->link.tqe_next end end if ($done == 1) shell /usr/bin/objdump -h $arg0 | \ awk '/ .text/ { print "set \$offset = 0x" $6 }' > .kgdb.temp source .kgdb.temp add-symbol-file $arg0 $kld->address + $offset end end document kldload Loads a module. Arguments are module name and offset of text section. end DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Linuxulator: possible Giant pushdown victim
Marcel Moolenaar wrote: > > BTW: Do we have handy functions for use in the remote debugger, such > as show_proc, show_vm or whatever, that dump important information > in a readable form? Matt has a cool set of macros as does Grog. -- ++ __ _ __ | __--_|\ Julian Elischer | \ U \/ / hard at work in | / \ [EMAIL PROTECTED] +-->x USA\ a very strange | ( OZ)\___ ___ | country ! +- X_.---._/presently in San Francisco \_/ \\ v To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Linuxulator: possible Giant pushdown victim
On Thu, Sep 06, 2001 at 11:55:19AM -0700, John Baldwin wrote: > > > Note that 3 of these are runnable (stat of 2 == SRUN). In top, see if they are > chewing up lots of time. Top doesn't update after the first mozilla process has started. Its trace is: mi_switch() cv_timedwait_sig() select() syscall() syscall_with_err_pushed() --- syscall(93, .., select) > > db> trace 517 > > mi_switch(0,cd193aa0,811f874,cd27cfa0,c02bead6) at mi_switch+0x1a0 > > _mtx_unlock_sleep(c039e860,0,c030b460,497) at _mtx_unlock_sleep+0x204 > > syscall(2f,2f,2f,811f874,1) at syscall+0x48a > > syscall_with_err_pushed() at syscall_with_err_pushed+0x1b > > --- syscall (514), eip = 0x285a31a7, esp = 0x811f858, ebp = 0x811f9b4 --- > > Weird syscall number (514). This one was blocked on a mutex that was just > released. I'm betting that 0xc039e860 is Giant? Perhaps not though? Rien ne va plus! It is Giant. > > db> trace 520 > > mi_switch(cd193ee0) at mi_switch+0x1a0 > > userret(cd193ee0,cd257fa8,0,208,befffc00) at userret+0x395 > > syscall(2f,2f,2f,befffd24,befffc00) at syscall+0x3c9 > > syscall_with_err_pushed() at syscall_with_err_pushed+0x1b > > --- syscall (0, Linux ELF, nosys), eip = 0x285b8bd4, esp = 0xbefffb24, ebp = > > 0xbefffbf4 --- > > Another instance of being preempted upon return to userland. Possible that the > regs in the trapframe are altered to hold return values and thus that the > syscall number is invalid. Hmm. That certainly would explain it (see above). > What locks do all these processes hold? No locks are hold by any of the processes. The question then is: what are they waiting for? I started playing with remote debugging let me look around for a bit. BTW: Do we have handy functions for use in the remote debugger, such as show_proc, show_vm or whatever, that dump important information in a readable form? -- Marcel Moolenaar USPA: A-39004 [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Linuxulator: possible Giant pushdown victim
On 06-Sep-01 Marcel Moolenaar wrote: > On Wed, Sep 05, 2001 at 02:47:28PM -0700, John Baldwin wrote: >> >> Yes, you can trace indiviudal processes though, using 'trace ', and I'm >> more curious about the traces of the Mozilla processes. > > Ok, here it is: > > db> ps > pid proc addruid ppid pgrp flag stat wmesg wchan cmd > 520 cd193ee0 cd256000 4152 517 514 02 2 > mozilla-bin > 519 cd197840 cd1ab000 4152 517 514 202 3 pause c17d3000 > mozilla-bin > 518 cd193880 cd27 4152 517 514 02 3 select c039bb24 > mozilla-bin > 517 cd193aa0 cd27b000 4152 514 514 02 2 > mozilla-bin > 514 cd194100 cd244000 4152 505 514 004002 2 > mozilla-bin > ... Note that 3 of these are runnable (stat of 2 == SRUN). In top, see if they are chewing up lots of time. > db> trace 514 > mi_switch(cd194100) at mi_switch+0x1a0 > userret(cd194100,cd245fa8,c5,a,bfbfeae0) at userret+0x395 > syscall(2f,2f,2f,282397c0,bfbfeae0) at syscall+0x3c9 > syscall_with_err_pushed() at syscall_with_err_pushed+0x1b > --- syscall (148, Linux ELF, linux_fdatasync), eip = 0x285c2074, esp = > 0xbfbfeac8, ebp = 0xbfbfeb98 --- It was returning from a syscall but had to do a context switch due to PS_NEEDRESCHED because it got preempted. > db> trace 517 > mi_switch(0,cd193aa0,811f874,cd27cfa0,c02bead6) at mi_switch+0x1a0 > _mtx_unlock_sleep(c039e860,0,c030b460,497) at _mtx_unlock_sleep+0x204 > syscall(2f,2f,2f,811f874,1) at syscall+0x48a > syscall_with_err_pushed() at syscall_with_err_pushed+0x1b > --- syscall (514), eip = 0x285a31a7, esp = 0x811f858, ebp = 0x811f9b4 --- Weird syscall number (514). This one was blocked on a mutex that was just released. I'm betting that 0xc039e860 is Giant? Perhaps not though? > db> trace 518 > mi_switch(cd19399c,cd193880,0,2,0) at mi_switch+0x1a0 > cv_timedwait_sig(c039bb24,cd19399c,dad,1,bfbffeb8) at cv_timedwait_sig+0x65b > poll(cd193880,cd271f44,cd19399c,cd193880,bf3ffa4c) at poll+0x656 > linux_poll(cd193880,cd271f80,bf3ffa4c,88b8,bf3ffa4c) at linux_poll+0x11f > syscall(2f,2f,2f,bf3ffa4c,88b8) at syscall+0x339 > syscall_with_err_pushed() at syscall_with_err_pushed+0x1b > --- syscall (168, Linux ELF, linux_poll), eip = 0x285c7894, esp = 0xbf3ff9e8, > ebp = 0xbf3ff9f4 --- Asleep in select as ps shows. > db> trace 519 > mi_switch(cd19795c,cd197840,c17d3000,c02f3a60,2) at mi_switch+0x1a0 > msleep(c17d3000,cd19795c,168,c02f0f49,0) at msleep+0x71a > sigsuspend(cd197840,cd1acf4c,cd1acf44,bfbffeb8,cd19795c) at sigsuspend+0x19f > linux_rt_sigsuspend(cd197840,cd1acf80,bf1ff94c,bf1ff94c,28239fc8) at > linux_rt_sigsuspend+0x8e > syscall(2f,2f,2f,28239fc8,bf1ff94c) at syscall+0x339 > syscall_with_err_pushed() at syscall_with_err_pushed+0x1b > --- syscall (179, Linux ELF, linux_rt_sigsuspend), eip = 0x2851c656, esp = > 0xbf1ff92c, ebp = 0xbf1ff934 --- Asleep in pause as ps shows. > db> trace 520 > mi_switch(cd193ee0) at mi_switch+0x1a0 > userret(cd193ee0,cd257fa8,0,208,befffc00) at userret+0x395 > syscall(2f,2f,2f,befffd24,befffc00) at syscall+0x3c9 > syscall_with_err_pushed() at syscall_with_err_pushed+0x1b > --- syscall (0, Linux ELF, nosys), eip = 0x285b8bd4, esp = 0xbefffb24, ebp = > 0xbefffbf4 --- Another instance of being preempted upon return to userland. Possible that the regs in the trapframe are altered to hold return values and thus that the syscall number is invalid. Hmm. What locks do all these processes hold? I would expect the ones in stat 3 (SSLEEP) to hold none, but the others might hold locks. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Linuxulator: possible Giant pushdown victim
On Wed, Sep 05, 2001 at 02:47:28PM -0700, John Baldwin wrote: > > Yes, you can trace indiviudal processes though, using 'trace ', and I'm > more curious about the traces of the Mozilla processes. Ok, here it is: db> ps pid proc addruid ppid pgrp flag stat wmesg wchan cmd 520 cd193ee0 cd256000 4152 517 514 02 2 mozilla-bin 519 cd197840 cd1ab000 4152 517 514 202 3 pause c17d3000 mozilla-bin 518 cd193880 cd27 4152 517 514 02 3 select c039bb24 mozilla-bin 517 cd193aa0 cd27b000 4152 514 514 02 2 mozilla-bin 514 cd194100 cd244000 4152 505 514 004002 2 mozilla-bin ... db> trace Debugger(c0305de9) at Debugger+0x44 scgetc(c039a080,2,c1667a00,c0392da0,4) at scgetc+0x412 sckbdevent(c0392da0,0,c039a080,c1667a00,c1669780) at sckbdevent+0x1c9 atkbd_intr(c0392da0,0,cc475f7c,c01bd99b,c0392da0) at atkbd_intr+0x22 atkbd_isa_intr(c0392da0) at atkbd_isa_intr+0x18 ithread_loop(c1669780,cc475fa8) at ithread_loop+0x2bf fork_exit(c01bd6dc,c1669780,cc475fa8) at fork_exit+0xb4 fork_trampoline() at fork_trampoline+0x8 db> trace 514 mi_switch(cd194100) at mi_switch+0x1a0 userret(cd194100,cd245fa8,c5,a,bfbfeae0) at userret+0x395 syscall(2f,2f,2f,282397c0,bfbfeae0) at syscall+0x3c9 syscall_with_err_pushed() at syscall_with_err_pushed+0x1b --- syscall (148, Linux ELF, linux_fdatasync), eip = 0x285c2074, esp = 0xbfbfeac8, ebp = 0xbfbfeb98 --- db> trace 517 mi_switch(0,cd193aa0,811f874,cd27cfa0,c02bead6) at mi_switch+0x1a0 _mtx_unlock_sleep(c039e860,0,c030b460,497) at _mtx_unlock_sleep+0x204 syscall(2f,2f,2f,811f874,1) at syscall+0x48a syscall_with_err_pushed() at syscall_with_err_pushed+0x1b --- syscall (514), eip = 0x285a31a7, esp = 0x811f858, ebp = 0x811f9b4 --- db> trace 518 mi_switch(cd19399c,cd193880,0,2,0) at mi_switch+0x1a0 cv_timedwait_sig(c039bb24,cd19399c,dad,1,bfbffeb8) at cv_timedwait_sig+0x65b poll(cd193880,cd271f44,cd19399c,cd193880,bf3ffa4c) at poll+0x656 linux_poll(cd193880,cd271f80,bf3ffa4c,88b8,bf3ffa4c) at linux_poll+0x11f syscall(2f,2f,2f,bf3ffa4c,88b8) at syscall+0x339 syscall_with_err_pushed() at syscall_with_err_pushed+0x1b --- syscall (168, Linux ELF, linux_poll), eip = 0x285c7894, esp = 0xbf3ff9e8, ebp = 0xbf3ff9f4 --- db> trace 519 mi_switch(cd19795c,cd197840,c17d3000,c02f3a60,2) at mi_switch+0x1a0 msleep(c17d3000,cd19795c,168,c02f0f49,0) at msleep+0x71a sigsuspend(cd197840,cd1acf4c,cd1acf44,bfbffeb8,cd19795c) at sigsuspend+0x19f linux_rt_sigsuspend(cd197840,cd1acf80,bf1ff94c,bf1ff94c,28239fc8) at linux_rt_sigsuspend+0x8e syscall(2f,2f,2f,28239fc8,bf1ff94c) at syscall+0x339 syscall_with_err_pushed() at syscall_with_err_pushed+0x1b --- syscall (179, Linux ELF, linux_rt_sigsuspend), eip = 0x2851c656, esp = 0xbf1ff92c, ebp = 0xbf1ff934 --- db> trace 520 mi_switch(cd193ee0) at mi_switch+0x1a0 userret(cd193ee0,cd257fa8,0,208,befffc00) at userret+0x395 syscall(2f,2f,2f,befffd24,befffc00) at syscall+0x3c9 syscall_with_err_pushed() at syscall_with_err_pushed+0x1b --- syscall (0, Linux ELF, nosys), eip = 0x285b8bd4, esp = 0xbefffb24, ebp = 0xbefffbf4 --- NOTE 1: process 517: this process seems to be the most active. Multiple breaks after continuing result in different traces. NOTE 2: process 518: there's no linux_poll in the source tree. This is a local change. NOTE 3: process 520: syscall 0 is an invalid Linux syscall (used to be setup()). NOTE 4: this is not reproducable on Alpha, because it panics even before loading mozilla, but this is for later. I'll go with my hunch (sp?) that it's linux_clone and see if I can find the evidence. The systems looks responsive, but everything that relates to processes (creation, destruction) seem to queue up. At least that's how it "feels"... FYI, -- Marcel Moolenaar USPA: A-39004 [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Linuxulator: possible Giant pushdown victim
On 05-Sep-01 Marcel Moolenaar wrote: > On Wed, Sep 05, 2001 at 11:04:04AM -0700, John Baldwin wrote: >> >> On 05-Sep-01 Marcel Moolenaar wrote: >> > Hi, >> > >> > I get consistent locks when trying to run Mozilla for Linux (RH 7.1). >> > >> > Breaking into the debugger, I see it hangs in fork_exit()+180. This >> > is should be the PROC_LOCK(p) in the source file (kern_fork.c). >> >> Can you do 'show locks ' where is the pid of the mozilla process? >> Also, what does a 'trace' of the pid in question show? (I take it this is >> how >> you know where it locked up?) > > show locks gives nothing for all cloned mozilla processes. This > strikes me as odd. Another strange thing is that it seems to have a > local effect at first (ie only mozilla hangs), but when trying to > compose an email on the same machine (for example), it locks up hard. > > I give you a complete trace when I call it a day at the office. In the > mean time, this is roughly it (warning, from memory): > > Debugger > ... > intr...kbd > intr...isa > ithread_loop > fork_exit > fork_trampoline > > My guess is that everything beginning with ithread_loop is related to > me breaking into the debugger with CA-ESC. Yes, you can trace indiviudal processes though, using 'trace ', and I'm more curious about the traces of the Mozilla processes. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: Linuxulator: possible Giant pushdown victim
On Wed, Sep 05, 2001 at 11:04:04AM -0700, John Baldwin wrote: > > On 05-Sep-01 Marcel Moolenaar wrote: > > Hi, > > > > I get consistent locks when trying to run Mozilla for Linux (RH 7.1). > > > > Breaking into the debugger, I see it hangs in fork_exit()+180. This > > is should be the PROC_LOCK(p) in the source file (kern_fork.c). > > Can you do 'show locks ' where is the pid of the mozilla process? > Also, what does a 'trace' of the pid in question show? (I take it this is how > you know where it locked up?) show locks gives nothing for all cloned mozilla processes. This strikes me as odd. Another strange thing is that it seems to have a local effect at first (ie only mozilla hangs), but when trying to compose an email on the same machine (for example), it locks up hard. I give you a complete trace when I call it a day at the office. In the mean time, this is roughly it (warning, from memory): Debugger ... intr...kbd intr...isa ithread_loop fork_exit fork_trampoline My guess is that everything beginning with ithread_loop is related to me breaking into the debugger with CA-ESC. When I get back home again, I'll try this on Alpha as well. The Alpha has already got a serial console, so it's easier to experiment at this time. Please standby... :-) -- Marcel Moolenaar USPA: A-39004 [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
RE: Linuxulator: possible Giant pushdown victim
On 05-Sep-01 Marcel Moolenaar wrote: > Hi, > > I get consistent locks when trying to run Mozilla for Linux (RH 7.1). > > Breaking into the debugger, I see it hangs in fork_exit()+180. This > is should be the PROC_LOCK(p) in the source file (kern_fork.c). Can you do 'show locks ' where is the pid of the mozilla process? Also, what does a 'trace' of the pid in question show? (I take it this is how you know where it locked up?) -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Linuxulator: possible Giant pushdown victim
Hi, I get consistent locks when trying to run Mozilla for Linux (RH 7.1). Breaking into the debugger, I see it hangs in fork_exit()+180. This is should be the PROC_LOCK(p) in the source file (kern_fork.c). Since a deadlock in this place should be seen for FreeBSD binaries as well and since that's not the case, it must be Mozilla. In the Linuxulator fork() and vfork() are implemented in terms of their FreeBSD equivs, so I don't think that's the problem. This leaves clone(). I'm in the office and can't try anything ATM, but if someone can tell me if my deductions make sense or not I'll see if I can get it resolved as soon as I'm home. -- Marcel Moolenaar USPA: A-39004 [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message