Re: Yet another crash in FreeBSD 5.1

2003-08-14 Thread Eivind Olsen
--On 7. august 2003 10:33 +0930 Greg 'groggy' Lehey [EMAIL PROTECTED] 
wrote:
Q: If you have a crash, please supply a backtrace from the dump analysis
as discussed below under Kernel Panics. Please don't delete the crash
dump; it may be needed for further analysis.
A: Sorry, I don't have a crash dump. I tried creating one when the
computer had crashed by giving the commands panic and then continue
but that didn't help.
Was this of any help?
Not much, unfortunately.  I think that these problems occur as the
result of some hardware failure, but there's nothing in what you've
supplied to indicate that.  If you can't repeat it, I fear that it's
yet another of the ones that got away.
I have now managed to produce a crash dump but I'm not sure if it's any 
good  or not. For some reason I tried to give ddb the panic command twice 
in a row and then it at least produced a crash dump but I'm not sure if it 
contains any information. Here is a backtrace at least. Keep in mind that 
I'm not a C programmer and have no experience with gdb so I must be told 
what to do to produce more information.

[EMAIL PROTECTED]:~/tmp/debug  gdb -k kernel.debug vmcore.0
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-undermydesk-freebsd...
panic: from debugger
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x14
fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc02e8139
stack pointer   = 0x10:0xcac43a00
frame pointer   = 0x10:0xcac43a34
code segment= base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 5 (pagedaemon)
panic: from debugger

Fatal trap 3: breakpoint instruction fault while in kernel mode
instruction pointer = 0x8:0xc048cd34
stack pointer   = 0x10:0xcac43780
frame pointer   = 0x10:0xcac4378c
code segment= base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, def32 1, gran 1
processor eflags= IOPL = 0
current process = 5 (pagedaemon)
panic: from debugger
Uptime: 1d13h38m55s
Dumping 191 MB
ata0: resetting devices ..
done
16 32 48 64 80 96 112 128 144 160 176
---
Reading symbols from 
/usr/obj/usr/src/sys/VIMES/modules/usr/src/sys/modules/vinum/vinum.ko.debug
...done.
Loaded symbols for 
/usr/obj/usr/src/sys/VIMES/modules/usr/src/sys/modules/vinum/vinum.ko.debug
Reading symbols from 
/usr/obj/usr/src/sys/VIMES/modules/usr/src/sys/modules/ipfw/ipfw.ko.debug..
.done.
Loaded symbols for 
/usr/obj/usr/src/sys/VIMES/modules/usr/src/sys/modules/ipfw/ipfw.ko.debug
Reading symbols from /boot/kernel/dragon_saver.ko...done.
Loaded symbols for /boot/kernel/dragon_saver.ko
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:238
238 dumping++;
(kgdb) bt
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:238
#1  0xc031a8f9 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:370
#2  0xc031abeb in panic () at /usr/src/sys/kern/kern_shutdown.c:543
#3  0xc0173e92 in db_panic () at /usr/src/sys/ddb/db_command.c:448
#4  0xc0173e12 in db_command (last_cmdp=0xc0527740, cmd_table=0x0, 
aux_cmd_tablep=0xc051da0c, aux_cmd_tablep_end=0xc051da24) at 
/usr/src/sys/ddb/db_command.c:346
#5  0xc0173f26 in db_command_loop () at /usr/src/sys/ddb/db_command.c:470
#6  0xc0176caa in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_trap.c:72
#7  0xc048ca95 in kdb_trap (type=12, code=0, regs=0xcac439c0) at 
/usr/src/sys/i386/i386/db_interface.c:170
#8  0xc049e772 in trap_fatal (frame=0xcac439c0, eva=0) at 
/usr/src/sys/i386/i386/trap.c:829
#9  0xc049e482 in trap_pfault (frame=0xcac439c0, usermode=0, eva=20) at 
/usr/src/sys/i386/i386/trap.c:748
#10 0xc049e05d in trap (frame=
 {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -1039907200, tf_esi = 
-978486016, tf_ebp = -893109708, tf_isp = -893109780, tf_ebx = 0, tf_edx = 
0, tf_ecx = 0, tf_eax = 23179264, tf_trapno = 12, tf_err = 2, tf_eip = 
-1070694087, tf_cs = 8, tf_eflags = 66054, tf_esp = -978486016, tf_ss = 
-893109736}) at /usr/src/sys/i386/i386/trap.c:433
#11 0xc048e3e8 in calltrap () at {standard input}:96
#12 0xc02e5bc6 in spec_xstrategy (vp=0xc2044680, bp=0xc5ad7d00) at 
/usr/src/sys/fs/specfs/spec_vnops.c:513
#13 0xc02e5c4b in spec_specstrategy (ap=0x0) at 
/usr/src/sys/fs/specfs/spec_vnops.c:550
#14 0xc02e4f18 in spec_vnoperate (ap=0x0) at 
/usr/src/sys/fs/specfs/spec_vnops.c:123
#15 0xc0465c4d in swapdev_strategy (ap=0x0) at vnode_if.h:1114
#16 0xc0452809 in swap_pager_putpages (object=0x0, m=0xcac43bd0, count=1, 
sync=0, rtvals=0xcac43b40) at 

Re: Yet another crash in FreeBSD 5.1

2003-08-03 Thread Greg 'groggy' Lehey
On Sunday,  3 August 2003 at  0:31:45 -0400, John Baldwin wrote:

 On 03-Aug-2003 Greg 'groggy' Lehey wrote:
 On Saturday,  2 August 2003 at 16:47:13 +0200, Eivind Olsen wrote:
 [EMAIL PROTECTED]:~/tmp/debug  gdb -k kernel.debug
 (kgdb) list *(g_dev_strategy+29)

 This is almost certainly the wrong function.  At the very list you
 should look at the arguments passed to it.

 Actually, this line can be very instructive.  Since 'bp' is valid
 it is probably the bp2 from g_clone_bio() that is NULL.  You might
 want to ask phk about that one.

I think you'll find that there's a null dev pointer in there.  As I
say, I've seen this scenario before (without GEOM), and I'd be
surprised if this were phk's problem.

 (kgdb) list *(launch_requests+448)
 No symbol launch_requests in current context.
 (kgdb) list *(vinumstart+2b2)
 No symbol vinumstart in current context.
 (kgdb)

 Read the links I just sent you.  You haven't loaded the Vinum symbols.

 Bah, this isn't hard for you to do either:

... once you've loaded the symbols.  That's why I pointed to the
links.

As I said to Terry, the real issue here is probably what was happening
at the time, not the contents of the dump.

Greg
--
See complete headers for address and phone numbers


pgp0.pgp
Description: PGP signature


Re: Yet another crash in FreeBSD 5.1

2003-08-03 Thread Eivind Olsen
--On 3. august 2003 00:31 -0400 John Baldwin [EMAIL PROTECTED] wrote:
But you knew that.  Also, Eivind, you need to use hex, not decimal
offsets from the functions.  You might want to redo the g_dev_strategy()
line with 0x29 instead of 29.
I already though about that so I tested the commands both with and without 
0x in front of those numbers and I get exactly the same output, so it looks 
like gdb interprets them as hex anyway:

(kgdb) list *(g_dev_strategy+0x29)
0xc02e8139 is in g_dev_strategy (/usr/src/sys/geom/geom_dev.c:415).
410 KASSERT(cp-acr || cp-acw,
411 (Consumer with zero access count in g_dev_strategy));
412
413 bp2 = g_clone_bio(bp);
414 KASSERT(bp2 != NULL, (XXX: ENOMEM in a bad place));
415 bp2-bio_offset = (off_t)bp-bio_blkno  DEV_BSHIFT;
416 KASSERT(bp2-bio_offset = 0,
417 (Negative bio_offset (%jd) on bio %p,
418 (intmax_t)bp2-bio_offset, bp));
419 bp2-bio_length = (off_t)bp-bio_bcount;
--
Regards / Hilsen
Eivind Olsen
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Yet another crash in FreeBSD 5.1

2003-08-03 Thread Eivind Olsen
--On 3. august 2003 09:35 +0930 Greg 'groggy' Lehey [EMAIL PROTECTED] 
wrote:
This is the real issue.  Until you supply the information I ask for in
the man page or at http://www.vinumvm.org/vinum/how-to-debug.html,
only Terry can help you.
Ok, I'll try to supply that information:

Q: What problems are you having?
A: FreeBSD RELENG_5_1 crashes with the following text shown on screen:
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x14
fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc02e8139
stack pointer   = 0x10:0xcfb5284c
frame pointer   = 0x10:0xcfb52880
code segment= base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 10785 (ctl_cyrusdb)
kernel: type 12 trap, code=0
Stopped at  g_dev_strategy+0x29:movl%eax,0x14(%ebx)
Q: Which version of FreeBSD are you running?
A: FreeBSD 5.1, tracking RELENG_5_1, cvsupped in the morning of the 27th of 
July if I'm not mistaken.

Q: Have you made any changes to the system sources, including Vinum?
A: No, it's all taken from the cvsup. I do have a custom kernel since I 
need to use ipfilter but that's really the only change. I've done the 
following changes:

makeoptions   DEBUG=-g
options   DDB
options   IPFILTER
options   IPFILTER_LOG
options   IPFILTER_DEFAULT_BLOCK
Q: Supply the output of the vinum list command. If you can't start Vinum, 
supply the on-disk configuration, as described below. If you can't start 
Vinum, then (and only then) send a copy of the configuration file.
A: Here it is:
vimes# vinum
vinum - list
2 drives:
D WHITE State: up   /dev/ad2s1e A: 0/113046 MB (0%)
D BLACK State: up   /dev/ad0s1d A: 0/113046 MB (0%)

6 volumes:
V var   State: up   Plexes:   2 Size:   6144 MB
V usrlocal  State: up   Plexes:   2 Size:   6144 MB
V tmp   State: up   Plexes:   1 Size:255 MB
V usr   State: up   Plexes:   2 Size:   6144 MB
V home  State: up   Plexes:   2 Size:   8192 MB
V storage   State: up   Plexes:   1 Size:168 GB
10 plexes:
P var.p0  C State: up   Subdisks: 1 Size:   6144 MB
P var.p1  C State: up   Subdisks: 1 Size:   6144 MB
P usrlocal.p0 C State: up   Subdisks: 1 Size:   6144 MB
P usrlocal.p1 C State: up   Subdisks: 1 Size:   6144 MB
P tmp.p0  S State: up   Subdisks: 2 Size:255 MB
P usr.p0  C State: up   Subdisks: 1 Size:   6144 MB
P usr.p1  C State: up   Subdisks: 1 Size:   6144 MB
P home.p0 C State: up   Subdisks: 1 Size:   8192 MB
P home.p1 C State: up   Subdisks: 1 Size:   8192 MB
P storage.p0  S State: up   Subdisks: 2 Size:168 GB
12 subdisks:
S var.p0.s0 State: up   D: BLACKSize:   6144 MB
S var.p1.s0 State: up   D: WHITESize:   6144 MB
S usrlocal.p0.s0State: up   D: BLACKSize:   6144 MB
S usrlocal.p1.s0State: up   D: WHITESize:   6144 MB
S tmp.p0.s0 State: up   D: BLACKSize:127 MB
S tmp.p0.s1 State: up   D: WHITESize:127 MB
S usr.p0.s0 State: up   D: BLACKSize:   6144 MB
S usr.p1.s0 State: up   D: WHITESize:   6144 MB
S home.p0.s0State: up   D: BLACKSize:   8192 MB
S home.p1.s0State: up   D: WHITESize:   8192 MB
S storage.p0.s0 State: up   D: BLACKSize: 84 GB
S storage.p0.s1 State: up   D: WHITESize: 84 GB
vinum -
Q: Supply an extract of the Vinum history file. Unless you have explicitly 
renamed it, it will be /var/log/vinum_history. This file can get very big; 
please limit it to the time around when you have the problems. Each line 
contains a timestamp at the beginning, so you will have no difficulty in 
establishing which data is of relevance.
A: It's so small, I'll give the complete vinum_history log:
vimes# cat vinum_history
26 Jul 2003 18:43:38.056211 *** vinum started ***
26 Jul 2003 18:43:39.456133 list
26 Jul 2003 18:43:41.631830 list
26 Jul 2003 18:43:42.598409 list
26 Jul 2003 18:43:46.885029 quit
26 Jul 2003 18:43:48.450706 *** vinum started ***
26 Jul 2003 18:43:51.745079 help
26 Jul 2003 18:47:54.213327 *** vinum started ***
26 Jul 2003 18:47:54.216030 create install-vinum.conf
drive BLACK device /dev/ad0s1d
drive WHITE device /dev/ad2s1e
volume var setupstate
   plex org concat
   sd length 

Re: Yet another crash in FreeBSD 5.1

2003-08-03 Thread Eivind Olsen
--On 3. august 2003 09:37 +0930 Greg 'groggy' Lehey [EMAIL PROTECTED] 
wrote:
Read the links I just sent you.  You haven't loaded the Vinum symbols.
I'm not sure exactly what to do here. I have absolutely no previous 
experience with kernel debugging, using gdb etc. so I'm lost without 
specific instructions on what to do, what to try etc.

The vinum.ko file is not stripped:

[EMAIL PROTECTED]:~/tmp/debug  file /boot/kernel/vinum.ko
/boot/kernel/vinum.ko: ELF 32-bit LSB shared object, Intel 80386, version 1 
(FreeBSD), not stripped
[EMAIL PROTECTED]:~/tmp/debug 

The web page mentions that I should either use the crash dump (which isn't 
created...) or use remote serial gdb to analyze the problem. I guess I'll 
have to find a nullmodem cable, install FreeBSD on another computer here (I 
couldn't find a Windows version of gdb) and try to figure out exactly how 
to  do remote GDB debugging (I've looked around in the developers handbook, 
specifically 
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerne
ldebug-online-gdb.html)

--
Regards / Hilsen
Eivind Olsen
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Yet another crash in FreeBSD 5.1

2003-08-03 Thread Greg 'groggy' Lehey
On Sunday,  3 August 2003 at 11:17:49 +0200, Eivind Olsen wrote:
 --On 3. august 2003 09:37 +0930 Greg 'groggy' Lehey [EMAIL PROTECTED]
 wrote:
 Read the links I just sent you.  You haven't loaded the Vinum symbols.

 I'm not sure exactly what to do here. I have absolutely no previous
 experience with kernel debugging, using gdb etc. so I'm lost without
 specific instructions on what to do, what to try etc.

Don't worry too much about that at the moment.  Let me analyze the
info you've sent me, and I'll ask some more questions.

Greg
--
See complete headers for address and phone numbers


pgp0.pgp
Description: PGP signature


Yet another crash in FreeBSD 5.1

2003-08-02 Thread Eivind Olsen
I've now had yet another crash under FreeBSD 5.1 (RELENG_5_1, cvsupped 5-6
days ago) and it looks almost the same as the crash I posted about
yesterday (or was it the day before?

Here's some output from DDB:

Krasj 2.7.2003:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x14
fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc02e8139
stack pointer   = 0x10:0xcfb5284c
frame pointer   = 0x10:0xcfb52880
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 10785 (ctl_cyrusdb)
kernel: type 12 trap, code=0
Stopped at  g_dev_strategy+0x29:movl%eax,0x14(%ebx)
db show reg
cs 0x8
ds0x10
es0x10
fs0x18
ss0x10
eax 0xfd235200
ecx  0
edx  0
ebx  0
esp 0xcfb5284c
ebp 0xcfb52880
esi 0xc2156024  _end+0x5ae4
edi 0xc2044900
eip 0xc02e8139  g_dev_strategy+0x29
efl0x10286
dr0  0
dr1  0
dr2  0
dr3  0
dr4 0x0ff0
dr5  0x400
dr6 0x0ff0
dr7  0x400
g_dev_strategy+0x29:movl%eax,0x14(%ebx)
db trace
g_dev_strategy(c2156024,c2153800,0,cfb528d0,c2099eca) at g_dev_strategy+0x29
launch_requests(c299bf00,0,1,,47) at launch_requests+0x448
vinumstart(c5ada2d0,0,c22ab000,cfb5294c,c02e5bc6) at vinumstart+0x2b2
vinumstrategy(c5ada2d0,0,c09719b0,40,0) at vinumstrategy+0xa6
spec_xstrategy(c215c5b4,c5ada2d0,cfb52968,c02e4f18,cfb52994) at
spec_xstrategy+0x306
spec_specstrategy(cfb52994,cfb529b0,c044f7ad,cfb52994,0) at
spec_specstrategy+0x1b
spec_vnoperate(cfb52994,0,c09719b0,f,c5ada2d0) at spec_vnoperate+0x18
ufs_strategy(cfb529d8,cfb52a0c,c0359a87,cfb529d8,1) at ufs_strategy+0xdd
ufs_vnoperate(cfb529d8,1,c0504f45,35e,cfb529f8) at ufs_vnoperate+0x18
bwrite(c5ada2d0,cfb52a5c,c0361aca,c5ada2d0,c5ada400) at bwrite+0x3a7
bawrite(c5ada2d0,c5ada400,10,3c6,20020080) at bawrite+0x1c
cluster_wbuild(c30c7124,4000,50,0,4) at cluster_wbuild+0x6ba
cluster_write(c5b735c0,9c7c64,0,55,c252b880) at cluster_write+0x571
ffs_write(cfb52be0,c21c2528,c22ab000,227,c2025e00) at ffs_wrie+0x5ff
vn_write(c21c2528,cfb52c7c,c252b880,0,c22ab000) at vn_write+0x192
dofilewrite(c22ab000,c21c2528,8,807e000,4000) at dofilewrite+0xe8
write(c22ab000,cfb52d10,c0518514,3fb,3) at write+0x69
syscall(2f,807002f,bfbf002f,0,807e000) at syscall+0x24e
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (4, FreeBSD ELF32, write), eip = 0x282e08b3, esp = 0xbfbfec1c,
ebp = 0xbfbfec38 ---
db

I tried creating a crash dump by issuing the commands panic and then
continue but everything seemingly stopped then and nothing was dumped to
disk.

Can anyone suggest what I do next to find out about this crash?

-- 
Regards
Eivind Olsen
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


vinum bug? (Re: Yet another crash in FreeBSD 5.1)

2003-08-02 Thread Kris Kennaway
On Sat, Aug 02, 2003 at 10:11:24AM +0200, Eivind Olsen wrote:

 db trace
 g_dev_strategy(c2156024,c2153800,0,cfb528d0,c2099eca) at g_dev_strategy+0x29
 launch_requests(c299bf00,0,1,,47) at launch_requests+0x448
 vinumstart(c5ada2d0,0,c22ab000,cfb5294c,c02e5bc6) at vinumstart+0x2b2
 vinumstrategy(c5ada2d0,0,c09719b0,40,0) at vinumstrategy+0xa6

Looks like a problem in vinum.  The other backtrace was the same, right?

Kris


pgp0.pgp
Description: PGP signature


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Terry Lambert
Eivind Olsen wrote:
 Can anyone suggest what I do next to find out about this crash?

 Fatal trap 12: page fault while in kernel mode
 fault virtual address   = 0x14

Dereference of NULL pointer; reference is for element at offset
0x14 in some structure; this is the equivalent of 5 32 bit ints
or pointers into the structure.

 db trace
 g_dev_strategy(c2156024,c2153800,0,cfb528d0,c2099eca) at g_dev_strategy+0x29
 launch_requests(c299bf00,0,1,,47) at launch_requests+0x448
 vinumstart(c5ada2d0,0,c22ab000,cfb5294c,c02e5bc6) at vinumstart+0x2b2

gdb -k kernel.debug
(gdb) list *(g_dev_strategy+29)
[ ... ]
(gdb) list *(launch_requests+448)
[ ... ]
(gdb) list *(vinumstart+2b2)
[ ... ]

Will give you the exact source lines involved, assuming you
built a debug kernel.

You don't actually need a crash dump to debug a stack traceback.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: vinum bug? (Re: Yet another crash in FreeBSD 5.1)

2003-08-02 Thread Bernd Walter
On Sat, Aug 02, 2003 at 02:00:52AM -0700, Kris Kennaway wrote:
 On Sat, Aug 02, 2003 at 10:11:24AM +0200, Eivind Olsen wrote:
 
  db trace
  g_dev_strategy(c2156024,c2153800,0,cfb528d0,c2099eca) at g_dev_strategy+0x29
  launch_requests(c299bf00,0,1,,47) at launch_requests+0x448
  vinumstart(c5ada2d0,0,c22ab000,cfb5294c,c02e5bc6) at vinumstart+0x2b2
  vinumstrategy(c5ada2d0,0,c09719b0,40,0) at vinumstrategy+0xa6
 
 Looks like a problem in vinum.  The other backtrace was the same, right?

Please take a look at an older thread named (IIRC) vinum or geom bug?
Greg asked for special debug output, but it never happened again for me.
A real murphy bug - it happend on three machines once a day and after
Gregs response nothing happened over weeks.

-- 
B.Walter   BWCThttp://www.bwct.de
[EMAIL PROTECTED]  [EMAIL PROTECTED]

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: vinum bug? (Re: Yet another crash in FreeBSD 5.1)

2003-08-02 Thread Eivind Olsen
[Sending to [EMAIL PROTECTED], and Kris copied in Greg so I'll also do that]

--On 2. august 2003 02:00 -0700 Kris Kennaway [EMAIL PROTECTED] wrote:
db trace
g_dev_strategy(c2156024,c2153800,0,cfb528d0,c2099eca) at
g_dev_strategy+0x29 launch_requests(c299bf00,0,1,,47) at
launch_requests+0x448 vinumstart(c5ada2d0,0,c22ab000,cfb5294c,c02e5bc6)
at vinumstart+0x2b2 vinumstrategy(c5ada2d0,0,c09719b0,40,0) at
vinumstrategy+0xa6
Looks like a problem in vinum.  The other backtrace was the same, right?
Basically the same, yes. Some differences (and many similarities) in the 
addresses that were referenced. And also almost the same output from the 
trace command (I see that my first example is missing the dofilewrite() 
between vn_write() and write() but that might just be because I've 
forgotten to write down that line (I wrote all this down by hand).

So, it looks like it's the same crash again (well, it does look like that 
to my untrained eye).

--
Regards / Hilsen
Eivind Olsen
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Eivind Olsen
--On 2. august 2003 02:11 -0700 Terry Lambert [EMAIL PROTECTED] 
wrote:
db trace
g_dev_strategy(c2156024,c2153800,0,cfb528d0,c2099eca) at
g_dev_strategy+0x29 launch_requests(c299bf00,0,1,,47) at
launch_requests+0x448 vinumstart(c5ada2d0,0,c22ab000,cfb5294c,c02e5bc6)
at vinumstart+0x2b2
gdb -k kernel.debug
(gdb) list *(g_dev_strategy+29)
[ ... ]
(gdb) list *(launch_requests+448)
[ ... ]
(gdb) list *(vinumstart+2b2)
[ ... ]
Will give you the exact source lines involved, assuming you
built a debug kernel.
I did. At least I've tried to. :)
(I have a kernel.debug which was compiled at the same time as the real 
kernel I'm using, and it's approx. 30MB in size).

You don't actually need a crash dump to debug a stack traceback.
This is what I found by using those commands you mentioned:

[EMAIL PROTECTED]:~/tmp/debug  gdb -k kernel.debug
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-undermydesk-freebsd...
(kgdb) list *(g_dev_strategy+29)
0xc02e812d is in g_dev_strategy (/usr/src/sys/geom/geom_dev.c:415).
410 KASSERT(cp-acr || cp-acw,
411 (Consumer with zero access count in g_dev_strategy));
412
413 bp2 = g_clone_bio(bp);
414 KASSERT(bp2 != NULL, (XXX: ENOMEM in a bad place));
415 bp2-bio_offset = (off_t)bp-bio_blkno  DEV_BSHIFT;
416 KASSERT(bp2-bio_offset = 0,
417 (Negative bio_offset (%jd) on bio %p,
418 (intmax_t)bp2-bio_offset, bp));
419 bp2-bio_length = (off_t)bp-bio_bcount;
(kgdb) list *(launch_requests+448)
No symbol launch_requests in current context.
(kgdb) list *(vinumstart+2b2)
No symbol vinumstart in current context.
(kgdb)

If anyone wants to take a look at this themselves I've put the compressed 
(gzip) debug-kernel available on 
http://eivind.aminor.no/debug/kernel.debug.gz
NOTE! It's approx. 13MB compressed!

--
Regards / Hilsen
Eivind Olsen
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: vinum bug? (Re: Yet another crash in FreeBSD 5.1)

2003-08-02 Thread Eivind Olsen
--On 2. august 2003 11:16 +0200 Bernd Walter [EMAIL PROTECTED] 
wrote:
Looks like a problem in vinum.  The other backtrace was the same, right?
Please take a look at an older thread named (IIRC) vinum or geom bug?
Greg asked for special debug output, but it never happened again for me.
A real murphy bug - it happend on three machines once a day and after
Gregs response nothing happened over weeks.
Are you thinking of the thread vinum and/or geom panic on alpha from 10th 
of June? I forgot to mention this but my system is i386 uniprocessor 
(Pentium2 at 450MHz).

In case it's relevant, yes I do run vinum:

vinum - l
2 drives:
D WHITE State: up   /dev/ad2s1e A: 0/113046 MB (0%)
D BLACK State: up   /dev/ad0s1d A: 0/113046 MB (0%)
6 volumes:
V var   State: up   Plexes:   2 Size:   6144 MB
V usrlocal  State: up   Plexes:   2 Size:   6144 MB
V tmp   State: up   Plexes:   1 Size:255 MB
V usr   State: up   Plexes:   2 Size:   6144 MB
V home  State: up   Plexes:   2 Size:   8192 MB
V storage   State: up   Plexes:   1 Size:168 GB
10 plexes:
P var.p0  C State: up   Subdisks: 1 Size:   6144 MB
P var.p1  C State: up   Subdisks: 1 Size:   6144 MB
P usrlocal.p0 C State: up   Subdisks: 1 Size:   6144 MB
P usrlocal.p1 C State: up   Subdisks: 1 Size:   6144 MB
P tmp.p0  S State: up   Subdisks: 2 Size:255 MB
P usr.p0  C State: up   Subdisks: 1 Size:   6144 MB
P usr.p1  C State: up   Subdisks: 1 Size:   6144 MB
P home.p0 C State: up   Subdisks: 1 Size:   8192 MB
P home.p1 C State: up   Subdisks: 1 Size:   8192 MB
P storage.p0  S State: up   Subdisks: 2 Size:168 GB
12 subdisks:
S var.p0.s0 State: up   D: BLACKSize:   6144 MB
S var.p1.s0 State: up   D: WHITESize:   6144 MB
S usrlocal.p0.s0State: up   D: BLACKSize:   6144 MB
S usrlocal.p1.s0State: up   D: WHITESize:   6144 MB
S tmp.p0.s0 State: up   D: BLACKSize:127 MB
S tmp.p0.s1 State: up   D: WHITESize:127 MB
S usr.p0.s0 State: up   D: BLACKSize:   6144 MB
S usr.p1.s0 State: up   D: WHITESize:   6144 MB
S home.p0.s0State: up   D: BLACKSize:   8192 MB
S home.p1.s0State: up   D: WHITESize:   8192 MB
S storage.p0.s0 State: up   D: BLACKSize: 84 GB
S storage.p0.s1 State: up   D: WHITESize: 84 GB
vinum -
--
Regards / Hilsen
Eivind Olsen
[EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Greg 'groggy' Lehey
On Saturday,  2 August 2003 at  2:11:24 -0700, Terry Lambert wrote:
 Eivind Olsen wrote:
 Can anyone suggest what I do next to find out about this crash?

 Fatal trap 12: page fault while in kernel mode
 fault virtual address   = 0x14

 Dereference of NULL pointer; reference is for element at offset
 0x14 in some structure; this is the equivalent of 5 32 bit ints
 or pointers into the structure.

 db trace
 g_dev_strategy(c2156024,c2153800,0,cfb528d0,c2099eca) at g_dev_strategy+0x29
 launch_requests(c299bf00,0,1,,47) at launch_requests+0x448
 vinumstart(c5ada2d0,0,c22ab000,cfb5294c,c02e5bc6) at vinumstart+0x2b2

 gdb -k kernel.debug
 (gdb) list *(g_dev_strategy+29)
 [ ... ]
 (gdb) list *(launch_requests+448)
 [ ... ]
 (gdb) list *(vinumstart+2b2)
 [ ... ]

 Will give you the exact source lines involved, assuming you
 built a debug kernel.

 You don't actually need a crash dump to debug a stack traceback.

Great!  So you know the answer?  Please submit a patch.

Seriously, this is nonsense.  Yes, it's a null pointer dereference.
What?  Why?  How do you fix it?  Finding the first step doesn't solve
the problem.

Greg
--
See complete headers for address and phone numbers


pgp0.pgp
Description: PGP signature


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Greg 'groggy' Lehey
On Saturday,  2 August 2003 at 17:00:59 +0200, Eivind Olsen wrote:
 --On 2. august 2003 11:16 +0200 Bernd Walter [EMAIL PROTECTED]
 wrote:
 Looks like a problem in vinum.  The other backtrace was the same, right?
 Please take a look at an older thread named (IIRC) vinum or geom bug?
 Greg asked for special debug output, but it never happened again for me.
 A real murphy bug - it happend on three machines once a day and after
 Gregs response nothing happened over weeks.

 Are you thinking of the thread vinum and/or geom panic on alpha from 10th
 of June? I forgot to mention this but my system is i386 uniprocessor
 (Pentium2 at 450MHz).

 In case it's relevant, yes I do run vinum:

Yes, of course you do.  That's what the stack trace says, and that's
why people mentioned Vinum in the first place:

On Saturday,  2 August 2003 at 10:11:24 +0200, Eivind Olsen wrote:
 Here's some output from DDB:

 db trace
 g_dev_strategy(c2156024,c2153800,0,cfb528d0,c2099eca) at g_dev_strategy+0x29
 launch_requests(c299bf00,0,1,,47) at launch_requests+0x448
 vinumstart(c5ada2d0,0,c22ab000,cfb5294c,c02e5bc6) at vinumstart+0x2b2
 vinumstrategy(c5ada2d0,0,c09719b0,40,0) at vinumstrategy+0xa6

On Saturday,  2 August 2003 at 11:16:21 +0200, Bernd Walter wrote:
 On Sat, Aug 02, 2003 at 02:00:52AM -0700, Kris Kennaway wrote:
 On Sat, Aug 02, 2003 at 10:11:24AM +0200, Eivind Olsen wrote:

 db trace
 g_dev_strategy(c2156024,c2153800,0,cfb528d0,c2099eca) at g_dev_strategy+0x29
 launch_requests(c299bf00,0,1,,47) at launch_requests+0x448
 vinumstart(c5ada2d0,0,c22ab000,cfb5294c,c02e5bc6) at vinumstart+0x2b2
 vinumstrategy(c5ada2d0,0,c09719b0,40,0) at vinumstrategy+0xa6

 Looks like a problem in vinum.  The other backtrace was the same, right?

 Please take a look at an older thread named (IIRC) vinum or geom bug?
 Greg asked for special debug output, but it never happened again for me.
 A real murphy bug - it happend on three machines once a day and after
 Gregs response nothing happened over weeks.

This is the real issue.  Until you supply the information I ask for in
the man page or at http://www.vinumvm.org/vinum/how-to-debug.html,
only Terry can help you.

Greg
--
See complete headers for address and phone numbers


pgp0.pgp
Description: PGP signature


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Greg 'groggy' Lehey
On Saturday,  2 August 2003 at 16:47:13 +0200, Eivind Olsen wrote:
 --On 2. august 2003 02:11 -0700 Terry Lambert [EMAIL PROTECTED]
 wrote:
 db trace
 g_dev_strategy(c2156024,c2153800,0,cfb528d0,c2099eca) at
 g_dev_strategy+0x29 launch_requests(c299bf00,0,1,,47) at
 launch_requests+0x448 vinumstart(c5ada2d0,0,c22ab000,cfb5294c,c02e5bc6)
 at vinumstart+0x2b2
 gdb -k kernel.debug
 (gdb) list *(g_dev_strategy+29)
 [ ... ]
 (gdb) list *(launch_requests+448)
 [ ... ]
 (gdb) list *(vinumstart+2b2)
 [ ... ]
 Will give you the exact source lines involved, assuming you
 built a debug kernel.

 I did. At least I've tried to. :)
 (I have a kernel.debug which was compiled at the same time as the real
 kernel I'm using, and it's approx. 30MB in size).

 You don't actually need a crash dump to debug a stack traceback.

 This is what I found by using those commands you mentioned:

 [EMAIL PROTECTED]:~/tmp/debug  gdb -k kernel.debug
 GNU gdb 5.2.1 (FreeBSD)
 Copyright 2002 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain
 conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as i386-undermydesk-freebsd...
 (kgdb) list *(g_dev_strategy+29)

This is almost certainly the wrong function.  At the very list you
should look at the arguments passed to it.

 (kgdb) list *(launch_requests+448)
 No symbol launch_requests in current context.
 (kgdb) list *(vinumstart+2b2)
 No symbol vinumstart in current context.
 (kgdb)

Read the links I just sent you.  You haven't loaded the Vinum symbols.

 If anyone wants to take a look at this themselves I've put the compressed
 (gzip) debug-kernel available on
 http://eivind.aminor.no/debug/kernel.debug.gz
 NOTE! It's approx. 13MB compressed!

The kernel's not much use by itself.  

Greg
--
See complete headers for address and phone numbers


pgp0.pgp
Description: PGP signature


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Terry Lambert
Eivind Olsen wrote:
 (kgdb) list *(launch_requests+448)
 No symbol launch_requests in current context.
 (kgdb) list *(vinumstart+2b2)
 No symbol vinumstart in current context.
 (kgdb)
 
 If anyone wants to take a look at this themselves I've put the compressed
 (gzip) debug-kernel available on
 http://eivind.aminor.no/debug/kernel.debug.gz
 NOTE! It's approx. 13MB compressed!

If this is repeatable for you, it's recommended that you compile
Vinum statically into your kernel, so that you can look at the
other symbols in the traceback and obtain source lines for them,
as well.  It may be that this will be debuggable without that
information, but in my experience with similar problems, without
a list of arguments to the functions from a live remote debug
session and/or a crashdump, the problem is going to have to be
found by an engineer eyeballing the call graph and seeing how
that particular line could end up with a NULL in bp2 or bp.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Terry Lambert
Greg 'groggy' Lehey wrote:
  You don't actually need a crash dump to debug a stack traceback.
 
 Great!  So you know the answer?  Please submit a patch.
 
 Seriously, this is nonsense.  Yes, it's a null pointer dereference.
 What?

That is precisely what doing what I suggested discovers, Greg.

If you haven't seen his response posting:

(kgdb) list *(g_dev_strategy+29)
0xc02e812d is in g_dev_strategy (/usr/src/sys/geom/geom_dev.c:415).
410 KASSERT(cp-acr || cp-acw,
411 (Consumer with zero access count in g_dev_strategy));
412
413 bp2 = g_clone_bio(bp);
414 KASSERT(bp2 != NULL, (XXX: ENOMEM in a bad place));
415 bp2-bio_offset = (off_t)bp-bio_blkno  DEV_BSHIFT;
416 KASSERT(bp2-bio_offset = 0,
417 (Negative bio_offset (%jd) on bio %p,
418 (intmax_t)bp2-bio_offset, bp));
419 bp2-bio_length = (off_t)bp-bio_bcount;


Clearly, bp2 or bp is NULL at the time of the dereference.


 Why?

Programmer error.  Either bp2 or bp is a NULL pointer.


 How do you fix it?

It depends on the root cause.  If the root cause is that the bp is
NULL, then I'd hope that it would have been caught higher up; if it
wasn't, then I'd hope that g_clone_bio(bp) would have returned NULL.

Is the KASSERT() active at the time of the problem?  I don't know;
if it isn't, it probably should be converted to an if()...panic().

If it is, then I'd have to expect that the validity fell out from
under it as a result of an interrupt, preemption, reentrancy (if
the locking didn't prevent it) or SMP races (if the locking didn't
prevent it).

I really can't answer it for the same reason that I couldn't locate
the line in the source code that was failing for him from his
posting of hex offsets into functions compiled from unknown source
code: I don't have his object set for the problem in question, nor
his debug kernel.


 Finding the first step doesn't solve the problem.

No.  Finding the first step is *necessary* to solving the problem,
but you are entirely correct in pointing out that it's not in
itself *sufficient*.

But it's one step farther along than he was.  I didn't see anyone
else helping him take that first step, so I did.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Terry Lambert
Greg 'groggy' Lehey wrote:
  Please take a look at an older thread named (IIRC) vinum or geom bug?
  Greg asked for special debug output, but it never happened again for me.
  A real murphy bug - it happend on three machines once a day and after
  Gregs response nothing happened over weeks.
 
 This is the real issue.  Until you supply the information I ask for in
 the man page or at http://www.vinumvm.org/vinum/how-to-debug.html,
 only Terry can help you.

This is BS, Greg.

I deal with about a traceback every other day, and sometimes as
high as 5 in a single day, if it's a busy day for it.

The information I gave him gets him to lines of source code,
instead of just function names with strange hexadecimal numbers
that resolve to instruction offsets that may be specific to his
compile flags, date of checkout of the sources from CVS, etc..

I don't know about you, but I can't easily write assembly
instructions to tape, run them the tape through my teeth, and
read the bits using my dental fillings.


If it's a NULL pointer dereference, the place to find it is by
turning on what debugging there is, and, if that fails, which
it probably will, by eyeballing the lines of source code in
question and understanding the code around it well enough that
you can tell *how* a pointer there could be NULL.  My instructions
*get* him those lines of source.

If you'll notice from his followup posting of the source in
question, Vinum is loaded as a module, and it's the FreeBSD
code that Vinum calls, not Vinum, that's causing the crash.

There's no reason to be paranoid about your baby with me; unlike
some people, personally I like Vinum, so relax and realize that
I'm not trying to blame your code by trying to help him squeeze
more information out of the data he *is* able to gather.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Greg 'groggy' Lehey
On Saturday,  2 August 2003 at 17:54:03 -0700, Terry Lambert wrote:
 Eivind Olsen wrote:
 (kgdb) list *(launch_requests+448)
 No symbol launch_requests in current context.
 (kgdb) list *(vinumstart+2b2)
 No symbol vinumstart in current context.
 (kgdb)

 If anyone wants to take a look at this themselves I've put the compressed
 (gzip) debug-kernel available on
 http://eivind.aminor.no/debug/kernel.debug.gz
 NOTE! It's approx. 13MB compressed!

 If this is repeatable for you, it's recommended that you compile
 Vinum statically into your kernel, so that you can look at the
 other symbols in the traceback and obtain source lines for them,
 as well.

No.  It is explicitly discouraged.

 It may be that this will be debuggable without that information, but
 in my experience with similar problems, without a list of arguments
 to the functions from a live remote debug session and/or a
 crashdump, the problem is going to have to be found by an engineer
 eyeballing the call graph and seeing how that particular line could
 end up with a NULL in bp2 or bp.

Terry hasn't read the debug instructions.  You can load symbols from
klds.  See the links I pointed to.

Greg
--
See complete headers for address and phone numbers


pgp0.pgp
Description: PGP signature


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Terry Lambert
Terry Lambert wrote:
 There's no reason to be paranoid about your baby with me; unlike
 some people, personally I like Vinum, so relax and realize that
 I'm not trying to blame your code by trying to help him squeeze
 more information out of the data he *is* able to gather.

To follow this up:

Sometimes you have to work with the information you have available,
rather than the information you wish you had available.  in an
earlier post, he said that he was having problems collecting
system crash dumps.  So what he has is pretty much what we get to
work with.

If you think that's fun, try translating a traceback that's a set
of hexadecimal instruction addresses for a released product (at
least you get the symbol'ed kernel to look at in gdb) from a
blurry digital photograph of a computer monitor...

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Greg 'groggy' Lehey
On Saturday,  2 August 2003 at 17:56:49 -0700, Terry Lambert wrote:
 Greg 'groggy' Lehey wrote:
 You don't actually need a crash dump to debug a stack traceback.

 Great!  So you know the answer?  Please submit a patch.

 Seriously, this is nonsense.  Yes, it's a null pointer dereference.
 What?

 That is precisely what doing what I suggested discovers, Greg.

Yes, that's what you said already.

 If you haven't seen his response posting:

I saw it and explained why it didn't help.

 Clearly, bp2 or bp is NULL at the time of the dereference.

 Why?

 Programmer error.  Either bp2 or bp is a NULL pointer.

You're repeating yourself.

 How do you fix it?

 It depends on the root cause.

*bingo*  Here you are having found the first (obvious) step and acting
as if the problem has been solved.

 I really can't answer it

OK, why don't you either:

1.  Find a way to answer it, or
2.  Keep quiet.

You're just confusing the issue here.

 Finding the first step doesn't solve the problem.

 No.  Finding the first step is *necessary* to solving the problem,
 but you are entirely correct in pointing out that it's not in
 itself *sufficient*.

 But it's one step farther along than he was.  I didn't see anyone
 else helping him take that first step, so I did.

Sorry, I don't hack in the middle of the night.  If you had read the
documentation at your disposal, you'd have discovered a lot of help,
and also that this is a known problem that crops up sporadically, and
that so far we can't find out why.

Greg
--
See complete headers for address and phone numbers


pgp0.pgp
Description: PGP signature


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Greg 'groggy' Lehey
On Saturday,  2 August 2003 at 18:06:36 -0700, Terry Lambert wrote:
 Greg 'groggy' Lehey wrote:
 Please take a look at an older thread named (IIRC) vinum or geom bug?
 Greg asked for special debug output, but it never happened again for me.
 A real murphy bug - it happend on three machines once a day and after
 Gregs response nothing happened over weeks.

 This is the real issue.  Until you supply the information I ask for in
 the man page or at http://www.vinumvm.org/vinum/how-to-debug.html,
 only Terry can help you.

 This is BS, Greg.

 I deal with about a traceback every other day, and sometimes as
 high as 5 in a single day, if it's a busy day for it.

Stack traces are pretty common stuff.  Your point?

 The information I gave him gets him to lines of source code, instead
 of just function names with strange hexadecimal numbers that resolve
 to instruction offsets that may be specific to his compile flags,
 date of checkout of the sources from CVS, etc..

The first step of the link above does the same thing.  But it's only
the first step.

 I don't know about you, but I can't easily write assembly
 instructions to tape, run them the tape through my teeth, and read
 the bits using my dental fillings.

Terry, why don't you come to my debug tutorial at the BSDCon next
month?  I'll show you how to do this properly.  I'm not asking for
people to interpret hex.  I'm asking for people, you included, to find
out what debugging help is available.

 If it's a NULL pointer dereference, the place to find it is by
 turning on what debugging there is, and, if that fails, which it
 probably will,

No, that will find the null pointer dereference pretty quickly.

 by eyeballing the lines of source code in question and understanding
 the code around it well enough that you can tell *how* a pointer
 there could be NULL.  My instructions *get* him those lines of
 source.

You obviously still haven't read the reference.  Do that first, and
come back when you have either understood things or are having
difficulty understanding.  But don't shoot off your mouth without
knowing what's going on.

 If you'll notice from his followup posting of the source in
 question, Vinum is loaded as a module, and it's the FreeBSD code
 that Vinum calls, not Vinum, that's causing the crash.

The bug is almost certainly in Vinum.

 There's no reason to be paranoid about your baby with me; unlike
 some people, personally I like Vinum, so relax and realize that I'm
 not trying to blame your code by trying to help him squeeze more
 information out of the data he *is* able to gather.

This has nothing to do with being paranoid about babies.  This has to
do with people shooting off their mouths in a public forum without
bothering to check details first.

Greg
--
See complete headers for address and phone numbers


pgp0.pgp
Description: PGP signature


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Terry Lambert
Greg 'groggy' Lehey wrote:
  If this is repeatable for you, it's recommended that you compile
  Vinum statically into your kernel, so that you can look at the
  other symbols in the traceback and obtain source lines for them,
  as well.
 
 No.  It is explicitly discouraged.

It saves the dicking around with the .ko files.

  It may be that this will be debuggable without that information, but
  in my experience with similar problems, without a list of arguments
  to the functions from a live remote debug session and/or a
  crashdump, the problem is going to have to be found by an engineer
  eyeballing the call graph and seeing how that particular line could
  end up with a NULL in bp2 or bp.
 
 Terry hasn't read the debug instructions.  You can load symbols from
 klds.  See the links I pointed to.

I read them.  You didn't provide examples for a non-crashdump
debug session.  Rather than give him incorrect information, I
gave him a workaround that would guarantee that what information
he did obtain would, in fact, be correct.

If you would care to take over, without insisting that he be able
to produce a crash dump (which he has already stated that he has
had trouble doing), be my guest.

The best information I can get him, without finding some way to
fix his obtaining a crashdump issue (I myself have been unable to
obtain one off and on during long stretches, due to the changes
in that area by PHK), is to translate his ddb traceback into
source code line numbers.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Terry Lambert
Greg 'groggy' Lehey wrote:
  The information I gave him gets him to lines of source code, instead
  of just function names with strange hexadecimal numbers that resolve
  to instruction offsets that may be specific to his compile flags,
  date of checkout of the sources from CVS, etc..
 
 The first step of the link above does the same thing.  But it's only
 the first step.

No, it does not.  The first step of your debugging link does
not deal with anything but having a vmcore lying around *which
he does not have*.


 Terry, why don't you come to my debug tutorial at the BSDCon next
 month?  I'll show you how to do this properly.  I'm not asking for
 people to interpret hex.  I'm asking for people, you included, to find
 out what debugging help is available.

I might do this; it depends on whether things die down at work
by then, or not.  Currently, though, I'm really busy fixing bugs
exatly like this one.  In the past 3 weeks, I've fixed 61 of them,
which average out to 4 a day.

  If it's a NULL pointer dereference, the place to find it is by
  turning on what debugging there is, and, if that fails, which it
  probably will,
 
 No, that will find the null pointer dereference pretty quickly.

You'd hope the entirety of the kernel were that well instrumented...


  by eyeballing the lines of source code in question and understanding
  the code around it well enough that you can tell *how* a pointer
  there could be NULL.  My instructions *get* him those lines of
  source.
 
 You obviously still haven't read the reference.  Do that first, and
 come back when you have either understood things or are having
 difficulty understanding.  But don't shoot off your mouth without
 knowing what's going on.

I read the reference.

How does it apply in cases like this one, where you don't have a
vmcore file?


  If you'll notice from his followup posting of the source in
  question, Vinum is loaded as a module, and it's the FreeBSD code
  that Vinum calls, not Vinum, that's causing the crash.
 
 The bug is almost certainly in Vinum.

Most likely; I think that it's passing a bad argument to the
inferior function.  The way I would approach finding this, with
only:

1)  The line of code where the failure occurred
2)  The stack traceback, with no arguments
3)  The sources for the code in the stack traceback

would be to eyeball the code in #1, and try to figure out how
I gould get to that point with that pointer having a NULL value,
given my apriori knowledge of the forward call graph.

I would examine every intermediate conditional and function call
that could effect the value of the pointer and cause it to be
NULL at the point in question.


 This has nothing to do with being paranoid about babies.  This has to
 do with people shooting off their mouths in a public forum without
 bothering to check details first.

It's really hard to talk to you about Vinum.

One of the details I wish you would check is whether or not he
has a vmcore file, or the ability to get one...

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Greg 'groggy' Lehey
On Saturday,  2 August 2003 at 18:36:24 -0700, Terry Lambert wrote:
 Greg 'groggy' Lehey wrote:
 The information I gave him gets him to lines of source code, instead
 of just function names with strange hexadecimal numbers that resolve
 to instruction offsets that may be specific to his compile flags,
 date of checkout of the sources from CVS, etc..

 The first step of the link above does the same thing.  But it's only
 the first step.
 by eyeballing the lines of source code in question and understanding
 the code around it well enough that you can tell *how* a pointer
 there could be NULL.  My instructions *get* him those lines of
 source.

 You obviously still haven't read the reference.  Do that first, and
 come back when you have either understood things or are having
 difficulty understanding.  But don't shoot off your mouth without
 knowing what's going on.

 I read the reference.

 How does it apply in cases like this one, where you don't have a
 vmcore file?

You don't seem to have read the reference very well.  It also asks for
other supporting information.  That's the most important thing at the
moment.  I know that because I've been there before, and I've looked
at a number of these dumps: it's almost certainly related to something
he's doing which is not normal.  You don't know that, and that's
excusable, but it's not excusable that after four or five requests,
you still haven't RTFM'd.

 The way I would approach finding this, with only:

 1)The line of code where the failure occurred
 2)The stack traceback, with no arguments
 3)The sources for the code in the stack traceback

 would be to eyeball the code in #1, and try to figure out how
 I gould get to that point with that pointer having a NULL value,
 given my apriori knowledge of the forward call graph.

You have that?

 I would examine every intermediate conditional and function call
 that could effect the value of the pointer and cause it to be NULL
 at the point in question.

Go for it.  Once I get the log files, I'll start there.

 One of the details I wish you would check is whether or not he has a
 vmcore file, or the ability to get one...

We'll address that issue when it becomes necessary.

Greg
--
See complete headers for address and phone numbers


pgp0.pgp
Description: PGP signature


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread Evan Dower
I fear we may have gotten a bit off-topic.
E

From: Greg 'groggy' Lehey [EMAIL PROTECTED]
To: Terry Lambert [EMAIL PROTECTED]
CC: [EMAIL PROTECTED], [EMAIL PROTECTED]
Subject: Re: Yet another crash in FreeBSD 5.1
Date: Sun, 3 Aug 2003 11:21:41 +0930
On Saturday,  2 August 2003 at 18:36:24 -0700, Terry Lambert wrote:
 Greg 'groggy' Lehey wrote:
 The information I gave him gets him to lines of source code, instead
 of just function names with strange hexadecimal numbers that resolve
 to instruction offsets that may be specific to his compile flags,
 date of checkout of the sources from CVS, etc..

 The first step of the link above does the same thing.  But it's only
 the first step.
 by eyeballing the lines of source code in question and understanding
 the code around it well enough that you can tell *how* a pointer
 there could be NULL.  My instructions *get* him those lines of
 source.

 You obviously still haven't read the reference.  Do that first, and
 come back when you have either understood things or are having
 difficulty understanding.  But don't shoot off your mouth without
 knowing what's going on.

 I read the reference.

 How does it apply in cases like this one, where you don't have a
 vmcore file?
You don't seem to have read the reference very well.  It also asks for
other supporting information.  That's the most important thing at the
moment.  I know that because I've been there before, and I've looked
at a number of these dumps: it's almost certainly related to something
he's doing which is not normal.  You don't know that, and that's
excusable, but it's not excusable that after four or five requests,
you still haven't RTFM'd.
 The way I would approach finding this, with only:

 1) The line of code where the failure occurred
 2) The stack traceback, with no arguments
 3) The sources for the code in the stack traceback

 would be to eyeball the code in #1, and try to figure out how
 I gould get to that point with that pointer having a NULL value,
 given my apriori knowledge of the forward call graph.
You have that?

 I would examine every intermediate conditional and function call
 that could effect the value of the pointer and cause it to be NULL
 at the point in question.
Go for it.  Once I get the log files, I'll start there.

 One of the details I wish you would check is whether or not he has a
 vmcore file, or the ability to get one...
We'll address that issue when it becomes necessary.

Greg
--
See complete headers for address and phone numbers
 attach3 
_
Tired of spam? Get advanced junk mail protection with MSN 8. 
http://join.msn.com/?page=features/junkmail

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Yet another crash in FreeBSD 5.1

2003-08-02 Thread John Baldwin

On 03-Aug-2003 Greg 'groggy' Lehey wrote:
 On Saturday,  2 August 2003 at 16:47:13 +0200, Eivind Olsen wrote:
 --On 2. august 2003 02:11 -0700 Terry Lambert [EMAIL PROTECTED]
 wrote:
 db trace
 g_dev_strategy(c2156024,c2153800,0,cfb528d0,c2099eca) at
 g_dev_strategy+0x29 launch_requests(c299bf00,0,1,,47) at
 launch_requests+0x448 vinumstart(c5ada2d0,0,c22ab000,cfb5294c,c02e5bc6)
 at vinumstart+0x2b2
 gdb -k kernel.debug
 (gdb) list *(g_dev_strategy+29)
 [ ... ]
 (gdb) list *(launch_requests+448)
 [ ... ]
 (gdb) list *(vinumstart+2b2)
 [ ... ]
 Will give you the exact source lines involved, assuming you
 built a debug kernel.

 I did. At least I've tried to. :)
 (I have a kernel.debug which was compiled at the same time as the real
 kernel I'm using, and it's approx. 30MB in size).

 You don't actually need a crash dump to debug a stack traceback.

 This is what I found by using those commands you mentioned:

 [EMAIL PROTECTED]:~/tmp/debug  gdb -k kernel.debug
 GNU gdb 5.2.1 (FreeBSD)
 Copyright 2002 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain
 conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as i386-undermydesk-freebsd...
 (kgdb) list *(g_dev_strategy+29)
 
 This is almost certainly the wrong function.  At the very list you
 should look at the arguments passed to it.

Actually, this line can be very instructive.  Since 'bp' is valid
it is probably the bp2 from g_clone_bio() that is NULL.  You might
want to ask phk about that one.

 (kgdb) list *(launch_requests+448)
 No symbol launch_requests in current context.
 (kgdb) list *(vinumstart+2b2)
 No symbol vinumstart in current context.
 (kgdb)
 
 Read the links I just sent you.  You haven't loaded the Vinum symbols.

Bah, this isn't hard for you to do either:

(gdb) l *(launch_requests+0x448)
0xad58 is in launch_requests (/usr/src/sys/dev/vinum/vinumrequest.c:448).
443 microtime(rqe-launchtime);/* time we 
launched this
request */
444 logrq(loginfo_rqe, (union rqinfou) rqe, rq-bp);
445 }
446 #endif
447 /* fire off the request */
448 DEV_STRATEGY(rqe-b);
449 }
450 }
451 }
452 return 0;

But you knew that.  Also, Eivind, you need to use hex, not decimal
offsets from the functions.  You might want to redo the g_dev_strategy()
line with 0x29 instead of 29.

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]