Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
Scott Rotondo [EMAIL PROTECTED] wrote: Did you run a test with the original filesystem, or what do you like to tell us here? I didn't test anything. I was just pointing out, based on simple examination of the source code, that line 944 is sure to panic if fsp contains random bits, but if it's set to NULL then line 943 will prevent 944 from executing at all. If you didn't test anything, why do you just repeat well known facts? Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
Joerg Schilling wrote: Scott Rotondo [EMAIL PROTECTED] wrote: Joerg Schilling wrote: Does it help to intialize the pointers to NULL? Sure. This code 943 if (fsp) 944 kmem_free(fsp, sizeof (*fsp)); 945 if (svp) 946 kmem_free(svp, sizeof (*svp)); 947 if (jvp) 948 kmem_free(jvp, sizeof (*jvp)); will behave very differently if those pointers are NULL rather than uninitialized. I was interested in a useful reply for the OP case Sorry, I don't know what you're asking. Did you run a test with the original filesystem, or what do you like to tell us here? I didn't test anything. I was just pointing out, based on simple examination of the source code, that line 944 is sure to panic if fsp contains random bits, but if it's set to NULL then line 943 will prevent 944 from executing at all. Scott ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
to make this an end, the fix (btw, as mentioned by ScottR) has been putbacked 2 days ago under 6715049 driven by JuergenKeil and Dan.McDonald. --- frankB ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
Scott Rotondo [EMAIL PROTECTED] wrote: Joerg Schilling wrote: Juergen Keil [EMAIL PROTECTED] wrote: Hmm, in usr/src/uts/common/fs/hsfs/hsfs_vfsops.c function hs_mountfs(), whenever we use one of the first three |goto cleanup|, the local variables |svp| and |jvp| are uninitialized. That should corrupt the kernel heap when we kmem_free() with an unitialized stack lock pointer in the cleanup section ... struct hs_volume *svp; /* Supplemental VD for ISO-9660:1999 */ struct hs_volume *jvp; /* Joliet VD */ I have to admit that I am responsible for the uninitialized Joliet VD pointer. Duplicating code is simple and in this case even passed 4 code reviews. Does it help to intialize the pointers to NULL? Sure. This code 943 if (fsp) 944 kmem_free(fsp, sizeof (*fsp)); 945 if (svp) 946 kmem_free(svp, sizeof (*svp)); 947 if (jvp) 948 kmem_free(jvp, sizeof (*jvp)); will behave very differently if those pointers are NULL rather than uninitialized. I was interested in a useful reply for the OP case Did you run a test with the original filesystem, or what do you like to tell us here? Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
Joerg Schilling wrote: Juergen Keil [EMAIL PROTECTED] wrote: Hmm, in usr/src/uts/common/fs/hsfs/hsfs_vfsops.c function hs_mountfs(), whenever we use one of the first three |goto cleanup|, the local variables |svp| and |jvp| are uninitialized. That should corrupt the kernel heap when we kmem_free() with an unitialized stack lock pointer in the cleanup section ... struct hs_volume *svp; /* Supplemental VD for ISO-9660:1999 */ struct hs_volume *jvp; /* Joliet VD */ I have to admit that I am responsible for the uninitialized Joliet VD pointer. Duplicating code is simple and in this case even passed 4 code reviews. Does it help to intialize the pointers to NULL? Sure. This code 943if (fsp) 944kmem_free(fsp, sizeof (*fsp)); 945if (svp) 946kmem_free(svp, sizeof (*svp)); 947if (jvp) 948kmem_free(jvp, sizeof (*jvp)); will behave very differently if those pointers are NULL rather than uninitialized. Scott ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
On Mon, 16 Jun 2008, Robert William Fuller wrote: [EMAIL PROTECTED] wrote: Hi Kyle, given that what happens looks ever-so-slightly different each time, a hardware glitch could be possible; to exclude this, would you happen to know whether these panics occurred before build 78 as well ? If they occur if you use the b77 hsfs module on your post-b78 system ? Does the machine you're using have a history of hardware issues, or other symptoms that'd point at flaky hardware (such as e.g. ZFS block checksumming errors) ? Did anybody else notice they're all NULL pointer de-references??? It's probably not a hardware problem For example, if it's a memory problem, then you'll often see random pointers, but not 3 NULL pointers in a row They all look, from my first glance, like there's a vfs_t with a NULL vfs_next field around. By the codepath in HSFS, that's impossible if the mount succeeded, but would be normal if it failed. A HW glitch that could cause this would only need to corrupt the return code from mount, register bitflips; You're right that won't be like an obvious HW issue (and a flip of a pointer-with-many-bits-set to a NULL is not how hardware problems usually manifest themselves). I'm not saying it's that. Just saying my mind could come up with a mechanism that'd explain it that way, which is not too-far-off. Wouldn't explain why now, and why only in these codepaths. FrankH. ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
On Mon, 16 Jun 2008, Juergen Keil wrote: Robert William Fuller wrote: [EMAIL PROTECTED] wrote: Hi Kyle, given that what happens looks ever-so-slightly different each time, a hardware glitch could be possible; to exclude this, would you happen to know whether these panics occurred before build 78 as well ? If they occur if you use the b77 hsfs module on your post-b78 system ? Does the machine you're using have a history of hardware issues, or other symptoms that'd point at flaky hardware (such as e.g. ZFS block checksumming errors) ? Did anybody else notice they're all NULL pointer de-references??? It's probably not a hardware problem For example, if it's a memory problem, then you'll often see random pointers, but not 3 NULL pointers in a row Yep, I noticed that, too. IIRC a bug like ``kmem_free(NULL, size)'' somewhere in the kernel can have the effect that a subsequent ``kmem_alloc(size, KM_SLEEP)'' somewhere else in the kernel will return with a NULL pointer! (Assuming you run release bits) If this is so, then it's a bug and should be fixed. Quote kmem_alloc(9F): NOTES kmem_alloc(0, flag) always returns NULL. kmem_free(NULL, 0) is legal. That's manpage - consider it a spec ... For that reason I did suggest to Kyle to try to reproduce this hsfs mount panic with kmem heap checking enabled. Add the following line to /etc/system, reboot, retry to reproduce the hsfs mount panic: set kmem_flags=0xf Good idea. FrankH. ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
[EMAIL PROTECTED] wrote: Hi Kyle, given that what happens looks ever-so-slightly different each time, a hardware glitch could be possible; to exclude this, would you happen to know whether these panics occurred before build 78 as well ? If they occur if you use the b77 hsfs module on your post-b78 system ? Does the machine you're using have a history of hardware issues, or other symptoms that'd point at flaky hardware (such as e.g. ZFS block checksumming errors) ? Did anybody else notice they're all NULL pointer de-references??? It's probably not a hardware problem For example, if it's a memory problem, then you'll often see random pointers, but not 3 NULL pointers in a row snip ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
Robert William Fuller wrote: [EMAIL PROTECTED] wrote: Hi Kyle, given that what happens looks ever-so-slightly different each time, a hardware glitch could be possible; to exclude this, would you happen to know whether these panics occurred before build 78 as well ? If they occur if you use the b77 hsfs module on your post-b78 system ? Does the machine you're using have a history of hardware issues, or other symptoms that'd point at flaky hardware (such as e.g. ZFS block checksumming errors) ? Did anybody else notice they're all NULL pointer de-references??? It's probably not a hardware problem For example, if it's a memory problem, then you'll often see random pointers, but not 3 NULL pointers in a row Yep, I noticed that, too. IIRC a bug like ``kmem_free(NULL, size)'' somewhere in the kernel can have the effect that a subsequent ``kmem_alloc(size, KM_SLEEP)'' somewhere else in the kernel will return with a NULL pointer! (Assuming you run release bits) For that reason I did suggest to Kyle to try to reproduce this hsfs mount panic with kmem heap checking enabled. Add the following line to /etc/system, reboot, retry to reproduce the hsfs mount panic: set kmem_flags=0xf ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
Frank Hofmann wrote: On Mon, 16 Jun 2008, Juergen Keil wrote: IIRC a bug like ``kmem_free(NULL, size)'' somewhere in the kernel can have the effect that a subsequent ``kmem_alloc(size, KM_SLEEP)'' somewhere else in the kernel will return with a NULL pointer! (Assuming you run release bits) If this is so, then it's a bug and should be fixed. Quote kmem_alloc(9F): NOTES kmem_alloc(0, flag) always returns NULL. kmem_free(NULL, 0) is legal. That's manpage - consider it a spec ... Well, it says kmem_free with a ptr == NULL and size == 0 is legal; but what about ptr == NULL and size 0? Quick test with ::call in kmdb, when booted with kmem_flags=0xf: - kmem_alloc::call 8 0 kmem_free::call value_returned_from_the_above_kmem_alloc 8 works, as expected - kmem_free::call 0 8 kmdb fails this call, with caught a trap ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
Juergen Keil [EMAIL PROTECTED] wrote: kmem_alloc(0, flag) always returns NULL. kmem_free(NULL, 0) is legal. That's manpage - consider it a spec ... Well, it says kmem_free with a ptr == NULL and size == 0 is legal; but what about ptr == NULL and size 0? We had the second one 2.5 years ago in the ACPI code ;-) Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
Robert William Fuller [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote: Hi Kyle, given that what happens looks ever-so-slightly different each time, a hardware glitch could be possible; to exclude this, would you happen to know whether these panics occurred before build 78 as well ? If they occur if you use the b77 hsfs module on your post-b78 system ? Does the machine you're using have a history of hardware issues, or other symptoms that'd point at flaky hardware (such as e.g. ZFS block checksumming errors) ? Did anybody else notice they're all NULL pointer de-references??? It's probably not a hardware problem For example, if it's a memory problem, then you'll often see random pointers, but not 3 NULL pointers in a row Did you try to use mdb -k on the crash dump? a stack trace with mdb would print function parameters Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
Frank Hofmann wrote: On Mon, 16 Jun 2008, Juergen Keil wrote: For that reason I did suggest to Kyle to try to reproduce this hsfs mount panic with kmem heap checking enabled. Add the following line to /etc/system, reboot, retry to reproduce the hsfs mount panic: set kmem_flags=0xf Good idea. Ok, I can actually reproduce that panic using last week's opensolaris bits. All I have to do is try and mount -F hsfs a non-existent slice; e.g. using a CD containing OpenSolaris 2008.05, mount -F hsfs /dev/dsk/c1t1d0s4 /mnt (mount -F hsfs /dev/dsk/c1t1d0p0 /mnt should work, though): panic[cpu1]/thread=ff0348445720: BAD TRAP: type=e (#pf Page fault) rp=ff00108bb990 addr=40 occurred in module genunix due to a NULL pointer dereference mount: #pf Page fault Bad kernel fault at addr=0x40 pid=19108, pc=0xfba92633, sp=0xff00108bba80, eflags=0x10207 cr0: 80050033pg,wp,ne,et,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de cr2: 40 cr3: 22f819000 cr8: c rdi: fbca88a0 rsi:1 rdx:8 rcx:0 r8: fbca8a70 r9:0 rax:0 rbx:0 rbp: ff00108bbaa0 r10: ff02d24a6500 r11: ff00108bb680 r12: 1b0103 r13: ff00108bbc08 r14: 1b0103 r15: 10 fsb:0 gsb: ff02d2e75540 ds: 4b es: 4b fs:0 gs: 1c3 trp:e err:0 rip: fba92633 cs: 30 rfl:10207 rsp: ff00108bba80 ss: 38 ff00108bb870 unix:die+c8 () ff00108bb980 unix:trap+13c3 () ff00108bb990 unix:_cmntrap+e9 () ff00108bbaa0 genunix:vfs_devismounted+23 () ff00108bbbc0 hsfs:hs_getmdev+176 () ff00108bbc60 hsfs:hsfs_mount+195 () ff00108bbc90 genunix:fsop_mount+21 () ff00108bbe00 genunix:domount+9ff () ff00108bbe80 genunix:mount+d2 () ff00108bbec0 genunix:syscall_ap+8f () ff00108bbf10 unix:brand_sys_syscall32+197 () syncing file systems... done dumping to /dev/dsk/c9t0d0s1, offset 860356608, content: kernel $C ff00108bbaa0 vfs_devismounted+0x23(1b0103) ff00108bbbc0 hs_getmdev+0x176(ff02dcf8a508, 804729e, 101, ff00108bbc08, ff00108bbc3c, ff0315246708) ff00108bbc60 hsfs_mount+0x195(ff02dcf8a508, ff02ffea2c00, ff00108bbe30, ff0315246708) ff00108bbc90 fsop_mount+0x21(ff02dcf8a508, ff02ffea2c00, ff00108bbe30, ff0315246708) ff00108bbe00 domount+0x9ff(0, ff00108bbe30, ff02ffea2c00, ff0315246708, ff00108bbe28) ff00108bbe80 mount+0xd2(ff0347a60fd8, ff00108bbeb8) ff00108bbec0 syscall_ap+0x8f() ff00108bbf10 sys_syscall32+0x101() The panic with kmem_flags=0xf is more interesting: ::status debugging crash dump vmcore.5 (64-bit) from tiger2 operating system: 5.11 snv_93_jk (i86pc) panic message: kernel heap corruption detected dump content: kernel pages only kernel memory allocator: invalid free: buffer not in cache buffer=ff0010455e30 bufctl=0 cache: kmem_alloc_256 panic[cpu1]/thread=ff03a05ad060: kernel heap corruption detected ff0010455a20 genunix:kmem_error+497 () ff0010455a40 genunix:kmem_free+d6 () ff0010455bb0 hsfs:hs_mountfs+8b9 () ff0010455c60 hsfs:hsfs_mount+1e9 () ff0010455c90 genunix:fsop_mount+21 () ff0010455e00 genunix:domount+9ff () ff0010455e80 genunix:mount+d2 () ff0010455ec0 genunix:syscall_ap+8f () ff0010455f10 unix:brand_sys_syscall32+197 () syncing file systems... done dumping to /dev/dsk/c9t0d0s1, offset 860356608, content: kernel $C ff0010455980 vpanic() ff0010455a20 kmem_error+0x497(1, ff02ce62b020, ff0010455e30) ff0010455a40 kmem_free+0xd6(ff0010455e30, e8) ff0010455bb0 hs_mountfs+0x8b9(ff03a5096dc8, 1b0104, ff03a2b9f140, 6100, 0, ff034ed39978, 0) ff0010455c60 hsfs_mount+0x1e9(ff03a5096dc8, ff02f09e8900, ff0010455e30, ff034ed39978) ff0010455c90 fsop_mount+0x21(ff03a5096dc8, ff02f09e8900, ff0010455e30, ff034ed39978) ff0010455e00 domount+0x9ff(0, ff0010455e30, ff02f09e8900, ff034ed39978, ff0010455e28) ff0010455e80 mount+0xd2(ff02e97cce38, ff0010455eb8) ff0010455ec0 syscall_ap+0x8f() ff0010455f10 sys_syscall32+0x101() hs_mountfs+0x8b9::dis hs_mountfs+0x88f: movq -0x78(%rbp),%r8 hs_mountfs+0x893: xorq %r9,%r9 hs_mountfs+0x896: call +0x34c9f65 fop_close hs_mountfs+0x89b: movq 0x30(%rsp),%rdi hs_mountfs+0x8a0: call +0x34c700b vn_rele hs_mountfs+0x8a5: testq %r13,%r13 hs_mountfs+0x8a8: je +0xf hs_mountfs+0x8b9
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
Hmm, in usr/src/uts/common/fs/hsfs/hsfs_vfsops.c function hs_mountfs(), whenever we use one of the first three |goto cleanup|, the local variables |svp| and |jvp| are uninitialized. That should corrupt the kernel heap when we kmem_free() with an unitialized stack lock pointer in the cleanup section ... struct hs_volume *svp; /* Supplemental VD for ISO-9660:1999 */ struct hs_volume *jvp; /* Joliet VD */ ... /* * Refuse to go any further if this * device is being used for swapping */ if (IS_SWAPVP(common_specvp(devvp))) { error = EBUSY; goto cleanup; } vap.va_mask = AT_SIZE; if ((error = VOP_GETATTR(devvp, vap, ATTR_COMM, cr, NULL)) != 0) { cmn_err(CE_NOTE, Cannot get attributes of the CD-ROM driver); goto cleanup; } /* * Make sure we have a nonzero size partition. * The current version of the SD driver will *not* fail the open * of such a partition so we have to check for it here. */ if (vap.va_size == 0) { error = ENXIO; goto cleanup; } /* * Init a new hsfs structure. */ fsp = kmem_zalloc(sizeof (*fsp), KM_SLEEP); svp = kmem_zalloc(sizeof (*svp), KM_SLEEP); jvp = kmem_zalloc(sizeof (*jvp), KM_SLEEP); ... cleanup: (void) VOP_CLOSE(devvp, FREAD, 1, (offset_t)0, cr, NULL); VN_RELE(devvp); if (fsp) kmem_free(fsp, sizeof (*fsp)); if (svp) kmem_free(svp, sizeof (*svp)); if (jvp) kmem_free(jvp, sizeof (*jvp)); return (error); ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
I filed a bug at http://bugs.opensolaris.org/; Bug-ID is not yet known. Fix is obvious: diff --git a/usr/src/uts/common/fs/hsfs/hsfs_vfsops.c b/usr/src/uts/common/fs/hsfs/hsfs_vfsops.c --- a/usr/src/uts/common/fs/hsfs/hsfs_vfsops.c +++ b/usr/src/uts/common/fs/hsfs/hsfs_vfsops.c @@ -596,8 +596,8 @@ hs_mountfs( size_t pathbufsz = strlen(path) + 1; int redo_rootvp; - struct hs_volume *svp; /* Supplemental VD for ISO-9660:1999 */ - struct hs_volume *jvp; /* Joliet VD */ + struct hs_volume *svp = NULL; /* Supplemental VD for ISO-9660:1999 */ + struct hs_volume *jvp = NULL; /* Joliet VD */ /* * The rules for which extension will be used are: Hmm, in usr/src/uts/common/fs/hsfs/hsfs_vfsops.c function hs_mountfs(), whenever we use one of the first three |goto cleanup|, the local variables |svp| and |jvp| are uninitialized. That should corrupt the kernel heap when we kmem_free() with an unitialized stack lock pointer in the cleanup section ... struct hs_volume *svp; /* Supplemental VD for ISO-9660:1999 */ struct hs_volume *jvp; /* Joliet VD */ ... /* * Refuse to go any further if this * device is being used for swapping */ if (IS_SWAPVP(common_specvp(devvp))) { error = EBUSY; goto cleanup; } vap.va_mask = AT_SIZE; if ((error = VOP_GETATTR(devvp, vap, ATTR_COMM, cr, NULL)) != 0) { cmn_err(CE_NOTE, Cannot get attributes of the CD-ROM driver); goto cleanup; } /* * Make sure we have a nonzero size partition. * The current version of the SD driver will *not* fail the open * of such a partition so we have to check for it here. */ if (vap.va_size == 0) { error = ENXIO; goto cleanup; } /* * Init a new hsfs structure. */ fsp = kmem_zalloc(sizeof (*fsp), KM_SLEEP); svp = kmem_zalloc(sizeof (*svp), KM_SLEEP); jvp = kmem_zalloc(sizeof (*jvp), KM_SLEEP); ... cleanup: (void) VOP_CLOSE(devvp, FREAD, 1, (offset_t)0, cr, NULL); VN_RELE(devvp); if (fsp) kmem_free(fsp, sizeof (*fsp)); if (svp) kmem_free(svp, sizeof (*svp)); if (jvp) kmem_free(jvp, sizeof (*jvp)); return (error); ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org
Re: [osol-discuss] [ufs-discuss] PANIC! mounting cdrom slice on b78
Juergen Keil [EMAIL PROTECTED] wrote: Hmm, in usr/src/uts/common/fs/hsfs/hsfs_vfsops.c function hs_mountfs(), whenever we use one of the first three |goto cleanup|, the local variables |svp| and |jvp| are uninitialized. That should corrupt the kernel heap when we kmem_free() with an unitialized stack lock pointer in the cleanup section ... struct hs_volume *svp; /* Supplemental VD for ISO-9660:1999 */ struct hs_volume *jvp; /* Joliet VD */ I have to admit that I am responsible for the uninitialized Joliet VD pointer. Duplicating code is simple and in this case even passed 4 code reviews. Does it help to intialize the pointers to NULL? Jörg -- EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin [EMAIL PROTECTED](uni) [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily ___ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org