Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]
On Fri, Mar 14, 2003 at 01:16:02PM +1030, Greg 'groggy' Lehey [EMAIL PROTECTED] wrote: So I did. Loaned two SCSI disks and 50-pin cable. Things haven't improved a bit, I'm very sorry to say it. Sorry for the slow reply to this. I thought it would make sense to try things out here, and so I kept trying to find time, but I have to admit I just don't have it yet for a while. I haven't forgotten, and I hope that in a few weeks time I can spend some time chasing down a whole lot of Vinum issues. This is definitely the worst I have seen, and I'm really puzzled why it always happens to you. # simulate disk crash by forcing one arbitrary subdisk down # seems that vinum doesn't return values for command completion status # checking? echo Stopping subdisk.. degraded mode vinum stop -f r5.p0.s3 # assume it was successful I wonder if there's something relating to stop -f that doesn't happen during a normal failure. But this was exactly the way I tested it in the first place. Thank you Greg, I really appreciate your ongoing effort for making vinum stable, trusted volume manager. I have to add some facts to the mix. Raidframe on the same hardware does not have any problems. The later tests I conducted was done under -stable, because I couldn't get raidframe to work under -current, system did panic everytime at the end of initialisation of parity (raidctl -iv raid?). So I used the raidframe patch for -stable at http://people.freebsd.org/~scottl/rf/2001-08-28-RAIDframe-stable.diff.gz Had to do some patching by hand, but otherwise works well. Will it suffice to switch off power for one disk to simulate more real-world disk failure? Are there any hidden pitfalls for failing and restoring operation of non-hotswap disks? -- Vallo Kallaste To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]
On Friday, 14 March 2003 at 10:05:28 +0200, Vallo Kallaste wrote: On Fri, Mar 14, 2003 at 01:16:02PM +1030, Greg 'groggy' Lehey [EMAIL PROTECTED] wrote: So I did. Loaned two SCSI disks and 50-pin cable. Things haven't improved a bit, I'm very sorry to say it. Sorry for the slow reply to this. I thought it would make sense to try things out here, and so I kept trying to find time, but I have to admit I just don't have it yet for a while. I haven't forgotten, and I hope that in a few weeks time I can spend some time chasing down a whole lot of Vinum issues. This is definitely the worst I have seen, and I'm really puzzled why it always happens to you. # simulate disk crash by forcing one arbitrary subdisk down # seems that vinum doesn't return values for command completion status # checking? echo Stopping subdisk.. degraded mode vinum stop -f r5.p0.s3 # assume it was successful I wonder if there's something relating to stop -f that doesn't happen during a normal failure. But this was exactly the way I tested it in the first place. Thank you Greg, I really appreciate your ongoing effort for making vinum stable, trusted volume manager. I have to add some facts to the mix. Raidframe on the same hardware does not have any problems. The later tests I conducted was done under -stable, because I couldn't get raidframe to work under -current, system did panic everytime at the end of initialisation of parity (raidctl -iv raid?). So I used the raidframe patch for -stable at http://people.freebsd.org/~scottl/rf/2001-08-28-RAIDframe-stable.diff.gz Had to do some patching by hand, but otherwise works well. I don't think that problems with RAIDFrame are related to these problems with Vinum. I seem to remember a commit to the head branch recently (in the last 12 months) relating to the problem you've seen. I forget exactly where it went (it wasn't from me), and in cursory searching I couldn't find it. It's possible that it hasn't been MFC'd, which would explain your problem. If you have a 5.0 machine, it would be interesting to see if you can reproduce it there. Will it suffice to switch off power for one disk to simulate more real-world disk failure? Are there any hidden pitfalls for failing and restoring operation of non-hotswap disks? I don't think so. It was more thinking aloud than anything else. As I said above, this is the way I tested things in the first place. Greg -- See complete headers for address and phone numbers pgp0.pgp Description: PGP signature
Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]
On Saturday, 1 March 2003 at 20:43:10 +0200, Vallo Kallaste wrote: On Thu, Feb 27, 2003 at 11:53:02AM +0200, Vallo Kallaste vallo wrote: The vinum R5 and system as a whole were stable without softupdates. Only one problem remained after disabling softupdates, while being online and user I/O going on, rebuilding of failed disk corrupt the R5 volume completely. Yes, we've fixed a bug in that area. It had nothing to do with soft updates, though. Oh, that's very good news, thank you! Yes, it had nothing to do with soft updates at all and that's why I had the remained after in the sentence. Don't know is it fixed or not as I don't have necessary hardware at the moment. The only way around was to quiesce the volume before rebuilding, umount it, and wait until rebuild finished. I'll suggest extensive testing cycle for everyone who's going to work with vinum R5. Concat, striping and mirroring has been a breeze but not so with R5. IIRC the rebuild bug bit any striped configuration. Ok, I definitely had problems only with R5, but you certainly know much better what it was exactly. I'll need to lend 50-pin SCSI cable and test vinum again. Will it matter on what version of FreeBSD I'll try on? My home system runs -current of Feb 5, but if you suggest -stable for consistent results, I'll do it. So I did. Loaned two SCSI disks and 50-pin cable. Things haven't improved a bit, I'm very sorry to say it. Sorry for the slow reply to this. I thought it would make sense to try things out here, and so I kept trying to find time, but I have to admit I just don't have it yet for a while. I haven't forgotten, and I hope that in a few weeks time I can spend some time chasing down a whole lot of Vinum issues. This is definitely the worst I have seen, and I'm really puzzled why it always happens to you. # simulate disk crash by forcing one arbitrary subdisk down # seems that vinum doesn't return values for command completion status # checking? echo Stopping subdisk.. degraded mode vinum stop -f r5.p0.s3# assume it was successful I wonder if there's something relating to stop -f that doesn't happen during a normal failure. But this was exactly the way I tested it in the first place. Greg -- See complete headers for address and phone numbers pgp0.pgp Description: PGP signature
Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]
On Thu, Feb 27, 2003 at 11:53:02AM +0200, Vallo Kallaste vallo wrote: The vinum R5 and system as a whole were stable without softupdates. Only one problem remained after disabling softupdates, while being online and user I/O going on, rebuilding of failed disk corrupt the R5 volume completely. Yes, we've fixed a bug in that area. It had nothing to do with soft updates, though. Oh, that's very good news, thank you! Yes, it had nothing to do with soft updates at all and that's why I had the remained after in the sentence. Don't know is it fixed or not as I don't have necessary hardware at the moment. The only way around was to quiesce the volume before rebuilding, umount it, and wait until rebuild finished. I'll suggest extensive testing cycle for everyone who's going to work with vinum R5. Concat, striping and mirroring has been a breeze but not so with R5. IIRC the rebuild bug bit any striped configuration. Ok, I definitely had problems only with R5, but you certainly know much better what it was exactly. I'll need to lend 50-pin SCSI cable and test vinum again. Will it matter on what version of FreeBSD I'll try on? My home system runs -current of Feb 5, but if you suggest -stable for consistent results, I'll do it. So I did. Loaned two SCSI disks and 50-pin cable. Things haven't improved a bit, I'm very sorry to say it. The entire test session (script below) was done in single user. To be fair, I did tens of them, and the mode doesn't matter. Complete script: Script started on Sat Mar 1 19:54:45 2003 # pwd /root # dmesg Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.0-CURRENT #0: Sun Feb 2 16:16:49 EET 2003 [EMAIL PROTECTED]:/usr/home/vallo/Kevad-5.0 Preloaded elf kernel /boot/kernel/kernel at 0xc0516000. Preloaded elf module /boot/kernel/vinum.ko at 0xc05160b4. Preloaded elf module /boot/kernel/ahc_pci.ko at 0xc0516160. Preloaded elf module /boot/kernel/ahc.ko at 0xc051620c. Preloaded elf module /boot/kernel/cam.ko at 0xc05162b4. Timecounter i8254 frequency 1193182 Hz Timecounter TSC frequency 132955356 Hz CPU: Pentium/P54C (132.96-MHz 586-class CPU) Origin = GenuineIntel Id = 0x526 Stepping = 6 Features=0x1bfFPU,VME,DE,PSE,TSC,MSR,MCE,CX8 real memory = 67108864 (64 MB) avail memory = 59682816 (56 MB) Intel Pentium detected, installing workaround for F00F bug Initializing GEOMetry subsystem VESA: v2.0, 4096k memory, flags:0x0, mode table:0xc037dec2 (122) VESA: ATI MACH64 npx0: math processor on motherboard npx0: INT 16 interface pcib0: Host to PCI bridge at pcibus 0 on motherboard pci0: PCI bus on pcib0 isab0: PCI-ISA bridge at device 7.0 on pci0 isa0: ISA bus on isab0 atapci0: Intel PIIX ATA controller port 0xff90-0xff9f at device 7.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 ahc0: Adaptec 2940 Ultra SCSI adapter port 0xf800-0xf8ff mem 0xffbee000-0xffbeefff irq 10 at device 13.0 on pci0 aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs pci0: display, VGA at device 14.0 (no driver attached) atapci1: Promise ATA66 controller port 0xff00-0xff3f,0xffe0-0xffe3,0xffa8-0xffaf,0xffe4-0xffe7,0xfff0-0xfff7 mem 0xffbc-0xffbd irq 11 at device 15.0 on pci0 ata2: at 0xfff0 on atapci1 ata3: at 0xffa8 on atapci1 orm0: Option ROMs at iomem 0xed000-0xedfff,0xca000-0xca7ff,0xc8000-0xc9fff,0xc-0xc7fff on isa0 atkbdc0: Keyboard controller (i8042) at port 0x64,0x60 on isa0 atkbd0: AT Keyboard flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 ed0 at port 0x300-0x31f iomem 0xd8000 irq 5 on isa0 ed0: address 00:80:c8:37:e2:a6, type NE2000 (16 bit) fdc0: Enhanced floppy controller (i82077, NE72065 or clone) at port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: 1440-KB 3.5 drive on fdc0 drive 0 ppc0: Parallel port at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (EPP/NIBBLE) in COMPATIBLE mode lpt0: Printer on ppbus0 lpt0: Interrupt-driven port ppi0: Parallel I/O on ppbus0 sc0: System console at flags 0x100 on isa0 sc0: VGA 5 virtual consoles, flags=0x300 sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0 unknown: PNP0303 can't assign resources (port) unknown: PNP0700 can't assign resources (port) unknown: PNP0401 can't assign resources (port) unknown: PNP0501 can't assign resources (port) unknown: PNP0501 can't assign resources (port) Timecounters tick every 1.000 msec ata0-slave: ATAPI identify retries exceeded ad4: 2445MB QUANTUM FIREBALL EL2.5A [5300/15/63] at ata2-master UDMA33 ad6: 2423MB SAMSUNG WU32553A (2.54GB) [4924/16/63] at ata3-master UDMA33 acd0: CDROM WPI CDD-820 at ata0-master PIO3 Waiting 15 seconds for SCSI devices to settle da0 at ahc0 bus 0 target 0
Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]
On Thu, Feb 27, 2003 at 11:59:59AM +1030, Greg 'groggy' Lehey [EMAIL PROTECTED] wrote: The crashes and anomalies with filesystem residing on R5 volume were related to vinum(R5)/softupdates combo. Well, at one point we suspected that. But the cases I have seen were based on a misassumption. Do you have any concrete evidence that points to that particular combination? Don't have any other evidence than the case I was describing. After changing my employer I hadn't had much time or motivation to try again. The vinum R5 and system as a whole were stable without softupdates. Only one problem remained after disabling softupdates, while being online and user I/O going on, rebuilding of failed disk corrupt the R5 volume completely. Yes, we've fixed a bug in that area. It had nothing to do with soft updates, though. Oh, that's very good news, thank you! Yes, it had nothing to do with soft updates at all and that's why I had the remained after in the sentence. Don't know is it fixed or not as I don't have necessary hardware at the moment. The only way around was to quiesce the volume before rebuilding, umount it, and wait until rebuild finished. I'll suggest extensive testing cycle for everyone who's going to work with vinum R5. Concat, striping and mirroring has been a breeze but not so with R5. IIRC the rebuild bug bit any striped configuration. Ok, I definitely had problems only with R5, but you certainly know much better what it was exactly. I'll need to lend 50-pin SCSI cable and test vinum again. Will it matter on what version of FreeBSD I'll try on? My home system runs -current of Feb 5, but if you suggest -stable for consistent results, I'll do it. Thanks -- Vallo Kallaste To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]
On Friday, 21 February 2003 at 10:00:46 +0200, Vallo Kallaste wrote: On Thu, Feb 20, 2003 at 02:28:45PM -0800, Darryl Okahata [EMAIL PROTECTED] wrote: Vallo Kallaste [EMAIL PROTECTED] wrote: I'll second Brad's statement about vinum and softupdates interactions. My last experiments with vinum were more than half a year ago, but I guess it still holds. BTW, the interactions showed up _only_ on R5 volumes. I had 6 disk (SCSI) R5 volume in Compaq Proliant 3000 and the system was very stable before I enabled softupdates.. and of course after I disabled softupdates. In between there were crashes and nasty problems with filesystem. Unfortunately it was production system and I hadn't chanche to play. Did you believe that the crashes were caused by enabling softupdates on an R5 vinum volume, or were the crashes unrelated to vinum/softupdates? I can see how crashes unrelated to vinum/softupdates might trash vinum filesystems. The crashes and anomalies with filesystem residing on R5 volume were related to vinum(R5)/softupdates combo. Well, at one point we suspected that. But the cases I have seen were based on a misassumption. Do you have any concrete evidence that points to that particular combination? The vinum R5 and system as a whole were stable without softupdates. Only one problem remained after disabling softupdates, while being online and user I/O going on, rebuilding of failed disk corrupt the R5 volume completely. Yes, we've fixed a bug in that area. It had nothing to do with soft updates, though. Don't know is it fixed or not as I don't have necessary hardware at the moment. The only way around was to quiesce the volume before rebuilding, umount it, and wait until rebuild finished. I'll suggest extensive testing cycle for everyone who's going to work with vinum R5. Concat, striping and mirroring has been a breeze but not so with R5. IIRC the rebuild bug bit any striped configuration. Greg -- See complete headers for address and phone numbers Please note: we block mail from major spammers, notably yahoo.com. See http://www.lemis.com/yahoospam.html for further details. pgp0.pgp Description: PGP signature
Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]
On Friday, 21 February 2003 at 1:56:56 -0800, Terry Lambert wrote: Vallo Kallaste wrote: The crashes and anomalies with filesystem residing on R5 volume were related to vinum(R5)/softupdates combo. The vinum R5 and system as a whole were stable without softupdates. Only one problem remained after disabling softupdates, while being online and user I/O going on, rebuilding of failed disk corrupt the R5 volume completely. Don't know is it fixed or not as I don't have necessary hardware at the moment. The only way around was to quiesce the volume before rebuilding, umount it, and wait until rebuild finished. I'll suggest extensive testing cycle for everyone who's going to work with vinum R5. Concat, striping and mirroring has been a breeze but not so with R5. I think this is an expected problem with a lot of concatenation, whether through Vinum, GEOM, RAIDFrame, or whatever. Can you be more specific? What you say below doesn't address any basic difference between virtual and real disks. This comes about for the same reason that you can't mount -u to turn Soft Updates from off to on: Soft Updates does not tolerate dirty buffers for which a dependency does not exist, and will crap out when a pending dirty buffer causes a write. I don't understand what this has to do with virtual disks. This could be fixed in the mount -u case for Soft Updates, and it can also be fixed for Vinum (et. al.). The key is the difference between a mount -u vs. a umount ; mount, which comes down to flushing and invalidating all buffers on the underlying device, e.g.: vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, p); vinvalbuf(devvp, V_SAVE, NOCRED, p, 0, 0); error = VOP_CLOSE(devvp, ronly ? FREAD : FREAD|FWRITE, FSCRED, p); error = VOP_OPEN(devvp, ronly ? FREAD : FREAD|FWRITE, FSCRED, p); VOP_UNLOCK(devvp, 0, p); ... Basically, after rebuilding, before allowing the mount to proceed, the Vinum (and GEOM and RAIDFRame, etc.) code needs to cause all the pending dirty buffers to be written. This will guarantee that there are no outstanding dirty buffers at mount time, which in turn guarantees that there will be no dirty buffers that the dependency tracking in Soft Updates does not know about. I don't understand what you're assuming here. Certainly I can't see any relevance to Vinum, RAIDframe or any other virtual disk system. Greg -- See complete headers for address and phone numbers Please note: we block mail from major spammers, notably yahoo.com. See http://www.lemis.com/yahoospam.html for further details. pgp0.pgp Description: PGP signature
Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]
Terry Lambert [EMAIL PROTECTED] wrote: I think this is an expected problem with a lot of concatenation, whether through Vinum, GEOM, RAIDFrame, or whatever. This comes about for the same reason that you can't mount -u to turn Soft Updates from off to on: Soft Updates does not tolerate dirty buffers for which a dependency does not exist, and will crap out when a pending dirty buffer causes a write. Does this affect background fsck, too (on regular, non-vinum filesystems)? From what little I know of bg fsck, I'm guessing not, but I'd like to be sure. Thanks. -- Darryl Okahata [EMAIL PROTECTED] DISCLAIMER: this message is the author's personal opinion and does not constitute the support, opinion, or policy of Agilent Technologies, or of the little green men that have been following him all day. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]
Darryl Okahata wrote: Terry Lambert [EMAIL PROTECTED] wrote: I think this is an expected problem with a lot of concatenation, whether through Vinum, GEOM, RAIDFrame, or whatever. This comes about for the same reason that you can't mount -u to turn Soft Updates from off to on: Soft Updates does not tolerate dirty buffers for which a dependency does not exist, and will crap out when a pending dirty buffer causes a write. Does this affect background fsck, too (on regular, non-vinum filesystems)? From what little I know of bg fsck, I'm guessing not, but I'd like to be sure. Thanks. No, it doesn't. Background fsck works by assuming that the only thing that could contain bad data is the cylinder group bitmaps, which means the worst case failure is some blocks are not available for reallocation. It works by taking a snapshot, which is a feature that allows modification of the FS while the bgfsck's idea of the FS remains unchanged. Then it goes through the bitmaps, verifying that the blocks it thinks are allocated are in fact allocated by files within the snapshot. Basically, it's only job is really to clear bits in the bitmap that represent blocks for which there are no files referencing them. There are situations where bgfsck can fail, sometimes catastrophically, but they are unrelated to having dirty blocks in memory for which no updates have been created. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]
On Thu, Feb 20, 2003 at 02:28:45PM -0800, Darryl Okahata [EMAIL PROTECTED] wrote: Vallo Kallaste [EMAIL PROTECTED] wrote: I'll second Brad's statement about vinum and softupdates interactions. My last experiments with vinum were more than half a year ago, but I guess it still holds. BTW, the interactions showed up _only_ on R5 volumes. I had 6 disk (SCSI) R5 volume in Compaq Proliant 3000 and the system was very stable before I enabled softupdates.. and of course after I disabled softupdates. In between there were crashes and nasty problems with filesystem. Unfortunately it was production system and I hadn't chanche to play. Did you believe that the crashes were caused by enabling softupdates on an R5 vinum volume, or were the crashes unrelated to vinum/softupdates? I can see how crashes unrelated to vinum/softupdates might trash vinum filesystems. The crashes and anomalies with filesystem residing on R5 volume were related to vinum(R5)/softupdates combo. The vinum R5 and system as a whole were stable without softupdates. Only one problem remained after disabling softupdates, while being online and user I/O going on, rebuilding of failed disk corrupt the R5 volume completely. Don't know is it fixed or not as I don't have necessary hardware at the moment. The only way around was to quiesce the volume before rebuilding, umount it, and wait until rebuild finished. I'll suggest extensive testing cycle for everyone who's going to work with vinum R5. Concat, striping and mirroring has been a breeze but not so with R5. -- Vallo Kallaste [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Vinum R5 [was: Re: background fsck deadlocks with ufs2 and big disk]
Vallo Kallaste wrote: The crashes and anomalies with filesystem residing on R5 volume were related to vinum(R5)/softupdates combo. The vinum R5 and system as a whole were stable without softupdates. Only one problem remained after disabling softupdates, while being online and user I/O going on, rebuilding of failed disk corrupt the R5 volume completely. Don't know is it fixed or not as I don't have necessary hardware at the moment. The only way around was to quiesce the volume before rebuilding, umount it, and wait until rebuild finished. I'll suggest extensive testing cycle for everyone who's going to work with vinum R5. Concat, striping and mirroring has been a breeze but not so with R5. I think this is an expected problem with a lot of concatenation, whether through Vinum, GEOM, RAIDFrame, or whatever. This comes about for the same reason that you can't mount -u to turn Soft Updates from off to on: Soft Updates does not tolerate dirty buffers for which a dependency does not exist, and will crap out when a pending dirty buffer causes a write. This could be fixed in the mount -u case for Soft Updates, and it can also be fixed for Vinum (et. al.). The key is the difference between a mount -u vs. a umount ; mount, which comes down to flushing and invalidating all buffers on the underlying device, e.g.: vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, p); vinvalbuf(devvp, V_SAVE, NOCRED, p, 0, 0); error = VOP_CLOSE(devvp, ronly ? FREAD : FREAD|FWRITE, FSCRED, p); error = VOP_OPEN(devvp, ronly ? FREAD : FREAD|FWRITE, FSCRED, p); VOP_UNLOCK(devvp, 0, p); ... Basically, after rebuilding, before allowing the mount to proceed, the Vinum (and GEOM and RAIDFRame, etc.) code needs to cause all the pending dirty buffers to be written. This will guarantee that there are no outstanding dirty buffers at mount time, which in turn guarantees that there will be no dirty buffers that the dependency tracking in Soft Updates does not know about. FWIW: I've maintained for over 6 years now that the mount update code should be modified to do this automatically (and provided patches; see early 1997 mailing list archives), essentially turning a mount -u into a umount ; mount, without invalidating outstanding vnodes and in-core inodes or their references (so that open files do not break... they just get all their buffers taken away from them). Of course, the only open files that matter for device layering are the device exporting the layered block store, and the underlying component block stores that make it up (i.e. no open files there). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck deadlocks with ufs2 and big disk
Vallo Kallaste [EMAIL PROTECTED] wrote: I'll second Brad's statement about vinum and softupdates interactions. My last experiments with vinum were more than half a year ago, but I guess it still holds. BTW, the interactions showed up _only_ on R5 volumes. I had 6 disk (SCSI) R5 volume in Compaq Proliant 3000 and the system was very stable before I enabled softupdates.. and of course after I disabled softupdates. In between there were crashes and nasty problems with filesystem. Unfortunately it was production system and I hadn't chanche to play. Did you believe that the crashes were caused by enabling softupdates on an R5 vinum volume, or were the crashes unrelated to vinum/softupdates? I can see how crashes unrelated to vinum/softupdates might trash vinum filesystems. -- Darryl Okahata [EMAIL PROTECTED] DISCLAIMER: this message is the author's personal opinion and does not constitute the support, opinion, or policy of Agilent Technologies, or of the little green men that have been following him all day. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck deadlocks with ufs2 and big disk
At 2:28 PM -0800 2003/02/20, Darryl Okahata wrote: Did you believe that the crashes were caused by enabling softupdates on an R5 vinum volume, or were the crashes unrelated to vinum/softupdates? I can see how crashes unrelated to vinum/softupdates might trash vinum filesystems. Using RAID-5 under vinum was always a somewhat tricky business for me, but in many cases I could get it to work reasonably well most of the time. But if I enabled softupdates on that filesystem, I was toast. Softupdates enabled on filesystems that were not on top of vinum RAID-5 logical devices seemed to be fine. So, the interaction that I personally witnessed was specifically between vinum RAID-5 and softupdates. -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck deadlocks with ufs2 and big disk
David Schultz [EMAIL PROTECTED] wrote: IIRC, Kirk was trying to reproduce this a little while ago in response to similar reports. He would probably be interested in any new information. I don't have any useful information, but I do have a data point: My 5.0-RELEASE system recently mysteriously panic'd, which resulted in a partially trashed UFS1 filesystem, which caused bg fsck to hang. Details: * The panic was weird, in that only the first 4-6 characters of the first function (in the panic stacktrace) was displayed on the console (sorry, forgot what it was). Nothing else past that point was shown, and the console was locked up. Ddb was compiled into the kernel, but ctrl-esc did nothing. * The UFS1 filesystem in question (and I assume that it was UFS1, as I did not specify a filesystem type to newfs) is located on a RAID5 vinum volume, consisting of five 80GB disks. * Softupdates is enabled. * When bg fsck hung (w/no disk activity), I could break into the ddb. Unfortunately, I don't know how to use ddb, aside from ps. * Disabling bg fsck allowed the system to boot. However, fg fsck failed, and I had to do a manual fsck, which spewed lots of nasty SOFTUPDATE INCONSISTENCY errors. * Disturbingly (but fortunately), I then unmounted the filesystem (in multi-user mode) and re-ran fsck, and fsck still found errors. There should not have been any errors, as fg fsck just finished running. [ Unfortunately, I've forgotten what they were, and an umount/fsck done right now shows no problems. I think the errors were one of the incorrect block count errors. ] * After the fsck, some files were partially truncated ( corrupted?). After investigating, I believe these truncated files (which were NOT recently modified) were in a directory in which other files were being created/written at the time of the panic. -- Darryl Okahata [EMAIL PROTECTED] DISCLAIMER: this message is the author's personal opinion and does not constitute the support, opinion, or policy of Agilent Technologies, or of the little green men that have been following him all day. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck deadlocks with ufs2 and big disk
At 9:15 AM -0800 2003/02/19, Darryl Okahata wrote: * The UFS1 filesystem in question (and I assume that it was UFS1, as I did not specify a filesystem type to newfs) is located on a RAID5 vinum volume, consisting of five 80GB disks. * Softupdates is enabled. You know, vinum softupdates have had bad interactions with each other for as long as I can remember. Has this truly been a consistent thing (as I seem to recall), or has this been an on-again/off-again situation? -- Brad Knowles, [EMAIL PROTECTED] They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety. -Benjamin Franklin, Historical Review of Pennsylvania. GCS/IT d+(-) s:+(++): a C++(+++)$ UMBSHI$ P+++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+() DI+() D+(++) G+() e++ h--- r---(+++)* z(+++) To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck deadlocks with ufs2 and big disk
Brad Knowles [EMAIL PROTECTED] wrote: You know, vinum softupdates have had bad interactions with each other for as long as I can remember. Has this truly been a consistent thing (as I seem to recall), or has this been an on-again/off-again situation? Ah, yaaah. Hmm This is the first I've heard of that, but I can see how that could be. Could vinum be considered to be a form of (unintentional) write-caching? That might explain how the filesystem got terribly hosed, but it doesn't help with the panic. Foo. [ This is on a system that's been running in the current state for around a month. So far, it's panic'd once (a week or so ago), and so I don't have any feel for long-term stability. We'll see how it goes. ] -- Darryl Okahata [EMAIL PROTECTED] DISCLAIMER: this message is the author's personal opinion and does not constitute the support, opinion, or policy of Agilent Technologies, or of the little green men that have been following him all day. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck deadlocks with ufs2 and big disk
Thus spake Martin Blapp [EMAIL PROTECTED]: I just wanted to tell that I can deadlock one of my current boxes with a ufs2 filesystem on a 120GB ATA disk. I can reproduce the problem. The background fsck process hangs some time at the same place always at the same place, sometimes the box freezes after some time. The same box works fine with ufs1. IIRC, Kirk was trying to reproduce this a little while ago in response to similar reports. He would probably be interested in any new information. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck did not create lost+found
On Mon, 20 Jan 2003, David Schultz wrote: First two entries clearly correspond to the missing file, which should have been put in /home/lost+found. But, the poroblem is that no lost+found directory was created, while it should (as fsck_ffs(8) says). I guess its a bug, probably in the background fsck code. Still, is there any way to reclaim the file now, besides running strings(1) on the whole partition? Consider what happens when you remove a large directory tree. Thousands of directory entries may be removed, but in the softupdates case, the inodes will stick around a bit longer. The same also applies to files that have been intentionally unlinked but are still open. To avoid a syndrome where all these thousands of files end up in lost+found after a crash or power failure, fsck just removes them on softupdates-enabled filesystems. Would that be a big problem to allow some fsck option not to erase all these softupdates-pending inodes, but to put them in lost+found as usual? The default behaviour is unchanged, yet there is a way to reclaim lost files. -- -- wrzask --= v =-- Winfried --=-- GG# 3838383 --=-- JS500-RIPE -- -- [EMAIL PROTECTED] --- [EMAIL PROTECTED] --===-- http://violent.dream.vg/ --- --= Ride the wild wind - push the envelope, don't sit on the fence, --- -- Ride the wild wind - live life on the razor's edge! =-- Queen -- To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck did not create lost+found
On Wed, 22 Jan 2003 11:14:47 +0100 (CET), Jan Srzednicki [EMAIL PROTECTED] said: Would that be a big problem to allow some fsck option not to erase all these softupdates-pending inodes, but to put them in lost+found as usual? It certainly couldn't be done with the background fsck, because background fsck works on a snapshot and not the running filesystem; thus, it cannot make any allocations -- it can only deallocate things. -GAWollman To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck did not create lost+found
On Wed, 22 Jan 2003, Garrett Wollman wrote: Would that be a big problem to allow some fsck option not to erase all these softupdates-pending inodes, but to put them in lost+found as usual? It certainly couldn't be done with the background fsck, because background fsck works on a snapshot and not the running filesystem; thus, it cannot make any allocations -- it can only deallocate things. Still, in case you know some of your important files can be lost, you can boot the system to single user and run foreground fsck. -- -- wrzask --= v =-- Winfried --=-- GG# 3838383 --=-- JS500-RIPE -- -- [EMAIL PROTECTED] --- [EMAIL PROTECTED] --===-- http://violent.dream.vg/ --- --= Ride the wild wind - push the envelope, don't sit on the fence, --- -- Ride the wild wind - live life on the razor's edge! =-- Queen -- To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck did not create lost+found
hi, there! On Wed, Jan 22, 2003 at 07:18:44PM +0100, Jan Srzednicki wrote: Would that be a big problem to allow some fsck option not to erase all these softupdates-pending inodes, but to put them in lost+found as usual? It certainly couldn't be done with the background fsck, because background fsck works on a snapshot and not the running filesystem; thus, it cannot make any allocations -- it can only deallocate things. Still, in case you know some of your important files can be lost, you can boot the system to single user and run foreground fsck. this is not an option if the system was rebooted because of power loss or kernel panic /fjoe To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck did not create lost+found
Thus spake Garrett Wollman [EMAIL PROTECTED]: On Wed, 22 Jan 2003 11:14:47 +0100 (CET), Jan Srzednicki [EMAIL PROTECTED] said: Would that be a big problem to allow some fsck option not to erase all these softupdates-pending inodes, but to put them in lost+found as usual? It certainly couldn't be done with the background fsck, because background fsck works on a snapshot and not the running filesystem; thus, it cannot make any allocations -- it can only deallocate things. Actually, that should work just fine. When background fsck notices an unreferenced inode in the snapshot, it could create a file in the underlying filesystem. The easy way to do this is to copy the data with the standard open(2)/write(2)/close(2) interfaces. After the copy, the original data blocks are deallocated as usual. A more efficient implementation might require a special kernel interface that creates a directory entry, given an inode number and path. Unfortunately, I think it is possible that the unreferenced inode has not been initialized, even though it is allocated in the inode bitmap, so you could potentially get random junk. Such a feature sounds reasonable, although I'm not sure how useful it would really be. If you have software that introduces a race window where you can lose data because it does updates incorrectly, hacking the operating system to make the race window slightly smaller is not the best solution. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck did not create lost+found
At 12:53 AM +0600 1/23/03, Max Khon wrote: On Wed, Jan 22, 2003 at 07:18:44PM +0100, Jan Srzednicki wrote: Would that be a big problem to allow some fsck option not to erase all these softupdates-pending inodes, but to put them in lost+found as usual? It certainly couldn't be done with the background fsck, because background fsck works on a snapshot and not the running filesystem; thus, it cannot make any allocations -- it can only deallocate things. Still, in case you know some of your important files can be lost, you can boot the system to single user and run foreground fsck. this is not an option if the system was rebooted because of power loss or kernel panic Can't you just set the rc.conf option to not-do the background fsck? -- Garance Alistair Drosehn= [EMAIL PROTECTED] Senior Systems Programmer or [EMAIL PROTECTED] Rensselaer Polytechnic Instituteor [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck did not create lost+found
hi, there! On Wed, Jan 22, 2003 at 02:43:37PM -0500, Garance A Drosihn wrote: Would that be a big problem to allow some fsck option not to erase all these softupdates-pending inodes, but to put them in lost+found as usual? It certainly couldn't be done with the background fsck, because background fsck works on a snapshot and not the running filesystem; thus, it cannot make any allocations -- it can only deallocate things. Still, in case you know some of your important files can be lost, you can boot the system to single user and run foreground fsck. this is not an option if the system was rebooted because of power loss or kernel panic Can't you just set the rc.conf option to not-do the background fsck? I can but the whole purpose of background fsck (faster startup times) will be lost. /fjoe To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck did not create lost+found
On Wed, 22 Jan 2003 11:32:12 -0800, David Schultz [EMAIL PROTECTED] said: Unfortunately, I think it is possible that the unreferenced inode has not been initialized, even though it is allocated in the inode bitmap, so you could potentially get random junk. That is definitely true on UFS2, which I had forgotten. UFS2 inodes are only initialized when they are used. -GAWollman To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck did not create lost+found
In the last episode (Jan 20), Jan Srzednicki said: After building new world and installing new kernel, I rebooted my machine to launch a new kernel. The system mysteriously failed to flush 22 disk buffers, and after reboot fsck was launched. [...] This massive disk mangling occured on /usr, but still, one file in /home got lost - which happened to be quite important file. Background fsck logged: Jan 20 16:06:30 stronghold root: /dev/ad1s1d: UNREF FILE I=1723065 OWNER=winfried MODE=100644 Jan 20 16:06:30 stronghold root: /dev/ad1s1d: SIZE=23397 MTIME=Jan 20 15:57 2003 (CLEARED) Jan 20 16:06:30 stronghold root: /dev/ad1s1d: Reclaimed: 0 directories, 8 files, 16439 fragments Jan 20 16:06:30 stronghold root: /dev/ad1s1d: 33802 files, 13109700 used, 6310697 free (11577 frags, 787390 blocks, 0.1% fragmentation) First two entries clearly correspond to the missing file, which should have been put in /home/lost+found. But, the poroblem is that no lost+found directory was created, while it should (as fsck_ffs(8) says). I guess its a bug, probably in the background fsck code. Still, is there any way to reclaim the file now, besides running strings(1) on the whole partition? It's not a bug. Softupdates works by guaranteeing that the only things that a background fsck needs to do are reduce link counts, clear inodes, and fix free-space bitmaps. Bgfsck will clear a file's space rather than put it in lost+found. This means that if you delete a file, immediately create a new one with the same name, and then reboot within 30 seconds, both files will be gone. You can minimize the risk by lowering the kern.metadelay, kern.dirdelay, and kern.filedelay sysctl values, but the lower you go, the less benefit you get. String'ing the raw partition is probably your best bet for recovering the data. -- Dan Nelson [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck did not create lost+found
Thus spake Jan Srzednicki [EMAIL PROTECTED]: This massive disk mangling occured on /usr, but still, one file in /home got lost - which happened to be quite important file. Background fsck logged: Jan 20 16:06:30 stronghold root: /dev/ad1s1d: UNREF FILE I=1723065 OWNER=winfried MODE=100644 Jan 20 16:06:30 stronghold root: /dev/ad1s1d: SIZE=23397 MTIME=Jan 20 15:57 2003 (CLEARED) Jan 20 16:06:30 stronghold root: /dev/ad1s1d: Reclaimed: 0 directories, 8 files, 16439 fragments Jan 20 16:06:30 stronghold root: /dev/ad1s1d: 33802 files, 13109700 used, 6310697 free (11577 frags, 787390 blocks, 0.1% fragmentation) First two entries clearly correspond to the missing file, which should have been put in /home/lost+found. But, the poroblem is that no lost+found directory was created, while it should (as fsck_ffs(8) says). I guess its a bug, probably in the background fsck code. Still, is there any way to reclaim the file now, besides running strings(1) on the whole partition? Consider what happens when you remove a large directory tree. Thousands of directory entries may be removed, but in the softupdates case, the inodes will stick around a bit longer. The same also applies to files that have been intentionally unlinked but are still open. To avoid a syndrome where all these thousands of files end up in lost+found after a crash or power failure, fsck just removes them on softupdates-enabled filesystems. Unfortunately, this means that a newly-created file that has an inode but no directory entry will also be cleared. In some sense, this race is equivalent to the situation where something went wrong before the inode could be written. However, when you are saving a new version of an important file, you need to be careful that the new version (and its directory entry) hits the disk before the old one goes away. I know that vi saves files in a safe way, whereas ee and emacs do not. (Emacs introduces only a small race, though.) Also, mv will DTRT only if the source and destination files live on the same filesystem. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck did not create lost+found
:However, when you are saving a new version of an important file, :you need to be careful that the new version (and its directory :entry) hits the disk before the old one goes away. I know that vi :saves files in a safe way, whereas ee and emacs do not. (Emacs :introduces only a small race, though.) Also, mv will DTRT only if :the source and destination files live on the same filesystem. : I think you have that reversed. vi just overwrites the destination file (O_CREAT|O_TRUNC, try ktrace'ing a vi session and you will see). I believe emacs defaults to a mode where it creates a new file and renames it over the original. This means that there is a period of time where a crash may result in the loss of the file if the vi session cannot be recovered (with vi -r) after the fact. -Matt Matthew Dillon [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck did not create lost+found
Thus spake Matthew Dillon [EMAIL PROTECTED]: :However, when you are saving a new version of an important file, :you need to be careful that the new version (and its directory :entry) hits the disk before the old one goes away. I know that vi :saves files in a safe way, whereas ee and emacs do not. (Emacs :introduces only a small race, though.) Also, mv will DTRT only if :the source and destination files live on the same filesystem. : I think you have that reversed. vi just overwrites the destination file (O_CREAT|O_TRUNC, try ktrace'ing a vi session and you will see). I believe emacs defaults to a mode where it creates a new file and renames it over the original. This means that there is a period of time where a crash may result in the loss of the file if the vi session cannot be recovered (with vi -r) after the fact. vi writes and fsyncs a recovery file when it opens a file for editing, and it fsyncs the real file before removing the recovery file. (I don't know how reliable vi's recovery mechanism is because I don't use vi, but at least it's ensuring that the recovery file is written to disk when it should be.) In Emacs, if 'make-backup-files' is non-nil (the default), the original file ${FILE} is renamed to ${FILE}~. Then it writes out and fsyncs a new file, which is perfectly safe. If 'make-backup-files' is nil, emacs simply omits the renaming part, unsafely overwriting the original file. The behavior in the latter case appears to be a bug, or at least an undocumented feature. Emacs even causes data loss in this case when the disk fills up! It needs to either do an fsync/rename or write and fsync a backup file for the duration of the save. Lastly, with ee, there's no backup file and no fsync. Some ktrace snippets are below. 3662 vi CALL open(0x808e260,0x2,0x180) 3662 vi NAMI /var/tmp/vi.recover/vi.HjDlgO 3662 vi RET open 4 ... 3662 vi CALL write(0x4,0x809a01c,0x400) 3662 vi GIO fd 4 wrote 1024 bytes [...]old contents[...] ... 3662 vi CALL fsync(0x4) 3662 vi RET fsync 0 ... [I edit the file from old contents to new contents] ... 3662 vi CALL open(0x8095140,0x601,0x1b6) 3662 vi NAMI foo 3662 vi RET open 7 ... 3662 vi CALL write(0x7,0x80bb000,0xd) 3662 vi GIO fd 7 wrote 13 bytes new contents 3662 vi RET write 13/0xd ... 3662 vi CALL fsync(0x7) 3662 vi RET fsync 0 3662 vi CALL close(0x7) 3662 vi RET close 0 ... 3662 vi CALL lseek(0x4,0,0x400,0,0) 3662 vi RET lseek 1024/0x400 3662 vi CALL write(0x4,0x809a01c,0x400) 3662 vi GIO fd 4 wrote 1024 bytes [...]new contents[...] ... 3662 vi CALL fsync(0x4) 3662 vi RET fsync 0 [The following bit only happens if make-backup-files is non-nil] 3799 emacsCALL rename(0x848c328,0x848fba8) 3799 emacsNAMI /home/test/foo 3799 emacsNAMI /home/test/foo~ 3799 emacsRET rename 0 ... [This part happens unconditionally] 3799 emacsCALL open(0x848c328,0x601,0x1b6) 3799 emacsNAMI /home/test/foo 3799 emacsRET open 3 3799 emacsCALL write(0x3,0xbfbfae24,0x3) 3799 emacsGIO fd 3 wrote 3 bytes new 3799 emacsRET write 3 3799 emacsCALL write(0x3,0xbfbfae24,0x9) 3799 emacsGIO fd 3 wrote 9 bytes contents 3799 emacsRET write 9 3799 emacsCALL fsync(0x3) 3799 emacsRET fsync 0 3799 emacsCALL close(0x3) 3799 emacsRET close 0 To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck
On 19-May-01 Matthew Thyer wrote: Is it possible that background fsck is not the culprit here ? I think this may be fallout from the dirpref changes as Chris Knight recently emailed in 018b01c0c496$07ed13d0$[EMAIL PROTECTED]. The solution is to unmount all your filesystems, fsck them and then use tunefs with -A to change something so that all superblock backups will be updated. Does this sound likely ? Jason Evans wrote: I had exactly the same thing happen to /var on an SMP test box using -current as of 16 May. It happened once out of about a half dozen panics. Jason My instances at least are not the result of dirpref, as my laptop tracks -current very closely and I navigated the dirpref waters a while back. -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck
I had exactly the same thing happen to /var on an SMP test box using -current as of 16 May. It happened once out of about a half dozen panics. Jason To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck
This happens to me ``almost all the time'' on my dev box: Filesystem 1K-blocks UsedAvail Capacity Mounted on /dev/ad0s1a 25406382600 15113835%/ devfs110 100%/dev procfs 440 100%/proc /dev/ad0s1e 2540637 233731 0%/tmp /dev/ad1s2a 49623926424 430116 6%/var /dev/ad1s2e4466254 1448160 266079435%/usr /dev/ad0s1f 775487 392540 32090955%/usr/obj /dev/ad1s1a 10145116 5631076 370243260%/usr/ports/distfiles /dev/ad1s1e 10145116 4957632 437587653%/usr/audio /dev/ad1s1g4963030 3621790 94419879%/usr/packages /dev/ad1s1f 10145116 4790396 454311251%/cvs /dev/ad1s2f 330596761 30414901 0%/spare1 The interesting thing is that it always happens on /usr and /cvs and no other partitions. Both of these partitions have large directory hierarchies Also, FWIW it now takes nearly 30 minutes to fsck my laptop's disk (20Gb 5400rpm). That's not good Has anyone else been trying out the background fsck? Last night I was working on the ithread code some and managed to panic my laptop while ejecting a pccard. Anyways, the kernel ate itself while trying to flush its buffers and I ended up with a dirty filesystem. I rebooted and let fsck -p do its usual thing, except that it freaked out. The actual fsck of / proceeded fine (actual fs activity when I panic'd my machine was very low, so the filesystems weren't corrupted, just marked dirty). When it got to /usr and /var, however, fsck freaked out and claimed that the primary superblock didn't match the first alternate. At this point I first had a heart attack. Once I recovered from that, I attempted read-only mounts of /usr and /var which did succeed, except that each mount spewed out a message to the kernel console about losing x files and y blocks. Confident that my fs wasn't totally hosed after doing some ls's, I unmounted /usr and /var and ran a non-preen fsck on them, which insisted on using an alternate superblock, but otherwise proceeded fine (except that it seemed to take longer than usual). Once the fscks's finished, it seemed to be all ok. Is anyone else seeing any weird stuff like this? I've never had fsck complain about the superblocks after a crash before. df -t ufs Filesystem 1K-blocks UsedAvail Capacity Mounted on /dev/ad0s2a148823847175220162%/ /dev/ad0s2f 10191770 7052563 232386675%/usr /dev/ad0s2e 99183142547699516%/var mount -t ufs /dev/ad0s2a on / (ufs, local) /dev/ad0s2f on /usr (ufs, local) /dev/ad0s2e on /var (ufs, local) grep ufs /etc/fstab /dev/ad0s2a / ufs rw 1 1 /dev/ad0s2f /usrufs rw 2 2 /dev/ad0s2e /varufs rw 2 2 Hmm, that's odd, I did have soft updates on on /usr and /var before the crash. It seems to be off now. :( -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc Power Users Use the Power to Serve! - http://www.FreeBSD.org/ -- Brian [EMAIL PROTECTED]brian@[uk.]FreeBSD.org http://www.Awfulhak.org brian@[uk.]OpenBSD.org Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck
Date: Thu, 17 May 2001 14:31:55 -0700 (PDT) From: John Baldwin [EMAIL PROTECTED] Has anyone else been trying out the background fsck? A little; despite my desire to help debug things, getting to a point where doing this is appropriate isn't something I am all too eager to do. Thus, it wasn't exactly voluntary. :-} Last night I was working on the ithread code some and managed to panic my laptop while ejecting a pccard. Anyways, the kernel ate itself while trying to flush its buffers and I ended up with a dirty filesystem. I've had a couple of occasions when I'd boot my laptop (which resembles yours, as you may recall) from -STABLE into -CURRENT (or vice versa), and xdm would fire up, but not present a login window. Meanwhile, the fan kicks into high gear (indicating that the machine is a tad busy, thankyouverymuch), and I can't get its attention by any means I have been able to discover short of a power-cycle. (At least the button does the job; I didn't need to yank the batteries out.) But lid-closure just shut off the display, no key chord I could find had a noticable effect, nor did removing re-inserting a PCMCIA card. I rebooted and let fsck -p do its usual thing, except that it freaked out. The actual fsck of / proceeded fine (actual fs activity when I panic'd my machine was very low, so the filesystems weren't corrupted, just marked dirty). When it got to /usr and /var, however, fsck freaked out and claimed that the primary superblock didn't match the first alternate. Well, I confess that the first couple of times I had been running -CURRENT and the box wanted a fsck more elaborate than the -p variety, I recalled that there had been recent activity, and I remembered one person's rather unfortunate experience of finding everything sitting in lost+found. Since I had no desire for that to happen, I booted -STABLE instead: single-user mode, fsck -p. Wasn't quite happy with a couple of file systems, so I did manual fsck (still under -STABLE) on each of those. Finally, system said things were OK; I was able to do a mount-a, so after that, I did a reboot into -CURRENT. Much to my surprise (and some chagrin), those 2 file systems that needed the extra attention (/var and -CURRENT's /usr, if I recall correctly) didn't pass muster with -CURRENT's fsck; it wanted a manual fsck of those, no question about it. Since they passed -STABLE's fsck, I figured they weren't likely in *too* terribly bad shape, so I went ahead and did the manual fsck, per request. And in each case, I had a similar symptom (re: primary first alternate superblock mismatch). I did wonder about making a choice just between those two, without checking for one of the other alternates (some sort of voting protocol -- though I wouldn't be too terribly keen on making fsck unecessarily complicated, certainly). But under the circumstances, I wanted to run -CURRENT, so I didn't see that I had a great deal of choice in the matter (regardless of what I was being asked), so I told it to go ahead. Following those manual fscks, I re-booted into multi-user mode, and things worked normally (as far as I can tell). At this point I first had a heart attack. I believe that a technical term for that literary device is hyperbole. :-) Once I recovered from that, I attempted read-only mounts of /usr and /var which did succeed, except that each mount spewed out a message to the kernel console about losing x files and y blocks. Confident that my fs wasn't totally hosed after doing some ls's, I unmounted /usr and /var and ran a non-preen fsck on them, which insisted on using an alternate superblock, but otherwise proceeded fine (except that it seemed to take longer than usual). Once the fscks's finished, it seemed to be all ok. Is anyone else seeing any weird stuff like this? I've never had fsck complain about the superblocks after a crash before. As noted, it's happened a couple of times for me. Generally, somewhat inopportune times (almost by definition), so I wasn't really able to take the time to sit back, take notes, and report back much that was coherent. And I was under the impression that much of this was under construction anyhow, so the value of any report I maight make was somewhat open to question (from my perspective, anyhow). ... Hmm, that's odd, I did have soft updates on on /usr and /var before the crash. It seems to be off now. :( That also happened to me. I thought it odd at the time, but forgot to mention it At least I have some reason to believe I was unlikely to have been hallucinating about that Cheers, david -- David H. Wolfskill [EMAIL PROTECTED] As a computing professional, I believe it would be unethical for me to advise, recommend, or support the use (save possibly for personal amusement) of any product that is or depends on any Microsoft product. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck
On Thu, 17 May 2001, David Wolfskill wrote: :From: John Baldwin [EMAIL PROTECTED] : :Hmm, that's odd, I did have soft updates on on /usr and /var before the crash. :It seems to be off now. :( : :That also happened to me. I thought it odd at the time, but forgot to :mention it At least I have some reason to believe I was unlikely to :have been hallucinating about that Does tunefs update the alternate superblocks when it enables soft updates? It doesn't look it does, but I might be missing something. -- [EMAIL PROTECTED] Bipedalism is only a fad. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck
Date: Thu, 17 May 2001 22:30:03 -0500 (CDT) From: David Scheidt [EMAIL PROTECTED] Does tunefs update the alternate superblocks when it enables soft updates? It doesn't look it does, but I might be missing something. I could easily have overlooked something myself, but it doesn't appear to do so to me. (I see it does want the file system clean when soft updates is enabled, but doesn't check for that for a disable request.) Cheers, david -- David H. Wolfskill [EMAIL PROTECTED] As a computing professional, I believe it would be unethical for me to advise, recommend, or support the use (save possibly for personal amusement) of any product that is or depends on any Microsoft product. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: background fsck
cc list trimed. On Thu, 17 May 2001, David Wolfskill wrote: :(I see it does want the file system clean when soft updates is enabled, :but doesn't check for that for a disable request.) : Right. fsck(8) can make assumptions about the state of the filesystem if it knows that softupdates were in use. (There's a smaller set of possible inconsistancies, but I don't remember what they are.) It's safe for fsck to assume that the filesystem could be in worse shape than it actually is, but not safe to assume it's cleaner. David -- [EMAIL PROTECTED] Bipedalism is only a fad. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message