disklabel (?) issues during upgrade to 4.2
The problem I am facing happens during installation of OpenBSD 4.2 -release, -stable, or -current as of January 1st (both amd64 and i386). I can very easily reproduce this issue every time. I've been testing for the last 48 hours, and can confirm that it never happens on 4.0 or 4.1. Happens with SATA drives, never with PATA. See the dmesg with SATA at the bottom (dmesg is for -stable, no custom changes otherwise). Firstly, while I try to upgrade my 4.1/amd64 box to 4.2/amd64, the upgrade script tries to fsck /dev/wd0a, but gives me the following: wd0a: id not found reading fsbn 128 of 128 143 (wd0 bn 8755093022399; cn 547... tn 80 sn 4), retrying wd0: transfer error, downgrading to Ultra: DMA mode 4 It downgrades down to DMA mode 2, finally gives up as FAILED, and instructs me to fsck manually (which doesn't work either). And, disklabel output at this point looks strange: #size offset fstype [fsize bsize cpg] a: 40365128755093022271 4.2BSD 0 0 256 The expected offset for partition a is of course 63. (disklabel in 4.2/i386 does the same too.) Just to confirm my observations, I managed to replace disklabel binary of 4.2 installation (in /sbin of rd0a) with the one from 4.1, and it does not have this issue. 4.2 disklabel behaves the same in install mode too. I mean, if I give up on upgrade (which is every time) and choose to install instead, after I drop to disklabel editor, and print the existing partitions, I see exactly the same huge number as the offset. But the difference is that if I continue with install without changing anything in disklabel editor, newfs cannot format the partition, and gives me the same id not found... downgrading ... DMA mode to... errors as above, and finally gives up. Therefore, the only way to install 4.2 (on my system with SATA HD already partitioned) is to zero out the partition table and recreate all the partitions in disklabel editor, and everything works fine thereafter. If this issue did happen with 4.0 and 4.1 too, then I could blame my hardware (perhaps nvidia chipset). But upgrading from 4.0 to 4.1 is fine. Up/downgrading from 4.2 to again 4.2 works fine too. And, otherwise this system has been running fine for more than a year now. Also, trying to *fake* downgrade from 4.2 to 4.1 fails during fsck (did not really downgrade, just wanted to test disklabel, fsck, mount, and newfs of 4.1). But downgrade from 4.2 to 4.0 seems to fsck fine. I have tried with many different partitioning, enabled/disabled IDE and SATA ports in bios, and used install42.iso (-release), cd42.iso (snapshot), etc. disklabel output after first boot seems fine, i.e. the issue I am reporting is only during installation of 4.2. I have seen that there are major changes to disklabel (and related tools) since June. Could this issue be related with those? I would appreciate any help. I can file a bug report if this is really a bug. (I myself have tried a patch before submitting this post, namely, a typecast to u_int64_t for the starting_sector in find_bounds() in editor.c, but it did not fix the disklabel offset. Lines 1649 and 1650 in -stable.) OpenBSD 4.2-stable (STABLE) #6: Sun Dec 2 17:51:00 EET 2007 [EMAIL PROTECTED]:/usr/src/sys/arch/amd64/compile/STABLE real mem = 1073278976 (1023MB) avail mem = 1030926336 (983MB) mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xf (75 entries) bios0: vendor Phoenix Technologies, LTD version ASUS A8N5X ACPI BIOS Revision 1003 date 06/01/2006 bios0: ASUSTeK Computer INC. A8N5X acpi at mainbus0 not configured cpu0 at mainbus0: (uniprocessor) cpu0: AMD Athlon(tm) 64 Processor 3700+, 2211.58 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 64b/line 16-way L2 cache cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu0: AMD erratum 89 present, BIOS upgrade may be required cpu0: Cool'n'Quiet K8 2211 MHz: speeds: 2200 2000 1800 1000 MHz pci0 at mainbus0 bus 0: configuration mode 1 NVIDIA nForce4 DDR rev 0xa3 at pci0 dev 0 function 0 not configured pcib0 at pci0 dev 1 function 0 NVIDIA nForce4 ISA rev 0xa3 nviic0 at pci0 dev 1 function 1 NVIDIA nForce4 SMBus rev 0xa2 iic0 at nviic0 iic1 at nviic0 pciide0 at pci0 dev 6 function 0 NVIDIA nForce4 IDE rev 0xf2: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility pciide0: channel 0 disabled (no drives) atapiscsi0 at pciide0 channel 1 drive 0 scsibus0 at atapiscsi0: 2 targets cd0 at scsibus0 targ 0 lun 0: HL-DT-ST, CD-RW GCE-8527B, 1.02 SCSI0 5/cdrom removable cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 pciide1 at pci0 dev 7 function 0 NVIDIA nForce4 SATA rev 0xf3: DMA pciide1: using irq 11 for native-PCI interrupt pciide2 at pci0 dev 8
Re: disklabel (?) issues during upgrade to 4.2
On Thu, Jan 03, 2008 at 06:56:18PM +0200, Soner Tari wrote: The problem I am facing happens during installation of OpenBSD 4.2 -release, -stable, or -current as of January 1st (both amd64 and i386). I can very easily reproduce this issue every time. I've been testing for the last 48 hours, and can confirm that it never happens on 4.0 or 4.1. Happens with SATA drives, never with PATA. See the dmesg with SATA at the bottom (dmesg is for -stable, no custom changes otherwise). Firstly, while I try to upgrade my 4.1/amd64 box to 4.2/amd64, the upgrade script tries to fsck /dev/wd0a, but gives me the following: wd0a: id not found reading fsbn 128 of 128 143 (wd0 bn 8755093022399; cn 547... tn 80 sn 4), retrying wd0: transfer error, downgrading to Ultra: DMA mode 4 It downgrades down to DMA mode 2, finally gives up as FAILED, and instructs me to fsck manually (which doesn't work either). And, disklabel output at this point looks strange: #size offset fstype [fsize bsize cpg] a: 40365128755093022271 4.2BSD 0 0 256 The expected offset for partition a is of course 63. (disklabel in 4.2/i386 does the same too.) This can happen if you have a version mismatch between kernel and disklabel executable. Just to confirm my observations, I managed to replace disklabel binary of 4.2 installation (in /sbin of rd0a) with the one from 4.1, and it does not have this issue. So that suggests you are NOT running a 4.2 kernel during your upgrade. 4.2 disklabel behaves the same in install mode too. I mean, if I give up on upgrade (which is every time) and choose to install instead, after I drop to disklabel editor, and print the existing partitions, I see exactly the same huge number as the offset. But the difference is that if I continue with install without changing anything in disklabel editor, newfs cannot format the partition, and gives me the same id not found... downgrading ... DMA mode to... errors as above, and finally gives up. Therefore, the only way to install 4.2 (on my system with SATA HD already partitioned) is to zero out the partition table and recreate all the partitions in disklabel editor, and everything works fine thereafter. If this issue did happen with 4.0 and 4.1 too, then I could blame my hardware (perhaps nvidia chipset). But upgrading from 4.0 to 4.1 is fine. Up/downgrading from 4.2 to again 4.2 works fine too. And, otherwise this system has been running fine for more than a year now. Also, trying to *fake* downgrade from 4.2 to 4.1 fails during fsck (did not really downgrade, just wanted to test disklabel, fsck, mount, and newfs of 4.1). But downgrade from 4.2 to 4.0 seems to fsck fine. Downgrades are NOT supported. Some backrground info: the disklabel format changed from 4.1 to 4.2. A 4.2 kernel makes sure to translate the format, and the new tools handle things fine too. We tested many upgrade scenarios, and they all work fine. So far reports like yours all have boiled down to version conflicts. I'm pretyy confident you mixed versions somewhere in your process, and with all your experimenting, you could very well have made the on-disk label faulty. I have tried with many different partitioning, enabled/disabled IDE and SATA ports in bios, and used install42.iso (-release), cd42.iso (snapshot), etc. disklabel output after first boot seems fine, i.e. the issue I am reporting is only during installation of 4.2. Another indicator your are NOT running a 4.2 kernel during install/upgrade. I have seen that there are major changes to disklabel (and related tools) since June. Could this issue be related with those? I would appreciate any help. I can file a bug report if this is really a bug. My bets are on a user error. -Otto (I myself have tried a patch before submitting this post, namely, a typecast to u_int64_t for the starting_sector in find_bounds() in editor.c, but it did not fix the disklabel offset. Lines 1649 and 1650 in -stable.) OpenBSD 4.2-stable (STABLE) #6: Sun Dec 2 17:51:00 EET 2007 [EMAIL PROTECTED]:/usr/src/sys/arch/amd64/compile/STABLE real mem = 1073278976 (1023MB) avail mem = 1030926336 (983MB) mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xf (75 entries) bios0: vendor Phoenix Technologies, LTD version ASUS A8N5X ACPI BIOS Revision 1003 date 06/01/2006 bios0: ASUSTeK Computer INC. A8N5X acpi at mainbus0 not configured cpu0 at mainbus0: (uniprocessor) cpu0: AMD Athlon(tm) 64 Processor 3700+, 2211.58 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB 64b/line 16-way L2 cache cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu0: AMD
Re: disklabel (?) issues during upgrade to 4.2
On Fri, Jan 04, 2008 at 12:06:04AM +0200, Soner Tari wrote: On Thu, 2008-01-03 at 19:15 +0100, Otto Moerbeek wrote: Downgrades are NOT supported. Some backrground info: the disklabel format changed from 4.1 to 4.2. A 4.2 kernel makes sure to translate the format, and the new tools handle things fine too. We tested many upgrade scenarios, and they all work fine. So far reports like yours all have boiled down to version conflicts. I'm pretyy confident you mixed versions somewhere in your process, and with all your experimenting, you could very well have made the on-disk label faulty. Hmmm, I always thought that the phrase Downgrades are NOT supported. referred to binaries, libraries, ports packages, and such. It never occurred to me that it refers to disk labels too. This must be especially true going from 4.1 to 4.2. The disklabel format has changed to allow for large (2TB) partitions and disks. A 4.2 kernel knows how to deal with the old format and the new format, but a 4.1 kernel obvioulsy does not know how to handle the new format. Installing 4.2 and writing a disklabel (which will happen if the label is edited and/or a newfs is done) will convert the on-disk label to the new format. -Otto Please note that I have never downgraded OpenBSD binaries and such, ever... But this was my test/development system, and I was testing the upgrade feature of my project. So I was reinstalling 4.1 over 4.2 (not downgrading), without touching the partitions. So, this is effectively an unintentional downgrade of the disk label. To continue my upgrading tests, I *have to* reinstall 4.1 over 4.2 again and again. In order not to have this label downgrading issue, do you think it is enough to 'z' on disklabel editor and recreate all partitions during 4.1 install (over 4.2 installed HD)? (I have dd'd the HD with /dev/zero after your comments above, so I am not sure if just 'z' on disklabel editor is enough.) Is there a faster way for my upgrade test cycles? Thank you very much.