disklabel (?) issues during upgrade to 4.2

2008-01-03 Thread Soner Tari
The problem I am facing happens during installation of OpenBSD 4.2
-release, -stable, or -current as of January 1st (both amd64 and i386).
I can very easily reproduce this issue every time. I've been testing for
the last 48 hours, and can confirm that it never happens on 4.0 or 4.1.
Happens with SATA drives, never with PATA. See the dmesg with SATA at
the bottom (dmesg is for -stable, no custom changes otherwise).

Firstly, while I try to upgrade my 4.1/amd64 box to 4.2/amd64, the
upgrade script tries to fsck /dev/wd0a, but gives me the following:

wd0a: id not found reading fsbn 128 of 128 143 (wd0 bn 8755093022399; cn
547... tn 80 sn 4), retrying
wd0: transfer error, downgrading to Ultra: DMA mode 4

It downgrades down to DMA mode 2, finally gives up as FAILED, and
instructs me to fsck manually (which doesn't work either).

And, disklabel output at this point looks strange:

#size   offset  fstype [fsize bsize  cpg]
  a:  40365128755093022271  4.2BSD  0 0  256 

The expected offset for partition a is of course 63. (disklabel in
4.2/i386 does the same too.)

Just to confirm my observations, I managed to replace disklabel binary
of 4.2 installation (in /sbin of rd0a) with the one from 4.1, and it
does not have this issue.

4.2 disklabel behaves the same in install mode too. I mean, if I give up
on upgrade (which is every time) and choose to install instead, after I
drop to disklabel editor, and print the existing partitions, I see
exactly the same huge number as the offset. But the difference is that
if I continue with install without changing anything in disklabel
editor, newfs cannot format the partition, and gives me the same id not
found... downgrading ... DMA mode to... errors as above, and finally
gives up.

Therefore, the only way to install 4.2 (on my system with SATA HD
already partitioned) is to zero out the partition table and recreate all
the partitions in disklabel editor, and everything works fine
thereafter.

If this issue did happen with 4.0 and 4.1 too, then I could blame my
hardware (perhaps nvidia chipset). But upgrading from 4.0 to 4.1 is
fine. Up/downgrading from 4.2 to again 4.2 works fine too. And,
otherwise this system has been running fine for more than a year now.

Also, trying to *fake* downgrade from 4.2 to 4.1 fails during fsck (did
not really downgrade, just wanted to test disklabel, fsck, mount, and
newfs of 4.1). But downgrade from 4.2 to 4.0 seems to fsck fine.

I have tried with many different partitioning, enabled/disabled IDE and
SATA ports in bios, and used install42.iso (-release), cd42.iso
(snapshot), etc.

disklabel output after first boot seems fine, i.e. the issue I am
reporting is only during installation of 4.2.

I have seen that there are major changes to disklabel (and related
tools) since June. Could this issue be related with those?

I would appreciate any help. I can file a bug report if this is really a
bug.

(I myself have tried a patch before submitting this post, namely, a
typecast to u_int64_t for the starting_sector in find_bounds() in
editor.c, but it did not fix the disklabel offset. Lines 1649 and 1650
in -stable.)

OpenBSD 4.2-stable (STABLE) #6: Sun Dec  2 17:51:00 EET 2007
[EMAIL PROTECTED]:/usr/src/sys/arch/amd64/compile/STABLE
real mem = 1073278976 (1023MB)
avail mem = 1030926336 (983MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xf (75 entries)
bios0: vendor Phoenix Technologies, LTD version ASUS A8N5X ACPI BIOS
Revision 1003 date 06/01/2006
bios0: ASUSTeK Computer INC. A8N5X
acpi at mainbus0 not configured
cpu0 at mainbus0: (uniprocessor)
cpu0: AMD Athlon(tm) 64 Processor 3700+, 2211.58 MHz
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB
64b/line 16-way L2 cache
cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully
associative
cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully
associative
cpu0: AMD erratum 89 present, BIOS upgrade may be required
cpu0: Cool'n'Quiet K8 2211 MHz: speeds: 2200 2000 1800 1000 MHz
pci0 at mainbus0 bus 0: configuration mode 1
NVIDIA nForce4 DDR rev 0xa3 at pci0 dev 0 function 0 not configured
pcib0 at pci0 dev 1 function 0 NVIDIA nForce4 ISA rev 0xa3
nviic0 at pci0 dev 1 function 1 NVIDIA nForce4 SMBus rev 0xa2
iic0 at nviic0
iic1 at nviic0
pciide0 at pci0 dev 6 function 0 NVIDIA nForce4 IDE rev 0xf2: DMA,
channel 0 configured to compatibility, channel 1 configured to
compatibility
pciide0: channel 0 disabled (no drives)
atapiscsi0 at pciide0 channel 1 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0: HL-DT-ST, CD-RW GCE-8527B, 1.02 SCSI0
5/cdrom removable
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
pciide1 at pci0 dev 7 function 0 NVIDIA nForce4 SATA rev 0xf3: DMA
pciide1: using irq 11 for native-PCI interrupt
pciide2 at pci0 dev 8 

Re: disklabel (?) issues during upgrade to 4.2

2008-01-03 Thread Otto Moerbeek
On Thu, Jan 03, 2008 at 06:56:18PM +0200, Soner Tari wrote:

 The problem I am facing happens during installation of OpenBSD 4.2
 -release, -stable, or -current as of January 1st (both amd64 and i386).
 I can very easily reproduce this issue every time. I've been testing for
 the last 48 hours, and can confirm that it never happens on 4.0 or 4.1.
 Happens with SATA drives, never with PATA. See the dmesg with SATA at
 the bottom (dmesg is for -stable, no custom changes otherwise).
 
 Firstly, while I try to upgrade my 4.1/amd64 box to 4.2/amd64, the
 upgrade script tries to fsck /dev/wd0a, but gives me the following:
 
 wd0a: id not found reading fsbn 128 of 128 143 (wd0 bn 8755093022399; cn
 547... tn 80 sn 4), retrying
 wd0: transfer error, downgrading to Ultra: DMA mode 4
 
 It downgrades down to DMA mode 2, finally gives up as FAILED, and
 instructs me to fsck manually (which doesn't work either).
 
 And, disklabel output at this point looks strange:
 
 #size   offset  fstype [fsize bsize  cpg]
   a:  40365128755093022271  4.2BSD  0 0  256 
 
 The expected offset for partition a is of course 63. (disklabel in
 4.2/i386 does the same too.)

This can happen if you have a version mismatch between kernel and
disklabel executable.

 
 Just to confirm my observations, I managed to replace disklabel binary
 of 4.2 installation (in /sbin of rd0a) with the one from 4.1, and it
 does not have this issue.

So that suggests you are NOT running a 4.2 kernel during your upgrade.

 
 4.2 disklabel behaves the same in install mode too. I mean, if I give up
 on upgrade (which is every time) and choose to install instead, after I
 drop to disklabel editor, and print the existing partitions, I see
 exactly the same huge number as the offset. But the difference is that
 if I continue with install without changing anything in disklabel
 editor, newfs cannot format the partition, and gives me the same id not
 found... downgrading ... DMA mode to... errors as above, and finally
 gives up.
 
 Therefore, the only way to install 4.2 (on my system with SATA HD
 already partitioned) is to zero out the partition table and recreate all
 the partitions in disklabel editor, and everything works fine
 thereafter.
 
 If this issue did happen with 4.0 and 4.1 too, then I could blame my
 hardware (perhaps nvidia chipset). But upgrading from 4.0 to 4.1 is
 fine. Up/downgrading from 4.2 to again 4.2 works fine too. And,
 otherwise this system has been running fine for more than a year now.
 
 Also, trying to *fake* downgrade from 4.2 to 4.1 fails during fsck (did
 not really downgrade, just wanted to test disklabel, fsck, mount, and
 newfs of 4.1). But downgrade from 4.2 to 4.0 seems to fsck fine.

Downgrades are NOT supported.

Some backrground info: the disklabel format changed from 4.1 to 4.2.
A 4.2 kernel makes sure to translate the format, and the new tools handle
things fine too. We tested many upgrade scenarios, and they all work
fine. So far reports like yours all have boiled down to version conflicts.

I'm pretyy confident you mixed versions somewhere in your process, and
with all your experimenting, you could very well have made the on-disk
label faulty.

 
 I have tried with many different partitioning, enabled/disabled IDE and
 SATA ports in bios, and used install42.iso (-release), cd42.iso
 (snapshot), etc.
 
 disklabel output after first boot seems fine, i.e. the issue I am
 reporting is only during installation of 4.2.

Another indicator your are NOT running a 4.2 kernel during install/upgrade.

 
 I have seen that there are major changes to disklabel (and related
 tools) since June. Could this issue be related with those?
 
 I would appreciate any help. I can file a bug report if this is really a
 bug.

My bets are on a user error.

-Otto

 
 (I myself have tried a patch before submitting this post, namely, a
 typecast to u_int64_t for the starting_sector in find_bounds() in
 editor.c, but it did not fix the disklabel offset. Lines 1649 and 1650
 in -stable.)
 
 OpenBSD 4.2-stable (STABLE) #6: Sun Dec  2 17:51:00 EET 2007
 [EMAIL PROTECTED]:/usr/src/sys/arch/amd64/compile/STABLE
 real mem = 1073278976 (1023MB)
 avail mem = 1030926336 (983MB)
 mainbus0 at root
 bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xf (75 entries)
 bios0: vendor Phoenix Technologies, LTD version ASUS A8N5X ACPI BIOS
 Revision 1003 date 06/01/2006
 bios0: ASUSTeK Computer INC. A8N5X
 acpi at mainbus0 not configured
 cpu0 at mainbus0: (uniprocessor)
 cpu0: AMD Athlon(tm) 64 Processor 3700+, 2211.58 MHz
 cpu0:
 FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
 cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 1MB
 64b/line 16-way L2 cache
 cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully
 associative
 cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully
 associative
 cpu0: AMD 

Re: disklabel (?) issues during upgrade to 4.2

2008-01-03 Thread Otto Moerbeek
On Fri, Jan 04, 2008 at 12:06:04AM +0200, Soner Tari wrote:

 On Thu, 2008-01-03 at 19:15 +0100, Otto Moerbeek wrote:
  Downgrades are NOT supported.
  
  Some backrground info: the disklabel format changed from 4.1 to 4.2.
  A 4.2 kernel makes sure to translate the format, and the new tools handle
  things fine too. We tested many upgrade scenarios, and they all work
  fine. So far reports like yours all have boiled down to version conflicts.
  
  I'm pretyy confident you mixed versions somewhere in your process, and
  with all your experimenting, you could very well have made the on-disk
  label faulty.
 
 Hmmm, I always thought that the phrase Downgrades are NOT supported.
 referred to binaries, libraries, ports packages, and such. It never
 occurred to me that it refers to disk labels too. This must be
 especially true going from 4.1 to 4.2.

The disklabel format has changed to allow for large (2TB) partitions
and disks. A 4.2 kernel knows how to deal with the old format and the
new format, but a 4.1 kernel obvioulsy does not know how to handle the
new format. Installing 4.2 and writing a disklabel (which will happen
if the label is edited and/or a newfs is done) will convert the
on-disk label to the new format.

-Otto

 
 Please note that I have never downgraded OpenBSD binaries and such,
 ever... But this was my test/development system, and I was testing the
 upgrade feature of my project. So I was reinstalling 4.1 over 4.2 (not
 downgrading), without touching the partitions. So, this is effectively
 an unintentional downgrade of the disk label.
 
 To continue my upgrading tests, I *have to* reinstall 4.1 over 4.2 again
 and again. In order not to have this label downgrading issue, do you
 think it is enough to 'z' on disklabel editor and recreate all
 partitions during 4.1 install (over 4.2 installed HD)? (I have dd'd the
 HD with /dev/zero after your comments above, so I am not sure if just
 'z' on disklabel editor is enough.) Is there a faster way for my upgrade
 test cycles?
 
 Thank you very much.