Re: /usr/obj partition AWOL

2007-06-08 Thread Markus Lude
On Thu, Jun 07, 2007 at 09:06:32AM +0200, Otto Moerbeek wrote:
 
 On Wed, 6 Jun 2007, Otto Moerbeek wrote:
 
  On Wed, 6 Jun 2007, Markus Lude wrote:
  
   On Tue, Jun 05, 2007 at 07:51:48AM +0200, Otto Moerbeek wrote:

There were some validations checkc added to partitions. If a bad
partition is found, it will be marked unused. The checks were a
little to strict for some cases. A fix for that went in yesterday, so
try a new snap. 
   
   Thanks for your info.
   
   After rebuilding kernel and userland the problem still exists, but now
   the affected partitions are /var, /home and /data. Hmm. Unmounting /data
   and doing a manual fsck -f runs without problems.
   
If the problem persists, please report with full disklabel output.
   
   $ cat /etc/fstab
   /dev/wd0a / ffs rw 1 1
   /dev/wd0d /tmp ffs rw,nodev,nosuid 1 2
   /dev/wd0e /usr ffs rw,nodev 1 2
   /dev/wd0f /var ffs rw,nodev,nosuid 1 2
   /dev/wd0g /home ffs rw,nodev,nosuid 1 2
   /dev/wd0h /data ffs rw,nodev,nosuid 1 2
   /dev/wd1d /backup ffs rw,nodev,nosuid 1 2
   
   with an actual kernel:
   
   $ sudo disklabel wd0
   # /dev/rwd0c:
   type: ESDI
   disk: ESDI/IDE disk
   label: ST3120213A  
   flags:
   bytes/sector: 512
   sectors/track: 63
   tracks/cylinder: 16
   sectors/cylinder: 1008
   cylinders: 16383
   total sectors: 16514064
  ^^^
  
  1008 * 16383 = 16514064
  
   rpm: 3600
   interleave: 1
   trackskew: 0
   cylinderskew: 0
   headswitch: 0   # microseconds
   track-to-track seek: 0  # microseconds
   drivedata: 0 
   
   16 partitions:
   # sizeoffset  fstype [fsize bsize  cpg]
 a:   1024128 0  4.2BSD   2048 16384   16 # Cyl 0 -  
   1015 
 b:   3072384   1024128swap   # Cyl  1016 -  
   4063 
 c: 234441648 0  unused  0 0  # Cyl 0 
   -232580 
  ^
  
  Your disk size and c partition size do not match. Can you send a
  dmesg, to see what the actual size of your disk is? This is really
  needed to see what is going on.
  
  Did you at any time edit the disk size by hand?

No, at least I can't remember it.

 d:   2048256   4096512  4.2BSD   2048 16384   16 # Cyl  4064 -  
   6095 
 e:  20479536   6144768  4.2BSD   2048 16384   16 # Cyl  6096 - 
   26412 
   disklabel: partition c: partition extends past end of unit
   disklabel: partition e: partition extends past end of unit
   
   older kernel:
   $ sudo disklabel wd0
   [...]
   16 partitions:
   # sizeoffset  fstype [fsize bsize  cpg]
 a:   1024128 0  4.2BSD  0 0   16 # Cyl 0 -  
   1015 
 b:   3072384   1024128swap   # Cyl  1016 -  
   4063 
 c: 234441648 0  unused  0 0  # Cyl 0 
   -232580 
 d:   2048256   4096512  4.2BSD  0 0   16 # Cyl  4064 -  
   6095 
 e:  20479536   6144768  4.2BSD  0 0   16 # Cyl  6096 - 
   26412 
 f:   4095504  26624304  4.2BSD  0 0   16 # Cyl 26413 - 
   30475 
 g:  20479536  30719808  4.2BSD  0 0   16 # Cyl 30476 - 
   50792 
 h: 183242304  51199344  4.2BSD  0 0   16 # Cyl 50793 
   -232580 
   disklabel: partition c: partition extends past end of unit
   disklabel: partition e: partition extends past end of unit
   disklabel: partition f: offset past end of unit
   disklabel: partition f: partition extends past end of unit
   disklabel: partition g: offset past end of unit
   disklabel: partition g: partition extends past end of unit
   disklabel: partition h: offset past end of unit
   disklabel: partition h: partition extends past end of unit
   
   Any hints how to fix this beside repartition and reinstall?
  
  If possible, please leave the disk as is, until we've done further
  diagnosis.  If that is not possible, you can use the 'e' command in
  disklabel, to set the actual size of the disk to the size (in sectors)
  reported in the dmesg.  You might need to adjust the 'c' partition as
  well. 
 
 After having sen your dmesg, I see that your disk size is really
 234441648 sectors. The disklabel says 16514064 though.  The new
 consistency checks did not like that. The consistency checks have been
 disabled in two steps (rev 1.44. and rev 1.66 of
 sys/kern/subr_disk.c). So a current kernel should not trip on this
 anymore. 
 
 There remain two questions: how did the size end up being wrong in the
 disklabel, and how to repair.
 
 To the first question I can only guess; it could be you dd'ed an image
 from another disk, you edited the size by hand or we are seeing the
 results of a (old?) bug in disklabel handling that now surfaced
 because of the concistency checks. 
 
 The second question I already answered: using the 'e' command in
 disklabel lets you set the size of the disk in the label. After that,
 

Invalid partition table (was /usr/obj partition AWOL)

2007-06-08 Thread Emilio Perea
On Thu, Jun 07, 2007 at 04:58:18PM -0500, Emilio Perea wrote:
 On Thu, Jun 07, 2007 at 07:50:24PM +0200, Otto Moerbeek wrote:
  I have thinking a bit more about the problem, and it is very likely the
  following scenario happened:
  
  1. Kernel upgrade by source.
  
  2. Reboot
  
  3. Kernel reads old disklabel format and converts it in-memory to the
  new v1 format. 
  
  4. Run a newfs using the old executable that does not know about the
  new disklabel format. newfs writes the block and fragment size info
  the old way, on a spot that is used in v1 labels to store the high 16
  bits of the offset and size of a partition. The label is written with
  version = 1, since the in-memory copy is v1. 
  
  5. Reboot, the kernel now sees a v1 disklabel with very high offset
  and/or size, the new consistency code (which is now disabled) kicks in
  and marks the partition as unused. 
  
  So the lesson here is: keep userland and kernel in sync, or use a
  snapshot to upgrade. 
 
 I believe that's exactly what happened the first time.  The catch is
 that kernel and userland were being built from the same cvs update, and
 I thought I was keeping them in sync.  In this case it would probably
 have been better to skip the reboot between building the kernel and the
 userland.

It might have been better to start a whole new thread, but it seemed
logical to believe that the problems might be related.  Using recent
snapshots, last night's insecurity output showed another disklabel
change: 

==
sd1 diffs (-OLD  +NEW)
==
--- /var/backups/disklabel.sd1.current  Fri Apr 20 01:31:19 2007
+++ /var/backups/disklabel.sd1  Fri Jun  8 01:31:55 2007
@@ -1,4 +1,4 @@
-# Inside MBR partition 0: type A6 start 63 size 71681967
+disklabel: warning, DOS partition table with no valid OpenBSD partition
 # /dev/rsd1c:
 type: SCSI
 disk: da0s1
*--*

The full output of disklabel and dmesg follow, but as I was getting
ready to send it, I remembered that this same disk had problems with the
disklabel changes last October.  For some reason it was shown as having
a FreeBSD disklabel.  Most of correspondence regarding it was off-list,
but involved several developers and ended with Ken Westerback suggesting
some tests before setting it to OpenBSD.

This was fdisk then:

Disk: sd1   geometry: 4462/255/63 [71682030 Sectors]
Offset: 0   Signature: 0xAA55
 Starting   Ending   LBA Info:
 #: idC   H  S -C   H  S [   start:  size   ]

*0: A60   1  1 - 4461 254 63 [  63:71681967 ] OpenBSD
 1: 000   0  0 -0   0  0 [   0:   0 ] unused
 2: 000   0  0 -0   0  0 [   0:   0 ] unused
 3: 000   0  0 -0   0  0 [   0:   0 ] unused

This is now:

Disk: sd1   geometry: 4462/255/63 [71687370 Sectors]
Offset: 0   Signature: 0xAA55
 Starting   Ending   LBA Info:
 #: idC   H  S -C   H  S [   start:  size   ]

 0: 000   0  0 -0   0  0 [   0:   0 ] unused
 1: 000   0  0 -0   0  0 [   0:   0 ] unused
 2: 000   0  0 -0   0  0 [   0:   0 ] unused
*3: A50   0  1 -3  28 41 [   0:   5 ] FreeBSD
*--*

It is currently working fine.  Should I just change the partition ID to
A6, or is there something else I should try first?

*--*
disklabel: warning, DOS partition table with no valid OpenBSD partition
# /dev/rsd1c:
type: SCSI
disk: da0s1
label: 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 4462
total sectors: 71687370
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0   # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

15 partitions:
# sizeoffset  fstype [fsize bsize  cpg]
  c:  7168196763  unused  0 0  # Cyl 0*-  4461 
  d:   210445263  4.2BSD   2048 16384  132 # Cyl 0*-   130 
  e:   8385930   2104515  4.2BSD   2048 16384  328 # Cyl   131 -   652 
  f:  23294250  48387780  4.2BSD   2048 16384  328 # Cyl  3012 -  4461 
  h:   4112640  15936480  4.2BSD   2048 16384  256 # Cyl   992 -  1247 
  i:   2104515  40933620  4.2BSD   2048 163841 # Cyl  2548 -  2678 
  j:  18828180  20049120  4.2BSD   2048 16384  328 # Cyl  1248 -  2419 
  k:   5349645  43038135  4.2BSD   2048 16384   16 # Cyl  2679 -  3011 
  l:   2056320  38877300  4.2BSD   2048 16384  128 # Cyl  2420 -  2547 
  m:   2104515  10490445  4.2BSD   2048 16384  132 # Cyl   653 -   783 
 

Re: Invalid partition table (was /usr/obj partition AWOL)

2007-06-08 Thread Kenneth R Westerback
This is very odd on several fronts. First, someone has obviously
been writing on the MBR for no good reason. I just tested an fdisk
compiled to day and noticed no oddities on my i386.

Second, the fact that you find a disklabel. Since we no longer store
or look for disklabels in FreeBSD partitions it is being
read from sector 1 if I recall the code correctly. But it should not
have been writing the disklabel there when there was an OpenBSD
partition to store it in.

Do you know if this is exactly the same disklabel you were using
before? Have you changed anything in the disklabel recently that
would identify this as an artifact that just happened to be lying in
sector 1 for a while?

Can you copy the MBR and send it to me. There might be a clue as to
what overwrote it. Then I would do fdisk -i and see what happens.
This will move the OpenBSD partition to partition 3, but cover the
entire disk as your original MBR did. Then see if the disklabel,
which should be read from the OpenBSD partition says.

 Ken

On Fri, Jun 08, 2007 at 09:08:21PM -0500, Emilio Perea wrote:
 On Thu, Jun 07, 2007 at 04:58:18PM -0500, Emilio Perea wrote:
  On Thu, Jun 07, 2007 at 07:50:24PM +0200, Otto Moerbeek wrote:
   I have thinking a bit more about the problem, and it is very likely the
   following scenario happened:
   
   1. Kernel upgrade by source.
   
   2. Reboot
   
   3. Kernel reads old disklabel format and converts it in-memory to the
   new v1 format. 
   
   4. Run a newfs using the old executable that does not know about the
   new disklabel format. newfs writes the block and fragment size info
   the old way, on a spot that is used in v1 labels to store the high 16
   bits of the offset and size of a partition. The label is written with
   version = 1, since the in-memory copy is v1. 
   
   5. Reboot, the kernel now sees a v1 disklabel with very high offset
   and/or size, the new consistency code (which is now disabled) kicks in
   and marks the partition as unused. 
   
   So the lesson here is: keep userland and kernel in sync, or use a
   snapshot to upgrade. 
  
  I believe that's exactly what happened the first time.  The catch is
  that kernel and userland were being built from the same cvs update, and
  I thought I was keeping them in sync.  In this case it would probably
  have been better to skip the reboot between building the kernel and the
  userland.
 
 It might have been better to start a whole new thread, but it seemed
 logical to believe that the problems might be related.  Using recent
 snapshots, last night's insecurity output showed another disklabel
 change: 
 
 ==
 sd1 diffs (-OLD  +NEW)
 ==
 --- /var/backups/disklabel.sd1.currentFri Apr 20 01:31:19 2007
 +++ /var/backups/disklabel.sd1Fri Jun  8 01:31:55 2007
 @@ -1,4 +1,4 @@
 -# Inside MBR partition 0: type A6 start 63 size 71681967
 +disklabel: warning, DOS partition table with no valid OpenBSD partition
  # /dev/rsd1c:
  type: SCSI
  disk: da0s1
 *--*
 
 The full output of disklabel and dmesg follow, but as I was getting
 ready to send it, I remembered that this same disk had problems with the
 disklabel changes last October.  For some reason it was shown as having
 a FreeBSD disklabel.  Most of correspondence regarding it was off-list,
 but involved several developers and ended with Ken Westerback suggesting
 some tests before setting it to OpenBSD.
 
 This was fdisk then:
 
 Disk: sd1   geometry: 4462/255/63 [71682030 Sectors]
 Offset: 0   Signature: 0xAA55
  Starting   Ending   LBA Info:
  #: idC   H  S -C   H  S [   start:  size   ]
 
 *0: A60   1  1 - 4461 254 63 [  63:71681967 ] OpenBSD
  1: 000   0  0 -0   0  0 [   0:   0 ] unused
  2: 000   0  0 -0   0  0 [   0:   0 ] unused
  3: 000   0  0 -0   0  0 [   0:   0 ] unused
 
 This is now:
 
 Disk: sd1 geometry: 4462/255/63 [71687370 Sectors]
 Offset: 0 Signature: 0xAA55
  Starting   Ending   LBA Info:
  #: idC   H  S -C   H  S [   start:  size   ]
 
  0: 000   0  0 -0   0  0 [   0:   0 ] unused
  1: 000   0  0 -0   0  0 [   0:   0 ] unused
  2: 000   0  0 -0   0  0 [   0:   0 ] unused
 *3: A50   0  1 -3  28 41 [   0:   5 ] FreeBSD
 *--*
 
 It is currently working fine.  Should I just change the partition ID to
 A6, or is there something else I should try first?
 
 *--*
 disklabel: warning, DOS partition table with no valid OpenBSD partition
 # /dev/rsd1c:

Re: Invalid partition table (was /usr/obj partition AWOL)

2007-06-08 Thread Theo de Raadt
   c:  7168196763  unused  0 0  # Cyl 0*-  
 4461 
   d:   210445263  4.2BSD   2048 16384  132 # Cyl 0*-   
 130 

Ah -- your 'c' partition does not start at 0.

It's an old FreeBSD partition on your disk.  That should not work; it
is bunk.  We are removing the code from the kernel that allows it to
work, because it requires extra stupid checks all over the place to
support an old 386BSD stupidity.

I hope that our new disklabel command, upon re-writing that label, will
repair that.

Todd?  That's the way to handle this, right?



Re: Invalid partition table (was /usr/obj partition AWOL)

2007-06-08 Thread Jimmy Mitchener

On 6/8/07, Theo de Raadt [EMAIL PROTECTED] wrote:

   c:  7168196763  unused  0 0  # Cyl 0*-  4461
   d:   210445263  4.2BSD   2048 16384  132 # Cyl 0*-   130

Ah -- your 'c' partition does not start at 0.

It's an old FreeBSD partition on your disk.  That should not work; it
is bunk.  We are removing the code from the kernel that allows it to
work, because it requires extra stupid checks all over the place to
support an old 386BSD stupidity.


It appears I have the very same issue, though with a much larger
offset. I created an OpenBSD partition on an existing partition table
towards the end of the drive.

[EMAIL PROTECTED]:~ sudo fdisk wd0
Disk: wd0   geometry: 11978/255/63 [192426570 Sectors]
Offset: 0   Signature: 0xAA55
Starting   Ending   LBA Info:
#: idC   H  S -C   H  S [   start:  size   ]

0: E8 15356  77  8 - 229721 118  4 [   246698998:  3443776305 ] Unknown ID
1: 010   0  1 - 267349  89  4 [   0:   0 ] DOS FAT-12
2: 000   0  0 -0   0  0 [   0:   0 ] unused
3: 3F0   0  1 - 267349  89  4 [   0:   0 ] Unknown ID
[EMAIL PROTECTED]:~ sudo disklabel wd0
# /dev/rwd0c:
type: ESDI
disk: ad0s3
label:
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 11978
total sectors: 192426570
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0   # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0

8 partitions:
# sizeoffset  fstype [fsize bsize  cpg]
 a:208845 17395  4.2BSD   2048 16384   13 # Cyl  9683 -  9695
 b:   4192965 155766240swap   # Cyl  9696 -  9956
 c:  36869175 17395  unused  0 0  # Cyl  9683 - 11977
 d:401625 159959205  4.2BSD   2048 16384   25 # Cyl  9957 -  9981
 e:  20964825 160360830  4.2BSD   2048 16384  328 # Cyl  9982 - 11286
 f:  11100915 181325655  4.2BSD   2048 16384  328 # Cyl 11287 - 11977
disklabel: warning, unused partition i: size 1413615339 offset -2147417768
disklabel: warning, unused partition j: size -196918 offset 402701520
disklabel: warning, unused partition k: size 503365533 offset 1463353529
disklabel: warning, unused partition l: size -1407327343 offset -1382830702
disklabel: warning, unused partition m: size -2013104760 offset -1065155243
disklabel: warning, unused partition n: size 402998726 offset 268977606
disklabel: warning, unused partition o: size -400023365 offset 17760440
disklabel: warning, unused partition p: size 1086332943 offset -356507121
[EMAIL PROTECTED]:~


Jimmy.



Re: Invalid partition table (was /usr/obj partition AWOL)

2007-06-08 Thread Emilio Perea
On Fri, Jun 08, 2007 at 10:41:40PM -0400, Kenneth R Westerback wrote:
 This is very odd on several fronts. First, someone has obviously
 been writing on the MBR for no good reason. I just tested an fdisk
 compiled to day and noticed no oddities on my i386.
 
 Second, the fact that you find a disklabel. Since we no longer store
 or look for disklabels in FreeBSD partitions it is being
 read from sector 1 if I recall the code correctly. But it should not
 have been writing the disklabel there when there was an OpenBSD
 partition to store it in.
 
 Do you know if this is exactly the same disklabel you were using
 before? Have you changed anything in the disklabel recently that
 would identify this as an artifact that just happened to be lying in
 sector 1 for a while?

Other than reducing the size of the last partition a couple of months
ago, there has been no (intentional) change to that disklabel since:

 On Wed, Oct 11, 2006 at 08:09:08AM -0700, K WESTERBACK wrote:
  Darn. A perfectly good theory shot to hell. :-).
  
  It would seem that you have a 'valid' disklabel at
  sector 1 of that disk.
  
  First, if you could save the first two sectors of the disk
  with
  
  dd if=/dev/rsd1c of=SaveMySectors bs=512 count=2
  
  and send me that file, and do two experiments, I would
  appreciate it.
 
  If you can run fdisk against the disk and change the partition
  type to 'A6' (OpenBSD) the correct disklabel should be read
  in and you should get the 'old' info back again.
 
  Second, if you are the risk taking type, change partition type
  back to 'A5' (FreeBSD) and zero out sector 1 on the disk with
  something like
  
  dd if=/dev/zero of=/dev/rsd1c bs=512 count=1 seek=1
  
  Then see what disklabel says. You should get a simple
  spoofed disklabel with 'c' and 'i' partitions.
 
  Finally, changing the partition type to 'A6' again should give
  you access to the data.

That was the last change I'm aware of.

 Can you copy the MBR and send it to me. There might be a clue as to
 what overwrote it. Then I would do fdisk -i and see what happens.
 This will move the OpenBSD partition to partition 3, but cover the
 entire disk as your original MBR did. Then see if the disklabel,
 which should be read from the OpenBSD partition says.

I'll send the file attached to the next message, since I assume it would
be stripped from the mailing list.

After running fdisk -i sd1:

# fdisk sd1
Disk: sd1   geometry: 4462/255/63 [71687370 Sectors]
Offset: 0   Signature: 0xAA55
 Starting   Ending   LBA Info:
 #: idC   H  S -C   H  S [   start:  size   ]

 0: 000   0  0 -0   0  0 [   0:   0 ] unused
 1: 000   0  0 -0   0  0 [   0:   0 ] unused
 2: 000   0  0 -0   0  0 [   0:   0 ] unused
*3: A60   1  1 - 4461 254 63 [  63:71681967 ] OpenBSD

It's back as an OpenBSD disklabel, but the c partition still starts at
63 rather than 0:

# disklabel sd1
# Inside MBR partition 3: type A6 start 63 size 71681967
# /dev/rsd1c:
type: SCSI
disk: da0s1
label:
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 4462
total sectors: 71687370
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0   # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0

15 partitions:
# sizeoffset  fstype [fsize bsize  cpg]
  c:  7168196763  unused  0 0  # Cyl 0*-  4461
  d:   210445263  4.2BSD   2048 16384  132 # Cyl 0*-   130
  e:   8385930   2104515  4.2BSD   2048 16384  328 # Cyl   131 -   652
  f:  23294250  48387780  4.2BSD   2048 16384  328 # Cyl  3012 -  4461
  h:   4112640  15936480  4.2BSD   2048 16384  256 # Cyl   992 -  1247
  i:   2104515  40933620  4.2BSD   2048 163841 # Cyl  2548 -  2678
  j:  18828180  20049120  4.2BSD   2048 16384  328 # Cyl  1248 -  2419
  k:   5349645  43038135  4.2BSD   2048 16384   16 # Cyl  2679 -  3011
  l:   2056320  38877300  4.2BSD   2048 16384  128 # Cyl  2420 -  2547
  m:   2104515  10490445  4.2BSD   2048 16384  132 # Cyl   653 -   783
  n:   2056320  12594960  4.2BSD   2048 163841 # Cyl   784 -   911

Emilio



Re: /usr/obj partition AWOL

2007-06-07 Thread Otto Moerbeek
On Wed, 6 Jun 2007, Otto Moerbeek wrote:

 On Wed, 6 Jun 2007, Markus Lude wrote:
 
  On Tue, Jun 05, 2007 at 07:51:48AM +0200, Otto Moerbeek wrote:
   
   On Tue, 5 Jun 2007, Markus Lude wrote:
   
On Mon, Jun 04, 2007 at 06:02:59PM -0500, Emilio Perea wrote:
 I follow -current on an i386 at work and an amd64 at home, and rarely
 run into any problem which is not self-inflicted.  So when I had a 
 weird
 experience this weekend, I assumed it was my fault.
 
 What happened was that after the usual sequence of [build kernel;
 reboot; build userland; reboot] the system complained that it could 
 not
 fsck wd1j and dropped into single-user mode.  wd1j is mounted on
 /usr/obj, and I thought that something in the last build had messed it
 up, so I ran newfs wd1j and got 
 
  newfs: /dev/rwd1j: Device not configured
 
 disklabel wd1 showed partitions d-i and k-p, but no j.  I added the
 partition, ran newfs, and everything seemed fine.  This afternoon I
 installed the i386 snapshot downloaded this morning (dated Jun 3 
 19:19)
 on the work pc, and after reboot it was missing the /usr/obj partition
 (sd0g in this case).
 
 Everything seems to be working fine on both computers, but I didn't
 expect the partitions to disappear.  Did nobody else run into this
 problem?  Or did everybody else who saw it thought it was too 
 obvious
 to mention it to the mailing list?

I had a similar problem on sparc64 with a snapshot from jun 2. The
system was unable to fsck some partitions and dropped to single user
mode.
Here the problems were with the /usr, /var, /tmp and /home partitions.
Some further (and larger partitions) weren't affected.

I installed an older snapshot.

Any suggestions how to get this fixed or what to test/try?
   
   There were some validations checkc added to partitions. If a bad
   partition is found, it will be marked unused. The checks were a
   little to strict for some cases. A fix for that went in yesterday, so
   try a new snap. 
  
  Thanks for your info.
  
  After rebuilding kernel and userland the problem still exists, but now
  the affected partitions are /var, /home and /data. Hmm. Unmounting /data
  and doing a manual fsck -f runs without problems.
  
   If the problem persists, please report with full disklabel output.
  
  $ cat /etc/fstab
  /dev/wd0a / ffs rw 1 1
  /dev/wd0d /tmp ffs rw,nodev,nosuid 1 2
  /dev/wd0e /usr ffs rw,nodev 1 2
  /dev/wd0f /var ffs rw,nodev,nosuid 1 2
  /dev/wd0g /home ffs rw,nodev,nosuid 1 2
  /dev/wd0h /data ffs rw,nodev,nosuid 1 2
  /dev/wd1d /backup ffs rw,nodev,nosuid 1 2
  
  with an actual kernel:
  
  $ sudo disklabel wd0
  # /dev/rwd0c:
  type: ESDI
  disk: ESDI/IDE disk
  label: ST3120213A  
  flags:
  bytes/sector: 512
  sectors/track: 63
  tracks/cylinder: 16
  sectors/cylinder: 1008
  cylinders: 16383
  total sectors: 16514064
 ^^^
 
 1008 * 16383 = 16514064
 
  rpm: 3600
  interleave: 1
  trackskew: 0
  cylinderskew: 0
  headswitch: 0   # microseconds
  track-to-track seek: 0  # microseconds
  drivedata: 0 
  
  16 partitions:
  # sizeoffset  fstype [fsize bsize  cpg]
a:   1024128 0  4.2BSD   2048 16384   16 # Cyl 0 -  
  1015 
b:   3072384   1024128swap   # Cyl  1016 -  
  4063 
c: 234441648 0  unused  0 0  # Cyl 0 
  -232580 
 ^
 
 Your disk size and c partition size do not match. Can you send a
 dmesg, to see what the actual size of your disk is? This is really
 needed to see what is going on.
 
 Did you at any time edit the disk size by hand?
 
d:   2048256   4096512  4.2BSD   2048 16384   16 # Cyl  4064 -  
  6095 
e:  20479536   6144768  4.2BSD   2048 16384   16 # Cyl  6096 - 
  26412 
  disklabel: partition c: partition extends past end of unit
  disklabel: partition e: partition extends past end of unit
  
  older kernel:
  $ sudo disklabel wd0
  [...]
  16 partitions:
  # sizeoffset  fstype [fsize bsize  cpg]
a:   1024128 0  4.2BSD  0 0   16 # Cyl 0 -  
  1015 
b:   3072384   1024128swap   # Cyl  1016 -  
  4063 
c: 234441648 0  unused  0 0  # Cyl 0 
  -232580 
d:   2048256   4096512  4.2BSD  0 0   16 # Cyl  4064 -  
  6095 
e:  20479536   6144768  4.2BSD  0 0   16 # Cyl  6096 - 
  26412 
f:   4095504  26624304  4.2BSD  0 0   16 # Cyl 26413 - 
  30475 
g:  20479536  30719808  4.2BSD  0 0   16 # Cyl 30476 - 
  50792 
h: 183242304  51199344  4.2BSD  0 0   16 # Cyl 50793 
  -232580 
  disklabel: partition c: partition extends past end of unit
  disklabel: partition e: partition extends past 

Re: /usr/obj partition AWOL

2007-06-07 Thread Otto Moerbeek
On Tue, 5 Jun 2007, Emilio Perea wrote:

 On Tue, Jun 05, 2007 at 07:51:48AM +0200, Otto Moerbeek wrote:
  There were some validations checkc added to partitions. If a bad
  partition is found, it will be marked unused. The checks were a
  little to strict for some cases. A fix for that went in yesterday, so
  try a new snap. 
  
  If the problem persists, please report with full disklabel output.
 
 The problem showed up on the latest snapshot as of now, which may well
 have been built before the fix you mention was incorporated.  The home
 PC running -current has not had a problem since Saturday afternoon.
 
 The daily insecurity reports show four changes in this partition during
 the last couple of months.  (Note that since this is on /usr/obj on a PC
 running -current, newfs is run just about every day.)  It seems funny
 that on May 29 the fsize and bsize were changed to 0, but nothing weird
 happened until the day after they were changed to what appeared to be
 more reasonable numbers.
 
 Anyhow, in case the information is useful, the insecurity messages and
 current disklabel follow:
 
 ==
 sd0 diffs (-OLD  +NEW)
 ==
 --- /var/backups/disklabel.sd0.currentFri Apr 21 01:31:35 2006
 +++ /var/backups/disklabel.sd0Tue Apr 17 01:31:10 2007
 @@ -26,4 +26,4 @@
d:   1048128   3144384  4.2BSD   2048 16384  416 # Cyl  1236 -  
 1647 
e:   1048128   4192512  4.2BSD   2048 16384  416 # Cyl  1648 -  
 2059 
f:   8387568   5240640  4.2BSD   2048 16384  480 # Cyl  2060 -  
 5356 
 -  g:   4139682  13628208  4.2BSD   2048 16384  480 # Cyl  5357 -  
 6984*
 +  g:   4139682  13628208  4.2BSD   2048 163841 # Cyl  5357 -  
 6984*

The cpg change is due to making newfs cylinder unaware.
.
 
 ==
 sd0 diffs (-OLD  +NEW)
 ==
 --- /var/backups/disklabel.sd0.currentTue Apr 17 01:31:10 2007
 +++ /var/backups/disklabel.sd0Wed May 30 01:32:08 2007
 @@ -26,4 +26,4 @@
d:   1048128   3144384  4.2BSD   2048 16384  416 # Cyl  1236 -  
 1647 
e:   1048128   4192512  4.2BSD   2048 16384  416 # Cyl  1648 -  
 2059 
f:   8387568   5240640  4.2BSD   2048 16384  480 # Cyl  2060 -  
 5356 
 -  g:   4139682  13628208  4.2BSD   2048 163841 # Cyl  5357 -  
 6984*
 +  g:   4139682  13628208  4.2BSD  0 01 # Cyl  5357 -  
 6984*

Here you are running with a new kernel, but userland is still old.
Hense the 0 fsize and bsize

 ==
 sd0 diffs (-OLD  +NEW)
 ==
 --- /var/backups/disklabel.sd0.currentWed May 30 01:32:08 2007
 +++ /var/backups/disklabel.sd0Fri Jun  1 01:32:15 2007
 @@ -26,4 +26,4 @@
d:   1048128   3144384  4.2BSD   2048 16384  416 # Cyl  1236 -  
 1647 
e:   1048128   4192512  4.2BSD   2048 16384  416 # Cyl  1648 -  
 2059 
f:   8387568   5240640  4.2BSD   2048 16384  480 # Cyl  2060 -  
 5356 
 -  g:   4139682  13628208  4.2BSD  0 01 # Cyl  5357 -  
 6984*
 +  g:   4139682  13628208  4.2BSD   2048  81921 # Cyl  5357 -  
 6984*

newfs is run, but it is still using the old struct partition format.
Hence the wrong fsize anf bsize.

 
 ==
 sd0 diffs (-OLD  +NEW)
 ==
 --- /var/backups/disklabel.sd0.currentFri Jun  1 01:32:15 2007
 +++ /var/backups/disklabel.sd0Tue Jun  5 01:32:10 2007
 @@ -26,4 +26,4 @@
d:   1048128   3144384  4.2BSD   2048 16384  416 # Cyl  1236 -  
 1647 
e:   1048128   4192512  4.2BSD   2048 16384  416 # Cyl  1648 -  
 2059 
f:   8387568   5240640  4.2BSD   2048 16384  480 # Cyl  2060 -  
 5356 
 -  g:   4139682  13628208  4.2BSD   2048  81921 # Cyl  5357 -  
 6984*
 +  g:   4139682  13628208  4.2BSD   2048 163841 # Cyl  5357 -  
 6984*

And here things are back in shape.

 
 
 # Inside MBR partition 3: type A6 start 63 size 17767827
 # /dev/rsd0c:
 type: SCSI
 disk: SCSI disk
 label: ST39102LW
 flags:
 bytes/sector: 512
 sectors/track: 212
 tracks/cylinder: 12
 sectors/cylinder: 2544
 cylinders: 6962
 total sectors: 17783240
 rpm: 3600
 interleave: 1
 trackskew: 0
 cylinderskew: 0
 headswitch: 0   # microseconds
 track-to-track seek: 0  # microseconds
 drivedata: 0
 
 16 partitions:
 # sizeoffset  fstype [fsize bsize  cpg]
   a:   209619363  4.2BSD   2048 16384  480 # Cyl 0*-   823
   b:   1048128   2096256swap   # Cyl   824 -  1235
   c:  17783240 0  unused  0 0  # Cyl 0 -  
 6990*
   d:   1048128   3144384  4.2BSD   2048 16384  416 # Cyl  1236 -  1647
   e:   1048128   4192512  4.2BSD   2048 16384  416 # Cyl  1648 -  2059
   f:   8387568   5240640  4.2BSD   2048 16384  480 # Cyl  2060 -  5356
   g:   4139682  13628208  4.2BSD   2048 163841 # Cyl  5357 -  
 6984*
 

We have seen some reports now on disappearing paritions. On sparc and

Re: /usr/obj partition AWOL

2007-06-07 Thread Otto Moerbeek
On Thu, 7 Jun 2007, Otto Moerbeek wrote:

 We have seen some reports now on disappearing paritions. On sparc and
 sparc64, there were actual bugs that have been fixed now. 
 
 For all platforms, the suspect new consistency checking code now been
 disabled until we find out what is causing the mishap, and (very)
 recent kernels should be back to normal.
 
 Please report with dikslabel info and dmesg if things are still going
 wrong. Preferable with fdisk (if applicable) and old disklabel
 information as well.

I have thinking a bit more about the problem, and it is very likely the
following scenario happened:

1. Kernel upgrade by source.

2. Reboot

3. Kernel reads old disklabel format and converts it in-memory to the
new v1 format. 

4. Run a newfs using the old executable that does not know about the
new disklabel format. newfs writes the block and fragment size info
the old way, on a spot that is used in v1 labels to store the high 16
bits of the offset and size of a partition. The label is written with
version = 1, since the in-memory copy is v1. 

5. Reboot, the kernel now sees a v1 disklabel with very high offset
and/or size, the new consistency code (which is now disabled) kicks in
and marks the partition as unused. 

So the lesson here is: keep userland and kernel in sync, or use a
snapshot to upgrade. 

-Otto



Re: /usr/obj partition AWOL

2007-06-07 Thread Emilio Perea
On Thu, Jun 07, 2007 at 07:50:24PM +0200, Otto Moerbeek wrote:
 I have thinking a bit more about the problem, and it is very likely the
 following scenario happened:
 
 1. Kernel upgrade by source.
 
 2. Reboot
 
 3. Kernel reads old disklabel format and converts it in-memory to the
 new v1 format. 
 
 4. Run a newfs using the old executable that does not know about the
 new disklabel format. newfs writes the block and fragment size info
 the old way, on a spot that is used in v1 labels to store the high 16
 bits of the offset and size of a partition. The label is written with
 version = 1, since the in-memory copy is v1. 
 
 5. Reboot, the kernel now sees a v1 disklabel with very high offset
 and/or size, the new consistency code (which is now disabled) kicks in
 and marks the partition as unused. 
 
 So the lesson here is: keep userland and kernel in sync, or use a
 snapshot to upgrade. 

I believe that's exactly what happened the first time.  The catch is
that kernel and userland were being built from the same cvs update, and
I thought I was keeping them in sync.  In this case it would probably
have been better to skip the reboot between building the kernel and the
userland.

I'll take newfs out of my build script (back to rm -rf /usr/obj/*) and
try to remember to use newfs before rebooting with a new kernel if I
want to avoid the wait.

Thanks again!

Emilio



Re: /usr/obj partition AWOL

2007-06-06 Thread Markus Lude
On Tue, Jun 05, 2007 at 07:51:48AM +0200, Otto Moerbeek wrote:
 
 On Tue, 5 Jun 2007, Markus Lude wrote:
 
  On Mon, Jun 04, 2007 at 06:02:59PM -0500, Emilio Perea wrote:
   I follow -current on an i386 at work and an amd64 at home, and rarely
   run into any problem which is not self-inflicted.  So when I had a weird
   experience this weekend, I assumed it was my fault.
   
   What happened was that after the usual sequence of [build kernel;
   reboot; build userland; reboot] the system complained that it could not
   fsck wd1j and dropped into single-user mode.  wd1j is mounted on
   /usr/obj, and I thought that something in the last build had messed it
   up, so I ran newfs wd1j and got 
   
newfs: /dev/rwd1j: Device not configured
   
   disklabel wd1 showed partitions d-i and k-p, but no j.  I added the
   partition, ran newfs, and everything seemed fine.  This afternoon I
   installed the i386 snapshot downloaded this morning (dated Jun 3 19:19)
   on the work pc, and after reboot it was missing the /usr/obj partition
   (sd0g in this case).
   
   Everything seems to be working fine on both computers, but I didn't
   expect the partitions to disappear.  Did nobody else run into this
   problem?  Or did everybody else who saw it thought it was too obvious
   to mention it to the mailing list?
  
  I had a similar problem on sparc64 with a snapshot from jun 2. The
  system was unable to fsck some partitions and dropped to single user
  mode.
  Here the problems were with the /usr, /var, /tmp and /home partitions.
  Some further (and larger partitions) weren't affected.
  
  I installed an older snapshot.
  
  Any suggestions how to get this fixed or what to test/try?
 
 There were some validations checkc added to partitions. If a bad
 partition is found, it will be marked unused. The checks were a
 little to strict for some cases. A fix for that went in yesterday, so
 try a new snap. 

Thanks for your info.

After rebuilding kernel and userland the problem still exists, but now
the affected partitions are /var, /home and /data. Hmm. Unmounting /data
and doing a manual fsck -f runs without problems.

 If the problem persists, please report with full disklabel output.

$ cat /etc/fstab
/dev/wd0a / ffs rw 1 1
/dev/wd0d /tmp ffs rw,nodev,nosuid 1 2
/dev/wd0e /usr ffs rw,nodev 1 2
/dev/wd0f /var ffs rw,nodev,nosuid 1 2
/dev/wd0g /home ffs rw,nodev,nosuid 1 2
/dev/wd0h /data ffs rw,nodev,nosuid 1 2
/dev/wd1d /backup ffs rw,nodev,nosuid 1 2

with an actual kernel:

$ sudo disklabel wd0
# /dev/rwd0c:
type: ESDI
disk: ESDI/IDE disk
label: ST3120213A  
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 16383
total sectors: 16514064
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0   # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

16 partitions:
# sizeoffset  fstype [fsize bsize  cpg]
  a:   1024128 0  4.2BSD   2048 16384   16 # Cyl 0 -  1015 
  b:   3072384   1024128swap   # Cyl  1016 -  4063 
  c: 234441648 0  unused  0 0  # Cyl 0 -232580 
  d:   2048256   4096512  4.2BSD   2048 16384   16 # Cyl  4064 -  6095 
  e:  20479536   6144768  4.2BSD   2048 16384   16 # Cyl  6096 - 26412 
disklabel: partition c: partition extends past end of unit
disklabel: partition e: partition extends past end of unit

older kernel:
$ sudo disklabel wd0
[...]
16 partitions:
# sizeoffset  fstype [fsize bsize  cpg]
  a:   1024128 0  4.2BSD  0 0   16 # Cyl 0 -  1015 
  b:   3072384   1024128swap   # Cyl  1016 -  4063 
  c: 234441648 0  unused  0 0  # Cyl 0 -232580 
  d:   2048256   4096512  4.2BSD  0 0   16 # Cyl  4064 -  6095 
  e:  20479536   6144768  4.2BSD  0 0   16 # Cyl  6096 - 26412 
  f:   4095504  26624304  4.2BSD  0 0   16 # Cyl 26413 - 30475 
  g:  20479536  30719808  4.2BSD  0 0   16 # Cyl 30476 - 50792 
  h: 183242304  51199344  4.2BSD  0 0   16 # Cyl 50793 -232580 
disklabel: partition c: partition extends past end of unit
disklabel: partition e: partition extends past end of unit
disklabel: partition f: offset past end of unit
disklabel: partition f: partition extends past end of unit
disklabel: partition g: offset past end of unit
disklabel: partition g: partition extends past end of unit
disklabel: partition h: offset past end of unit
disklabel: partition h: partition extends past end of unit

Any hints how to fix this beside repartition and reinstall?

Regards,
Markus



Re: /usr/obj partition AWOL

2007-06-06 Thread Otto Moerbeek
On Wed, 6 Jun 2007, Markus Lude wrote:

 On Tue, Jun 05, 2007 at 07:51:48AM +0200, Otto Moerbeek wrote:
  
  On Tue, 5 Jun 2007, Markus Lude wrote:
  
   On Mon, Jun 04, 2007 at 06:02:59PM -0500, Emilio Perea wrote:
I follow -current on an i386 at work and an amd64 at home, and rarely
run into any problem which is not self-inflicted.  So when I had a weird
experience this weekend, I assumed it was my fault.

What happened was that after the usual sequence of [build kernel;
reboot; build userland; reboot] the system complained that it could not
fsck wd1j and dropped into single-user mode.  wd1j is mounted on
/usr/obj, and I thought that something in the last build had messed it
up, so I ran newfs wd1j and got 

 newfs: /dev/rwd1j: Device not configured

disklabel wd1 showed partitions d-i and k-p, but no j.  I added the
partition, ran newfs, and everything seemed fine.  This afternoon I
installed the i386 snapshot downloaded this morning (dated Jun 3 19:19)
on the work pc, and after reboot it was missing the /usr/obj partition
(sd0g in this case).

Everything seems to be working fine on both computers, but I didn't
expect the partitions to disappear.  Did nobody else run into this
problem?  Or did everybody else who saw it thought it was too obvious
to mention it to the mailing list?
   
   I had a similar problem on sparc64 with a snapshot from jun 2. The
   system was unable to fsck some partitions and dropped to single user
   mode.
   Here the problems were with the /usr, /var, /tmp and /home partitions.
   Some further (and larger partitions) weren't affected.
   
   I installed an older snapshot.
   
   Any suggestions how to get this fixed or what to test/try?
  
  There were some validations checkc added to partitions. If a bad
  partition is found, it will be marked unused. The checks were a
  little to strict for some cases. A fix for that went in yesterday, so
  try a new snap. 
 
 Thanks for your info.
 
 After rebuilding kernel and userland the problem still exists, but now
 the affected partitions are /var, /home and /data. Hmm. Unmounting /data
 and doing a manual fsck -f runs without problems.
 
  If the problem persists, please report with full disklabel output.
 
 $ cat /etc/fstab
 /dev/wd0a / ffs rw 1 1
 /dev/wd0d /tmp ffs rw,nodev,nosuid 1 2
 /dev/wd0e /usr ffs rw,nodev 1 2
 /dev/wd0f /var ffs rw,nodev,nosuid 1 2
 /dev/wd0g /home ffs rw,nodev,nosuid 1 2
 /dev/wd0h /data ffs rw,nodev,nosuid 1 2
 /dev/wd1d /backup ffs rw,nodev,nosuid 1 2
 
 with an actual kernel:
 
 $ sudo disklabel wd0
 # /dev/rwd0c:
 type: ESDI
 disk: ESDI/IDE disk
 label: ST3120213A  
 flags:
 bytes/sector: 512
 sectors/track: 63
 tracks/cylinder: 16
 sectors/cylinder: 1008
 cylinders: 16383
 total sectors: 16514064
^^^

1008 * 16383 = 16514064

 rpm: 3600
 interleave: 1
 trackskew: 0
 cylinderskew: 0
 headswitch: 0   # microseconds
 track-to-track seek: 0  # microseconds
 drivedata: 0 
 
 16 partitions:
 # sizeoffset  fstype [fsize bsize  cpg]
   a:   1024128 0  4.2BSD   2048 16384   16 # Cyl 0 -  
 1015 
   b:   3072384   1024128swap   # Cyl  1016 -  
 4063 
   c: 234441648 0  unused  0 0  # Cyl 0 
 -232580 
^

Your disk size and c partition size do not match. Can you send a
dmesg, to see what the actual size of your disk is? This is really
needed to see what is going on.

Did you at any time edit the disk size by hand?

   d:   2048256   4096512  4.2BSD   2048 16384   16 # Cyl  4064 -  
 6095 
   e:  20479536   6144768  4.2BSD   2048 16384   16 # Cyl  6096 - 
 26412 
 disklabel: partition c: partition extends past end of unit
 disklabel: partition e: partition extends past end of unit
 
 older kernel:
 $ sudo disklabel wd0
 [...]
 16 partitions:
 # sizeoffset  fstype [fsize bsize  cpg]
   a:   1024128 0  4.2BSD  0 0   16 # Cyl 0 -  
 1015 
   b:   3072384   1024128swap   # Cyl  1016 -  
 4063 
   c: 234441648 0  unused  0 0  # Cyl 0 
 -232580 
   d:   2048256   4096512  4.2BSD  0 0   16 # Cyl  4064 -  
 6095 
   e:  20479536   6144768  4.2BSD  0 0   16 # Cyl  6096 - 
 26412 
   f:   4095504  26624304  4.2BSD  0 0   16 # Cyl 26413 - 
 30475 
   g:  20479536  30719808  4.2BSD  0 0   16 # Cyl 30476 - 
 50792 
   h: 183242304  51199344  4.2BSD  0 0   16 # Cyl 50793 
 -232580 
 disklabel: partition c: partition extends past end of unit
 disklabel: partition e: partition extends past end of unit
 disklabel: partition f: offset past end of unit
 disklabel: partition f: partition extends past end of unit
 disklabel: partition g: offset past end of unit
 disklabel: partition g: 

Re: /usr/obj partition AWOL

2007-06-05 Thread Otto Moerbeek
On Tue, 5 Jun 2007, Markus Lude wrote:

 On Mon, Jun 04, 2007 at 06:02:59PM -0500, Emilio Perea wrote:
  I follow -current on an i386 at work and an amd64 at home, and rarely
  run into any problem which is not self-inflicted.  So when I had a weird
  experience this weekend, I assumed it was my fault.
  
  What happened was that after the usual sequence of [build kernel;
  reboot; build userland; reboot] the system complained that it could not
  fsck wd1j and dropped into single-user mode.  wd1j is mounted on
  /usr/obj, and I thought that something in the last build had messed it
  up, so I ran newfs wd1j and got 
  
   newfs: /dev/rwd1j: Device not configured
  
  disklabel wd1 showed partitions d-i and k-p, but no j.  I added the
  partition, ran newfs, and everything seemed fine.  This afternoon I
  installed the i386 snapshot downloaded this morning (dated Jun 3 19:19)
  on the work pc, and after reboot it was missing the /usr/obj partition
  (sd0g in this case).
  
  Everything seems to be working fine on both computers, but I didn't
  expect the partitions to disappear.  Did nobody else run into this
  problem?  Or did everybody else who saw it thought it was too obvious
  to mention it to the mailing list?
 
 I had a similar problem on sparc64 with a snapshot from jun 2. The
 system was unable to fsck some partitions and dropped to single user
 mode.
 Here the problems were with the /usr, /var, /tmp and /home partitions.
 Some further (and larger partitions) weren't affected.
 
 I installed an older snapshot.
 
 Any suggestions how to get this fixed or what to test/try?

There were some validations checkc added to partitions. If a bad
partition is found, it will be marked unused. The checks were a
little to strict for some cases. A fix for that went in yesterday, so
try a new snap. 

If the problem persists, please report with full disklabel output.

-Otto



Re: /usr/obj partition AWOL

2007-06-05 Thread Emilio Perea
On Tue, Jun 05, 2007 at 07:51:48AM +0200, Otto Moerbeek wrote:
 There were some validations checkc added to partitions. If a bad
 partition is found, it will be marked unused. The checks were a
 little to strict for some cases. A fix for that went in yesterday, so
 try a new snap. 
 
 If the problem persists, please report with full disklabel output.

The problem showed up on the latest snapshot as of now, which may well
have been built before the fix you mention was incorporated.  The home
PC running -current has not had a problem since Saturday afternoon.

The daily insecurity reports show four changes in this partition during
the last couple of months.  (Note that since this is on /usr/obj on a PC
running -current, newfs is run just about every day.)  It seems funny
that on May 29 the fsize and bsize were changed to 0, but nothing weird
happened until the day after they were changed to what appeared to be
more reasonable numbers.

Anyhow, in case the information is useful, the insecurity messages and
current disklabel follow:

==
sd0 diffs (-OLD  +NEW)
==
--- /var/backups/disklabel.sd0.current  Fri Apr 21 01:31:35 2006
+++ /var/backups/disklabel.sd0  Tue Apr 17 01:31:10 2007
@@ -26,4 +26,4 @@
   d:   1048128   3144384  4.2BSD   2048 16384  416 # Cyl  1236 -  1647 
   e:   1048128   4192512  4.2BSD   2048 16384  416 # Cyl  1648 -  2059 
   f:   8387568   5240640  4.2BSD   2048 16384  480 # Cyl  2060 -  5356 
-  g:   4139682  13628208  4.2BSD   2048 16384  480 # Cyl  5357 -  6984*
+  g:   4139682  13628208  4.2BSD   2048 163841 # Cyl  5357 -  6984*

==
sd0 diffs (-OLD  +NEW)
==
--- /var/backups/disklabel.sd0.current  Tue Apr 17 01:31:10 2007
+++ /var/backups/disklabel.sd0  Wed May 30 01:32:08 2007
@@ -26,4 +26,4 @@
   d:   1048128   3144384  4.2BSD   2048 16384  416 # Cyl  1236 -  1647 
   e:   1048128   4192512  4.2BSD   2048 16384  416 # Cyl  1648 -  2059 
   f:   8387568   5240640  4.2BSD   2048 16384  480 # Cyl  2060 -  5356 
-  g:   4139682  13628208  4.2BSD   2048 163841 # Cyl  5357 -  6984*
+  g:   4139682  13628208  4.2BSD  0 01 # Cyl  5357 -  6984*

==
sd0 diffs (-OLD  +NEW)
==
--- /var/backups/disklabel.sd0.current  Wed May 30 01:32:08 2007
+++ /var/backups/disklabel.sd0  Fri Jun  1 01:32:15 2007
@@ -26,4 +26,4 @@
   d:   1048128   3144384  4.2BSD   2048 16384  416 # Cyl  1236 -  1647 
   e:   1048128   4192512  4.2BSD   2048 16384  416 # Cyl  1648 -  2059 
   f:   8387568   5240640  4.2BSD   2048 16384  480 # Cyl  2060 -  5356 
-  g:   4139682  13628208  4.2BSD  0 01 # Cyl  5357 -  6984*
+  g:   4139682  13628208  4.2BSD   2048  81921 # Cyl  5357 -  6984*

==
sd0 diffs (-OLD  +NEW)
==
--- /var/backups/disklabel.sd0.current  Fri Jun  1 01:32:15 2007
+++ /var/backups/disklabel.sd0  Tue Jun  5 01:32:10 2007
@@ -26,4 +26,4 @@
   d:   1048128   3144384  4.2BSD   2048 16384  416 # Cyl  1236 -  1647 
   e:   1048128   4192512  4.2BSD   2048 16384  416 # Cyl  1648 -  2059 
   f:   8387568   5240640  4.2BSD   2048 16384  480 # Cyl  2060 -  5356 
-  g:   4139682  13628208  4.2BSD   2048  81921 # Cyl  5357 -  6984*
+  g:   4139682  13628208  4.2BSD   2048 163841 # Cyl  5357 -  6984*


# Inside MBR partition 3: type A6 start 63 size 17767827
# /dev/rsd0c:
type: SCSI
disk: SCSI disk
label: ST39102LW
flags:
bytes/sector: 512
sectors/track: 212
tracks/cylinder: 12
sectors/cylinder: 2544
cylinders: 6962
total sectors: 17783240
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0   # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0

16 partitions:
# sizeoffset  fstype [fsize bsize  cpg]
  a:   209619363  4.2BSD   2048 16384  480 # Cyl 0*-   823
  b:   1048128   2096256swap   # Cyl   824 -  1235
  c:  17783240 0  unused  0 0  # Cyl 0 -  6990*
  d:   1048128   3144384  4.2BSD   2048 16384  416 # Cyl  1236 -  1647
  e:   1048128   4192512  4.2BSD   2048 16384  416 # Cyl  1648 -  2059
  f:   8387568   5240640  4.2BSD   2048 16384  480 # Cyl  2060 -  5356
  g:   4139682  13628208  4.2BSD   2048 163841 # Cyl  5357 -  6984*



/usr/obj partition AWOL

2007-06-04 Thread Emilio Perea
I follow -current on an i386 at work and an amd64 at home, and rarely
run into any problem which is not self-inflicted.  So when I had a weird
experience this weekend, I assumed it was my fault.

What happened was that after the usual sequence of [build kernel;
reboot; build userland; reboot] the system complained that it could not
fsck wd1j and dropped into single-user mode.  wd1j is mounted on
/usr/obj, and I thought that something in the last build had messed it
up, so I ran newfs wd1j and got 

 newfs: /dev/rwd1j: Device not configured

disklabel wd1 showed partitions d-i and k-p, but no j.  I added the
partition, ran newfs, and everything seemed fine.  This afternoon I
installed the i386 snapshot downloaded this morning (dated Jun 3 19:19)
on the work pc, and after reboot it was missing the /usr/obj partition
(sd0g in this case).

Everything seems to be working fine on both computers, but I didn't
expect the partitions to disappear.  Did nobody else run into this
problem?  Or did everybody else who saw it thought it was too obvious
to mention it to the mailing list?

Emilio



Re: /usr/obj partition AWOL

2007-06-04 Thread Markus Lude
On Mon, Jun 04, 2007 at 06:02:59PM -0500, Emilio Perea wrote:
 I follow -current on an i386 at work and an amd64 at home, and rarely
 run into any problem which is not self-inflicted.  So when I had a weird
 experience this weekend, I assumed it was my fault.
 
 What happened was that after the usual sequence of [build kernel;
 reboot; build userland; reboot] the system complained that it could not
 fsck wd1j and dropped into single-user mode.  wd1j is mounted on
 /usr/obj, and I thought that something in the last build had messed it
 up, so I ran newfs wd1j and got 
 
  newfs: /dev/rwd1j: Device not configured
 
 disklabel wd1 showed partitions d-i and k-p, but no j.  I added the
 partition, ran newfs, and everything seemed fine.  This afternoon I
 installed the i386 snapshot downloaded this morning (dated Jun 3 19:19)
 on the work pc, and after reboot it was missing the /usr/obj partition
 (sd0g in this case).
 
 Everything seems to be working fine on both computers, but I didn't
 expect the partitions to disappear.  Did nobody else run into this
 problem?  Or did everybody else who saw it thought it was too obvious
 to mention it to the mailing list?

I had a similar problem on sparc64 with a snapshot from jun 2. The
system was unable to fsck some partitions and dropped to single user
mode.
Here the problems were with the /usr, /var, /tmp and /home partitions.
Some further (and larger partitions) weren't affected.

I installed an older snapshot.

Any suggestions how to get this fixed or what to test/try?

Regards,
Markus