Re: Disk Problem

2015-07-11 Thread Mateusz Lenik
On Fri, Jul 10, 2015 at 05:57:15PM +, Vijay Sankar wrote:
 Quoting Otto Moerbeek o...@drijf.net:
 
 On Fri, Jul 10, 2015 at 04:04:04PM +, Vijay Sankar wrote:
 
 My objective for this weekend was to follow the new dpb and build ports
 without using sudo. So I was hoping to upgrade to the latest snapshot on a
 system that I use for tests.
 
 The test system has a 2TB drive and it had two 300GB partitions in it for
 ports and vm; and a 120GB SSD for the OS and used to look as follows:
 
 Filesystem SizeUsed   Avail Capacity  Mounted on
 /dev/sd1a 1005M   55.0M900M 6%/
 /dev/sd1k 64.5G   20.9G   40.3G34%/home
 /dev/sd1d  3.9G   10.0K3.7G 0%/tmp
 /dev/sd1f  2.0G966M946M51%/usr
 /dev/sd1g 1005M191M764M20%/usr/X11R6
 /dev/sd1h  9.8G2.9G6.5G31%/usr/local
 /dev/sd1j  2.0G2.0K1.9G 0%/usr/obj
 /dev/sd1i  2.0G827M1.1G43%/usr/src
 /dev/sd1e 13.5G   26.5M   12.8G 0%/var
 /dev/sd0h  298G176G107G62%/ports
 /dev/sd0f  298G   19.6G263G 7%/vm
 
 My /etc/fstab was
 
 4f0cd8b5e7fd8f6a.b none swap sw
 4f0cd8b5e7fd8f6a.a / ffs rw 1 1
 4f0cd8b5e7fd8f6a.k /home ffs rw,nodev,nosuid 1 2
 4f0cd8b5e7fd8f6a.d /tmp ffs rw,nodev,nosuid 1 2
 4f0cd8b5e7fd8f6a.f /usr ffs rw,nodev 1 2
 4f0cd8b5e7fd8f6a.g /usr/X11R6 ffs rw,nodev 1 2
 4f0cd8b5e7fd8f6a.h /usr/local ffs rw,nodev 1 2
 4f0cd8b5e7fd8f6a.j /usr/obj ffs rw,nodev,nosuid 1 2
 4f0cd8b5e7fd8f6a.i /usr/src ffs rw,nodev,nosuid 1 2
 4f0cd8b5e7fd8f6a.e /var ffs rw,nodev,nosuid 1 2
 4d43e3389228e319.h /ports ffs rw,nodev,nosuid 1 2
 4d43e3389228e319.f /vm ffs rw,nodev,nosuid 1 2
 
 I am not sure what happened -- but when I rebooted the system this morning
 /ports and /vm would not mount; so I commented out the last two lines in
 /etc/fstab and rebooted. After reboot disklabel seems to have changed
 completely and it currently looks like this:
 
 # disklabel sd0
 # /dev/rsd0c:
 type: SCSI
 disk: SCSI disk
 label: ST2000DM001-1CH1
 duid: 
 flags:
 bytes/sector: 512
 sectors/track: 63
 tracks/cylinder: 255
 sectors/cylinder: 16065
 cylinders: 503
 total sectors: 8089950
 boundstart: 0
 boundend: 8089950
 drivedata: 0
 
 16 partitions:
 #size   offset  fstype [fsize bsize  cpg]
   c:  80899500  unused
 
 
 Is there any way fix the disklabel or is this an error that is impossible to
 recover from? duid used to show up as 4d43e3389228e319 and not
 .
 
 Please let me know if you have any suggestions.
 
 Get your old label from /var/backups and try to restore it with
 disklabel -R.  You don't tell what your platform is, it might be that
 you also need to do fdisk work first to restore the mbr partition
 table.
 
 But of course, it is also interesting to know what happened to you
 disk. But since you do not tell us what you did you are on your own
 here.
 
  -Otto
 
 Thank you very much. I am running an older snapshot OpenBSD 5.7 -current as
 of Mar 19, 2015. I thought of -R with disklabel but since the drive seems to
 show itself as a 3950MB drive instead of a 2TB drive, I was not sure how to
 do this.
 
 The problem truly is I am not sure what I did to cause all this problem!!!
 The sequence of actions were as follows. Since I had not looked at this box
 for a while I was just logging in to look at where I had kept everything. I
 did a cd /ports/packages/amd64/all and got an input error when I tried to
 edit a file. So I did a shutdown -h now; opened the 3.5 and 2.5 hotswap
 drive bays and pulled both drives out and pushed them back in. Powered the
 system on at which point I was dropped into the shell because /vm and /ports
 had errors. So I tried to do a fsck_ffs and that failed. At that point I
 looked at disklabel and noticed that the duid was gone. fdisk sd0 does not
 show anything other than:
 
 # fdisk sd0
 Disk: sd0   geometry: 503/255/63 [8089950 Sectors]
 
 I tried the disklabel -R as you suggested;
 
 # disklabel -R sd0 disklabel.sd0.current
 disklabel: partition a: partition extends past end of unit
 disklabel: partition c: partition extends past end of unit
 disklabel: partition d: offset past end of unit
 disklabel: partition d: partition extends past end of unit
 disklabel: partition e: offset past end of unit
 disklabel: partition e: partition extends past end of unit
 disklabel: partition f: offset past end of unit
 disklabel: partition f: partition extends past end of unit
 disklabel: partition g: offset past end of unit
 disklabel: partition g: partition extends past end of unit
 disklabel: partition h: offset past end of unit
 disklabel: partition h: partition extends past end of unit
 disklabel: partition i: offset past end of unit
 disklabel: partition i: partition extends past end of unit
 
 Also tried
 # fdisk -i sd0
 Do you wish to write new MBR and partition table? [n] y
 Writing 

Re: Disk Problem

2015-07-11 Thread Eric Furman
On Fri, Jul 10, 2015, at 09:15 PM, Vijay Sankar wrote:
 
 Quoting Eric Furman ericfur...@fastmail.net:
 
  On Fri, Jul 10, 2015, at 01:57 PM, Vijay Sankar wrote:
  Quoting Otto Moerbeek o...@drijf.net:
 
   On Fri, Jul 10, 2015 at 04:04:04PM +, Vijay Sankar wrote:
  
   My objective for this weekend was to follow the new dpb and build ports
   without using sudo. So I was hoping to upgrade to the latest  
  snapshot on a
   system that I use for tests.
  
   The test system has a 2TB drive and it had two 300GB partitions in it 
   for
   ports and vm; and a 120GB SSD for the OS and used to look as follows:
  
   Filesystem SizeUsed   Avail Capacity  Mounted on
   /dev/sd1a 1005M   55.0M900M 6%/
   /dev/sd1k 64.5G   20.9G   40.3G34%/home
   /dev/sd1d  3.9G   10.0K3.7G 0%/tmp
   /dev/sd1f  2.0G966M946M51%/usr
   /dev/sd1g 1005M191M764M20%/usr/X11R6
   /dev/sd1h  9.8G2.9G6.5G31%/usr/local
   /dev/sd1j  2.0G2.0K1.9G 0%/usr/obj
   /dev/sd1i  2.0G827M1.1G43%/usr/src
   /dev/sd1e 13.5G   26.5M   12.8G 0%/var
   /dev/sd0h  298G176G107G62%/ports
   /dev/sd0f  298G   19.6G263G 7%/vm
  
   My /etc/fstab was
  
   4f0cd8b5e7fd8f6a.b none swap sw
   4f0cd8b5e7fd8f6a.a / ffs rw 1 1
   4f0cd8b5e7fd8f6a.k /home ffs rw,nodev,nosuid 1 2
   4f0cd8b5e7fd8f6a.d /tmp ffs rw,nodev,nosuid 1 2
   4f0cd8b5e7fd8f6a.f /usr ffs rw,nodev 1 2
   4f0cd8b5e7fd8f6a.g /usr/X11R6 ffs rw,nodev 1 2
   4f0cd8b5e7fd8f6a.h /usr/local ffs rw,nodev 1 2
   4f0cd8b5e7fd8f6a.j /usr/obj ffs rw,nodev,nosuid 1 2
   4f0cd8b5e7fd8f6a.i /usr/src ffs rw,nodev,nosuid 1 2
   4f0cd8b5e7fd8f6a.e /var ffs rw,nodev,nosuid 1 2
   4d43e3389228e319.h /ports ffs rw,nodev,nosuid 1 2
   4d43e3389228e319.f /vm ffs rw,nodev,nosuid 1 2
  
   I am not sure what happened -- but when I rebooted the system  
  this morning
   /ports and /vm would not mount; so I commented out the last two lines in
   /etc/fstab and rebooted. After reboot disklabel seems to have changed
   completely and it currently looks like this:
  
   # disklabel sd0
   # /dev/rsd0c:
   type: SCSI
   disk: SCSI disk
   label: ST2000DM001-1CH1
   duid: 
   flags:
   bytes/sector: 512
   sectors/track: 63
   tracks/cylinder: 255
   sectors/cylinder: 16065
   cylinders: 503
   total sectors: 8089950
   boundstart: 0
   boundend: 8089950
   drivedata: 0
  
   16 partitions:
   #size   offset  fstype [fsize bsize  cpg]
 c:  80899500  unused
  
  
   Is there any way fix the disklabel or is this an error that is  
  impossible to
   recover from? duid used to show up as 4d43e3389228e319 and not
   .
  
   Please let me know if you have any suggestions.
  
   Get your old label from /var/backups and try to restore it with
   disklabel -R.  You don't tell what your platform is, it might be that
   you also need to do fdisk work first to restore the mbr partition
   table.
  
   But of course, it is also interesting to know what happened to you
   disk. But since you do not tell us what you did you are on your own
   here.
  
-Otto
 
  Thank you very much. I am running an older snapshot OpenBSD 5.7
  -current as of Mar 19, 2015. I thought of -R with disklabel but since
  the drive seems to show itself as a 3950MB drive instead of a 2TB
  drive, I was not sure how to do this.
 
  The problem truly is I am not sure what I did to cause all this
  problem!!! The sequence of actions were as follows. Since I had not
  looked at this box for a while I was just logging in to look at where
  I had kept everything. I did a cd /ports/packages/amd64/all and got an
  input error when I tried to edit a file. So I did a shutdown -h now;
 
  -
  opened the 3.5 and 2.5 hotswap drive bays and pulled both drives out
  and pushed them back in. Powered the system on at which point I was
  ^^
 
  I am very curious to know why you did this.
  What am I missing here?
  -
 
  dropped into the shell because /vm and /ports had errors. So I tried
  to do a fsck_ffs and that failed. At that point I looked at disklabel
  and noticed that the duid was gone. fdisk sd0 does not show anything
  other than:
 
  # fdisk sd0
  Disk: sd0   geometry: 503/255/63 [8089950 Sectors]
 
  I tried the disklabel -R as you suggested;
 
  # disklabel -R sd0 disklabel.sd0.current
  disklabel: partition a: partition extends past end of unit
  disklabel: partition c: partition extends past end of unit
  disklabel: partition d: offset past end of unit
  disklabel: 

Re: Disk Problem

2015-07-10 Thread Vijay Sankar

Quoting Otto Moerbeek o...@drijf.net:


On Fri, Jul 10, 2015 at 04:04:04PM +, Vijay Sankar wrote:


My objective for this weekend was to follow the new dpb and build ports
without using sudo. So I was hoping to upgrade to the latest snapshot on a
system that I use for tests.

The test system has a 2TB drive and it had two 300GB partitions in it for
ports and vm; and a 120GB SSD for the OS and used to look as follows:

Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/sd1a 1005M   55.0M900M 6%/
/dev/sd1k 64.5G   20.9G   40.3G34%/home
/dev/sd1d  3.9G   10.0K3.7G 0%/tmp
/dev/sd1f  2.0G966M946M51%/usr
/dev/sd1g 1005M191M764M20%/usr/X11R6
/dev/sd1h  9.8G2.9G6.5G31%/usr/local
/dev/sd1j  2.0G2.0K1.9G 0%/usr/obj
/dev/sd1i  2.0G827M1.1G43%/usr/src
/dev/sd1e 13.5G   26.5M   12.8G 0%/var
/dev/sd0h  298G176G107G62%/ports
/dev/sd0f  298G   19.6G263G 7%/vm

My /etc/fstab was

4f0cd8b5e7fd8f6a.b none swap sw
4f0cd8b5e7fd8f6a.a / ffs rw 1 1
4f0cd8b5e7fd8f6a.k /home ffs rw,nodev,nosuid 1 2
4f0cd8b5e7fd8f6a.d /tmp ffs rw,nodev,nosuid 1 2
4f0cd8b5e7fd8f6a.f /usr ffs rw,nodev 1 2
4f0cd8b5e7fd8f6a.g /usr/X11R6 ffs rw,nodev 1 2
4f0cd8b5e7fd8f6a.h /usr/local ffs rw,nodev 1 2
4f0cd8b5e7fd8f6a.j /usr/obj ffs rw,nodev,nosuid 1 2
4f0cd8b5e7fd8f6a.i /usr/src ffs rw,nodev,nosuid 1 2
4f0cd8b5e7fd8f6a.e /var ffs rw,nodev,nosuid 1 2
4d43e3389228e319.h /ports ffs rw,nodev,nosuid 1 2
4d43e3389228e319.f /vm ffs rw,nodev,nosuid 1 2

I am not sure what happened -- but when I rebooted the system this morning
/ports and /vm would not mount; so I commented out the last two lines in
/etc/fstab and rebooted. After reboot disklabel seems to have changed
completely and it currently looks like this:

# disklabel sd0
# /dev/rsd0c:
type: SCSI
disk: SCSI disk
label: ST2000DM001-1CH1
duid: 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 503
total sectors: 8089950
boundstart: 0
boundend: 8089950
drivedata: 0

16 partitions:
#size   offset  fstype [fsize bsize  cpg]
  c:  80899500  unused


Is there any way fix the disklabel or is this an error that is impossible to
recover from? duid used to show up as 4d43e3389228e319 and not
.

Please let me know if you have any suggestions.


Get your old label from /var/backups and try to restore it with
disklabel -R.  You don't tell what your platform is, it might be that
you also need to do fdisk work first to restore the mbr partition
table.

But of course, it is also interesting to know what happened to you
disk. But since you do not tell us what you did you are on your own
here.

-Otto


Thank you very much. I am running an older snapshot OpenBSD 5.7  
-current as of Mar 19, 2015. I thought of -R with disklabel but since  
the drive seems to show itself as a 3950MB drive instead of a 2TB  
drive, I was not sure how to do this.


The problem truly is I am not sure what I did to cause all this  
problem!!! The sequence of actions were as follows. Since I had not  
looked at this box for a while I was just logging in to look at where  
I had kept everything. I did a cd /ports/packages/amd64/all and got an  
input error when I tried to edit a file. So I did a shutdown -h now;  
opened the 3.5 and 2.5 hotswap drive bays and pulled both drives out  
and pushed them back in. Powered the system on at which point I was  
dropped into the shell because /vm and /ports had errors. So I tried  
to do a fsck_ffs and that failed. At that point I looked at disklabel  
and noticed that the duid was gone. fdisk sd0 does not show anything  
other than:


# fdisk sd0
Disk: sd0   geometry: 503/255/63 [8089950 Sectors]

I tried the disklabel -R as you suggested;

# disklabel -R sd0 disklabel.sd0.current
disklabel: partition a: partition extends past end of unit
disklabel: partition c: partition extends past end of unit
disklabel: partition d: offset past end of unit
disklabel: partition d: partition extends past end of unit
disklabel: partition e: offset past end of unit
disklabel: partition e: partition extends past end of unit
disklabel: partition f: offset past end of unit
disklabel: partition f: partition extends past end of unit
disklabel: partition g: offset past end of unit
disklabel: partition g: partition extends past end of unit
disklabel: partition h: offset past end of unit
disklabel: partition h: partition extends past end of unit
disklabel: partition i: offset past end of unit
disklabel: partition i: partition extends past end of unit

Also tried
# fdisk -i sd0
Do you wish to write new MBR and partition table? [n] y
Writing MBR at offset 0.
fdisk: error writing MBR: Input/output error

Not sure whether there is any other option but Thanks very much for  
the help and 

Re: Disk Problem

2015-07-10 Thread Otto Moerbeek
On Fri, Jul 10, 2015 at 04:04:04PM +, Vijay Sankar wrote:

 My objective for this weekend was to follow the new dpb and build ports
 without using sudo. So I was hoping to upgrade to the latest snapshot on a
 system that I use for tests.
 
 The test system has a 2TB drive and it had two 300GB partitions in it for
 ports and vm; and a 120GB SSD for the OS and used to look as follows:
 
 Filesystem SizeUsed   Avail Capacity  Mounted on
 /dev/sd1a 1005M   55.0M900M 6%/
 /dev/sd1k 64.5G   20.9G   40.3G34%/home
 /dev/sd1d  3.9G   10.0K3.7G 0%/tmp
 /dev/sd1f  2.0G966M946M51%/usr
 /dev/sd1g 1005M191M764M20%/usr/X11R6
 /dev/sd1h  9.8G2.9G6.5G31%/usr/local
 /dev/sd1j  2.0G2.0K1.9G 0%/usr/obj
 /dev/sd1i  2.0G827M1.1G43%/usr/src
 /dev/sd1e 13.5G   26.5M   12.8G 0%/var
 /dev/sd0h  298G176G107G62%/ports
 /dev/sd0f  298G   19.6G263G 7%/vm
 
 My /etc/fstab was
 
 4f0cd8b5e7fd8f6a.b none swap sw
 4f0cd8b5e7fd8f6a.a / ffs rw 1 1
 4f0cd8b5e7fd8f6a.k /home ffs rw,nodev,nosuid 1 2
 4f0cd8b5e7fd8f6a.d /tmp ffs rw,nodev,nosuid 1 2
 4f0cd8b5e7fd8f6a.f /usr ffs rw,nodev 1 2
 4f0cd8b5e7fd8f6a.g /usr/X11R6 ffs rw,nodev 1 2
 4f0cd8b5e7fd8f6a.h /usr/local ffs rw,nodev 1 2
 4f0cd8b5e7fd8f6a.j /usr/obj ffs rw,nodev,nosuid 1 2
 4f0cd8b5e7fd8f6a.i /usr/src ffs rw,nodev,nosuid 1 2
 4f0cd8b5e7fd8f6a.e /var ffs rw,nodev,nosuid 1 2
 4d43e3389228e319.h /ports ffs rw,nodev,nosuid 1 2
 4d43e3389228e319.f /vm ffs rw,nodev,nosuid 1 2
 
 I am not sure what happened -- but when I rebooted the system this morning
 /ports and /vm would not mount; so I commented out the last two lines in
 /etc/fstab and rebooted. After reboot disklabel seems to have changed
 completely and it currently looks like this:
 
 # disklabel sd0
 # /dev/rsd0c:
 type: SCSI
 disk: SCSI disk
 label: ST2000DM001-1CH1
 duid: 
 flags:
 bytes/sector: 512
 sectors/track: 63
 tracks/cylinder: 255
 sectors/cylinder: 16065
 cylinders: 503
 total sectors: 8089950
 boundstart: 0
 boundend: 8089950
 drivedata: 0
 
 16 partitions:
 #size   offset  fstype [fsize bsize  cpg]
   c:  80899500  unused
 
 
 Is there any way fix the disklabel or is this an error that is impossible to
 recover from? duid used to show up as 4d43e3389228e319 and not
 .
 
 Please let me know if you have any suggestions.

Get your old label from /var/backups and try to restore it with
disklabel -R.  You don't tell what your platform is, it might be that
you also need to do fdisk work first to restore the mbr partition
table. 

But of course, it is also interesting to know what happened to you
disk. But since you do not tell us what you did you are on your own
here. 

-Otto



Disk Problem

2015-07-10 Thread Vijay Sankar
My objective for this weekend was to follow the new dpb and build  
ports without using sudo. So I was hoping to upgrade to the latest  
snapshot on a system that I use for tests.


The test system has a 2TB drive and it had two 300GB partitions in it  
for ports and vm; and a 120GB SSD for the OS and used to look as  
follows:


Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/sd1a 1005M   55.0M900M 6%/
/dev/sd1k 64.5G   20.9G   40.3G34%/home
/dev/sd1d  3.9G   10.0K3.7G 0%/tmp
/dev/sd1f  2.0G966M946M51%/usr
/dev/sd1g 1005M191M764M20%/usr/X11R6
/dev/sd1h  9.8G2.9G6.5G31%/usr/local
/dev/sd1j  2.0G2.0K1.9G 0%/usr/obj
/dev/sd1i  2.0G827M1.1G43%/usr/src
/dev/sd1e 13.5G   26.5M   12.8G 0%/var
/dev/sd0h  298G176G107G62%/ports
/dev/sd0f  298G   19.6G263G 7%/vm

My /etc/fstab was

4f0cd8b5e7fd8f6a.b none swap sw
4f0cd8b5e7fd8f6a.a / ffs rw 1 1
4f0cd8b5e7fd8f6a.k /home ffs rw,nodev,nosuid 1 2
4f0cd8b5e7fd8f6a.d /tmp ffs rw,nodev,nosuid 1 2
4f0cd8b5e7fd8f6a.f /usr ffs rw,nodev 1 2
4f0cd8b5e7fd8f6a.g /usr/X11R6 ffs rw,nodev 1 2
4f0cd8b5e7fd8f6a.h /usr/local ffs rw,nodev 1 2
4f0cd8b5e7fd8f6a.j /usr/obj ffs rw,nodev,nosuid 1 2
4f0cd8b5e7fd8f6a.i /usr/src ffs rw,nodev,nosuid 1 2
4f0cd8b5e7fd8f6a.e /var ffs rw,nodev,nosuid 1 2
4d43e3389228e319.h /ports ffs rw,nodev,nosuid 1 2
4d43e3389228e319.f /vm ffs rw,nodev,nosuid 1 2

I am not sure what happened -- but when I rebooted the system this  
morning /ports and /vm would not mount; so I commented out the last  
two lines in /etc/fstab and rebooted. After reboot disklabel seems to  
have changed completely and it currently looks like this:


# disklabel sd0
# /dev/rsd0c:
type: SCSI
disk: SCSI disk
label: ST2000DM001-1CH1
duid: 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 503
total sectors: 8089950
boundstart: 0
boundend: 8089950
drivedata: 0

16 partitions:
#size   offset  fstype [fsize bsize  cpg]
  c:  80899500  unused


Is there any way fix the disklabel or is this an error that is  
impossible to recover from? duid used to show up as 4d43e3389228e319  
and not .


Please let me know if you have any suggestions.

Thanks very much,

Vijay
--
Vijay Sankar, M.Eng., P.Eng.
ForeTell Technologies Limited
vsan...@foretell.ca



Re: Weird disk problem

2014-06-10 Thread David Vasek

On Sun, 8 Jun 2014, Christian Weisgerber wrote:


On 2014-06-05, David Vasek va...@fido.cz wrote:


Did you try smartctl from smartmontools for a more detailed report?


I assume there is a 1000-page SMART spec somewhere that would come
in handy for interpreting the responses?


I'm not an expert. But I believe there are some reading this mailing list.

There is a description of the interface available, but I don't think it 
can help you to interpret the numbers.


ftp://ftp.t10.org/t13/docs2004/D1699-ATA8-ACS.pdf
http://www.hgst.com/tech/techlib.nsf/techdocs/EF593BD721D5D2768825782D000B8111/$file/DS7K3000_US7K3000_SATA_OEMSpecRev1.3.pdf
(beware of the $ character in the url)

What I usually care about are attributes like Reallocated_Sector_Ct, 
Reallocated_Event_Count, Current_Pending_Sector, Offline_Uncorrectable, 
Spin_Retry_Count, UDMA_CRC_Error_Count. I monitor my drives in the long 
term and watch if any of these values rises. And of course, the SMART 
Error Log is important.


As for the other attributes such as Raw_Read_Error_Rate, 
Throughput_Performance and Seek_Error_Rate, every vendor seem to use it in 
a different way.


Btw, the model of Hitachi drive you have problems with is said to be one 
of the most reliable hard drives.


http://blog.backblaze.com/2014/01/21/what-hard-drive-should-i-buy/
http://www.hgst.com/tech/techlib.nsf/techdocs/EC6D440C3F64DBCC8825782300026498/$file/US7K3000_ds.pdf.
http://www.hgst.com/tech/techlib.nsf/products/Ultrastar_7K3000


smartctl -t short /dev/sd1c


Not supported, it seems.


It is surprising, all Hitachi hard drives I have support short test. If it 
isn't a secret, could I get the 'smartctl -a' output from your drive for 
comparison? Thanks.


Regards,
David



Re: Weird disk problem

2014-06-08 Thread Christian Weisgerber
On 2014-06-05, David Vasek va...@fido.cz wrote:

 Did you try smartctl from smartmontools for a more detailed report?

I assume there is a 1000-page SMART spec somewhere that would come
in handy for interpreting the responses?

 My favourite are:

 smartctl -a /dev/sd1c
 smartctl -l scttemp /dev/sd1c

Temperature is fine, never exceeded the limits.

 smartctl -t short /dev/sd1c

Not supported, it seems.

-- 
Christian naddy Weisgerber  na...@mips.inka.de



Re: Weird disk problem

2014-06-08 Thread Christian Weisgerber
On 2014-06-05, STeve Andre' and...@msu.edu wrote:

 I think you are relying on the smart system too much.

Not at all, but I knew people would immediately direct me to it.

 Certainly try what David said, but it's obvious that the disk is
 sick despite what the smart system may say.

I got a replacement disk and I'm now trying to get the data off the
old one.  (Nothing really important.)  That is proceeding fitfully.
There are spurts of 65 MB/s and then there are stretches of XXX
kB/s, XX kB/s, down to 5 kB/s.  At the current average rate it will
be going for five or six days, assuming the disk survives that long.

Whatever's wrong with it, it's a tenacious little bugger.  There
still hasn't been a single hard read error.  Anyway, I guess we can
close the topic.

-- 
Christian naddy Weisgerber  na...@mips.inka.de



Weird disk problem

2014-06-05 Thread Christian Weisgerber
I have a 3TB disk here...

sd1 at scsibus1 targ 1 lun 0: ATA, Hitachi HUA72303, MKAO SCSI3 0/direct 
fixed naa.5000cca225c5fbeb
sd1: 2861588MB, 512 bytes/sector, 5860533168 sectors

... that's serving as a general media dump with a single FFS2 file
system on it.

Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/sd1d  2.7T2.5T   63.7G98%/export

Yesterday, I experienced the odd effect that reading some files,
or parts of files, from that disk became excruciatingly slow.  We're
talking a few kB/s here.  Other files were fine.  There were no
kernel errors/warnings whatsoever.  There were no read errors, the
disk was just 100% busy and appeared to be returning data drip by
drip.

# atactl sd1 smartstatus
No SMART threshold exceeded

No change on reboot.  dd(1) from the raw device was initially fast,
then slowed to a crawl as it progressed.  I eventually fixed it
all by powering off the machine, jiggling the SATA connectors (all
fine), and powering the machine back up.

Tonight the problem is back.  Something is very wrong.  Given that
dd if=/dev/rsd1c also seems affected, the filesystem layer can be
excluded.  I won't cry too much over a dying disk, but why the heck
are there no error indications of any kind?

Any other ideas?

-- 
Christian naddy Weisgerber  na...@mips.inka.de



Re: Weird disk problem

2014-06-05 Thread David Vasek

On Thu, 5 Jun 2014, Christian Weisgerber wrote:


I have a 3TB disk here...

sd1 at scsibus1 targ 1 lun 0: ATA, Hitachi HUA72303, MKAO SCSI3 0/direct 
fixed naa.5000cca225c5fbeb
sd1: 2861588MB, 512 bytes/sector, 5860533168 sectors

... that's serving as a general media dump with a single FFS2 file
system on it.

Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/sd1d  2.7T2.5T   63.7G98%/export

Yesterday, I experienced the odd effect that reading some files,
or parts of files, from that disk became excruciatingly slow.  We're
talking a few kB/s here.  Other files were fine.  There were no
kernel errors/warnings whatsoever.  There were no read errors, the
disk was just 100% busy and appeared to be returning data drip by
drip.

# atactl sd1 smartstatus
No SMART threshold exceeded

No change on reboot.  dd(1) from the raw device was initially fast,
then slowed to a crawl as it progressed.  I eventually fixed it
all by powering off the machine, jiggling the SATA connectors (all
fine), and powering the machine back up.

Tonight the problem is back.  Something is very wrong.  Given that
dd if=/dev/rsd1c also seems affected, the filesystem layer can be
excluded.  I won't cry too much over a dying disk, but why the heck
are there no error indications of any kind?

Any other ideas?


Did you try smartctl from smartmontools for a more detailed report?

My favourite are:

smartctl -a /dev/sd1c
smartctl -l scttemp /dev/sd1c

smartctl -t short /dev/sd1c
smartctl -t long /dev/sd1c (will take several hours!!!)

smartctl -a /dev/sd1c (again after each of the tests)


Regards,
David



Re: Weird disk problem

2014-06-05 Thread STeve Andre'

On 06/05/14 17:38, Christian Weisgerber wrote:

I have a 3TB disk here...

sd1 at scsibus1 targ 1 lun 0: ATA, Hitachi HUA72303, MKAO SCSI3 0/direct 
fixed naa.5000cca225c5fbeb
sd1: 2861588MB, 512 bytes/sector, 5860533168 sectors

... that's serving as a general media dump with a single FFS2 file
system on it.

Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/sd1d  2.7T2.5T   63.7G98%/export

Yesterday, I experienced the odd effect that reading some files,
or parts of files, from that disk became excruciatingly slow.  We're
talking a few kB/s here.  Other files were fine.  There were no
kernel errors/warnings whatsoever.  There were no read errors, the
disk was just 100% busy and appeared to be returning data drip by
drip.

# atactl sd1 smartstatus
No SMART threshold exceeded

No change on reboot.  dd(1) from the raw device was initially fast,
then slowed to a crawl as it progressed.  I eventually fixed it
all by powering off the machine, jiggling the SATA connectors (all
fine), and powering the machine back up.

Tonight the problem is back.  Something is very wrong.  Given that
dd if=/dev/rsd1c also seems affected, the filesystem layer can be
excluded.  I won't cry too much over a dying disk, but why the heck
are there no error indications of any kind?

Any other ideas?



I think you are relying on the smart system too much.  Certainly try
what David said, but it's obvious that the disk is sick despite what the
smart system may say.

I've had about seven disk failures in the last several years.  Three or
four of them the smart system was absolutely correct, with the others
being less informative.  I've also had a false notice that a disk was bad,
but worked for several years, till it got too small for its task.

Smart is good, but it has its limitations.  It best deals with gradual
errors, not fast catastrophic ones.

--STeve Andre'



Re: Weird disk problem

2014-06-05 Thread Shawn K. Quinn
On Thu, Jun 5, 2014, at 05:24 PM, STeve Andre' wrote:
 On 06/05/14 17:38, Christian Weisgerber wrote:
  I have a 3TB disk here...
 
  sd1 at scsibus1 targ 1 lun 0: ATA, Hitachi HUA72303, MKAO SCSI3 0/direct 
  fixed naa.5000cca225c5fbeb
  sd1: 2861588MB, 512 bytes/sector, 5860533168 sectors
 
  ... that's serving as a general media dump with a single FFS2 file
  system on it.
 
  Filesystem SizeUsed   Avail Capacity  Mounted on
  /dev/sd1d  2.7T2.5T   63.7G98%/export
 
  Yesterday, I experienced the odd effect that reading some files,
  or parts of files, from that disk became excruciatingly slow.  We're
  talking a few kB/s here.  Other files were fine.  There were no
  kernel errors/warnings whatsoever.  There were no read errors, the
  disk was just 100% busy and appeared to be returning data drip by
  drip.
 
  # atactl sd1 smartstatus
  No SMART threshold exceeded
 
  No change on reboot.  dd(1) from the raw device was initially fast,
  then slowed to a crawl as it progressed.  I eventually fixed it
  all by powering off the machine, jiggling the SATA connectors (all
  fine), and powering the machine back up.
 
  Tonight the problem is back.  Something is very wrong.  Given that
  dd if=/dev/rsd1c also seems affected, the filesystem layer can be
  excluded.  I won't cry too much over a dying disk, but why the heck
  are there no error indications of any kind?
 
  Any other ideas?

Anything in dmesg/kernel log about operations timing out?
 
 I think you are relying on the smart system too much.  Certainly try
 what David said, but it's obvious that the disk is sick despite what the
 smart system may say.
 
 I've had about seven disk failures in the last several years.  Three or
 four of them the smart system was absolutely correct, with the others
 being less informative.  I've also had a false notice that a disk was
 bad,
 but worked for several years, till it got too small for its task.
 
 Smart is good, but it has its limitations.  It best deals with gradual
 errors, not fast catastrophic ones.

Running smartmontools should give you enough information to determine if
you have a sick disk, though it may require looking at the values and
seeing if you have a rise in e.g. the number of sectors remapped; I
would not trust atactl sd# smartstatus by itself. Failing that, there
are more time-honored empirical tests, such as assuming the worst for
the disk's health if it is making weird noises when it slows to a crawl.

It could also be either the SATA cabling or the SATA controller that is
having trouble after warming up (with specific bit patterns, or just in
general). I know that sounds weird, but SATA cables aren't that
expensive to replace and it's quite possible the OP got a dud.

-- 
  Shawn K. Quinn
  skqu...@rushpost.com



Disk problem with -current kernel

2006-10-10 Thread Emilio Perea
I ran into a problem when rebooting to a current kernel (i386 GENERIC)
due to a secondary disk without an 'a' partition.  Disk sd0 checked out
fine, but all the partitions on sd1 had bad magic numbers and failed
fsck:

/dev/rsd1d: BAD SUPER BLOCK: MAGIC NUMBER WRONG
/dev/rsd1d: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.
 ...
/dev/rsd1n: BAD SUPER BLOCK: MAGIC NUMBER WRONG
/dev/rsd1n: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.

Old disklabel sd1:

# Inside MBR partition 0: type A5 start 63 size 71681967
# /dev/rsd1c:
type: SCSI
disk: da0s1
label: 
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 4462
total sectors: 71687370
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0   # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

15 partitions:
# sizeoffset  fstype [fsize bsize  cpg]
  c:  7168196763  unused  0 0  # Cyl 0*-  4461 
  d:   210445263  4.2BSD   2048 16384  132 # Cyl 0*-   130 
  e:   8385930   2104515  4.2BSD   2048 16384  328 # Cyl   131 -   652 
  f:  23294250  48387780  4.2BSD   2048 16384  328 # Cyl  3012 -  4461 
  h:   4112640  15936480  4.2BSD   2048 16384  256 # Cyl   992 -  1247 
  i:   2104515  40933620  4.2BSD   2048 16384  132 # Cyl  2548 -  2678 
  j:  18828180  20049120  4.2BSD   2048 16384  328 # Cyl  1248 -  2419 
  k:   5349645  43038135  4.2BSD   2048 16384   16 # Cyl  2679 -  3011 
  l:   2056320  38877300  4.2BSD   2048 16384  128 # Cyl  2420 -  2547 
  m:   2104515  10490445  4.2BSD   2048 16384  132 # Cyl   653 -   783 
  n:   3341520  12594960  4.2BSD   2048 16384  208 # Cyl   784 -   991 

  New disklabel sd1:

# Inside MBR partition 0: type A5 start 63 size 71681967
# /dev/rsd1c:
type: SCSI
disk: SCSI disk
label: ST336705LW 
flags:
bytes/sector: 512
sectors/track: 470
tracks/cylinder: 8
sectors/cylinder: 3760
cylinders: 19036
total sectors: 71687370
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0   # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0 

16 partitions:
# sizeoffset  fstype [fsize bsize  cpg]
  c:  71687370 0  unused  0 0  # Cyl 0 - 19065*
  d:   2097000  34314840  4.2BSD   1024  8192   16 # Cyl  9126*-  9683 
  e:   1049040  36411840  4.2BSD   1024  8192   16 # Cyl  9684 -  9962 
  f:   4196160  37460880  4.2BSD   1024  8192   16 # Cyl  9963 - 11078 
  g:   4196160  41657040  4.2BSD   1024  8192   16 # Cyl 11079 - 12194 
  h:   8388560  45853200  4.2BSD   1024  8192   16 # Cyl 12195 - 14425 
  i:53008263  ext2fs   # Cyl 0*-   140*
  j:   1060290  16466625 unknown   # Cyl  4379*-  4661*
  k:  16787925  17526915  ext2fs   # Cyl  4661*-  9126*
  l:  15936480530145  ext2fs   # Cyl   140*-  4379*

  I assume this is due to using the new kernel with the old fsck and
  that installing the next snapshot will fix it.  If this is unexpected,
  please let me know if you want additional information.

Last dmesg, just in case...

OpenBSD 4.0-current (GENERIC) #1141: Sun Oct  8 13:54:04 MDT 2006
[EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Pentium(R) 4 CPU 1500MHz (GenuineIntel 686-class) 1.50 GHz
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM
real mem  = 804384768 (785532K)
avail mem = 725274624 (708276K)
using 4256 buffers containing 40341504 bytes (39396K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+(00) BIOS, date 06/06/01, BIOS32 rev. 0 @ 0xffe90, 
SMBIOS rev. 2.3 @ 0xf0450 (97 entries)
bios0: Dell Computer Corporation Precision 330
apm0 at bios0: Power Management spec V1.2
apm0: AC on, battery charge unknown
apm0: flags 30102 dobusy 0 doidle 1
pcibios0 at bios0: rev 2.1 @ 0xf/0x1
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfbbb0/176 (9 entries)
pcibios0: PCI Interrupt Router at 000:31:0 (Intel 82801BA LPC rev 0x00)
pcibios0: PCI bus #2 is the last bus
bios0: ROM list: 0xc/0xa800 0xca800/0x5800
cpu0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 Intel 82850 Host rev 0x02
ppb0 at pci0 dev 1 function 0 Intel 82850/82860 AGP rev 0x02
pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 NVIDIA Vanta rev 0x15
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ppb1 at pci0 dev 30 function 0 Intel 82801BA AGP rev 0x04
pci2 at ppb1 bus 2
fxp0 at pci2 dev 8 function 0 Intel 8255x rev 0x05, i82558: irq 10, address 
00:90:27:86:21:9c
inphy0 at fxp0 phy 1: i82555 10/100 PHY, rev. 0
ahc0 at pci2 dev 10 function 0 Adaptec AHA-2940U2 U2 rev 0x00: irq 11

Re: Disk problem with -current kernel

2006-10-10 Thread Nick Holland
Emilio Perea wrote:
 I ran into a problem when rebooting to a current kernel (i386 GENERIC)
 due to a secondary disk without an 'a' partition.  

I don't think the lack of an 'a' partition is your problem.  Goodness
knows, I've got a lot of machines with no 'a' partition on the second
and later disks.

 Disk sd0 checked out
 fine, but all the partitions on sd1 had bad magic numbers and failed
 fsck:
 
 /dev/rsd1d: BAD SUPER BLOCK: MAGIC NUMBER WRONG
 /dev/rsd1d: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.
  ...
 /dev/rsd1n: BAD SUPER BLOCK: MAGIC NUMBER WRONG
 /dev/rsd1n: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.
 
 Old disklabel sd1:
 
...
 15 partitions:
 # sizeoffset  fstype [fsize bsize  cpg]
   c:  7168196763  unused  0 0  # Cyl 0*-  
 4461 

huh?
c should be the entire disk.  There shouldn't be an offset there.

I'm not sure that is your problem, but that doesn't look right at all.
Messing with the 'c' partition is going to break things.

   d:   210445263  4.2BSD   2048 16384  132 # Cyl 0*-   
 130 
   e:   8385930   2104515  4.2BSD   2048 16384  328 # Cyl   131 -   
 652 
   f:  23294250  48387780  4.2BSD   2048 16384  328 # Cyl  3012 -  
 4461 
   h:   4112640  15936480  4.2BSD   2048 16384  256 # Cyl   992 -  
 1247 
   i:   2104515  40933620  4.2BSD   2048 16384  132 # Cyl  2548 -  
 2678 
   j:  18828180  20049120  4.2BSD   2048 16384  328 # Cyl  1248 -  
 2419 
   k:   5349645  43038135  4.2BSD   2048 16384   16 # Cyl  2679 -  
 3011 
   l:   2056320  38877300  4.2BSD   2048 16384  128 # Cyl  2420 -  
 2547 
   m:   2104515  10490445  4.2BSD   2048 16384  132 # Cyl   653 -   
 783 
   n:   3341520  12594960  4.2BSD   2048 16384  208 # Cyl   784 -   
 991 
 
   New disklabel sd1:

new? old? I'm not following that...
...
 16 partitions:
 # sizeoffset  fstype [fsize bsize  cpg]
   c:  71687370 0  unused  0 0  # Cyl 0 - 
 19065*
   d:   2097000  34314840  4.2BSD   1024  8192   16 # Cyl  9126*-  
 9683 
   e:   1049040  36411840  4.2BSD   1024  8192   16 # Cyl  9684 -  
 9962 
   f:   4196160  37460880  4.2BSD   1024  8192   16 # Cyl  9963 - 
 11078 
   g:   4196160  41657040  4.2BSD   1024  8192   16 # Cyl 11079 - 
 12194 
   h:   8388560  45853200  4.2BSD   1024  8192   16 # Cyl 12195 - 
 14425 
   i:53008263  ext2fs   # Cyl 0*-   
 140*
   j:   1060290  16466625 unknown   # Cyl  4379*-  
 4661*
   k:  16787925  17526915  ext2fs   # Cyl  4661*-  
 9126*
   l:  15936480530145  ext2fs   # Cyl   140*-  
 4379*
 

That's more like it...except you have a lot of partitions crossing
cylinder boundaries.  That's not a problem, but it makes checking for
overlapping partitions more difficult.  They may not grossly overlap,
but I didn't look for a few sector overlaps...which would really ruin
your day if they were there.
...
 
   I assume this is due to using the new kernel with the old fsck and
   that installing the next snapshot will fix it.  If this is unexpected,
   please let me know if you want additional information.
 
 Last dmesg, just in case...
 
 OpenBSD 4.0-current (GENERIC) #1141: Sun Oct  8 13:54:04 MDT 2006
 [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC
 cpu0: Intel(R) Pentium(R) 4 CPU 1500MHz (GenuineIntel 686-class) 1.50 GHz

gotta love a dmesg. :)

However, I'm confused by what you are showing me:
  a problem including an 'n' partition.
  a old, misconfigured drive with an 'n' partition
  a new, seemingly properly configured drive without the 'n' partition
  Looks like your drive geometry changed between old and new.  I'm
curious about why.  Usually, that means you changed controllers.

So...if the problem is with the first drive configuration, I'd try again
with a proper 'c' partition.  Otherwise..I'm confused...which isn't to
say I'm not missing something.

Nick.



Re: Disk problem with -current kernel

2006-10-10 Thread Nick Holland
Nick Holland wrote:
 Otherwise..I'm confused...which isn't to
 say I'm not missing something.

I've been informed that I *was* missing something, that this is a
problem which is being dealt with, beatings are being applied (including
to me, for missing it...).  Disregard my comments...things will be fixed
shortly...

Nick.



Re: Disk problem with -current kernel

2006-10-10 Thread Emilio Perea
On Tue, Oct 10, 2006 at 07:01:21PM -0400, Nick Holland wrote:
 Emilio Perea wrote:
  I ran into a problem when rebooting to a current kernel (i386 GENERIC)
  due to a secondary disk without an 'a' partition.  
 
 I don't think the lack of an 'a' partition is your problem.  Goodness
 knows, I've got a lot of machines with no 'a' partition on the second
 and later disks.

No, the problem was due to sd1's MBR partition type being A5 rather than
A6.  My apologies for not checking that before posting.  Last night's
change to disksubr.c broke it.  Thanks to Thordur and Pedro and Ken and
you for help in tracking this down.

I borrowed this disk from a dead server over five years ago and had
forgotten that I had not fdisk'd it at the time.  It's been running
OpenBSD since 2.8...

Mea culpa!

Emilio