Re: Disk Problem
On Fri, Jul 10, 2015 at 05:57:15PM +, Vijay Sankar wrote: Quoting Otto Moerbeek o...@drijf.net: On Fri, Jul 10, 2015 at 04:04:04PM +, Vijay Sankar wrote: My objective for this weekend was to follow the new dpb and build ports without using sudo. So I was hoping to upgrade to the latest snapshot on a system that I use for tests. The test system has a 2TB drive and it had two 300GB partitions in it for ports and vm; and a 120GB SSD for the OS and used to look as follows: Filesystem SizeUsed Avail Capacity Mounted on /dev/sd1a 1005M 55.0M900M 6%/ /dev/sd1k 64.5G 20.9G 40.3G34%/home /dev/sd1d 3.9G 10.0K3.7G 0%/tmp /dev/sd1f 2.0G966M946M51%/usr /dev/sd1g 1005M191M764M20%/usr/X11R6 /dev/sd1h 9.8G2.9G6.5G31%/usr/local /dev/sd1j 2.0G2.0K1.9G 0%/usr/obj /dev/sd1i 2.0G827M1.1G43%/usr/src /dev/sd1e 13.5G 26.5M 12.8G 0%/var /dev/sd0h 298G176G107G62%/ports /dev/sd0f 298G 19.6G263G 7%/vm My /etc/fstab was 4f0cd8b5e7fd8f6a.b none swap sw 4f0cd8b5e7fd8f6a.a / ffs rw 1 1 4f0cd8b5e7fd8f6a.k /home ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.d /tmp ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.f /usr ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.g /usr/X11R6 ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.h /usr/local ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.j /usr/obj ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.i /usr/src ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.e /var ffs rw,nodev,nosuid 1 2 4d43e3389228e319.h /ports ffs rw,nodev,nosuid 1 2 4d43e3389228e319.f /vm ffs rw,nodev,nosuid 1 2 I am not sure what happened -- but when I rebooted the system this morning /ports and /vm would not mount; so I commented out the last two lines in /etc/fstab and rebooted. After reboot disklabel seems to have changed completely and it currently looks like this: # disklabel sd0 # /dev/rsd0c: type: SCSI disk: SCSI disk label: ST2000DM001-1CH1 duid: flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 503 total sectors: 8089950 boundstart: 0 boundend: 8089950 drivedata: 0 16 partitions: #size offset fstype [fsize bsize cpg] c: 80899500 unused Is there any way fix the disklabel or is this an error that is impossible to recover from? duid used to show up as 4d43e3389228e319 and not . Please let me know if you have any suggestions. Get your old label from /var/backups and try to restore it with disklabel -R. You don't tell what your platform is, it might be that you also need to do fdisk work first to restore the mbr partition table. But of course, it is also interesting to know what happened to you disk. But since you do not tell us what you did you are on your own here. -Otto Thank you very much. I am running an older snapshot OpenBSD 5.7 -current as of Mar 19, 2015. I thought of -R with disklabel but since the drive seems to show itself as a 3950MB drive instead of a 2TB drive, I was not sure how to do this. The problem truly is I am not sure what I did to cause all this problem!!! The sequence of actions were as follows. Since I had not looked at this box for a while I was just logging in to look at where I had kept everything. I did a cd /ports/packages/amd64/all and got an input error when I tried to edit a file. So I did a shutdown -h now; opened the 3.5 and 2.5 hotswap drive bays and pulled both drives out and pushed them back in. Powered the system on at which point I was dropped into the shell because /vm and /ports had errors. So I tried to do a fsck_ffs and that failed. At that point I looked at disklabel and noticed that the duid was gone. fdisk sd0 does not show anything other than: # fdisk sd0 Disk: sd0 geometry: 503/255/63 [8089950 Sectors] I tried the disklabel -R as you suggested; # disklabel -R sd0 disklabel.sd0.current disklabel: partition a: partition extends past end of unit disklabel: partition c: partition extends past end of unit disklabel: partition d: offset past end of unit disklabel: partition d: partition extends past end of unit disklabel: partition e: offset past end of unit disklabel: partition e: partition extends past end of unit disklabel: partition f: offset past end of unit disklabel: partition f: partition extends past end of unit disklabel: partition g: offset past end of unit disklabel: partition g: partition extends past end of unit disklabel: partition h: offset past end of unit disklabel: partition h: partition extends past end of unit disklabel: partition i: offset past end of unit disklabel: partition i: partition extends past end of unit Also tried # fdisk -i sd0 Do you wish to write new MBR and partition table? [n] y Writing
Re: Disk Problem
On Fri, Jul 10, 2015, at 09:15 PM, Vijay Sankar wrote: Quoting Eric Furman ericfur...@fastmail.net: On Fri, Jul 10, 2015, at 01:57 PM, Vijay Sankar wrote: Quoting Otto Moerbeek o...@drijf.net: On Fri, Jul 10, 2015 at 04:04:04PM +, Vijay Sankar wrote: My objective for this weekend was to follow the new dpb and build ports without using sudo. So I was hoping to upgrade to the latest snapshot on a system that I use for tests. The test system has a 2TB drive and it had two 300GB partitions in it for ports and vm; and a 120GB SSD for the OS and used to look as follows: Filesystem SizeUsed Avail Capacity Mounted on /dev/sd1a 1005M 55.0M900M 6%/ /dev/sd1k 64.5G 20.9G 40.3G34%/home /dev/sd1d 3.9G 10.0K3.7G 0%/tmp /dev/sd1f 2.0G966M946M51%/usr /dev/sd1g 1005M191M764M20%/usr/X11R6 /dev/sd1h 9.8G2.9G6.5G31%/usr/local /dev/sd1j 2.0G2.0K1.9G 0%/usr/obj /dev/sd1i 2.0G827M1.1G43%/usr/src /dev/sd1e 13.5G 26.5M 12.8G 0%/var /dev/sd0h 298G176G107G62%/ports /dev/sd0f 298G 19.6G263G 7%/vm My /etc/fstab was 4f0cd8b5e7fd8f6a.b none swap sw 4f0cd8b5e7fd8f6a.a / ffs rw 1 1 4f0cd8b5e7fd8f6a.k /home ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.d /tmp ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.f /usr ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.g /usr/X11R6 ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.h /usr/local ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.j /usr/obj ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.i /usr/src ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.e /var ffs rw,nodev,nosuid 1 2 4d43e3389228e319.h /ports ffs rw,nodev,nosuid 1 2 4d43e3389228e319.f /vm ffs rw,nodev,nosuid 1 2 I am not sure what happened -- but when I rebooted the system this morning /ports and /vm would not mount; so I commented out the last two lines in /etc/fstab and rebooted. After reboot disklabel seems to have changed completely and it currently looks like this: # disklabel sd0 # /dev/rsd0c: type: SCSI disk: SCSI disk label: ST2000DM001-1CH1 duid: flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 503 total sectors: 8089950 boundstart: 0 boundend: 8089950 drivedata: 0 16 partitions: #size offset fstype [fsize bsize cpg] c: 80899500 unused Is there any way fix the disklabel or is this an error that is impossible to recover from? duid used to show up as 4d43e3389228e319 and not . Please let me know if you have any suggestions. Get your old label from /var/backups and try to restore it with disklabel -R. You don't tell what your platform is, it might be that you also need to do fdisk work first to restore the mbr partition table. But of course, it is also interesting to know what happened to you disk. But since you do not tell us what you did you are on your own here. -Otto Thank you very much. I am running an older snapshot OpenBSD 5.7 -current as of Mar 19, 2015. I thought of -R with disklabel but since the drive seems to show itself as a 3950MB drive instead of a 2TB drive, I was not sure how to do this. The problem truly is I am not sure what I did to cause all this problem!!! The sequence of actions were as follows. Since I had not looked at this box for a while I was just logging in to look at where I had kept everything. I did a cd /ports/packages/amd64/all and got an input error when I tried to edit a file. So I did a shutdown -h now; - opened the 3.5 and 2.5 hotswap drive bays and pulled both drives out and pushed them back in. Powered the system on at which point I was ^^ I am very curious to know why you did this. What am I missing here? - dropped into the shell because /vm and /ports had errors. So I tried to do a fsck_ffs and that failed. At that point I looked at disklabel and noticed that the duid was gone. fdisk sd0 does not show anything other than: # fdisk sd0 Disk: sd0 geometry: 503/255/63 [8089950 Sectors] I tried the disklabel -R as you suggested; # disklabel -R sd0 disklabel.sd0.current disklabel: partition a: partition extends past end of unit disklabel: partition c: partition extends past end of unit disklabel: partition d: offset past end of unit disklabel:
Re: Disk Problem
Quoting Otto Moerbeek o...@drijf.net: On Fri, Jul 10, 2015 at 04:04:04PM +, Vijay Sankar wrote: My objective for this weekend was to follow the new dpb and build ports without using sudo. So I was hoping to upgrade to the latest snapshot on a system that I use for tests. The test system has a 2TB drive and it had two 300GB partitions in it for ports and vm; and a 120GB SSD for the OS and used to look as follows: Filesystem SizeUsed Avail Capacity Mounted on /dev/sd1a 1005M 55.0M900M 6%/ /dev/sd1k 64.5G 20.9G 40.3G34%/home /dev/sd1d 3.9G 10.0K3.7G 0%/tmp /dev/sd1f 2.0G966M946M51%/usr /dev/sd1g 1005M191M764M20%/usr/X11R6 /dev/sd1h 9.8G2.9G6.5G31%/usr/local /dev/sd1j 2.0G2.0K1.9G 0%/usr/obj /dev/sd1i 2.0G827M1.1G43%/usr/src /dev/sd1e 13.5G 26.5M 12.8G 0%/var /dev/sd0h 298G176G107G62%/ports /dev/sd0f 298G 19.6G263G 7%/vm My /etc/fstab was 4f0cd8b5e7fd8f6a.b none swap sw 4f0cd8b5e7fd8f6a.a / ffs rw 1 1 4f0cd8b5e7fd8f6a.k /home ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.d /tmp ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.f /usr ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.g /usr/X11R6 ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.h /usr/local ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.j /usr/obj ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.i /usr/src ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.e /var ffs rw,nodev,nosuid 1 2 4d43e3389228e319.h /ports ffs rw,nodev,nosuid 1 2 4d43e3389228e319.f /vm ffs rw,nodev,nosuid 1 2 I am not sure what happened -- but when I rebooted the system this morning /ports and /vm would not mount; so I commented out the last two lines in /etc/fstab and rebooted. After reboot disklabel seems to have changed completely and it currently looks like this: # disklabel sd0 # /dev/rsd0c: type: SCSI disk: SCSI disk label: ST2000DM001-1CH1 duid: flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 503 total sectors: 8089950 boundstart: 0 boundend: 8089950 drivedata: 0 16 partitions: #size offset fstype [fsize bsize cpg] c: 80899500 unused Is there any way fix the disklabel or is this an error that is impossible to recover from? duid used to show up as 4d43e3389228e319 and not . Please let me know if you have any suggestions. Get your old label from /var/backups and try to restore it with disklabel -R. You don't tell what your platform is, it might be that you also need to do fdisk work first to restore the mbr partition table. But of course, it is also interesting to know what happened to you disk. But since you do not tell us what you did you are on your own here. -Otto Thank you very much. I am running an older snapshot OpenBSD 5.7 -current as of Mar 19, 2015. I thought of -R with disklabel but since the drive seems to show itself as a 3950MB drive instead of a 2TB drive, I was not sure how to do this. The problem truly is I am not sure what I did to cause all this problem!!! The sequence of actions were as follows. Since I had not looked at this box for a while I was just logging in to look at where I had kept everything. I did a cd /ports/packages/amd64/all and got an input error when I tried to edit a file. So I did a shutdown -h now; opened the 3.5 and 2.5 hotswap drive bays and pulled both drives out and pushed them back in. Powered the system on at which point I was dropped into the shell because /vm and /ports had errors. So I tried to do a fsck_ffs and that failed. At that point I looked at disklabel and noticed that the duid was gone. fdisk sd0 does not show anything other than: # fdisk sd0 Disk: sd0 geometry: 503/255/63 [8089950 Sectors] I tried the disklabel -R as you suggested; # disklabel -R sd0 disklabel.sd0.current disklabel: partition a: partition extends past end of unit disklabel: partition c: partition extends past end of unit disklabel: partition d: offset past end of unit disklabel: partition d: partition extends past end of unit disklabel: partition e: offset past end of unit disklabel: partition e: partition extends past end of unit disklabel: partition f: offset past end of unit disklabel: partition f: partition extends past end of unit disklabel: partition g: offset past end of unit disklabel: partition g: partition extends past end of unit disklabel: partition h: offset past end of unit disklabel: partition h: partition extends past end of unit disklabel: partition i: offset past end of unit disklabel: partition i: partition extends past end of unit Also tried # fdisk -i sd0 Do you wish to write new MBR and partition table? [n] y Writing MBR at offset 0. fdisk: error writing MBR: Input/output error Not sure whether there is any other option but Thanks very much for the help and
Re: Disk Problem
On Fri, Jul 10, 2015 at 04:04:04PM +, Vijay Sankar wrote: My objective for this weekend was to follow the new dpb and build ports without using sudo. So I was hoping to upgrade to the latest snapshot on a system that I use for tests. The test system has a 2TB drive and it had two 300GB partitions in it for ports and vm; and a 120GB SSD for the OS and used to look as follows: Filesystem SizeUsed Avail Capacity Mounted on /dev/sd1a 1005M 55.0M900M 6%/ /dev/sd1k 64.5G 20.9G 40.3G34%/home /dev/sd1d 3.9G 10.0K3.7G 0%/tmp /dev/sd1f 2.0G966M946M51%/usr /dev/sd1g 1005M191M764M20%/usr/X11R6 /dev/sd1h 9.8G2.9G6.5G31%/usr/local /dev/sd1j 2.0G2.0K1.9G 0%/usr/obj /dev/sd1i 2.0G827M1.1G43%/usr/src /dev/sd1e 13.5G 26.5M 12.8G 0%/var /dev/sd0h 298G176G107G62%/ports /dev/sd0f 298G 19.6G263G 7%/vm My /etc/fstab was 4f0cd8b5e7fd8f6a.b none swap sw 4f0cd8b5e7fd8f6a.a / ffs rw 1 1 4f0cd8b5e7fd8f6a.k /home ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.d /tmp ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.f /usr ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.g /usr/X11R6 ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.h /usr/local ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.j /usr/obj ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.i /usr/src ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.e /var ffs rw,nodev,nosuid 1 2 4d43e3389228e319.h /ports ffs rw,nodev,nosuid 1 2 4d43e3389228e319.f /vm ffs rw,nodev,nosuid 1 2 I am not sure what happened -- but when I rebooted the system this morning /ports and /vm would not mount; so I commented out the last two lines in /etc/fstab and rebooted. After reboot disklabel seems to have changed completely and it currently looks like this: # disklabel sd0 # /dev/rsd0c: type: SCSI disk: SCSI disk label: ST2000DM001-1CH1 duid: flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 503 total sectors: 8089950 boundstart: 0 boundend: 8089950 drivedata: 0 16 partitions: #size offset fstype [fsize bsize cpg] c: 80899500 unused Is there any way fix the disklabel or is this an error that is impossible to recover from? duid used to show up as 4d43e3389228e319 and not . Please let me know if you have any suggestions. Get your old label from /var/backups and try to restore it with disklabel -R. You don't tell what your platform is, it might be that you also need to do fdisk work first to restore the mbr partition table. But of course, it is also interesting to know what happened to you disk. But since you do not tell us what you did you are on your own here. -Otto
Disk Problem
My objective for this weekend was to follow the new dpb and build ports without using sudo. So I was hoping to upgrade to the latest snapshot on a system that I use for tests. The test system has a 2TB drive and it had two 300GB partitions in it for ports and vm; and a 120GB SSD for the OS and used to look as follows: Filesystem SizeUsed Avail Capacity Mounted on /dev/sd1a 1005M 55.0M900M 6%/ /dev/sd1k 64.5G 20.9G 40.3G34%/home /dev/sd1d 3.9G 10.0K3.7G 0%/tmp /dev/sd1f 2.0G966M946M51%/usr /dev/sd1g 1005M191M764M20%/usr/X11R6 /dev/sd1h 9.8G2.9G6.5G31%/usr/local /dev/sd1j 2.0G2.0K1.9G 0%/usr/obj /dev/sd1i 2.0G827M1.1G43%/usr/src /dev/sd1e 13.5G 26.5M 12.8G 0%/var /dev/sd0h 298G176G107G62%/ports /dev/sd0f 298G 19.6G263G 7%/vm My /etc/fstab was 4f0cd8b5e7fd8f6a.b none swap sw 4f0cd8b5e7fd8f6a.a / ffs rw 1 1 4f0cd8b5e7fd8f6a.k /home ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.d /tmp ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.f /usr ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.g /usr/X11R6 ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.h /usr/local ffs rw,nodev 1 2 4f0cd8b5e7fd8f6a.j /usr/obj ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.i /usr/src ffs rw,nodev,nosuid 1 2 4f0cd8b5e7fd8f6a.e /var ffs rw,nodev,nosuid 1 2 4d43e3389228e319.h /ports ffs rw,nodev,nosuid 1 2 4d43e3389228e319.f /vm ffs rw,nodev,nosuid 1 2 I am not sure what happened -- but when I rebooted the system this morning /ports and /vm would not mount; so I commented out the last two lines in /etc/fstab and rebooted. After reboot disklabel seems to have changed completely and it currently looks like this: # disklabel sd0 # /dev/rsd0c: type: SCSI disk: SCSI disk label: ST2000DM001-1CH1 duid: flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 503 total sectors: 8089950 boundstart: 0 boundend: 8089950 drivedata: 0 16 partitions: #size offset fstype [fsize bsize cpg] c: 80899500 unused Is there any way fix the disklabel or is this an error that is impossible to recover from? duid used to show up as 4d43e3389228e319 and not . Please let me know if you have any suggestions. Thanks very much, Vijay -- Vijay Sankar, M.Eng., P.Eng. ForeTell Technologies Limited vsan...@foretell.ca
Re: Weird disk problem
On Sun, 8 Jun 2014, Christian Weisgerber wrote: On 2014-06-05, David Vasek va...@fido.cz wrote: Did you try smartctl from smartmontools for a more detailed report? I assume there is a 1000-page SMART spec somewhere that would come in handy for interpreting the responses? I'm not an expert. But I believe there are some reading this mailing list. There is a description of the interface available, but I don't think it can help you to interpret the numbers. ftp://ftp.t10.org/t13/docs2004/D1699-ATA8-ACS.pdf http://www.hgst.com/tech/techlib.nsf/techdocs/EF593BD721D5D2768825782D000B8111/$file/DS7K3000_US7K3000_SATA_OEMSpecRev1.3.pdf (beware of the $ character in the url) What I usually care about are attributes like Reallocated_Sector_Ct, Reallocated_Event_Count, Current_Pending_Sector, Offline_Uncorrectable, Spin_Retry_Count, UDMA_CRC_Error_Count. I monitor my drives in the long term and watch if any of these values rises. And of course, the SMART Error Log is important. As for the other attributes such as Raw_Read_Error_Rate, Throughput_Performance and Seek_Error_Rate, every vendor seem to use it in a different way. Btw, the model of Hitachi drive you have problems with is said to be one of the most reliable hard drives. http://blog.backblaze.com/2014/01/21/what-hard-drive-should-i-buy/ http://www.hgst.com/tech/techlib.nsf/techdocs/EC6D440C3F64DBCC8825782300026498/$file/US7K3000_ds.pdf. http://www.hgst.com/tech/techlib.nsf/products/Ultrastar_7K3000 smartctl -t short /dev/sd1c Not supported, it seems. It is surprising, all Hitachi hard drives I have support short test. If it isn't a secret, could I get the 'smartctl -a' output from your drive for comparison? Thanks. Regards, David
Re: Weird disk problem
On 2014-06-05, David Vasek va...@fido.cz wrote: Did you try smartctl from smartmontools for a more detailed report? I assume there is a 1000-page SMART spec somewhere that would come in handy for interpreting the responses? My favourite are: smartctl -a /dev/sd1c smartctl -l scttemp /dev/sd1c Temperature is fine, never exceeded the limits. smartctl -t short /dev/sd1c Not supported, it seems. -- Christian naddy Weisgerber na...@mips.inka.de
Re: Weird disk problem
On 2014-06-05, STeve Andre' and...@msu.edu wrote: I think you are relying on the smart system too much. Not at all, but I knew people would immediately direct me to it. Certainly try what David said, but it's obvious that the disk is sick despite what the smart system may say. I got a replacement disk and I'm now trying to get the data off the old one. (Nothing really important.) That is proceeding fitfully. There are spurts of 65 MB/s and then there are stretches of XXX kB/s, XX kB/s, down to 5 kB/s. At the current average rate it will be going for five or six days, assuming the disk survives that long. Whatever's wrong with it, it's a tenacious little bugger. There still hasn't been a single hard read error. Anyway, I guess we can close the topic. -- Christian naddy Weisgerber na...@mips.inka.de
Weird disk problem
I have a 3TB disk here... sd1 at scsibus1 targ 1 lun 0: ATA, Hitachi HUA72303, MKAO SCSI3 0/direct fixed naa.5000cca225c5fbeb sd1: 2861588MB, 512 bytes/sector, 5860533168 sectors ... that's serving as a general media dump with a single FFS2 file system on it. Filesystem SizeUsed Avail Capacity Mounted on /dev/sd1d 2.7T2.5T 63.7G98%/export Yesterday, I experienced the odd effect that reading some files, or parts of files, from that disk became excruciatingly slow. We're talking a few kB/s here. Other files were fine. There were no kernel errors/warnings whatsoever. There were no read errors, the disk was just 100% busy and appeared to be returning data drip by drip. # atactl sd1 smartstatus No SMART threshold exceeded No change on reboot. dd(1) from the raw device was initially fast, then slowed to a crawl as it progressed. I eventually fixed it all by powering off the machine, jiggling the SATA connectors (all fine), and powering the machine back up. Tonight the problem is back. Something is very wrong. Given that dd if=/dev/rsd1c also seems affected, the filesystem layer can be excluded. I won't cry too much over a dying disk, but why the heck are there no error indications of any kind? Any other ideas? -- Christian naddy Weisgerber na...@mips.inka.de
Re: Weird disk problem
On Thu, 5 Jun 2014, Christian Weisgerber wrote: I have a 3TB disk here... sd1 at scsibus1 targ 1 lun 0: ATA, Hitachi HUA72303, MKAO SCSI3 0/direct fixed naa.5000cca225c5fbeb sd1: 2861588MB, 512 bytes/sector, 5860533168 sectors ... that's serving as a general media dump with a single FFS2 file system on it. Filesystem SizeUsed Avail Capacity Mounted on /dev/sd1d 2.7T2.5T 63.7G98%/export Yesterday, I experienced the odd effect that reading some files, or parts of files, from that disk became excruciatingly slow. We're talking a few kB/s here. Other files were fine. There were no kernel errors/warnings whatsoever. There were no read errors, the disk was just 100% busy and appeared to be returning data drip by drip. # atactl sd1 smartstatus No SMART threshold exceeded No change on reboot. dd(1) from the raw device was initially fast, then slowed to a crawl as it progressed. I eventually fixed it all by powering off the machine, jiggling the SATA connectors (all fine), and powering the machine back up. Tonight the problem is back. Something is very wrong. Given that dd if=/dev/rsd1c also seems affected, the filesystem layer can be excluded. I won't cry too much over a dying disk, but why the heck are there no error indications of any kind? Any other ideas? Did you try smartctl from smartmontools for a more detailed report? My favourite are: smartctl -a /dev/sd1c smartctl -l scttemp /dev/sd1c smartctl -t short /dev/sd1c smartctl -t long /dev/sd1c (will take several hours!!!) smartctl -a /dev/sd1c (again after each of the tests) Regards, David
Re: Weird disk problem
On 06/05/14 17:38, Christian Weisgerber wrote: I have a 3TB disk here... sd1 at scsibus1 targ 1 lun 0: ATA, Hitachi HUA72303, MKAO SCSI3 0/direct fixed naa.5000cca225c5fbeb sd1: 2861588MB, 512 bytes/sector, 5860533168 sectors ... that's serving as a general media dump with a single FFS2 file system on it. Filesystem SizeUsed Avail Capacity Mounted on /dev/sd1d 2.7T2.5T 63.7G98%/export Yesterday, I experienced the odd effect that reading some files, or parts of files, from that disk became excruciatingly slow. We're talking a few kB/s here. Other files were fine. There were no kernel errors/warnings whatsoever. There were no read errors, the disk was just 100% busy and appeared to be returning data drip by drip. # atactl sd1 smartstatus No SMART threshold exceeded No change on reboot. dd(1) from the raw device was initially fast, then slowed to a crawl as it progressed. I eventually fixed it all by powering off the machine, jiggling the SATA connectors (all fine), and powering the machine back up. Tonight the problem is back. Something is very wrong. Given that dd if=/dev/rsd1c also seems affected, the filesystem layer can be excluded. I won't cry too much over a dying disk, but why the heck are there no error indications of any kind? Any other ideas? I think you are relying on the smart system too much. Certainly try what David said, but it's obvious that the disk is sick despite what the smart system may say. I've had about seven disk failures in the last several years. Three or four of them the smart system was absolutely correct, with the others being less informative. I've also had a false notice that a disk was bad, but worked for several years, till it got too small for its task. Smart is good, but it has its limitations. It best deals with gradual errors, not fast catastrophic ones. --STeve Andre'
Re: Weird disk problem
On Thu, Jun 5, 2014, at 05:24 PM, STeve Andre' wrote: On 06/05/14 17:38, Christian Weisgerber wrote: I have a 3TB disk here... sd1 at scsibus1 targ 1 lun 0: ATA, Hitachi HUA72303, MKAO SCSI3 0/direct fixed naa.5000cca225c5fbeb sd1: 2861588MB, 512 bytes/sector, 5860533168 sectors ... that's serving as a general media dump with a single FFS2 file system on it. Filesystem SizeUsed Avail Capacity Mounted on /dev/sd1d 2.7T2.5T 63.7G98%/export Yesterday, I experienced the odd effect that reading some files, or parts of files, from that disk became excruciatingly slow. We're talking a few kB/s here. Other files were fine. There were no kernel errors/warnings whatsoever. There were no read errors, the disk was just 100% busy and appeared to be returning data drip by drip. # atactl sd1 smartstatus No SMART threshold exceeded No change on reboot. dd(1) from the raw device was initially fast, then slowed to a crawl as it progressed. I eventually fixed it all by powering off the machine, jiggling the SATA connectors (all fine), and powering the machine back up. Tonight the problem is back. Something is very wrong. Given that dd if=/dev/rsd1c also seems affected, the filesystem layer can be excluded. I won't cry too much over a dying disk, but why the heck are there no error indications of any kind? Any other ideas? Anything in dmesg/kernel log about operations timing out? I think you are relying on the smart system too much. Certainly try what David said, but it's obvious that the disk is sick despite what the smart system may say. I've had about seven disk failures in the last several years. Three or four of them the smart system was absolutely correct, with the others being less informative. I've also had a false notice that a disk was bad, but worked for several years, till it got too small for its task. Smart is good, but it has its limitations. It best deals with gradual errors, not fast catastrophic ones. Running smartmontools should give you enough information to determine if you have a sick disk, though it may require looking at the values and seeing if you have a rise in e.g. the number of sectors remapped; I would not trust atactl sd# smartstatus by itself. Failing that, there are more time-honored empirical tests, such as assuming the worst for the disk's health if it is making weird noises when it slows to a crawl. It could also be either the SATA cabling or the SATA controller that is having trouble after warming up (with specific bit patterns, or just in general). I know that sounds weird, but SATA cables aren't that expensive to replace and it's quite possible the OP got a dud. -- Shawn K. Quinn skqu...@rushpost.com
Disk problem with -current kernel
I ran into a problem when rebooting to a current kernel (i386 GENERIC) due to a secondary disk without an 'a' partition. Disk sd0 checked out fine, but all the partitions on sd1 had bad magic numbers and failed fsck: /dev/rsd1d: BAD SUPER BLOCK: MAGIC NUMBER WRONG /dev/rsd1d: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY. ... /dev/rsd1n: BAD SUPER BLOCK: MAGIC NUMBER WRONG /dev/rsd1n: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY. Old disklabel sd1: # Inside MBR partition 0: type A5 start 63 size 71681967 # /dev/rsd1c: type: SCSI disk: da0s1 label: flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 4462 total sectors: 71687370 rpm: 3600 interleave: 1 trackskew: 0 cylinderskew: 0 headswitch: 0 # microseconds track-to-track seek: 0 # microseconds drivedata: 0 15 partitions: # sizeoffset fstype [fsize bsize cpg] c: 7168196763 unused 0 0 # Cyl 0*- 4461 d: 210445263 4.2BSD 2048 16384 132 # Cyl 0*- 130 e: 8385930 2104515 4.2BSD 2048 16384 328 # Cyl 131 - 652 f: 23294250 48387780 4.2BSD 2048 16384 328 # Cyl 3012 - 4461 h: 4112640 15936480 4.2BSD 2048 16384 256 # Cyl 992 - 1247 i: 2104515 40933620 4.2BSD 2048 16384 132 # Cyl 2548 - 2678 j: 18828180 20049120 4.2BSD 2048 16384 328 # Cyl 1248 - 2419 k: 5349645 43038135 4.2BSD 2048 16384 16 # Cyl 2679 - 3011 l: 2056320 38877300 4.2BSD 2048 16384 128 # Cyl 2420 - 2547 m: 2104515 10490445 4.2BSD 2048 16384 132 # Cyl 653 - 783 n: 3341520 12594960 4.2BSD 2048 16384 208 # Cyl 784 - 991 New disklabel sd1: # Inside MBR partition 0: type A5 start 63 size 71681967 # /dev/rsd1c: type: SCSI disk: SCSI disk label: ST336705LW flags: bytes/sector: 512 sectors/track: 470 tracks/cylinder: 8 sectors/cylinder: 3760 cylinders: 19036 total sectors: 71687370 rpm: 3600 interleave: 1 trackskew: 0 cylinderskew: 0 headswitch: 0 # microseconds track-to-track seek: 0 # microseconds drivedata: 0 16 partitions: # sizeoffset fstype [fsize bsize cpg] c: 71687370 0 unused 0 0 # Cyl 0 - 19065* d: 2097000 34314840 4.2BSD 1024 8192 16 # Cyl 9126*- 9683 e: 1049040 36411840 4.2BSD 1024 8192 16 # Cyl 9684 - 9962 f: 4196160 37460880 4.2BSD 1024 8192 16 # Cyl 9963 - 11078 g: 4196160 41657040 4.2BSD 1024 8192 16 # Cyl 11079 - 12194 h: 8388560 45853200 4.2BSD 1024 8192 16 # Cyl 12195 - 14425 i:53008263 ext2fs # Cyl 0*- 140* j: 1060290 16466625 unknown # Cyl 4379*- 4661* k: 16787925 17526915 ext2fs # Cyl 4661*- 9126* l: 15936480530145 ext2fs # Cyl 140*- 4379* I assume this is due to using the new kernel with the old fsck and that installing the next snapshot will fix it. If this is unexpected, please let me know if you want additional information. Last dmesg, just in case... OpenBSD 4.0-current (GENERIC) #1141: Sun Oct 8 13:54:04 MDT 2006 [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Intel(R) Pentium(R) 4 CPU 1500MHz (GenuineIntel 686-class) 1.50 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM real mem = 804384768 (785532K) avail mem = 725274624 (708276K) using 4256 buffers containing 40341504 bytes (39396K) of memory mainbus0 (root) bios0 at mainbus0: AT/286+(00) BIOS, date 06/06/01, BIOS32 rev. 0 @ 0xffe90, SMBIOS rev. 2.3 @ 0xf0450 (97 entries) bios0: Dell Computer Corporation Precision 330 apm0 at bios0: Power Management spec V1.2 apm0: AC on, battery charge unknown apm0: flags 30102 dobusy 0 doidle 1 pcibios0 at bios0: rev 2.1 @ 0xf/0x1 pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfbbb0/176 (9 entries) pcibios0: PCI Interrupt Router at 000:31:0 (Intel 82801BA LPC rev 0x00) pcibios0: PCI bus #2 is the last bus bios0: ROM list: 0xc/0xa800 0xca800/0x5800 cpu0 at mainbus0 pci0 at mainbus0 bus 0: configuration mode 1 (no bios) pchb0 at pci0 dev 0 function 0 Intel 82850 Host rev 0x02 ppb0 at pci0 dev 1 function 0 Intel 82850/82860 AGP rev 0x02 pci1 at ppb0 bus 1 vga1 at pci1 dev 0 function 0 NVIDIA Vanta rev 0x15 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) ppb1 at pci0 dev 30 function 0 Intel 82801BA AGP rev 0x04 pci2 at ppb1 bus 2 fxp0 at pci2 dev 8 function 0 Intel 8255x rev 0x05, i82558: irq 10, address 00:90:27:86:21:9c inphy0 at fxp0 phy 1: i82555 10/100 PHY, rev. 0 ahc0 at pci2 dev 10 function 0 Adaptec AHA-2940U2 U2 rev 0x00: irq 11
Re: Disk problem with -current kernel
Emilio Perea wrote: I ran into a problem when rebooting to a current kernel (i386 GENERIC) due to a secondary disk without an 'a' partition. I don't think the lack of an 'a' partition is your problem. Goodness knows, I've got a lot of machines with no 'a' partition on the second and later disks. Disk sd0 checked out fine, but all the partitions on sd1 had bad magic numbers and failed fsck: /dev/rsd1d: BAD SUPER BLOCK: MAGIC NUMBER WRONG /dev/rsd1d: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY. ... /dev/rsd1n: BAD SUPER BLOCK: MAGIC NUMBER WRONG /dev/rsd1n: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY. Old disklabel sd1: ... 15 partitions: # sizeoffset fstype [fsize bsize cpg] c: 7168196763 unused 0 0 # Cyl 0*- 4461 huh? c should be the entire disk. There shouldn't be an offset there. I'm not sure that is your problem, but that doesn't look right at all. Messing with the 'c' partition is going to break things. d: 210445263 4.2BSD 2048 16384 132 # Cyl 0*- 130 e: 8385930 2104515 4.2BSD 2048 16384 328 # Cyl 131 - 652 f: 23294250 48387780 4.2BSD 2048 16384 328 # Cyl 3012 - 4461 h: 4112640 15936480 4.2BSD 2048 16384 256 # Cyl 992 - 1247 i: 2104515 40933620 4.2BSD 2048 16384 132 # Cyl 2548 - 2678 j: 18828180 20049120 4.2BSD 2048 16384 328 # Cyl 1248 - 2419 k: 5349645 43038135 4.2BSD 2048 16384 16 # Cyl 2679 - 3011 l: 2056320 38877300 4.2BSD 2048 16384 128 # Cyl 2420 - 2547 m: 2104515 10490445 4.2BSD 2048 16384 132 # Cyl 653 - 783 n: 3341520 12594960 4.2BSD 2048 16384 208 # Cyl 784 - 991 New disklabel sd1: new? old? I'm not following that... ... 16 partitions: # sizeoffset fstype [fsize bsize cpg] c: 71687370 0 unused 0 0 # Cyl 0 - 19065* d: 2097000 34314840 4.2BSD 1024 8192 16 # Cyl 9126*- 9683 e: 1049040 36411840 4.2BSD 1024 8192 16 # Cyl 9684 - 9962 f: 4196160 37460880 4.2BSD 1024 8192 16 # Cyl 9963 - 11078 g: 4196160 41657040 4.2BSD 1024 8192 16 # Cyl 11079 - 12194 h: 8388560 45853200 4.2BSD 1024 8192 16 # Cyl 12195 - 14425 i:53008263 ext2fs # Cyl 0*- 140* j: 1060290 16466625 unknown # Cyl 4379*- 4661* k: 16787925 17526915 ext2fs # Cyl 4661*- 9126* l: 15936480530145 ext2fs # Cyl 140*- 4379* That's more like it...except you have a lot of partitions crossing cylinder boundaries. That's not a problem, but it makes checking for overlapping partitions more difficult. They may not grossly overlap, but I didn't look for a few sector overlaps...which would really ruin your day if they were there. ... I assume this is due to using the new kernel with the old fsck and that installing the next snapshot will fix it. If this is unexpected, please let me know if you want additional information. Last dmesg, just in case... OpenBSD 4.0-current (GENERIC) #1141: Sun Oct 8 13:54:04 MDT 2006 [EMAIL PROTECTED]:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Intel(R) Pentium(R) 4 CPU 1500MHz (GenuineIntel 686-class) 1.50 GHz gotta love a dmesg. :) However, I'm confused by what you are showing me: a problem including an 'n' partition. a old, misconfigured drive with an 'n' partition a new, seemingly properly configured drive without the 'n' partition Looks like your drive geometry changed between old and new. I'm curious about why. Usually, that means you changed controllers. So...if the problem is with the first drive configuration, I'd try again with a proper 'c' partition. Otherwise..I'm confused...which isn't to say I'm not missing something. Nick.
Re: Disk problem with -current kernel
Nick Holland wrote: Otherwise..I'm confused...which isn't to say I'm not missing something. I've been informed that I *was* missing something, that this is a problem which is being dealt with, beatings are being applied (including to me, for missing it...). Disregard my comments...things will be fixed shortly... Nick.
Re: Disk problem with -current kernel
On Tue, Oct 10, 2006 at 07:01:21PM -0400, Nick Holland wrote: Emilio Perea wrote: I ran into a problem when rebooting to a current kernel (i386 GENERIC) due to a secondary disk without an 'a' partition. I don't think the lack of an 'a' partition is your problem. Goodness knows, I've got a lot of machines with no 'a' partition on the second and later disks. No, the problem was due to sd1's MBR partition type being A5 rather than A6. My apologies for not checking that before posting. Last night's change to disksubr.c broke it. Thanks to Thordur and Pedro and Ken and you for help in tracking this down. I borrowed this disk from a dead server over five years ago and had forgotten that I had not fdisk'd it at the time. It's been running OpenBSD since 2.8... Mea culpa! Emilio