Re: Tool for detecting partition superblocks needed
I can totally confirm Benjamin's analysis. I also have lost a disk (2.5 Seagate Momentus 5400.2) to the load/unload problem. It was used in a Mac Mini running as a server (thus powered on 24x7). As said, the solution to this is setting the power management configuration via hdparm's -B option. There are a few options: - laptop-mode-tools - /etc/default/hdparm - pbbuttonsd I also recently lost a disk on powerbook5,6 running lenny. I did not report it as I used a vanilla kernel and assumed the failure might have been related to problems with suspending and Xorg. In fact I lost it trying to wake it up after suspend. If indeed it was also a load/unload problem I think it the problem should be described in red in Debian README with suggestions how to avoid it. Piotr Piotr -- http://okle.pl -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Tool for detecting partition superblocks needed
Hi Benjamin, Hermann Kaiser. Hi All On Fri, Oct 03, 2008 at 12:16:17AM +0200, Benjamin Cama wrote: Le jeudi 02 octobre 2008 à 20:17 +0200, Wolfgang Pfeiffer a écrit : Could this also be a simple file system damage? Errors like that (i.e. in the middle of a DMA interrupt) are not simple FS damage, I am pretty sure. You were right: I got my machine back from repair in the meantime, and the repair report confirmed your opinion: it said that SMART showed several defect sectors on the disk, and that the disk had to be replaced. So luckily enough I'm back again now with a working fast machine, and its new disk: Thanks a lot, Benjamin, Hermann and everyone else having been involved: it helped a lot. Until then ... Best Regards Wolfgang -- heelsbroke.blogspot.com -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Tool for detecting partition superblocks needed
Wow, Benjamin, your effort to help is definitely more than I could expect from a mailing list .. :) Thanks for lot: I think I learned a few things with your explanations ... On Fri, Oct 03, 2008 at 12:16:17AM +0200, Benjamin Cama wrote: Le jeudi 02 octobre 2008 à 20:17 +0200, Wolfgang Pfeiffer a écrit : Could this also be a simple file system damage? Errors like that (i.e. in the middle of a DMA interrupt) are not simple FS damage, I am pretty sure. I was hoping it was just something like that, because this hopefully could be fixed with a reinstall, and with a previous low-level formatting like dd if=/dev/urandom of=/dev/hda (not being sure whether the syntax is correct .. ) For this kind of low level formatting, I would advise you to dd from /dev/zero, as the hard drive controller can try to replace bad sectors if needed when it sees that an all-0 block is written. But when some sectors begin to fail, others will soon come, in general. That's exactly why I will replace that disk. I shall not take any risks like having to re-install again in a few weeks just because other disk sectors start to break at some point in the future ... Oh, and the Fedora smartctrl found (surprise, surprise .. :) a failure on LBA 76724676 and 76724678, too (please see the old log above) ... Well, this confirm that the error lies in hardware, not in the file system itself. I attach the log made on the broken machine via smartctrl -a /dev/hda So, the very high numbers like Raw_Read_Error_Rate and Seek_Error_Rate are meaningless, I think, but the numbers in Offline_Uncorrectable and UDMA_CRC_Error_Count show that some sectors have already been lost. But what worries me most is the Load_Cycle_Count value : 2898441 is far too high for a disk, but may look real, as your disk as been spinning for quite some time (1+ hours). This roughly corresponds to a load/unload every 15s : do you here some light tic tac sound from your disk every 15s or so ? IIRC: yes, I think so. Currently 'tho, with the machine being booted via the Fedora CD, I don't hear any clicks, although hdparm -I reports the Advanced power management level being set to 128 ... For reference, mine, which has a 8000+ hours lifetime, has a count of 268656 (ten times less ...). If you take the specs from seagate for your hard drive ( http://www.seagate.com/docs/pdf/datasheet/disc/ds_momentus5400.2_120gb.pdf ), you'll see it's made for at most 60 load/unload cycles. That's why I'll want a Seagate again: Just think about what might have happened with disk settings changed to no power management, like you suggested below, with hdparm -B .. :) I am worried about it because this is what I saw from a lot of apple laptops, and as you may have understood, from one of my lost iBook's hard drive. This made the news some time ago, not especially for apple's disks, but when Ubuntu was said to be killing hard drives : some vendor BIOSes did not set the power saving mode of the hard drive correctly, which led them to load/unload too often, and kill the hard drive in a very short time. The first disk shipped with my old Titanium IV PB broke after around 2 years, IIRC ... and the current one (a replacement disk) in that machine, with more than 7100 Power_On_Hours, and a with a Load_Cycle_Count of 366793 probably might fail in the near future, with all the ugly sound I hear from the machine while it is up .. and I disabled Power Management on this machine via hdparm now ... let's see .. :) AFAIK, OpenFirmware does not set any power saving mode at all, and neither does OSX, and by default a lot of disk are in a maximum power saving mode, which unloads the hard drive head very often, thus consuming less energy but shortening the life of the hard drive. [ ... ] You can change these settings with the -B option of hdparm, for example : hdparm -B254 /dev/hda Let's see whether this is preserved and kept over reboots ... :) disables power management on most hard drive (the value is drive dependant, most of the time 255 or 254 disables power savings). I think this is what is done in laptop-mode package when you set your hard drive in no PM mode. What do you mean by see hda7 ? to see in the sense of to detect ... mac-fdisk detected the damaged partition in the Debian installer, IIRC .. What I meant is that, if you can see some partition in mac-fdisk, you can see them all. But this doesn't mean the FS on them is not failing. parted detects that partition, but does not see any FS on it, that is, when I type print in parted for /dev/hda, there's an empty space for the column where parted is supposed to report the FS for 7 (which should be /dev/hda7). FS is reported for 6 (/home), 5 (/var) and swap for 4. Even hfs+ is detected correctly for #2, where the small OS X partition sits ... When you said you didn't see hda7 but you did with the others, it sounded strange to me.
Re: Tool for detecting partition superblocks needed
Hi all, On 3 Oct, this message from Benjamin Cama echoed through cyberspace: But what worries me most is the Load_Cycle_Count value : 2898441 is far too high for a disk, but may look real, as your disk as been spinning for quite some time (1+ hours). This roughly corresponds to a load/unload every 15s I can totally confirm Benjamin's analysis. I also have lost a disk (2.5 Seagate Momentus 5400.2) to the load/unload problem. It was used in a Mac Mini running as a server (thus powered on 24x7). As said, the solution to this is setting the power management configuration via hdparm's -B option. There are a few options: - laptop-mode-tools - /etc/default/hdparm - pbbuttonsd All of them include ways to set hdparm options. Cheers michel - Michel Lanners | Read Philosophy. Study Art. 23, Rue Paul Henkes|Ask Questions. Make Mistakes. L-1710 Luxembourg | email [EMAIL PROTECTED]| http://www.cpu.lu/~mlan| Learn Always. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Tool for detecting partition superblocks needed
Hi Benjamin, hi All Benjamin, firstly Thanks for your effort .. OK, and the question I was asking is answered: I took a Fedora 9 CD for ppc, booted the affected machine, and found, Fedora has the tools one needs for a disk recovery. smartctrl is there, plus dumpe2fs, and even less (which I was badly missing, on the Debian installer, IIRC) etc. etc. ... Looks nice ... [note: the German keyboard on Fedora was missing the bar sign (|) .. so I chose an US English keyboard layout, that had some sign similar to that .. ] On Thu, Oct 02, 2008 at 12:49:36AM +0200, Benjamin Cama wrote: Le mercredi 01 octobre 2008 à 22:29 +0200, Wolfgang Pfeiffer a écrit : Excerpt from the Debian install syslog, cut: kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=76724680, high=4, low=9615816, sector=76724680 kernel: ide: failed opcode was: unknown kernel: end_request: I/O error, dev hda, sector 76724680 kernel: Buffer I/O error on device hda7, logical block 1 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=76724676, high=4, low=9615812, sector=76724676 kernel: ide: failed opcode was: unknown kernel: end_request: I/O error, dev hda, sector 76724676 kernel: Buffer I/O error on device hda7, logical block 0 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=76724680, high=4, low=9615816, sector=76724680 kernel: ide: failed opcode was: unknown kernel: end_request: I/O error, dev hda, sector 76724680 ... and the errors on sectors 76724680 and 76724680 are reported again and again ... and only these 2 sectors, IINM .. These are quite severe hardware errors, Could this also be a simple file system damage? I was hoping it was just something like that, because this hopefully could be fixed with a reinstall, and with a previous low-level formatting like dd if=/dev/urandom of=/dev/hda (not being sure whether the syntax is correct .. ) Oh, and the Fedora smartctrl found (surprise, surprise .. :) a failure on LBA 76724676 and 76724678, too (please see the old log above) ... I attach the log made on the broken machine via smartctrl -a /dev/hda I would advise you to do some complete copy of your harddrive right now before it gets worse. Just dd your hda, or use some other recovery software (like http://www.garloff.de/kurt/linux/ddrescue/ ). If only hda7 is affected, I think this is only the beginning, and if you have backups it would be better. I've done some backup already for /home on hda6 ... and /etc on hda7 (the failing partition) isn't that important ... Failing hda7 is the root partition. hda6, not being listed here, with quite a few errors, was recoverable ... tho it's not quite clear whether this hda6 errors started while I was playing with the Debian Installer (Etch, IIRC) ... This may not be related. Maybe you could try to run smartctl -a from here, As I said: smartctrl is missing on the Debian installer, or at least I didn't find it ... to get some (hopefully) usefull stats from it. IIRC: the installer later on didn't even see hda7, whereas mac-fdisk did (took quite some time for it to finish the detection ...). What do you mean by see hda7 ? to see in the sense of to detect ... mac-fdisk detected the damaged partition in the Debian installer, IIRC .. Apple Hardware Test on the OS X install CD seems to hang/crash, or simply does not load even after 15 Minutes or so ... Well, being on the same IDE controller might not help. Can you remove the HDD from your laptop for further analysis ? No, not that easily .. It's a Powerbook5,8: if I manage to remove the disk from it (there must be instructions somewhere on www) I'll reinstall a new one. No time to waste, because my old tibook, where I'm typing this email, is making strange noises already. Looks like I'm in need of quick decisions, besides working hardware ... :) BTW, how old is your powerbook ? a little more than 2.5 years. Still with the same disk installed that was shipped with the machine .. Regards, guys. And Thanks Wolfgang -- heelsbroke.blogspot.com smartctl version 5.38 [powerpc-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Seagate Momentus 5400.2 series Device Model: ST9808211A Serial Number:3LF2V7HB Firmware Version: 3.07 User Capacity:80,026,361,856 bytes Device is:In smartctl database [for details use: -P show] ATA Version is: 6 ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2 Local Time is:Thu Oct 2 18:07:41 2008 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART
Re: Tool for detecting partition superblocks needed
Le jeudi 02 octobre 2008 à 20:17 +0200, Wolfgang Pfeiffer a écrit : Could this also be a simple file system damage? Errors like that (i.e. in the middle of a DMA interrupt) are not simple FS damage, I am pretty sure. I was hoping it was just something like that, because this hopefully could be fixed with a reinstall, and with a previous low-level formatting like dd if=/dev/urandom of=/dev/hda (not being sure whether the syntax is correct .. ) For this kind of low level formatting, I would advise you to dd from /dev/zero, as the hard drive controller can try to replace bad sectors if needed when it sees that an all-0 block is written. But when some sectors begin to fail, others will soon come, in general. Oh, and the Fedora smartctrl found (surprise, surprise .. :) a failure on LBA 76724676 and 76724678, too (please see the old log above) ... Well, this confirm that the error lies in hardware, not in the file system itself. I attach the log made on the broken machine via smartctrl -a /dev/hda So, the very high numbers like Raw_Read_Error_Rate and Seek_Error_Rate are meaningless, I think, but the numbers in Offline_Uncorrectable and UDMA_CRC_Error_Count show that some sectors have already been lost. But what worries me most is the Load_Cycle_Count value : 2898441 is far too high for a disk, but may look real, as your disk as been spinning for quite some time (1+ hours). This roughly corresponds to a load/unload every 15s : do you here some light tic tac sound from your disk every 15s or so ? For reference, mine, which has a 8000+ hours lifetime, has a count of 268656 (ten times less ...). If you take the specs from seagate for your hard drive ( http://www.seagate.com/docs/pdf/datasheet/disc/ds_momentus5400.2_120gb.pdf ), you'll see it's made for at most 60 load/unload cycles. I am worried about it because this is what I saw from a lot of apple laptops, and as you may have understood, from one of my lost iBook's hard drive. This made the news some time ago, not especially for apple's disks, but when Ubuntu was said to be killing hard drives : some vendor BIOSes did not set the power saving mode of the hard drive correctly, which led them to load/unload too often, and kill the hard drive in a very short time. AFAIK, OpenFirmware does not set any power saving mode at all, and neither does OSX, and by default a lot of disk are in a maximum power saving mode, which unloads the hard drive head very often, thus consuming less energy but shortening the life of the hard drive. For some years, I've been seeing a lot (4 from my eyes, more from internet forums) of apple laptops fail after a bit more than a year (mostly on low end laptops, like the iBook, whose hard drive is made to bear no more than 30 load/unload cycles). Most of them were not using Linux, just OSX. I don't know if this information should be louder spoken, because I wasn't able to verify that on failing laptops I didn't handle but heard to be failing quite soon in their life. But this is, I think, one of the main reason hard drives seem so fragile today, as the autopsy shows that the number of load/unload cycles really exceeded what the vendor says, and as a lot of vendors want their laptop to save energy, they set aggressive settings to gain some battery life. You can change these settings with the -B option of hdparm, for example : hdparm -B254 /dev/hda disables power management on most hard drive (the value is drive dependant, most of the time 255 or 254 disables power savings). I think this is what is done in laptop-mode package when you set your hard drive in no PM mode. What do you mean by see hda7 ? to see in the sense of to detect ... mac-fdisk detected the damaged partition in the Debian installer, IIRC .. What I meant is that, if you can see some partition in mac-fdisk, you can see them all. But this doesn't mean the FS on them is not failing. When you said you didn't see hda7 but you did with the others, it sounded strange to me. No, not that easily .. It's a Powerbook5,8: if I manage to remove the disk from it (there must be instructions somewhere on www) I'll reinstall a new one. No time to waste, because my old tibook, where I'm typing this email, is making strange noises already. Looks like I'm in need of quick decisions, besides working hardware ... :) Well, the instructions from macfixit are good, I already disassembled a 12 iBook and a 15 PB with them. And it looks like you will need them soon ... BTW, how old is your powerbook ? a little more than 2.5 years. Still with the same disk installed that was shipped with the machine .. My iBook lasted a bit more than a year, with an hard drive spec'ed for half the load/unload cycle count ... make up your mind. Begards, benjamin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Tool for detecting partition superblocks needed
Hi All I can't e2fsck my root partition any more - looks seriously like that partition went belly up on me. e2fsck tells me something like the superblock could not be read ... The Debian install CD (Etch) does not have dumpe2fs - which normally should yield information of the partition backup superblocks. Also the CD does not seem to offer the option of installing missing tools/packages to RAM .. Any ideas how to see or recover the superblocks? Tools on the Etch CD that might help for that? Affected machine is a Powerbook5,8 Best Regards Wolfgang -- heelsbroke.blogspot.com -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Tool for detecting partition superblocks needed
Hi On Wed, Oct 01, 2008 at 04:46:11PM +0200, Wolfgang Pfeiffer wrote: Hi All I can't e2fsck my root partition any more - looks seriously like that partition went belly up on me. e2fsck tells me something like the superblock could not be read ... The Debian install CD (Etch) does not have dumpe2fs - which normally should yield information of the partition backup superblocks. Also the CD does not seem to offer the option of installing missing tools/packages to RAM .. Any ideas how to see or recover the superblocks? Tools on the Etch CD that might help for that? Affected machine is a Powerbook5,8a If you are able to download and burn a CD, trying one of the Lenny beta CDs might help. They have a special rescue mode included. I'm not sure though if dumpe2fs is included. Gaudenz -- Ever tried. Ever failed. No matter. Try again. Fail again. Fail better. ~ Samuel Beckett ~ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Tool for detecting partition superblocks needed
Hi Gaudenz Thanks a lot for your response. After quite a few tests I'm still stuck with a non-loading root partition. That is, I get to the first boot prompt from where I can choose to start a CD, Linux or OS X. Both the CD and OS X can be booted. Linux doesn't boot. On Wed, Oct 01, 2008 at 05:35:45PM +0200, Gaudenz Steinlin wrote: Hi On Wed, Oct 01, 2008 at 04:46:11PM +0200, Wolfgang Pfeiffer wrote: Hi All I can't e2fsck my root partition any more - looks seriously like that partition went belly up on me. [ ... ] Affected machine is a Powerbook5,8a Typo. Should say: Powerbook5,8 If you are able to download and burn a CD, trying one of the Lenny beta CDs might help. They have a special rescue mode included. I'm not sure though if dumpe2fs is included. Didn't help: neither dumpe2fs nor smartctl is on that CD ... strange thing is, that OS X on its own partition, on the same drive, is still booting. I even updated successfully OS X on its disk partition ... So it's not clear to me yet whether this disk (hda) is really hosed or not. Excerpt from the Debian install syslog, cut: kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=76724680, high=4, low=9615816, sector=76724680 kernel: ide: failed opcode was: unknown kernel: end_request: I/O error, dev hda, sector 76724680 kernel: Buffer I/O error on device hda7, logical block 1 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=76724676, high=4, low=9615812, sector=76724676 kernel: ide: failed opcode was: unknown kernel: end_request: I/O error, dev hda, sector 76724676 kernel: Buffer I/O error on device hda7, logical block 0 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=76724680, high=4, low=9615816, sector=76724680 kernel: ide: failed opcode was: unknown kernel: end_request: I/O error, dev hda, sector 76724680 ... and the errors on sectors 76724680 and 76724680 are reported again and again ... and only these 2 sectors, IINM .. Failing hda7 is the root partition. hda6, not being listed here, with quite a few errors, was recoverable ... tho it's not quite clear whether this hda6 errors started while I was playing with the Debian Installer (Etch, IIRC) ... IIRC: the installer later on didn't even see hda7, whereas mac-fdisk did (took quite some time for it to finish the detection ...). Apple Hardware Test on the OS X install CD seems to hang/crash, or simply does not load even after 15 Minutes or so ... Oh, and I was playing with libfreevec, on Linux, before this happened ... Whatever. If everyone else understands as much as I do, I'll try to reinstall (ask me someone why I hate that idea ... :) Thanks again, Gaudenz ... :) Best Regards Wolfgang -- heelsbroke.blogspot.com -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Tool for detecting partition superblocks needed
Le mercredi 01 octobre 2008 à 22:29 +0200, Wolfgang Pfeiffer a écrit : Excerpt from the Debian install syslog, cut: kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=76724680, high=4, low=9615816, sector=76724680 kernel: ide: failed opcode was: unknown kernel: end_request: I/O error, dev hda, sector 76724680 kernel: Buffer I/O error on device hda7, logical block 1 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=76724676, high=4, low=9615812, sector=76724676 kernel: ide: failed opcode was: unknown kernel: end_request: I/O error, dev hda, sector 76724676 kernel: Buffer I/O error on device hda7, logical block 0 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error } kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=76724680, high=4, low=9615816, sector=76724680 kernel: ide: failed opcode was: unknown kernel: end_request: I/O error, dev hda, sector 76724680 ... and the errors on sectors 76724680 and 76724680 are reported again and again ... and only these 2 sectors, IINM .. These are quite severe hardware errors, I would advise you to do some complete copy of your harddrive right now before it gets worse. Just dd your hda, or use some other recovery software (like http://www.garloff.de/kurt/linux/ddrescue/ ). If only hda7 is affected, I think this is only the beginning, and if you have backups it would be better. Failing hda7 is the root partition. hda6, not being listed here, with quite a few errors, was recoverable ... tho it's not quite clear whether this hda6 errors started while I was playing with the Debian Installer (Etch, IIRC) ... This may not be related. Maybe you could try to run smartctl -a from here, to get some (hopefully) usefull stats from it. IIRC: the installer later on didn't even see hda7, whereas mac-fdisk did (took quite some time for it to finish the detection ...). What do you mean by see hda7 ? If (mac-)fdisk isn't even able to list your partitions, one of the first blocks of your hard drive may be damaged too. Apple Hardware Test on the OS X install CD seems to hang/crash, or simply does not load even after 15 Minutes or so ... Well, being on the same IDE controller might not help. Can you remove the HDD from your laptop for further analysis ? BTW, how old is your powerbook ? I saw so many less-than-2-years iBook fail, PBs might long longer but not that much ... benjamin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]