Re: read error on superblock
On Tue, 24 Jul 2012 09:52:18 +0200, dexen deVries wrote: Hi Vyacheslav, On Tuesday 24 of July 2012 10:26:37 you wrote: I am afraid that it is not so good from the end user point of view. First of all, the message mount: /dev/sda3: can't read superblock can confuse user. The reason is bad sectors inside the volume but user is informed about impossibility to read superblock. Secondly, it is possible situation when it really needs to use a volume in the case of presence of bad sectors. And I think that users can expect such NILFS behavior because of declared reliability. Unfortunately, as I can understand, NILFS hasn't bad blocks table and can't process situation of bad blocks presence on volume correctly. It means that NILFS interprets bad blocks as exceptional case. But from my point of view, it makes sense to interpret bad blocks as usual thing and try to work in the presence of ones. For example, fsck potentially can check NILFS volume on bad blocks presence, construct bad blocks table and save it on the volume. NILFS does't have sector-based bad blocks table, but it has an error flag on the segment usage file (sufile). If a segment is marked 'erroneous', it will not be allocated. At present, this doesn't work together with badblocks (mkfs.nilfs2), nor the recovery logic. However it is applicable for this purpose if needed. I suggest to add virtual special file for bad blocks description. It can be described by inode in ifile and all bad blocks can be described in DAT file as parts of this virtual special file. So, as a result, NILFS file system driver will have bad blocks table which can be a basis for excluding bad blocks from operation and trying to survive in the not good device environment. What do you think about such idea? I believe bad sectors to be thing of the past mostly; any decent harddrive (probably also any decent SSD) should re-map them after some re-reads. Some data meta-data loss is possible, but overall the FS should be accessible again. I agree with this opinion. If the sector-based bad blocks table is sorely-needed, it is worth considering, but at least it should be optional and not mandatory. But even it's well implemented optionally, it still looks overkill because most recent hard drives internally have alternate sectors and most recent flash based drives have own remap mechanism. Moreover, how the device corrupts is deeply depends on the nature and configuration of underlying block device. In this sense, in-device or in-driver solution looks better to me. Badblocks table is about to become a thing of the past, it's almost stuff of the floppy drive's era. I have no idea why my particular HDD did not re-map; perhaps it just takes much longer than I gave it. As a point of reference, XFS does not do bad block management either; however, the partition driver of IRIX does bad sector management -- so it is implemented one layer below the FS. Yes, If we implment some kind of redundancy mechanism in the FS layer, it absolutely should reflect how the the data integrity should be enhanced in the FS layer. With regards, Ryusuke Konishi I guess it /may be/ possible to use Linux' `dm' driver in such manner. Cheers, -- dexen deVries [[[↓][→]]] all dichotomies are either true or false is a true paradox because it's paradoxical only if it is a paradox ;) -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: read error on superblock
Hello All, I see in this thread, what I think is a misunderstanding of the role of the disk drive in the face of a hard read error. The drive cannot simply map an unreadable sector to a new sector based on a read failure. If the read has failed, the drive does not contain the correct contents for the sector. The read failure needs to persist until a write is received for the unreadable sector. When the write is received, the new data can be written to a good sector and the sector map adjusted. One of the jobs of RAID is to reconstruct the data from other sources and write the correct data back to the same sector of the drive allowing the drive to do this remapping. If you are not using RAID software or hardware, there is typically no way to reconstruct the data. If the read error is correctable using ECC, the drive does know the proper contents for the sector and could choose to re-map it, but likely will not do so. This could be done without reporting the error to the host system. It is my understanding that ECC errors detected within the drive are not at all uncommon. If the ECC can correct the error, the valid data is typically returned, and the drive moves on to the next request. If ECC cannot correct the error, the first thing the drive will do is attempt to re-read the media. If it is able to read the data the next time, even if it had to use ECC to correct it, it will still return the valid data and may move on to the next request. At the file system level, a slow read would be observed, not a read error. The behavior of the drive firmware is vendor specific. Sometimes it is configurable. The behavior of the firmware will vary across different classes and generations of drives even from the same vendor. Drive firmware that makes up its own data and remaps the sector to correct a read error should never be sold by a reputable drive vendor. The origin of the bad block table in the file system pre-dates drive hardware sector re-mapping. When it was not likely that writing to a sector whose contents were previously unreadable would result in being to read that sector back again, then it was a good idea to not write anything there in the future. With modern drive technologies, it is likely that a write to a previously unreadable sector will result in being able to read back the newly written data. The value of a bad block map in the file system is now minimized. In addition with hardware and software RAID technology now available to everyone, many volumes will never in their lifetime, return a single read error to a file system. Errors are hidden and corrected at lower levels. The file system observes perfect media, or in catastrophic failure of a RAID system, media offline. I recommend assuming modern storage devices and subsystems, and focusing development efforts on file system issues that remain. Thanks, Nick Martin -Original Message- From: linux-nilfs-ow...@vger.kernel.org [mailto:linux-nilfs-ow...@vger.kernel.org] On Behalf Of Ryusuke Konishi Sent: Tuesday, July 24, 2012 11:47 AM To: dexen deVries; Vyacheslav Dubeyko Cc: linux-nilfs@vger.kernel.org Subject: Re: read error on superblock On Tue, 24 Jul 2012 09:52:18 +0200, dexen deVries wrote: Hi Vyacheslav, On Tuesday 24 of July 2012 10:26:37 you wrote: I am afraid that it is not so good from the end user point of view. First of all, the message mount: /dev/sda3: can't read superblock can confuse user. The reason is bad sectors inside the volume but user is informed about impossibility to read superblock. Secondly, it is possible situation when it really needs to use a volume in the case of presence of bad sectors. And I think that users can expect such NILFS behavior because of declared reliability. Unfortunately, as I can understand, NILFS hasn't bad blocks table and can't process situation of bad blocks presence on volume correctly. It means that NILFS interprets bad blocks as exceptional case. But from my point of view, it makes sense to interpret bad blocks as usual thing and try to work in the presence of ones. For example, fsck potentially can check NILFS volume on bad blocks presence, construct bad blocks table and save it on the volume. NILFS does't have sector-based bad blocks table, but it has an error flag on the segment usage file (sufile). If a segment is marked 'erroneous', it will not be allocated. At present, this doesn't work together with badblocks (mkfs.nilfs2), nor the recovery logic. However it is applicable for this purpose if needed. I suggest to add virtual special file for bad blocks description. It can be described by inode in ifile and all bad blocks can be described in DAT file as parts of this virtual special file. So, as a result, NILFS file system driver will have bad blocks table which can be a basis for excluding bad blocks from operation and trying to survive in the not good
Re: read error on superblock
Hi, On Mon, 2012-07-23 at 10:45 +0200, dexen deVries wrote: Hi list, a harddrive got some bad sectors and now one NILFS filesystem can't be mounted; mount: /dev/sda3: can't read superblock I can (try to) copy this filesystem to another drive; how do I proceed form that point? Does it make any sense to substitute another superblock for this one? (either to use a spare superblock, if such exists, or put a new superblock on the damaged sector(s)). Regards, It exits second superblock at the end of NILFS volume. But it can be not in fully synchronous state with primary ones (as I guess). Theoretically, it is possible to copy secondary superblock on the place of primary. But I am afraid that the NILFS volume can be in inconsistent state anyway. Do you sure that this volume doesn't contain another damaged sectors? With the best regards, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: read error on superblock
Hi Dexen, On Mon, 2012-07-23 at 11:24 +0200, dexen deVries wrote: Hi Vyacheslaw, On Monday 23 of July 2012 13:17:28 you wrote: It exits second superblock at the end of NILFS volume. But it can be not in fully synchronous state with primary ones (as I guess). Theoretically, it is possible to copy secondary superblock on the place of primary. But I am afraid that the NILFS volume can be in inconsistent state anyway. any hints how to locate the superblock? what offset to look at, and what's the magic number(s)? Usually, secondary superblock is located in the last block (4 KB) of the volume. In nilfs2_fs.h exists such #define NILFS_SB2_OFFSET_BYTES(devsize) devsize) 12) - 1) 12) which define placement of the secondary superblock (devsize is size of the device in bytes). Magic number of NILFS2 is 0x3434. It is located on 0x0006 bytes offset from superblock's begin. Do you sure that this volume doesn't contain another damaged sectors? in my case, that doesn't matter: all the data i want to recover is either in Git (which does internal consistency checks) or in MySQL, which also does /some/ consistency checks. Cheers, With the best regards, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: read error on superblock
Hi, On Mon, 23 Jul 2012 13:17:28 +0400, Vyacheslav Dubeyko wrote: Hi, On Mon, 2012-07-23 at 10:45 +0200, dexen deVries wrote: Hi list, a harddrive got some bad sectors and now one NILFS filesystem can't be mounted; mount: /dev/sda3: can't read superblock I can (try to) copy this filesystem to another drive; how do I proceed form that point? Does it make any sense to substitute another superblock for this one? (either to use a spare superblock, if such exists, or put a new superblock on the damaged sector(s)). NILFS tries to use the second superblock automatically if the primary super block was broken. And, NILFS even tries to recover the primary superblock by copying the secondary superblock. mount: /dev/sda3: can't read superblock Looks weird. mount.nilfs2 doesn't output this error message. Is mount.nilfs2 installed in /sbin directory? Could you try mount.nilfs2 as follows instead of the mount program? # mount.nilfs2 device mount-point or # mount -t nilfs2 device mount-point Regards, Ryusuke Konishi Regards, It exits second superblock at the end of NILFS volume. But it can be not in fully synchronous state with primary ones (as I guess). Theoretically, it is possible to copy secondary superblock on the place of primary. But I am afraid that the NILFS volume can be in inconsistent state anyway. Do you sure that this volume doesn't contain another damaged sectors? With the best regards, Vyacheslav Dubeyko. -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: read error on superblock
Hi Ryusuke, On Monday 23 of July 2012 18:39:07 you wrote: # mount -t nilfs2 device mount-point that's what I've tried. I guess the problem is, the harddrive have not re-allocated the sector as of yet, so it is /unreadable/ rather than merely containing wrong data. I'll see later on a while if the drive can re-allocate the sector. -- dexen deVries [[[↓][→]]] all dichotomies are either true or false is a true paradox because it's paradoxical only if it is a paradox ;) -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: read error on superblock
Hi again, On Monday 23 of July 2012 18:39:07 you wrote: Looks weird. mount.nilfs2 doesn't output this error message. another computer, same drive: coil!root!/mnt # mount.nilfs2 -v /dev/sdc3 x -o errors=continue,norecovery mount.nilfs2: Error while mounting /dev/sdc3 on x: Input/output error also, in dmesg: NILFS warning: mounting unchecked fs ((lotsa ATA read error stuff)) NILFS: error searching super root. -- dexen deVries [[[↓][→]]] all dichotomies are either true or false is a true paradox because it's paradoxical only if it is a paradox ;) -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: read error on superblock
Hi, On Mon, 23 Jul 2012 13:06:57 +0200, dexen deVries wrote: Hi again, On Monday 23 of July 2012 18:39:07 you wrote: Looks weird. mount.nilfs2 doesn't output this error message. another computer, same drive: coil!root!/mnt # mount.nilfs2 -v /dev/sdc3 x -o errors=continue,norecovery mount.nilfs2: Error while mounting /dev/sdc3 on x: Input/output error also, in dmesg: NILFS warning: mounting unchecked fs ((lotsa ATA read error stuff)) NILFS: error searching super root. Uum, the device seems to have serious problem. Can you copy the contents of the device by dd command? # dd if=/dev/sdc3 of=path-to-other/nilfs.img Regards, Ryusuke Konishi -- dexen deVries [[[↓][→]]] all dichotomies are either true or false is a true paradox because it's paradoxical only if it is a paradox ;) -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: read error on superblock
On Mon, 23 Jul 2012 20:24:10 +0200, dexen deVries wrote: Hi again, I've copied the whole filesystem elsewhere (to a file) with `ddrescue'. It found one damaged area on the drive, but apparently neither at start nor at the end of partition. The FS on the drive was marked as `dirty' (requiring recovery upon mount). My guess is that kernel attempted recovery, and gave up upon read error. Unfortunately, the `norecovery' option did not help with the drive; it only helped once i've moved whole FS to file. Log from ddrescue: # Rescue Logfile. Created by GNU ddrescue version 1.14 # Command line: ddrescue /dev/sdc3 sda3 sda3.log # current_pos current_status 0x149E0CCC00 + # possize status 0x 0x149E0CC000 + 0x149E0CC000 0x1000 - 0x149E0CD000 0x11220D3000 + my understanding is, the following line describes the damaged area, format: start length status-marker (`-' for error) 0x149E0CC000 0x1000 - Once the FS was copied to a file, it mounted correctly: # mount -o ro,loop,norecovery ./sda3.img ./some-mountpoint My gripe with current (linux-3.5.0) NILFS2 driver is that I couldn't tell it to ignore read errors and thus force it to mount the filesystem. Good point. The current recovery logic is intentionally implemented so that it aborts when having met an I/O error. This treatment should not be applied at least if the norecovery option is specified. Thanks, Ryusuke Konishi Ony after I've moved some 160GB of FS to a file (that's a bit tedious :P) it opened the FS just fine. Cheers, -- dexen deVries 1972 - Dennis Ritchie invents a powerful gun that shoots both forward and backward simultaneously. Not satisfied with the number of deaths and permanent maimings from that invention he invents C and Unix. -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-nilfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html