Re: read error on superblock

2012-07-24 Thread Ryusuke Konishi
On Tue, 24 Jul 2012 09:52:18 +0200, dexen deVries wrote:
 Hi Vyacheslav,
 
 
 On Tuesday 24 of July 2012 10:26:37 you wrote:
  I am afraid that it is not so good from the end user point of view.
  
  First of all, the message mount: /dev/sda3: can't read superblock can
  confuse user. The reason is bad sectors inside the volume but user is
  informed about impossibility to read superblock.
  
  Secondly, it is possible situation when it really needs to use a volume
  in the case of presence of bad sectors. And I think that users can
  expect such NILFS behavior because of declared reliability.
  
  Unfortunately, as I can understand, NILFS hasn't bad blocks table and
  can't process situation of bad blocks presence on volume correctly. It
  means that NILFS interprets bad blocks as exceptional case. But from my
  point of view, it makes sense to interpret bad blocks as usual thing and
  try to work in the presence of ones. For example, fsck potentially can
  check NILFS volume on bad blocks presence, construct bad blocks table
  and save it on the volume.

NILFS does't have sector-based bad blocks table, but it has an error
flag on the segment usage file (sufile).  If a segment is marked
'erroneous', it will not be allocated.

At present, this doesn't work together with badblocks (mkfs.nilfs2),
nor the recovery logic.  However it is applicable for this purpose if
needed.

  I suggest to add virtual special file for bad blocks description. It
  can be described by inode in ifile and all bad blocks can be described
  in DAT file as parts of this virtual special file. So, as a result,
  NILFS file system driver will have bad blocks table which can be a basis
  for excluding bad blocks from operation and trying to survive in the not
  good device environment.
  
  What do you think about such idea?
 
 I believe bad sectors to be thing of the past mostly; any decent harddrive 
 (probably also any decent SSD) should re-map them after some re-reads. Some 
 data  meta-data loss is possible, but overall the FS should be accessible 
 again.

I agree with this opinion.

If the sector-based bad blocks table is sorely-needed, it is worth
considering, but at least it should be optional and not mandatory.

But even it's well implemented optionally, it still looks overkill
because most recent hard drives internally have alternate sectors and
most recent flash based drives have own remap mechanism.

Moreover, how the device corrupts is deeply depends on the nature and
configuration of underlying block device.  In this sense, in-device or
in-driver solution looks better to me.

Badblocks table is about to become a thing of the past, it's almost
stuff of the floppy drive's era.

 I have no idea why my particular HDD did not re-map; perhaps it just takes 
 much longer than I gave it.
 
 As a point of reference, XFS does not do bad block management either; 
 however, 
 the partition driver of IRIX does bad sector management -- so it is 
 implemented one layer below the FS.

Yes, If we implment some kind of redundancy mechanism in the FS layer,
it absolutely should reflect how the the data integrity should be
enhanced in the FS layer.


With regards,
Ryusuke Konishi


 I guess it /may be/ possible to use Linux' `dm' driver in such manner.
 
 
 Cheers,
 -- 
 dexen deVries
 
 [[[↓][→]]]
 
 all dichotomies are either true or false is a true paradox because it's 
 paradoxical only if it is a paradox ;)
 --
 To unsubscribe from this list: send the line unsubscribe linux-nilfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-nilfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: read error on superblock

2012-07-24 Thread Nick Martin
Hello All,

I see in this thread, what I think is a misunderstanding of the role of the 
disk drive in the face of a hard read error.
The drive cannot simply map an unreadable sector to a new sector based on a 
read failure.  
If the read has failed, the drive does not contain the correct contents for the 
sector.  
The read failure needs to persist until a write is received for the unreadable 
sector.
When the write is received, the new data can be written to a good sector and 
the sector map adjusted.
One of the jobs of RAID is to reconstruct the data from other sources and write 
the correct data back to the same sector of the drive allowing the drive to do 
this remapping.
If you are not using RAID software or hardware, there is typically no way to 
reconstruct the data.

If the read error is correctable using ECC, the drive does know the proper 
contents for the sector and could choose to re-map it, but likely will not do 
so.
This could be done without reporting the error to the host system.  
It is my understanding that ECC errors detected within the drive are not at all 
uncommon.
If the ECC can correct the error, the valid data is typically returned, and the 
drive moves on to the next request.
If ECC cannot correct the error, the first thing the drive will do is attempt 
to re-read the media.  
If it is able to read the data the next time, even if it had to use ECC to 
correct it, it will still return the valid data and may move on to the next 
request.
At the file system level, a slow read would be observed, not a read error.

The behavior of the drive firmware is vendor specific.  Sometimes it is 
configurable.  The behavior of the firmware will vary across different classes 
and generations of drives even from the same vendor.
Drive firmware that makes up its own data and remaps the sector to correct a 
read error should never be sold by a reputable drive vendor.

The origin of the bad block table in the file system pre-dates drive hardware 
sector re-mapping.   
When it was not likely that writing to a sector whose contents were previously 
unreadable would result in being to read that sector back again, then it was a 
good idea to not write anything there in the future.
With modern drive technologies, it is likely that a write to a previously 
unreadable sector will result in being able to read back the newly written 
data.   The value of a bad block map in the file system is now minimized.
In addition with hardware and software RAID technology now available to 
everyone, many volumes will never in their lifetime, return a single read error 
to a file system.  
Errors are hidden and corrected at lower levels.  The file system observes 
perfect media, or in catastrophic failure of a RAID system, media offline.

I recommend assuming modern storage devices and subsystems, and focusing 
development efforts on file system issues that remain.

Thanks,
Nick Martin

-Original Message-
From: linux-nilfs-ow...@vger.kernel.org 
[mailto:linux-nilfs-ow...@vger.kernel.org] On Behalf Of Ryusuke Konishi
Sent: Tuesday, July 24, 2012 11:47 AM
To: dexen deVries; Vyacheslav Dubeyko
Cc: linux-nilfs@vger.kernel.org
Subject: Re: read error on superblock

On Tue, 24 Jul 2012 09:52:18 +0200, dexen deVries wrote:
 Hi Vyacheslav,
 
 
 On Tuesday 24 of July 2012 10:26:37 you wrote:
  I am afraid that it is not so good from the end user point of view.
  
  First of all, the message mount: /dev/sda3: can't read superblock 
  can confuse user. The reason is bad sectors inside the volume but 
  user is informed about impossibility to read superblock.
  
  Secondly, it is possible situation when it really needs to use a 
  volume in the case of presence of bad sectors. And I think that 
  users can expect such NILFS behavior because of declared reliability.
  
  Unfortunately, as I can understand, NILFS hasn't bad blocks table 
  and can't process situation of bad blocks presence on volume 
  correctly. It means that NILFS interprets bad blocks as exceptional 
  case. But from my point of view, it makes sense to interpret bad 
  blocks as usual thing and try to work in the presence of ones. For 
  example, fsck potentially can check NILFS volume on bad blocks 
  presence, construct bad blocks table and save it on the volume.

NILFS does't have sector-based bad blocks table, but it has an error flag on 
the segment usage file (sufile).  If a segment is marked 'erroneous', it will 
not be allocated.

At present, this doesn't work together with badblocks (mkfs.nilfs2), nor the 
recovery logic.  However it is applicable for this purpose if needed.

  I suggest to add virtual special file for bad blocks description. 
  It can be described by inode in ifile and all bad blocks can be 
  described in DAT file as parts of this virtual special file. So, 
  as a result, NILFS file system driver will have bad blocks table 
  which can be a basis for excluding bad blocks from operation and 
  trying to survive in the not good

Re: read error on superblock

2012-07-23 Thread Vyacheslav Dubeyko
Hi,

On Mon, 2012-07-23 at 10:45 +0200, dexen deVries wrote:
 Hi list,
 
 
 a harddrive got some bad sectors and now one NILFS filesystem can't be 
 mounted;
 
 mount: /dev/sda3: can't read superblock
 
 I can (try to) copy this filesystem to another drive; how do I proceed form 
 that point? Does it make any sense to substitute another superblock for this 
 one? (either to use a spare superblock, if such exists, or put a new 
 superblock on the damaged sector(s)).
 
 
 Regards,

It exits second superblock at the end of NILFS volume. But it can be not
in fully synchronous state with primary ones (as I guess).
Theoretically, it is possible to copy secondary superblock on the place
of primary. But I am afraid that the NILFS volume can be in inconsistent
state anyway.

Do you sure that this volume doesn't contain another damaged sectors?

With the best regards,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line unsubscribe linux-nilfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read error on superblock

2012-07-23 Thread Vyacheslav Dubeyko
Hi Dexen,

On Mon, 2012-07-23 at 11:24 +0200, dexen deVries wrote:
 Hi Vyacheslaw,
 
 
 On Monday 23 of July 2012 13:17:28 you wrote:
  It exits second superblock at the end of NILFS volume. But it can be not
  in fully synchronous state with primary ones (as I guess).
  Theoretically, it is possible to copy secondary superblock on the place
  of primary. But I am afraid that the NILFS volume can be in inconsistent
  state anyway.
 
 any hints how to locate the superblock? what offset to look at, and what's 
 the 
 magic number(s)?

Usually, secondary superblock is located in the last block (4 KB) of the
volume. In nilfs2_fs.h exists such

#define NILFS_SB2_OFFSET_BYTES(devsize) devsize)  12) - 1)  12)

which define placement of the secondary superblock (devsize is size of
the device in bytes).

Magic number of NILFS2 is 0x3434. It is located on 0x0006 bytes offset
from superblock's begin. 

 
  Do you sure that this volume doesn't contain another damaged sectors?
 
 
 in my case, that doesn't matter: all the data i want to recover is either in 
 Git (which does internal consistency checks) or in MySQL, which also does 
 /some/ consistency checks.
 
 
 Cheers,

With the best regards,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line unsubscribe linux-nilfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read error on superblock

2012-07-23 Thread Ryusuke Konishi
Hi,
On Mon, 23 Jul 2012 13:17:28 +0400, Vyacheslav Dubeyko wrote:
 Hi,
 
 On Mon, 2012-07-23 at 10:45 +0200, dexen deVries wrote:
  Hi list,
  
  
  a harddrive got some bad sectors and now one NILFS filesystem can't be 
  mounted;
  
  mount: /dev/sda3: can't read superblock
  
  I can (try to) copy this filesystem to another drive; how do I proceed form 
  that point? Does it make any sense to substitute another superblock for 
  this 
  one? (either to use a spare superblock, if such exists, or put a new 
  superblock on the damaged sector(s)).

NILFS tries to use the second superblock automatically if the primary
super block was broken.  And, NILFS even tries to recover the primary
superblock by copying the secondary superblock.

  mount: /dev/sda3: can't read superblock

Looks weird.  mount.nilfs2 doesn't output this error message.

Is mount.nilfs2 installed in /sbin directory?

Could you try mount.nilfs2 as follows instead of the mount program?

# mount.nilfs2 device mount-point

 or

# mount -t nilfs2 device mount-point


Regards,
Ryusuke Konishi

  Regards,
 
 It exits second superblock at the end of NILFS volume. But it can be not
 in fully synchronous state with primary ones (as I guess).
 Theoretically, it is possible to copy secondary superblock on the place
 of primary. But I am afraid that the NILFS volume can be in inconsistent
 state anyway.
 
 Do you sure that this volume doesn't contain another damaged sectors?
 
 With the best regards,
 Vyacheslav Dubeyko.
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-nilfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-nilfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read error on superblock

2012-07-23 Thread dexen deVries
Hi Ryusuke,


On Monday 23 of July 2012 18:39:07 you wrote:
 
 # mount -t nilfs2 device mount-point

that's what I've tried.

I guess the problem is, the harddrive have not re-allocated the sector as of 
yet, so it is /unreadable/ rather than merely containing wrong data.


I'll see later on a while if the drive can re-allocate the sector.

-- 
dexen deVries

[[[↓][→]]]

all dichotomies are either true or false is a true paradox because it's 
paradoxical only if it is a paradox ;)
--
To unsubscribe from this list: send the line unsubscribe linux-nilfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read error on superblock

2012-07-23 Thread dexen deVries
Hi again,

On Monday 23 of July 2012 18:39:07 you wrote:
 Looks weird.  mount.nilfs2 doesn't output this error message.


another computer, same drive:

coil!root!/mnt # mount.nilfs2 -v /dev/sdc3 x -o errors=continue,norecovery
mount.nilfs2: Error while mounting /dev/sdc3 on x: Input/output error


also, in dmesg:
 NILFS warning: mounting unchecked fs
 ((lotsa ATA read error stuff))
 NILFS: error searching super root.


-- 
dexen deVries

[[[↓][→]]]

all dichotomies are either true or false is a true paradox because it's 
paradoxical only if it is a paradox ;)
--
To unsubscribe from this list: send the line unsubscribe linux-nilfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read error on superblock

2012-07-23 Thread Ryusuke Konishi
Hi,
On Mon, 23 Jul 2012 13:06:57 +0200, dexen deVries wrote:
 Hi again,
 
 On Monday 23 of July 2012 18:39:07 you wrote:
  Looks weird.  mount.nilfs2 doesn't output this error message.
 
 
 another computer, same drive:
 
 coil!root!/mnt # mount.nilfs2 -v /dev/sdc3 x -o errors=continue,norecovery
 mount.nilfs2: Error while mounting /dev/sdc3 on x: Input/output error
 
 
 also, in dmesg:
  NILFS warning: mounting unchecked fs
  ((lotsa ATA read error stuff))
  NILFS: error searching super root.

Uum, the device seems to have serious problem.
Can you copy the contents of the device by dd command?

 # dd if=/dev/sdc3 of=path-to-other/nilfs.img

Regards,
Ryusuke Konishi

 -- 
 dexen deVries
 
 [[[↓][→]]]
 
 all dichotomies are either true or false is a true paradox because it's 
 paradoxical only if it is a paradox ;)
 --
 To unsubscribe from this list: send the line unsubscribe linux-nilfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-nilfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read error on superblock

2012-07-23 Thread Ryusuke Konishi
On Mon, 23 Jul 2012 20:24:10 +0200, dexen deVries wrote:
 Hi again,
 
 
 I've copied the whole filesystem elsewhere (to a file) with `ddrescue'. It 
 found 
 one damaged area on the drive, but apparently neither at start nor at the end 
 of partition.
 
 The FS on the drive was marked as `dirty' (requiring recovery upon mount). My 
 guess is that kernel attempted recovery, and gave up upon read error.
 
 Unfortunately, the `norecovery' option did not help with the drive; it only 
 helped once i've moved whole FS to file.
 
 
 Log from ddrescue:
 
 
 # Rescue Logfile. Created by GNU ddrescue version 1.14
 # Command line: ddrescue /dev/sdc3 sda3 sda3.log
 # current_pos  current_status
 0x149E0CCC00 +
 #  possize  status
 0x  0x149E0CC000  +
 0x149E0CC000  0x1000  -
 0x149E0CD000  0x11220D3000  +
 
 
 my understanding is, the following line describes the damaged area, format: 
 start length status-marker (`-' for error)
 0x149E0CC000  0x1000  -
 
 
 Once the FS was copied to a file, it mounted correctly:
 # mount -o ro,loop,norecovery ./sda3.img ./some-mountpoint 
 
 
 My gripe with current (linux-3.5.0) NILFS2 driver is that I couldn't tell it 
 to ignore read errors and thus force it to mount the filesystem.

Good point.  The current recovery logic is intentionally implemented
so that it aborts when having met an I/O error.

This treatment should not be applied at least if the norecovery option
is specified.

Thanks,
Ryusuke Konishi

 Ony after I've 
 moved some 160GB of FS to a file (that's a bit tedious :P) it opened the FS 
 just fine.
 
 
 Cheers,
 -- 
 dexen deVries
 
 1972 - Dennis Ritchie invents a powerful gun that shoots both forward and 
 backward simultaneously. Not satisfied with the number of deaths and 
 permanent 
 maimings from that invention he invents C and Unix.
 --
 To unsubscribe from this list: send the line unsubscribe linux-nilfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-nilfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html