Bug#789290: e2fsprogs: e2fsck claims to have fixed fs, but a second run finds all the same problems

2015-06-20 Thread Zack Weinberg
 It could be caused by a hardware problem, or if it's a RAID array, if
 the RAID array is out of sync, it's possible for two subsequent reads
 to return something else.

It's RAID0, which I *believe* can't get out of sync, but there is much
I do not understand about RAID.

 Can you take the two .gz files and reconstruct a file system on some
 other system with a known-bug disk, and then try running e2fsck on the
 the image?

e2fsck successfully repairs both the skeleton image and the complete
partition image when they are on a known-good disk.

Here's some more detail about the partition.  The Partition does not
start on physical sector boundary thing might be relevant.  (I can't
say I understand how that can even happen, though.)

md127 : active raid0 sde3[1] sdd3[0]
  556720128 blocks super 1.2 512k chunks

# sfdisk -Vl /dev/md127
Disk /dev/md127: 531 GiB, 570081411072 bytes, 1113440256 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 524288 bytes / 1048576 bytes
Disklabel type: dos
Disk identifier: 0x2c9d8483

Device   Boot StartEnd   Sectors   Size Id Type
/dev/md127p1 63  128005919 12800585761G 83 Linux
/dev/md127p2  128005920 1113433019 985427100 469.9G 83 Linux

Partition 1 does not start on physical sector boundary.
Partition 2 does not start on physical sector boundary.
Remaining 7236 unallocated 512-byte sectors.

# sfdisk -Vl /dev/sdd
Disk /dev/sdd: 298.1 GiB, 320072933376 bytes, 625142448 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x3ce15391

Device BootStart   End   Sectors   Size Id Type
/dev/sdd1  *2048   1050623   1048576   512M 83 Linux
/dev/sdd21050624  68159487  6710886432G 82 Linux swap / Solaris
/dev/sdd3   68159488 625142447 556982960 265.6G fd Linux raid autodetect

# sfdisk -Vl /dev/sde
Disk /dev/sde: 298.1 GiB, 320072933376 bytes, 625142448 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x75d309b0

Device BootStart   End   Sectors   Size Id Type
/dev/sde1   2048   1050623   1048576   512M 83 Linux
/dev/sde21050624  68159487  6710886432G 82 Linux swap / Solaris
/dev/sde3   68159488 625142447 556982960 265.6G fd Linux raid autodetect


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#789290: e2fsprogs: e2fsck claims to have fixed fs, but a second run finds all the same problems

2015-06-20 Thread Zack Weinberg
On Sat, Jun 20, 2015 at 11:35 AM, Theodore Ts'o ty...@mit.edu wrote:
 On Sat, Jun 20, 2015 at 11:05:31AM -0400, Zack Weinberg wrote:

 e2fsck successfully repairs both the skeleton image and the complete
 partition image when they are on a known-good disk.

 OK, so this is a storage device issue.  I'd be taking a very jaundiced
 look at the reliability/correctness of your drives.

 It could be that they have a firmware bug in how they handle 512e
 emulation.  (See below.)  Or maybe one or more is starting to go bad.
 (Not all drive failures are predicted by S.M.A.R.T.  In fact, only
 about 50-66% of drive failures are predicted by SMART.  Think about
 that the next time you are tempted to skimp on backups.  :-)

Either is possible.  These are an identical pair of Western Digital
drives and they're about five years old.  They *claim* to have
512-byte physical sectors (per hdparm -I -- full dump at the bottom)
but I would totally believe they are faking that.  Also, the
computer's power supply failed catastrophically in the middle of a
system upgrade, which is how the root filesystem got so very
corrupted.  That could certainly have caused physical damage.  (The
drives are currently attached to a different computer for data
recovery.)

The fsck behavior I originally reported continues to be 100%
reproducible on the physical partition.  There are no hard errors in
the SMART logs for either drive.  (After I'm done copying data off the
/home partition, which was not corrupted, I will run extended
selftests.)  Before the catastrophic power supply failure, there were
no problems writing data to either filesystem inside the RAID array.
And the outer partitions are properly aligned.  Putting all of those
things together, I wonder whether this might be a bug in direct (not
filesystem) access to the block devices for misaligned partitions
within MD-RAID(0).

Is it possible for you to construct a similarly-misaligned partition
within an MD-RAID0 array, unpack the skeleton image I sent you into
that partition, and then try to reproduce my original fsck report on
that?  Do you need more information from me first?

...
 Yeah, that's not good.  Congratulation, whatever software set up your
 RAID configuration is as intelligent (or as obsolete) as Windows XP.
 Which explains why hard drive vendors are still selling 512e drives,
 although they devoutly wish they could stop.

In this case, that would have been cfdisk as of roughly 9 months ago,
and I *think* the problem was it didn't know what to do with an MD
device.  Notice how the outer partitions start at offset 2048 but the
inner partitions start at offset 63?

(The disks are much older than the installation because the computer
is secondhand, and had been completely wiped.)

---
# hdparm -I /dev/sdd

/dev/sdd:

ATA device, with non-removable media
Model Number:   ST3320418AS
Serial Number:  9VM5KB8B
Firmware Revision:  CC44
Transport:  Serial
Standards:
Used: unknown (minor revision code 0x0029)
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logicalmaxcurrent
cylinders1638316383
heads1616
sectors/track6363
--
CHS current addressable sectors:   16514064
LBAuser addressable sectors:  268435455
LBA48  user addressable sectors:  625142448
Logical/Physical Sector size:   512 bytes
device size with M = 1024*1024:  305245 MBytes
device size with M = 1000*1000:  320072 MBytes (320 GB)
cache/buffer size  = 16384 KBytes
Nominal Media Rotation Rate: 7200
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16Current = 16
Recommended acoustic management value: 208, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
 Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
EnabledSupported:
   *SMART feature set
Security Mode feature set
   *Power Management feature set
   *Write cache
   *Look-ahead
   *Host Protected Area feature set
   *WRITE_BUFFER command
   *READ_BUFFER command
   *DOWNLOAD_MICROCODE
Power-Up In Standby feature set
SET_FEATURES required to spinup after power up
SET_MAX security extension
   *Automatic Acoustic Management feature set
   *48-bit Address feature set
   *Device Configuration Overlay feature set
   *Mandatory FLUSH_CACHE
   *FLUSH_CACHE_EXT
   *SMART error logging
   *SMART self-test
   *General Purpose Logging feature set
   *WRITE_{DMA|MULTIPLE}_FUA_EXT
   *64-bit World wide name
Write-Read-Verify 

Bug#789290: e2fsprogs: e2fsck claims to have fixed fs, but a second run finds all the same problems

2015-06-20 Thread Theodore Ts'o
On Sat, Jun 20, 2015 at 11:05:31AM -0400, Zack Weinberg wrote:
 
 e2fsck successfully repairs both the skeleton image and the complete
 partition image when they are on a known-good disk.

OK, so this is a storage device issue.  I'd be taking a very jaundiced
look at the reliability/correctness of your drives.

It could be that they have a firmware bug in how they handle 512e
emulation.  (See below.)  Or maybe one or more is starting to go bad.
(Not all drive failures are predicted by S.M.A.R.T.  In fact, only
about 50-66% of drive failures are predicted by SMART.  Think about
that the next time you are tempted to skimp on backups.  :-)

 Here's some more detail about the partition.  The Partition does not
 start on physical sector boundary thing might be relevant.  (I can't
 say I understand how that can even happen, though.)

Modern hard drives either are have 512e or 4k sectors.  512 byte
emulation is provided for backwards compatibility for Windows XP.
This means that they have a logical sector size of 512, and a physical
sector size of 4069.  This means that it's *allowed* for you to send
writes which are multiple of 512, but which are not aligned on the
4096 byte boundary or not a multiple of 4096 bytes.  However, the
drive will do a read-modify-write cycle, which is not the most
efficient thing in the world.  If the partition is not aligned on a 4k
boundary, then *all* writes will be subject to a read-modify-write
cycle, which will of course trash your write performance.

For drives with a physical and logical sector size of 4k, then in
fact, the LBA numbers sent to the hard drive is in units of 4k.  So a
drive LBA of 2 represents the physical sector which is 8192 bytes from
the beginning of the disk.  However, Linux internal always uses sector
numbers which are in units of 512 bytes, so when you see terms like
LBA thrown around, you need to be careful about whether you are
talking about LBA's from the POV of the Linux kernel, or LBA's from
the SATA/SCSI specification's point of view.

In Linux, the device driver will take a request to read LBA #12 with a
sector count of 8 and turn that into a SATA command requesting read of
2 drive sectors starting at drive LBA #3.  Hence, on a sector with 4k
logical/physical sectors, it is *impossible* to send misaligned reads
or writes; we talk to the drive in units of 4k.

 Device   Boot StartEnd   Sectors   Size Id Type
 /dev/md127p1 63  128005919 12800585761G 83 Linux
 /dev/md127p2  128005920 1113433019 985427100 469.9G 83 Linux
 
 Partition 1 does not start on physical sector boundary.
 Partition 2 does not start on physical sector boundary.
 Remaining 7236 unallocated 512-byte sectors.

Yeah, that's not good.  Congratulation, whatever software set up your
RAID configuration is as intelligent (or as obsolete) as Windows XP.
Which explains why hard drive vendors are still selling 512e drives,
although they devoutly wish they could stop.

It took them a decade longer to introduce native 4k sector drives than
they had originally wished, and most of this can be blamed on the
failure of Windows Vista and the fact that enterprises stuck with
Windows XP for much longer than anyone (including Microsoft) would
have wanted.

- Ted



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#789290: e2fsprogs: e2fsck claims to have fixed fs, but a second run finds all the same problems

2015-06-20 Thread Theodore Ts'o
On Sat, Jun 20, 2015 at 12:38:56PM -0400, Zack Weinberg wrote:
 Either is possible.  These are an identical pair of Western Digital
 drives and they're about five years old.  They *claim* to have
 512-byte physical sectors (per hdparm -I -- full dump at the bottom)
 but I would totally believe they are faking that.

I pulled the spec sheet for the drives; the copyright date is
2008-2009, so I suspect they are a bit older than five years.  It does
claim to be a 512 byte physical sector drives.  So it's possible the
comment about not being aligned is just in error.

 so, the
 computer's power supply failed catastrophically in the middle of a
 system upgrade, which is how the root filesystem got so very
 corrupted.  That could certainly have caused physical damage.  (The
 drives are currently attached to a different computer for data
 recovery.)

Or the disk could just be 6+ years old, and it's just too old.  If I
were you I would just replace the hard drives and be done with it.

 In this case, that would have been cfdisk as of roughly 9 months ago,
 and I *think* the problem was it didn't know what to do with an MD
 device.  Notice how the outer partitions start at offset 2048 but the
 inner partitions start at offset 63?

Or this was just the case where cfdisk didn't want to mess with a
prexisting partition table, and the original partition table as
shipped from the manufacturer was Windows XP compatible.

  - Ted


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#789290: e2fsprogs: e2fsck claims to have fixed fs, but a second run finds all the same problems

2015-06-20 Thread Zack Weinberg
On Sat, Jun 20, 2015 at 1:37 PM, Theodore Ts'o ty...@mit.edu wrote:

 Or the disk could just be 6+ years old, and it's just too old.  If I
 were you I would just replace the hard drives and be done with it.

That's probably going to happen in the near future, yes.

 In this case, that would have been cfdisk as of roughly 9 months ago,
 and I *think* the problem was it didn't know what to do with an MD
 device.  Notice how the outer partitions start at offset 2048 but the
 inner partitions start at offset 63?

 Or this was just the case where cfdisk didn't want to mess with a
 prexisting partition table, and the original partition table as
 shipped from the manufacturer was Windows XP compatible.

It's the partition table *inside* the MD container that's misaligned,
so that can't be it.

I'd like to make certain that there isn't an fsck or kernel bug here.
Is it possible for you to construct a similarly-misaligned partition
within an MD-RAID0 array, unpack the skeleton image I sent you
into that partition, and then try to reproduce my original fsck report
on that?  Do you need more information from me first?

zw


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#789290: e2fsprogs: e2fsck claims to have fixed fs, but a second run finds all the same problems

2015-06-19 Thread Theodore Ts'o
On Fri, Jun 19, 2015 at 12:09:11PM -0400, Zack Weinberg wrote:
 On Fri, Jun 19, 2015 at 11:53 AM, Theodore Ts'o ty...@mit.edu wrote:
 
  I can't reproduce the problem on my end  (see attached)
 
 Still happens for me on the real filesystem (see attached).  We appear
 to be using the same version of e2fsprogs.  What could cause the
 divergence?

It could be caused by a hardware problem, or if it's a RAID array, if
the RAID array is out of sync, it's possible for two subsequent reads
to return something else.

Can you take the two .gz files and reconstruct a file system on some
other system with a known-bug disk, and then try running e2fsck on the
the image?

- Ted


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#789290: e2fsprogs: e2fsck claims to have fixed fs, but a second run finds all the same problems

2015-06-19 Thread Zack Weinberg
I'm going to have to wipe out and recreate this filesystem in order to
continue repairing this computer, but I have saved a complete image of
the partition.  It's a bit too big to just send you (11GB after xz
compression) and also it contains /etc/shadow and similar.  But I'm
happy to do further tests on it.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#789290: e2fsprogs: e2fsck claims to have fixed fs, but a second run finds all the same problems

2015-06-19 Thread Zack Weinberg
On Fri, Jun 19, 2015 at 11:53 AM, Theodore Ts'o ty...@mit.edu wrote:

 I can't reproduce the problem on my end  (see attached)

Still happens for me on the real filesystem (see attached).  We appear
to be using the same version of e2fsprogs.  What could cause the
divergence?

zw


typescript.gz
Description: GNU Zip compressed data


Bug#789290: e2fsprogs: e2fsck claims to have fixed fs, but a second run finds all the same problems

2015-06-19 Thread Theodore Ts'o
On Fri, Jun 19, 2015 at 11:22:21AM -0400, Zack Weinberg wrote:
 Package: e2fsprogs
 Version: 1.42.13-1
 Severity: normal
 
 When e2fsck -yf is run on the filesystem that produced the attached image
 (qcow2 format, xz-compressed, split in half for attachment)
 it reports a big long list of errors and claims to have fixed them.
 If you run it again, it reports the *same* big long list of errors and
 claims to have fixed them.

I can't reproduce the problem on my end  (see attached)

- Ted



typescript.gz
Description: application/gzip