Re: Likelihood of read error, recover device failure raid10

2016-08-14 Thread Wolfgang Mader
On Sunday, August 14, 2016 8:04:14 PM CEST you wrote:
> On Sunday, August 14, 2016 10:20:39 AM CEST you wrote:
> > On Sat, Aug 13, 2016 at 9:39 AM, Wolfgang Mader
> > 
> > <wolfgang_ma...@brain-frog.de> wrote:
> > > Hi,
> > > 
> > > I have two questions
> > > 
> > > 1) Layout of raid10 in btrfs
> > > btrfs pools all devices and than stripes and mirrors across this pool.
> > > Is
> > > it therefore correct, that a raid10 layout consisting of 4 devices
> > > a,b,c,d is _not_
> > > 
> > >   raid0
> > >|
> > >|---|
> > > 
> > >   -
> > > 
> > > |a|  |b|  |c|  |d|
> > > |
> > >raid1raid1
> > > 
> > > Rather, there is no clear distinction of device level between two
> > > devices
> > > which form a raid1 set which are than paired by raid0, but simply, each
> > > bit is mirrored across two different devices. Is this correct?
> > 
> > All of the profiles apply to block groups (chunks), and that includes
> > raid10. They only incidentally apply to devices since of course block
> > groups end up on those devices, but which stripe ends up on which
> > device is not consistent, and that ends up making Btrfs raid10 pretty
> > much only able to survive a single device loss.
> > 
> > I don't know if this is really thoroughly understood. I just did a
> > test and I kinda wonder if the reason for this inconsistent assignment
> > is a difference between the initial stripe>devid pairing at mkfs time,
> > compared to subsequent pairings done by kernel code. For example, I
> > 
> > get this from mkfs:
> > item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15715
> > itemsize
> > 
> > 176 chunk length 16777216 owner 2 stripe_len 65536
> > 
> > type SYSTEM|RAID10 num_stripes 4
> > 
> > stripe 0 devid 4 offset 1048576
> > dev uuid: 736ba7b3-f21f-4643-8a59-9869b3526a82
> > stripe 1 devid 3 offset 1048576
> > dev uuid: af95126a-e674-425c-af01-2599d66d9d06
> > stripe 2 devid 2 offset 1048576
> > dev uuid: 1c3038ca-2615-414e-9383-d326b942f647
> > stripe 3 devid 1 offset 20971520
> > dev uuid: 969a95d3-d76d-44dc-9364-9d1f6e449a74
> > 
> > item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 37748736) itemoff 15539
> > itemsize
> > 
> > 176 chunk length 2147483648 owner 2 stripe_len 65536
> > 
> > type METADATA|RAID10 num_stripes 4
> > 
> > stripe 0 devid 4 offset 9437184
> > dev uuid: 736ba7b3-f21f-4643-8a59-9869b3526a82
> > stripe 1 devid 3 offset 9437184
> > dev uuid: af95126a-e674-425c-af01-2599d66d9d06
> > stripe 2 devid 2 offset 9437184
> > dev uuid: 1c3038ca-2615-414e-9383-d326b942f647
> > stripe 3 devid 1 offset 29360128
> > dev uuid: 969a95d3-d76d-44dc-9364-9d1f6e449a74
> > 
> > item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 2185232384) itemoff 15363
> > 
> > itemsize 176
> > 
> > chunk length 2147483648 owner 2 stripe_len 65536
> > type DATA|RAID10 num_stripes 4
> > 
> > stripe 0 devid 4 offset 1083179008
> > dev uuid: 736ba7b3-f21f-4643-8a59-9869b3526a82
> > stripe 1 devid 3 offset 1083179008
> > dev uuid: af95126a-e674-425c-af01-2599d66d9d06
> > stripe 2 devid 2 offset 1083179008
> > dev uuid: 1c3038ca-2615-414e-9383-d326b942f647
> > stripe 3 devid 1 offset 1103101952
> > dev uuid: 969a95d3-d76d-44dc-9364-9d1f6e449a74
> > 
> > Here you can see every chunk type has the same stripe to devid
> > pairing. But once the kernel starts to allocate more data chunks, the
> > pairing is different from mkfs, yet always (so far) consistent for
> > each additional kernel allocated chunk.
> > 
> > item 7 key (FIRST_CHUNK_TREE CHUNK_ITEM 4332716032) itemoff 15187
> > 
> > itemsize 176
> > 
> > chunk length 2147483648 owner 2 stripe_len 65536
> > type DATA|RAID10 num_stripes 4
> > 
> > stripe 0 devid 2 offset 2156920832
> > dev uuid: 1c3038ca-2615-414e-9383-d326b942f647
> > stripe 1 devid 3 offset 2156920832
> > dev uuid: af95126a-

Re: Likelihood of read error, recover device failure raid10

2016-08-14 Thread Wolfgang Mader
On Sunday, August 14, 2016 10:20:39 AM CEST you wrote:
> On Sat, Aug 13, 2016 at 9:39 AM, Wolfgang Mader
> 
> <wolfgang_ma...@brain-frog.de> wrote:
> > Hi,
> > 
> > I have two questions
> > 
> > 1) Layout of raid10 in btrfs
> > btrfs pools all devices and than stripes and mirrors across this pool. Is
> > it therefore correct, that a raid10 layout consisting of 4 devices
> > a,b,c,d is _not_
> > 
> >   raid0
> >|
> >|---|
> > 
> >   -
> > 
> > |a|  |b|  |c|  |d|
> > |
> >raid1raid1
> > 
> > Rather, there is no clear distinction of device level between two devices
> > which form a raid1 set which are than paired by raid0, but simply, each
> > bit is mirrored across two different devices. Is this correct?
> 
> All of the profiles apply to block groups (chunks), and that includes
> raid10. They only incidentally apply to devices since of course block
> groups end up on those devices, but which stripe ends up on which
> device is not consistent, and that ends up making Btrfs raid10 pretty
> much only able to survive a single device loss.
> 
> I don't know if this is really thoroughly understood. I just did a
> test and I kinda wonder if the reason for this inconsistent assignment
> is a difference between the initial stripe>devid pairing at mkfs time,
> compared to subsequent pairings done by kernel code. For example, I
> get this from mkfs:
> 
> item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 15715 itemsize
> 176 chunk length 16777216 owner 2 stripe_len 65536
> type SYSTEM|RAID10 num_stripes 4
> stripe 0 devid 4 offset 1048576
> dev uuid: 736ba7b3-f21f-4643-8a59-9869b3526a82
> stripe 1 devid 3 offset 1048576
> dev uuid: af95126a-e674-425c-af01-2599d66d9d06
> stripe 2 devid 2 offset 1048576
> dev uuid: 1c3038ca-2615-414e-9383-d326b942f647
> stripe 3 devid 1 offset 20971520
> dev uuid: 969a95d3-d76d-44dc-9364-9d1f6e449a74
> item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 37748736) itemoff 15539 itemsize
> 176 chunk length 2147483648 owner 2 stripe_len 65536
> type METADATA|RAID10 num_stripes 4
> stripe 0 devid 4 offset 9437184
> dev uuid: 736ba7b3-f21f-4643-8a59-9869b3526a82
> stripe 1 devid 3 offset 9437184
> dev uuid: af95126a-e674-425c-af01-2599d66d9d06
> stripe 2 devid 2 offset 9437184
> dev uuid: 1c3038ca-2615-414e-9383-d326b942f647
> stripe 3 devid 1 offset 29360128
> dev uuid: 969a95d3-d76d-44dc-9364-9d1f6e449a74
> item 6 key (FIRST_CHUNK_TREE CHUNK_ITEM 2185232384) itemoff 15363
> itemsize 176
> chunk length 2147483648 owner 2 stripe_len 65536
> type DATA|RAID10 num_stripes 4
> stripe 0 devid 4 offset 1083179008
> dev uuid: 736ba7b3-f21f-4643-8a59-9869b3526a82
> stripe 1 devid 3 offset 1083179008
> dev uuid: af95126a-e674-425c-af01-2599d66d9d06
> stripe 2 devid 2 offset 1083179008
> dev uuid: 1c3038ca-2615-414e-9383-d326b942f647
> stripe 3 devid 1 offset 1103101952
> dev uuid: 969a95d3-d76d-44dc-9364-9d1f6e449a74
> 
> Here you can see every chunk type has the same stripe to devid
> pairing. But once the kernel starts to allocate more data chunks, the
> pairing is different from mkfs, yet always (so far) consistent for
> each additional kernel allocated chunk.
> 
> 
> item 7 key (FIRST_CHUNK_TREE CHUNK_ITEM 4332716032) itemoff 15187
> itemsize 176
> chunk length 2147483648 owner 2 stripe_len 65536
> type DATA|RAID10 num_stripes 4
> stripe 0 devid 2 offset 2156920832
> dev uuid: 1c3038ca-2615-414e-9383-d326b942f647
> stripe 1 devid 3 offset 2156920832
> dev uuid: af95126a-e674-425c-af01-2599d66d9d06
> stripe 2 devid 4 offset 2156920832
> dev uuid: 736ba7b3-f21f-4643-8a59-9869b3526a82
> stripe 3 devid 1 offset 2176843776
> dev uuid: 969a95d3-d76d-44dc-9364-9d1f6e449a74
> 
> This volume now has about a dozen chunks created by kernel code, and
> the stripe X to devid Y mapping is identical. Using dd and hexdump,
> I'm finding that stripe 0 and 1 are mirrored pairs, they contain
> identical information. And stripe 2 and 3 are mirrored pairs. And the
> raid0 striping happens across 01 and 23 such that odd-numbered 64KiB
> (default) stripe elements go on 01, and even-numbered stripe elements
> go on 23. If the stripe

Likelihood of read error, recover device failure raid10

2016-08-13 Thread Wolfgang Mader
Hi,

I have two questions

1) Layout of raid10 in btrfs
btrfs pools all devices and than stripes and mirrors across this pool. Is it 
therefore correct, that a raid10 layout consisting of 4 devices a,b,c,d is 
_not_

  raid0
   |---|
  -
|a|  |b|  |c|  |d|
   raid1raid1

Rather, there is no clear distinction of device level between two devices 
which form a raid1 set which are than paired by raid0, but simply, each bit is 
mirrored across two different devices. Is this correct?

2) Recover raid10 from a failed disk
Raid10 inherits its redundancy from the raid1 scheme. If I build a raid10 from 
n devices, each bit is mirrored across two devices. Therefore, in order to 
restore a raid10 from a single failed device, I need to read the amount of 
data worth this device from the remaining n-1 devices. In case, the amount of 
data on the failed disk is in the order of the number of bits for which I can 
expect an unrecoverable read error from a device, I will most likely not be 
able to recover from the disk failure. Is this conclusion correct, or am I am 
missing something here.

Thanks,
Wolfgang

signature.asc
Description: This is a digitally signed message part.


Concurrent write access

2015-07-09 Thread Wolfgang Mader
Hi,

I have a btrfs raid10 which is connected to a server hosting multiple virtual 
machine. Does btrfs support connecting the same subvolumes of the same raid to 
multiple virtual machines for concurrent read and write? The situation would 
be the same as, say, mounting user homes from the same nfs share on different 
machines.


Thanks,
Wolfgang

signature.asc
Description: This is a digitally signed message part.


Re: Concurrent write access

2015-07-09 Thread Wolfgang Mader
On Thursday 09 July 2015 22:06:09 Hugo Mills wrote:
 On Thu, Jul 09, 2015 at 11:34:40PM +0200, Wolfgang Mader wrote:
  Hi,
  
  I have a btrfs raid10 which is connected to a server hosting
  multiple virtual machine. Does btrfs support connecting the same
  subvolumes of the same raid to multiple virtual machines for
  concurrent read and write? The situation would be the same as, say,
  mounting user homes from the same nfs share on different machines.
 
It'll depend on the protocol you use to make the subvolumes visible
 within the VMs.
 
btrfs subvolumes aren't block devices, so that rules out most of
 the usual approaches. However, there are two methods I've used which I
 can confirm will work well: NFS and 9p.
 
NFS will work as a root filesystem, and will work with any
 host/guest, as long as there's a network connection between the two.
 9p is, at least in theory, faster (particularly with virtio), but
 won't let you boot with the 9p device as your root FS. You'll need
 virtualiser support if you want to run a virtio 9p -- I know qemu/kvm
 supports this; I don't know if anything else supports it.


Thanks for the overview. It it qmeu/kvm in fact, to this is an option. Right 
now, however, I connect the discs as virtual discs and not the file system, 
but only to one virtual machine.

Best,
Wolfgang

 
You can probably use Samba/CIFS as well. It'll be slower than the
 virtualised 9p, and not be able to host a root filesystem. I haven't
 tried this one, because Samba and I get on like a house on fire(*).
 
Hugo.
 
 (*) Screaming, shouting, people running away, emergency services.


signature.asc
Description: This is a digitally signed message part.


Re: How to get the devid of a missing device

2015-04-27 Thread Wolfgang Mader
On Monday, April 27, 2015 02:11:05 AM Duncan wrote:
 Wolfgang Mader posted on Sun, 26 Apr 2015 20:39:34 +0200 as excerpted:
  Hello,
  
  I have a raid10 with one device missing. I would like to use btrfs
  replace to replace it. However, I am unsure on how to obtain the devid
  of the missing device.
 
 The devid is if the device is still active in the filesystem.  If it's
 missing...
 
 btrfs device delete missing
 
 That, along with a bunch of other likely helpful information, is covered
 on the wiki:
 
 https://btrfs.wiki.kernel.org
 
 Specifically for that (reassemble from the wrap, too lazy to fiddle with
 it on my end):
 
 https://btrfs.wiki.kernel.org/index.php/
 Using_Btrfs_with_Multiple_Devices#Replacing_failed_devices
 
 But since it was a four-device raid10 and four devices is the minimum for
 that, I don't believe it'll let you delete until you device add, first.

Thanks for your answer. I know the stuff of that page, but since it is 
possible to use btrfs replace on a missing device I wanted to try that 
approach.

 
 Of course that means probably some hours for the add, then some more
 hours for the delete missing, during which you obviously hope not to lose
 another device.  Of course the sysadmin's backup rule of thumb applies --
 if you don't have a backup, by definition, loss of the data isn't a big
 deal, or you'd have it backed up.  (And the corollary, it's not a backup
 until it's tested, applies as well.)  So you shouldn't have to worry
 about loss of a second device during that time, because it's either
 backed up or the data isn't worth the trouble to backup and thus loss of
 it with the loss of a second device isn't a big deal.

Well, of course, there is a backup on a different machine. Its in the same 
room, but who has the luxury for off-site backups in a home-use setting! :)

 
 That page doesn't seem to cover direct replacement, probably because the
 replace command is new and it hasn't been updated.  But AFAIK replace
 doesn't work with a missing device anyway; it's the fast way to replace a
 still listed device, so you don't have to add and then delete, but the
 device has to still be there in ordered to use that shortcut.  (You could
 try using missing with the -r option tho, just in case it works now.  The
 wiki/manpage is a bit vague on that point.)
 
  Btw, the file system is too old for skinny metadata and extended inode
  refs. If I do a btrfs replace or a btrfs device add, must I myself
  ensure that the new features are not enables for the new device which is
  to be added?
 
 I don't believe there's any way to set that manually even if you wanted
 to -- you don't use mkfs on it and the add/replace would overwrite
 existing if you did.  The new device should just take on the attributes
 of the filesystem you're adding it to.

 Good to know.

Thanks

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to get the devid of a missing device

2015-04-27 Thread Wolfgang Mader
On Monday, April 27, 2015 12:48:07 PM Anand Jain wrote:
 On 04/27/2015 02:39 AM, Wolfgang Mader wrote:
  Hello,
  
  I have a raid10 with one device missing. I would like to use btrfs replace
  to replace it. However, I am unsure on how to obtain the devid of the
  missing device. Having the filesystem mounted in degraded mode under mnt,
  btrfs fs
   At the user end there is no way. unless you want to use gdb and dump
   the fs_uuids and check.
 
   Submitted these patches as of now to obtain from the logs. Will
   help in the situation when device is missing at the time of mount.
 
 Btrfs: check error before reporting missing device and add uuid
 Btrfs: log when missing device is created
 
   For long term we have sysfs interface, patches are in the ML if you
   want to tests.
 
 Good luck.
 
 Anand

Great. Thank you for patching this in.

Best,
Ẃolfgang

 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to get the devid of a missing device

2015-04-26 Thread Wolfgang Mader
Hello,

I have a raid10 with one device missing. I would like to use btrfs replace to 
replace it. However, I am unsure on how to obtain the devid of the missing 
device. Having the filesystem mounted in degraded mode under mnt, btrfs fs 
show /mnt returns

sudo btrfs filesystem show /mnt
Label: 'dataPool'  uuid: b5f082e2-2ce0-4f91-b54b-c2d26185a635
Total devices 4 FS bytes used 665.88GiB
devid2 size 931.51GiB used 336.03GiB path /dev/sdb
devid4 size 931.51GiB used 336.03GiB path /dev/sdd
devid5 size 931.51GiB used 336.03GiB path /dev/sde
*** Some devices missing

but does not mention the devid of the missing device. The device stat is

sudo btrfs device stats /mnt 
[/dev/sdb].write_io_errs   0 
[/dev/sdb].read_io_errs0 
[/dev/sdb].flush_io_errs   0 
[/dev/sdb].corruption_errs 0 
[/dev/sdb].generation_errs 0 
[(null)].write_io_errs   1448 
[(null)].read_io_errs0 
[(null)].flush_io_errs   0 
[(null)].corruption_errs 0 
[(null)].generation_errs 0 
[/dev/sdd].write_io_errs   0 
[/dev/sdd].read_io_errs0 
[/dev/sdd].flush_io_errs   0 
[/dev/sdd].corruption_errs 0 
[/dev/sdd].generation_errs 0 
[/dev/sde].write_io_errs   0 
[/dev/sde].read_io_errs0 
[/dev/sde].flush_io_errs   0 
[/dev/sde].corruption_errs 0 
[/dev/sde].generation_errs 0


Btw, the file system is too old for skinny metadata and extended inode refs. 
If I do a btrfs replace or a btrfs device add, must I myself ensure that the 
new features are not enables for the new device which is to be added?

Thank!
Wolfgang


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Scrub status: no stats available

2014-05-06 Thread Wolfgang Mader
Dear list,

I am running btrfs on Arch Linux ARM (Linux 3.14.2, Btrfs v3.14.1). I can run 
scrub w/o errors, but I never get stats from scrub status

What I get is
   btrfs scrub status /pools/dataPool
   
   scrub status for b5f082e2-2ce0-4f91-b54b-c2d26185a635
   no stats available
   total bytes scrubbed: 694.13GiB with 0 errors

Please mind the line no stats available. Where can I start digging?

Thank you,
Wolfgang


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Understanding btrfs and backups

2014-03-07 Thread Wolfgang Mader
Duncan, thank you for this comprehensive post. Really helpful as always!

[...]

 As for restoring, since a snapshot is a copy of the filesystem as it
 existed at that point, and the method btrfs exposes for accessing them is
 to mount that specific snapshot, to restore an individual file from a
 snapshot, you simply mount the snapshot you want somewhere and copy the
 file as it existed in that snapshot over top of your current version
 (which will have presumably already been mounted elsewhere, before you
 mounted the snapshot to retrieve the file from), then unmount the
 snapshot and go about your day. =:^)

Please, how do I list mounted snapshots only?

[...]

 
 Since a snapshot is an image of the filesystem as it was at that
 particular point in time, and btrfs by nature copies blocks elsewhere
 when they are modified, all (well, not all as there's metadata like
 file owner, permissions and group, too, but that's handled the same way)
 the snapshot does is map what blocks composed each file at the time the
 snapshot was taken.

Is it correct, that e.g. ownership is recorded separately from the data 
itself, so if I would change the owner of all my files, the respective 
snapshot would only store the old owner information?

[...]

 
 The first time you do this, there's no existing copy at the other end, so
 btrfs send sends a full copy and btrfs receive writes it out.  After
 that, the receive side has a snapshot identical to the one created on the
 send side and further btrfs send/receives to the same set simply
 duplicate the differences between the reference and the new snapshot from
 the send end to the receive end.  As with local snapshots, old ones can
 be deleted on both the send and receive ends, as long as at least one
 common reference snapshot is maintained on both ends, so diffs taken
 against the send side reference can be applied to an appropriately
 identical receive side reference, thereby updating the receive side to
 match the new read-only snapshot on the send side.

Is the receiving side a complete file system in its own right? If so, I only 
need to maintain one common reference in order to apply the received snapshot, 
right. If I would in any way get the send and receive side out of sync, such 
that they do not share a common reference any more, only the send/receive 
would fail, but I still would have the complete filesystem on the receiving 
side, and could copy it all over (cp, rscync) to the send side in case of a 
disaster on the send side. Is this correct?

Thank you!
Best,
Wolfgang

-- 
Wolfgang Mader
wolfgang.ma...@fdm.uni-freiburg.de
Telefon: +49 (761) 203-7710
Institute of Physics
Hermann-Herder Str. 3, 79104 Freiburg, Germany
Office: 207
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read i/o errs and disk replacement

2014-02-19 Thread Wolfgang Mader
On Tuesday 18 February 2014 15:02:51 Chris Murphy wrote:
 On Feb 18, 2014, at 2:33 PM, Wolfgang Mader wolfgang_ma...@brain-frog.de 
wrote:
  Feb 18 13:14:09 deck kernel: ata2.00: failed command: READ DMA
  Feb 18 13:14:09 deck kernel: ata2.00: cmd
  c8/00:08:60:f2:30/00:00:00:00:00/e0 tag 0 dma 4096 in
  
   res
   51/04:08:60:f2:30/00:00:00:00:00/e0
  
  Emask 0x1 (device error)
  Feb 18 13:14:09 deck kernel: ata2.00: status: { DRDY ERR }
  Feb 18 13:14:09 deck kernel: ata2.00: error: { ABRT }
  Feb 18 13:14:09 deck kernel: ata2.15: hard resetting link
  Feb 18 13:14:14 deck kernel: ata2.15: link is slow to respond, please be
  patient (ready=0)
  Feb 18 13:14:19 deck kernel: ata2.15: SRST failed (errno=-16)
  Feb 18 13:14:19 deck kernel: ata2.15: hard resetting link
  Feb 18 13:14:24 deck kernel: ata2.15: link is slow to respond, please be
  patient (ready=0)
  Feb 18 13:14:29 deck kernel: ata2.15: SATA link up 3.0 Gbps (SStatus 123
  SControl F300)
  Feb 18 13:14:29 deck kernel:
  Feb 18 13:14:30 deck kernel: ata2.01: hard resetting link
  Feb 18 13:14:31 deck kernel: ata2.02: hard resetting link
  Feb 18 13:14:31 deck kernel: ata2.03: hard resetting link
  Feb 18 13:14:32 deck kernel: ata2.04: hard resetting link
  Feb 18 13:14:32 deck kernel: ata2.05: hard resetting link
  Feb 18 13:14:33 deck kernel: ata2.06: hard resetting link
  Feb 18 13:14:34 deck kernel: ata2.07: hard resetting link
  Feb 18 13:14:34 deck kernel: ata2.00: configured for UDMA/133
  Feb 18 13:14:34 deck kernel: ata2.01: configured for UDMA/133
  Feb 18 13:14:35 deck kernel: ata2.02: configured for UDMA/133
  Feb 18 13:14:35 deck kernel: ata2.03: configured for UDMA/133
  Feb 18 13:14:35 deck kernel: ata2.04: configured for UDMA/133
  Feb 18 13:14:35 deck kernel: ata2.05: configured for UDMA/133
  Feb 18 13:14:35 deck kernel: ata2.06: configured for UDMA/133
  Feb 18 13:14:35 deck kernel: ata2.07: configured for UDMA/133
  Feb 18 13:14:35 deck kernel: ata2: EH complete
 
 Two things. The full dmesg includes useful information separate from the
 error messages, including the model drive to ata device mapping, and why
 there's a failed read to ATA2.00 yet there's a reset in sequence for
 ata2.01, 2.02, 2.03 and so on. So the entire dmesg would be useful.

You want it, you get it. I attache the full output to this mail, and hope that 
the list allows for attachements. Here is a short summary.

The hd enclosure is a Sharkoon 8-Bay which is connected via e-sata. The hds 
are mostly Segate Barracude ES.2 (ST31000NSSUN1.0T) and one from Hitachi 
(HUA7210SASUN1.0T). All of them are server grade hds and have a device time 
out of 30. The hds are sata 1.0 such that they do not feature native command 
queuing. I have generally a low read/write performance, around 10MB/sec. I 
test the enclosure with a sata 2 disk and got a way better performance, around 
70MB/sec. This is why I sticked with the Skarkoon. If the bad performance is 
due to bad configuration, I would be happy to fix it. :-)

Now to the dmesg output.
The SRST error you spotted in the logs points to wrong jumper setting 
concerning master, slave, cable select. My hds do not have a jumper for this 
setting.

The order of the hds is ata2.00-sda ata2.01-sdc etc. The port multiplier 
through which all those devices are accessed is ata2.15.

I rebooted the system two times to see if the read error count goes up. Didn't 
happen, it sits still at 2.




 
 In any case the actual problem might not be discoverable due to the hard
 resetting. I'm not finding any useful translation, in 5 minute search, for
 SRST. But it makes me suspicious of a configuration problem, like maybe an
 unnecessary jumper setting on a drive or with the enclosure itself. So I'd
 check for that. Also, what model drives are being used? If they are
 consumer drives, they almost certainly have long error recoveries over 30
 minutes. And if the drive is trying to honor the read request for more than
 30 seconds, the default SCSI block layer will time out and produce messages
 like what we see here. So you probably need to change the SCSI block layer
 timeout. To set the command timer to something else use:
 
 echo value /sys/block/device/device/timeout
 
 Where value is e.g. 121 since many consumer drives time out at 120 seconds
 this means the kernel will wait 121 seconds before starting its error
 handling (which includes resetting the drive and then the bus).
  ---end---
  
  This output it repeated several times and than end in this read error
  
  [Tue Feb 18 13:15:48 2014] btrfs: bdev /dev/sdb errs: wr 0, rd 2, flush 0,
  corrupt 0, gen 0
  [Tue Feb 18 13:15:48 2014] ata2: EH complete
  [Tue Feb 18 13:15:48 2014] btrfs read error corrected: ino 1 off
  29184540672 (dev /dev/sdb sector 3207776)
 
 Well that reads like Btrfs knows what sector had a read problem, without
 corruption being the cause, and corrected it. So the question

Read i/o errs and disk replacement

2014-02-18 Thread Wolfgang Mader
Hi all,

well, I hit the first incidence where I really have to work with my btrfs 
setup. To get things straight I want to double-check here to not screw things 
up right from the start. We are talking about a home server. There is no time 
or user pressure involved, and there are backups, too.


Software
-
Linux 3.13.3
Btrfs v3.12


Hardware
---
5 1T hard drives configured to be a raid 10 for both data and metadata
Data, RAID10: total=282.00GiB, used=273.33GiB
System, RAID10: total=64.00MiB, used=36.00KiB
Metadata, RAID10: total=1.00GiB, used=660.48MiB


Error

This is not btrfs' fault but due to an hd error. I saw in the system logs
btrfs: bdev /dev/sdb errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
and a subsequent check on btrfs showed
[/dev/sdb].write_io_errs   0
[/dev/sdb].read_io_errs2
[/dev/sdb].flush_io_errs   0
[/dev/sdb].corruption_errs 0
[/dev/sdb].generation_errs 0

So, I have a read error on sdb.


Questions
---
1)
Do I have to take action immediately (shutdown the system, umount the file 
system)? Can I even ignore the error? Unfortunately, I can not access SMART 
information through the sata interface of the enclosure which hosts the hds.

2)
I only can replace the disk, not add a new one and than swap over. There is no 
space left in the disk enclosure I am using. I also can not guarantee that if 
I remove sdb and start the system up again that all the other disks are named 
the same as they are now, and that the newly added disk will be names sdb 
again. Is this an issue?

3)
I know that btrfs can handle disks of different sizes. Is there a downside if I 
go for a 3T disk and add it to the 1T disks? Is there e.g. more stuff saved on 
the 3T disk, and if this ones fails I lose redundancy? Is a soft transition to 
3T where I replace every dying 1T disk with a 3T disk advisable?


Proposed solution for the current issue
--
1)
Delete the faulted drive using
btrfs device delete /dev/sdb /path/to/pool
2)
Format the new disk with btrfs
mkfs.btrfs
3)
Add the new disk to the filesystem using
btrfs device add /dev/newdiskname /path/to/pool
4)
Balance the file system
btrfs fs balance /path/to/pool

Is this the proper way to deal with the situation?


Thank you for your advice.
Best,
Wolfgang
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Read i/o errs and disk replacement

2014-02-18 Thread Wolfgang Mader
On Tuesday 18 February 2014 11:48:49 Chris Murphy wrote:
 On Feb 18, 2014, at 6:19 AM, Wolfgang Mader wolfgang_ma...@brain-frog.de 
wrote:
  Hi all,
  
  well, I hit the first incidence where I really have to work with my btrfs
  setup. To get things straight I want to double-check here to not screw
  things up right from the start. We are talking about a home server. There
  is no time or user pressure involved, and there are backups, too.
  
  
  Software
  -
  Linux 3.13.3
  Btrfs v3.12
  
  
  Hardware
  ---
  5 1T hard drives configured to be a raid 10 for both data and metadata
  
 Data, RAID10: total=282.00GiB, used=273.33GiB
 System, RAID10: total=64.00MiB, used=36.00KiB
 Metadata, RAID10: total=1.00GiB, used=660.48MiB
  
  Error
  
  This is not btrfs' fault but due to an hd error. I saw in the system logs
  
 btrfs: bdev /dev/sdb errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
  
  and a subsequent check on btrfs showed
  
 [/dev/sdb].write_io_errs   0
 [/dev/sdb].read_io_errs2
 [/dev/sdb].flush_io_errs   0
 [/dev/sdb].corruption_errs 0
 [/dev/sdb].generation_errs 0
  
  So, I have a read error on sdb.
  
  
  Questions
  ---
  1)
  Do I have to take action immediately (shutdown the system, umount the file
  system)? Can I even ignore the error? Unfortunately, I can not access
  SMART
  information through the sata interface of the enclosure which hosts the
  hds.
 A full dmesg should be sufficient to determine if this is due to the drive
 reporting a read error, in which case Btrfs is expected to get a copy of
 the missing data from a mirror, send it up to the application layer without
 error, and then write it to the LBAs of the device(s) that reported the
 original read error. It is kinda important to make sure that there wasn't a
 device reset, but an explicit read error. If the drive merely hangs while
 in recovery, upon reset any way of knowing what sectors were slow or bad is
 lost.

Thank you for your quick response.

The first read error is occurring during system start up when the raid is 
activated for the first time

[Tue Feb 18 13:02:08 2014] btrfs: use lzo compression
[Tue Feb 18 13:02:08 2014] btrfs: disk space caching is enabled
[Tue Feb 18 13:02:09 2014] btrfs: bdev /dev/sdb errs: wr 0, rd 1, flush 0, 
corrupt 0, gen 0

and then dmsg is silent for the next 10 minutes.


The second read error happens while the device is in use and is preceded by

---start--
Feb 18 13:14:09 deck kernel: ata2.15: exception Emask 0x1 SAct 0x0 SErr 0x0 
action 0x6
Feb 18 13:14:09 deck kernel: ata2.15: edma_err_cause=0084 
pp_flags=0001, dev error, EDMA self-disable
Feb 18 13:14:09 deck kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 
action 0x0
Feb 18 13:14:09 deck kernel: ata2.00: failed command: READ DMA
Feb 18 13:14:09 deck kernel: ata2.00: cmd c8/00:08:60:f2:30/00:00:00:00:00/e0 
tag 0 dma 4096 in
  res 51/04:08:60:f2:30/00:00:00:00:00/e0 
Emask 0x1 (device error)
Feb 18 13:14:09 deck kernel: ata2.00: status: { DRDY ERR }
Feb 18 13:14:09 deck kernel: ata2.00: error: { ABRT }
Feb 18 13:14:09 deck kernel: ata2.15: hard resetting link
Feb 18 13:14:14 deck kernel: ata2.15: link is slow to respond, please be 
patient (ready=0)
Feb 18 13:14:19 deck kernel: ata2.15: SRST failed (errno=-16)
Feb 18 13:14:19 deck kernel: ata2.15: hard resetting link
Feb 18 13:14:24 deck kernel: ata2.15: link is slow to respond, please be 
patient (ready=0)
Feb 18 13:14:29 deck kernel: ata2.15: SATA link up 3.0 Gbps (SStatus 123 
SControl F300)
Feb 18 13:14:29 deck kernel: 
Feb 18 13:14:30 deck kernel: ata2.01: hard resetting link
Feb 18 13:14:31 deck kernel: ata2.02: hard resetting link
Feb 18 13:14:31 deck kernel: ata2.03: hard resetting link
Feb 18 13:14:32 deck kernel: ata2.04: hard resetting link
Feb 18 13:14:32 deck kernel: ata2.05: hard resetting link
Feb 18 13:14:33 deck kernel: ata2.06: hard resetting link
Feb 18 13:14:34 deck kernel: ata2.07: hard resetting link
Feb 18 13:14:34 deck kernel: ata2.00: configured for UDMA/133
Feb 18 13:14:34 deck kernel: ata2.01: configured for UDMA/133
Feb 18 13:14:35 deck kernel: ata2.02: configured for UDMA/133
Feb 18 13:14:35 deck kernel: ata2.03: configured for UDMA/133
Feb 18 13:14:35 deck kernel: ata2.04: configured for UDMA/133
Feb 18 13:14:35 deck kernel: ata2.05: configured for UDMA/133
Feb 18 13:14:35 deck kernel: ata2.06: configured for UDMA/133
Feb 18 13:14:35 deck kernel: ata2.07: configured for UDMA/133
Feb 18 13:14:35 deck kernel: ata2: EH complete
---end---

This output it repeated several times and than end in this read error

[Tue Feb 18 13:15:48 2014] btrfs: bdev /dev/sdb errs: wr 0, rd 2, flush 0, 
corrupt 0, gen 0
[Tue Feb 18 13:15:48 2014] ata2: EH complete
[Tue Feb 18 13:15:48 2014] btrfs read error corrected: ino 1 off 29184540672 
(dev /dev/sdb sector 3207776)

This might have to do with the fact, that my hds