Re: Is this normal? Should I use scrub?

2015-04-02 Thread Hugo Mills
On Thu, Apr 02, 2015 at 09:58:39AM +, Andy Smith wrote:
 Hi Hugo,
 
 Thanks for your help.

   Makes a change from you answering my questions. :)

 On Wed, Apr 01, 2015 at 03:42:02PM +, Hugo Mills wrote:
  On Wed, Apr 01, 2015 at 03:11:14PM +, Andy Smith wrote:
   Should I run a scrub as well?
  
 Yes. The output you've had so far will be just the pieces that the
  FS has tried to read, and where, as a result, it's been able to detect
  the out-of-date data. A scrub will check and fix everything.
 
 Thanks, things seem to be fine now. :)
 
 What's the difference between verufy and csum here?

   verify would be where the internal consistency checks for metadata
failed. That might be, for example, where it's detected that a tree
node has a newer transaction ID (effectively a monotonic timestamp)
than its parent. This should never happen, so the parent is probably
out of date. If there's another copy of the metadata that doesn't have
the same problem, it can be used to repair the obviously-wrong copy.

   csum is where the checksum validation failed -- this would be, for
example, where some data was modified on one copy and left unchanged
on the older copy, but the metadata for both copies was updated. In
that case, the data on the out-of-date drive wouldn't match the
checksum, and needs to be updated from the good copy.

   Hugo.

 scrub status for 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
 scrub device /dev/sdh (id 2) history
 scrub started at Wed Apr  1 20:05:58 2015 and finished after 14642 
 seconds
 total bytes scrubbed: 383.42GiB with 0 errors
 scrub device /dev/sdg (id 3) history
 scrub started at Wed Apr  1 20:05:58 2015 and finished after 14504 
 seconds
 total bytes scrubbed: 382.62GiB with 0 errors
 scrub device /dev/sdf (id 4) history
 scrub started at Wed Apr  1 20:05:58 2015 and finished after 14436 
 seconds
 total bytes scrubbed: 383.00GiB with 0 errors
 scrub device /dev/sdk (id 5) history
 scrub started at Wed Apr  1 20:05:58 2015 and finished after 21156 
 seconds
 total bytes scrubbed: 1.13TiB with 14530 errors
 error details: verify=10909 csum=3621
 corrected errors: 14530, uncorrectable errors: 0, unverified errors: 0
 scrub device /dev/sdj (id 6) history
 scrub started at Wed Apr  1 20:05:58 2015 and finished after 5693 
 seconds
 total bytes scrubbed: 119.42GiB with 0 errors
 scrub device /dev/sde (id 7) history
 scrub started at Wed Apr  1 20:05:58 2015 and finished after 5282 
 seconds
 total bytes scrubbed: 114.45GiB with 0 errors
 
 Cheers,
 Andy

-- 
Hugo Mills | Debugging is like hitting yourself in the head with
hugo@... carfax.org.uk | hammer: it feels so good when you find the bug, and
http://carfax.org.uk/  | you're allowed to stop debugging.
PGP: 65E74AC0  |PotatoEngineer


signature.asc
Description: Digital signature


Re: Is this normal? Should I use scrub?

2015-04-02 Thread Andy Smith
Hi Hugo,

Thanks for your help.

On Wed, Apr 01, 2015 at 03:42:02PM +, Hugo Mills wrote:
 On Wed, Apr 01, 2015 at 03:11:14PM +, Andy Smith wrote:
  Should I run a scrub as well?
 
Yes. The output you've had so far will be just the pieces that the
 FS has tried to read, and where, as a result, it's been able to detect
 the out-of-date data. A scrub will check and fix everything.

Thanks, things seem to be fine now. :)

What's the difference between verufy and csum here?

scrub status for 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
scrub device /dev/sdh (id 2) history
scrub started at Wed Apr  1 20:05:58 2015 and finished after 14642 
seconds
total bytes scrubbed: 383.42GiB with 0 errors
scrub device /dev/sdg (id 3) history
scrub started at Wed Apr  1 20:05:58 2015 and finished after 14504 
seconds
total bytes scrubbed: 382.62GiB with 0 errors
scrub device /dev/sdf (id 4) history
scrub started at Wed Apr  1 20:05:58 2015 and finished after 14436 
seconds
total bytes scrubbed: 383.00GiB with 0 errors
scrub device /dev/sdk (id 5) history
scrub started at Wed Apr  1 20:05:58 2015 and finished after 21156 
seconds
total bytes scrubbed: 1.13TiB with 14530 errors
error details: verify=10909 csum=3621
corrected errors: 14530, uncorrectable errors: 0, unverified errors: 0
scrub device /dev/sdj (id 6) history
scrub started at Wed Apr  1 20:05:58 2015 and finished after 5693 
seconds
total bytes scrubbed: 119.42GiB with 0 errors
scrub device /dev/sde (id 7) history
scrub started at Wed Apr  1 20:05:58 2015 and finished after 5282 
seconds
total bytes scrubbed: 114.45GiB with 0 errors

Cheers,
Andy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is this normal? Should I use scrub?

2015-04-01 Thread Hugo Mills
   Hi, Andy,

On Wed, Apr 01, 2015 at 03:11:14PM +, Andy Smith wrote:
 I have a 6 device RAID-1 filesystem:

[snip tale of a filesystem with out of data data on one copy of the RAID]

 I have now got a new enclosure and put this system back together
 with all six devices. I was not expecting this filesystem to mount
 without assistance on boot because of /dev/sdk being stale
 compared to the other devices. I suppose this incorrect view is a
 holdover from my experience with mdadm.
 
 Anyway, I booted it and /srv/tank was mounted automatically with all
 six devices.  I got a bunch of these messages as soon as it was
 mounted:
 
 http://pastie.org/private/2ghahjwtzlcm6hwp66hkg
 
 There's lots more of it but it's all like that. That paste is from
 the end of the log and there haven't been any more such message
 since, so that's about 20 minutes (the times are in GMT).
 
 Is that normal output indicating that btrfs is repairing the
 staleness of sdk from the other copy?

   Yes, exactly. That output you pasted looks pretty much exactly like
what I'd expect to see in the situation described above. You might
also expect to see some checksum errors corrected in the data, as well
as the metadata messages you're getting.

 I seem to be able to use the filesystem and a cursory inspection
 isn't turning up anything that I can't read or that seems
 corrupted. I will now run checksums against my last good backup.
 
 Should I run a scrub as well?

   Yes. The output you've had so far will be just the pieces that the
FS has tried to read, and where, as a result, it's been able to detect
the out-of-date data. A scrub will check and fix everything.

   Hugo.

-- 
Hugo Mills | My karma has run over my dogma.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: 65E74AC0  |


signature.asc
Description: Digital signature