On 7/9/2012 10:47 AM, Greg Troxel wrote:

Brad Rupp <[email protected]> writes:

I am running the following command:

~/tahoe/bin/tahoe deep-check --repair --verbose my-alias:

I would include --add-lease, because the servers might be doing expiration.

The servers should not be doing expiration. They should be all set to expire in 365 days. My data is only a few weeks old.

Having said that, dumber things have happened.  I will check.

Once per week, I do a deep-check with both --repair and --add-leases. I started running these repairs (--repair only) as a sanity check that my data was in fact safe.


The output from repair #1:

repair successful
done: 11801 objects checked
  pre-repair: 11725 healthy, 76 unhealthy
  76 repairs attempted, 76 successful, 0 failed
  post-repair: 11801 healthy, 0 unhealthy

The output from repair #2:

done: 11801 objects checked
  pre-repair: 11789 healthy, 12 unhealthy
  12 repairs attempted, 11 successful, 1 failed
  post-repair: 11800 healthy, 1 unhealthy

This is a clue that your servers are unstable somehow; it isn't normal.
I would use tcpdump and see if connection are coming and going.

To measure without changing, I would do deep-check (with --add-lease)
without using --repair and see if you get stable output.

I will give this a try and let you know.


As you can see, the first repair found and fixed 76 unhealthy
objects. The second repair, approximately 12 hours later, found 12
unhealthy objects and fixed 11 of them.

How many servers?  Are they all stably present, both uptime and
connectivity?

20 servers total, 17 up consistently. This is a public grid (Volunteer Grid 2), so I don't own most of the servers.


Why would the second repair find 12 unhealthy objects?  I would have
expected it to find 0 unhealthy objects given that the first repair
was performed only 12 hours earlier.

Absent servers not being reachable, you are right.

This is just one repair run out of many.  I can consistently get
similar results.  I guess the deeper question is are the objects
stored in Tahoe safe?  Or when I really need them due to a
catastrophic event will I lose a handful of objects due to this?

So far your objects were repairable, so you haven't lost data.  But
there is IMHO something wrong.

There have been cases where objects were not repairable. The runs that I copied and pasted just happened to have successful repairs both times.
_______________________________________________
tahoe-dev mailing list
[email protected]
https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev

Reply via email to