Dear Humberto Ortiz-Zuazaga: I'm sorry that nobody replied to this when you posted it back on March 4:
On Thu, Mar 4, 2010 at 6:09 AM, Humberto Ortiz-Zuazaga <[email protected]> wrote: > > Not Healthy! : Unhealthy: multiple versions are recoverable > > * Report: > > Recoverable Versions: 6*seq24-wgxp/10*seq26-4tdb > Unhealthy: there are multiple recoverable versions > Best Recoverable Version: seq26-4tdb > > I can't see how to fix this. A deep check with the repair checkbox > leaves the directories in the same state. Did you find a solution on your own yet? I'm not sure what we intend the repair tool to do in a case like this. Apparently we intend for it to warn you and stop, thus failing safe. Have you tried manually inspecting the directory (if you do, I think you will see the seq26 version of it, which I assume is newer than the seq24 version), deciding if it looks okay, and then saving it again? I guess if you do this, your gateway might detect the older versions when it is uploading your new version and go ahead and overwrite the older shares. I'm not sure if it does that though. Let's see... Looking at tickets marked "mutable"... Hm. There are a lot of tickets that show that mutable upload/download isn't as robust as we would like in the face of unusual situations. (Yours is an unusual situation: somehow the shares of an older version -- seq24 -- weren't overwritten when a newer version was upload, possibly because those six shares were on servers that weren't reachable when you uploaded the newer version.) Here are all the tickets that look vaguely related to the topic of "robust upload/download of mutables": http://allmydata.org/trac/tahoe-lafs/ticket/232# Peer selection doesn't rebalance shares on overwrite of mutable file. http://allmydata.org/trac/tahoe-lafs/ticket/474# uncaught exception in mutable-retrieve: UCW between mapupdate and retrieve http://allmydata.org/trac/tahoe-lafs/ticket/540# inappropriate "uncoordinated write error" after handling a server failure http://allmydata.org/trac/tahoe-lafs/ticket/541# foolscap 'reference'-token bug workaround in mutable publish http://allmydata.org/trac/tahoe-lafs/ticket/546# mutable-file surprise shares raise inappropriate UCWE http://allmydata.org/trac/tahoe-lafs/ticket/547# mapupdate(MODE_WRITE) triggers on a false boundary http://allmydata.org/trac/tahoe-lafs/ticket/548# mutable publish sends queries to servers that have already been asked http://allmydata.org/trac/tahoe-lafs/ticket/549# MODE_WRITE mapupdate: maybe increase epsilon to handle large batches of new servers better http://allmydata.org/trac/tahoe-lafs/ticket/846# allmydata.test.test_system.SystemTest.test_mutable sometimes hangs on a slow machine http://allmydata.org/trac/tahoe-lafs/ticket/893# UCWE when mapupdate gives up too early, then server errors require replacement servers Oh man, we really need to focus on this stuff. Having all of these undesirable behaviors (even, or especially, if they crop up only in rare cases) saps some of the "aura of quality" feeling that I have about Tahoe-LAFS. However, everyone already has tasks underway for Tahoe-LAFS v1.7.0 (due in May), so I'm not sure when we're going to get a volunteer to fix this stuff. Anyway, we definitely need to open a ticket for Humberto Ortiz-Zuazaga's problem, which might end up being cross-linked with some of these other tickets. I suggest the ticket title "how to fix 'multiple versions are recoverable'?". Humberto: would you please open that ticket? It isn't clear to me that the repair tool, or a manual inspect-and-resave, *should* do in this case. Just taking the version with the highest sequence number and overwriting all the shares of the older version might not be the right thing to do, if the other version was caused by someone else simultaneously writing to that mutable thing rather than by some of the servers being unavailable the last time the (single) writer wrote to that thing. On the other hand, simultaneous writers are not a supported use case, so maybe it is perfectly fine for the recovery process to blindly blow away anything with an older sequence number than the latest. Maybe we can discuss that on the ticket. Thank you. Regards, Zooko _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
