On 2010-12-02 00:37, Brian Warner wrote: > On 11/30/10 10:58 PM, Kyle Markley wrote: [...]
>> allmydata.interfaces.UploadUnhappinessError: shares could be placed on >> only 3 server(s) such that any 2 of them have enough shares to recover >> the file, but we were asked to place shares on at least 4 such >> servers. (placed all 4 shares, want to place shares on at least 4 >> servers such that any 2 of them have enough shares to recover the >> file, sent 4 queries to 4 peers, 4 queries placed some shares, 0 >> placed none (of which 0 placed none due to the server being full and 0 >> placed none due to an error)) > >> So it appears it's failing to upload the .login file. The specific >> error message doesn't make sense to me -- if all 4 queries placed some >> shares, and 0 queries placed none, then why hasn't the file become healthy? > > There are two confusing things going on here. The first is that I think > (but I'd have to check the code to be sure) the "4 queries placed some > shares" message is including any "I already have a share" responses. The > second is that the UploadUnhappinessError criteria is more strict than > simply getting all four shares into the grid: it wants the arrangement > of those shares to meet the "servers-of-happiness" criteria. The "at > least 4 such servers" means s-o-h (aka tahoe.cfg's misnamed > "shares.happy") is equal to 4. > > Uploading consists of two phases: share placement, then share upload. If > the proposed arrangement that comes out of the placement phase does not > meet the s-o-h criteria, the upload stops before any shares are placed. > > The share-placement algorithm is usually expecting the > file-doesn't-exist-in-grid-yet case. It sends "please accept share X > (and by the way do you have any other shares?)" messages to each server > in permuted order, all in parallel (I think), with shnums chosen to get > exactly one share per server if everything goes well (i.e. each server > accepts the share offered it, and no preexisting shares were found). > > I'm suspecting that something in the share-placement algorithm is > getting stuck: the particular placement of preexisting shares and the > order in which the queries are being sent/received is causing the > placement algorithm to terminate, but which doesn't result in an > arrangement that will pass the s-o-h test. > > David-Sarah, you know more than I do about s-o-h and the new placement > algorithm.. could you take a look? Given the serverids and SI described > here, I think the permuted order should have been (xxaj,juwm,vjqc,47cs), > but I'd like to confirm that (maybe with a flog trace), because I can't > make that order fit with the other evidence. I can't look at this specific case right now, but the current placement algorithm is known to be insufficient in several cases, which are tested by the following test cases in allmydata.test.test_upload.EncodingParameters: test_problem_layout_comment_187 test_problem_layout_ticket_1124 test_problem_layout_ticket_1128 The exception message above is in fact identical to that in both #1124 and #1128. The latter is a duplicate of <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1130>. A think we reached a concensus on how to fix this in <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1212>, starting at comment:14 (kevan had previously suggested a similar algorithm in <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/778#comment:194>. -- David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com
signature.asc
Description: OpenPGP digital signature
_______________________________________________ tahoe-dev mailing list [email protected] http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
