While playing with a Tahoe storage grid running 1.2.0 storage nodes, a
trunk client and the blackmatch fuse implementation, I possibly did not
follow the "Don't Do That" coordination directive on mutable files.
The result is that now each time I try to create a new dirnode, even an
unlinked one with "tahoe mkdir", it fails with an
UncoordinatedWriteError and the following message gets logged.
WEIRD they had shares [0] that we didn't know about
I don't really understand what those surprise shares stand for ?
You'll find an excerpt of the incident logfile attached to this mail.
Thanks a lot for you help !
François
# 11:04:54.726 [495]: Publish(dt2ya): starting
* 11:04:54.727 [496]: starting publish, datalen is 0
* 11:04:54.727 [497]: new seqnum will be 1
* 11:04:54.727 [498]: _encrypt_and_encode
* 11:04:54.728 [499]: _generate_shares
* 11:04:54.729 [500]: my new root_hash is
7ahxby45gnrnamse7qgfh7ndw626ekkk4367v765lxpwdcdykesq
* 11:04:54.734 [501]: entering loop
* 11:04:54.734 [502]: current goal: before update:
* 11:04:54.734 [503]: we are planning to push new seqnum=#1
* 11:04:54.734 [504]: current goal: after update: , sh0 to [stczs25o], sh1
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to
[dlbsgdt4]
* 11:04:54.734 [505]: we are planning to push new seqnum=#1
* 11:04:54.734 [506]: need to send 10 new shares
* 11:04:54.734 [507]: _send_shares
* 11:04:54.735 [508]: sending 10 shares
* 11:04:54.756 [509]: 10 shares sent
* 11:04:54.789 [510]: _got_write_answer from fv4p67ck
* 11:04:54.789 [511]: entering loop
* 11:04:54.790 [512]: current goal: before update: , sh0 to [stczs25o], sh1
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to
[dlbsgdt4]
* 11:04:54.790 [513]: we are planning to push new seqnum=#1
* 11:04:54.790 [514]: 9 queries still outstanding
* 11:04:54.813 [515]: _got_write_answer from 7vlsqics
* 11:04:54.813 [516]: entering loop
* 11:04:54.814 [517]: current goal: before update: , sh0 to [stczs25o], sh1
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to
[dlbsgdt4]
* 11:04:54.814 [518]: we are planning to push new seqnum=#1
* 11:04:54.814 [519]: 8 queries still outstanding
* 11:04:54.815 [520]: _got_write_answer from vcqcuxod
* 11:04:54.815 [521]: entering loop
* 11:04:54.815 [522]: current goal: before update: , sh0 to [stczs25o], sh1
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to
[dlbsgdt4]
* 11:04:54.815 [523]: we are planning to push new seqnum=#1
* 11:04:54.816 [524]: 7 queries still outstanding
* 11:04:54.819 [525]: _got_write_answer from dlbsgdt4
* 11:04:54.819 [526]: entering loop
* 11:04:54.819 [527]: current goal: before update: , sh0 to [stczs25o], sh1
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to
[dlbsgdt4]
* 11:04:54.819 [528]: we are planning to push new seqnum=#1
* 11:04:54.820 [529]: 6 queries still outstanding
* 11:04:54.834 [530]: _got_write_answer from om7xqw7k
* 11:04:54.834 [531]: entering loop
* 11:04:54.835 [532]: current goal: before update: , sh0 to [stczs25o], sh1
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to
[dlbsgdt4]
* 11:04:54.835 [533]: we are planning to push new seqnum=#1
* 11:04:54.835 [534]: 5 queries still outstanding
* 11:04:54.839 [538]: UNUSUAL error while writing shares [6] to peerid
3dszcosf FAILURE:
[CopiedFailure instance: Traceback from remote host -- Traceback (most
recent call last):
File "build/bdist.linux-i686/egg/foolscap/eventual.py", line 26, in
_turn
File "build/bdist.linux-i686/egg/foolscap/broker.py", line 536, in
doNextCall
File
"/home/tahoe/lib/python2.4/site-packages/Twisted-8.1.0-py2.4-linux-i686.egg/twisted/internet/defer.py",
line 195, in addCallback
callbackKeywords=kw)
File
"/home/tahoe/lib/python2.4/site-packages/Twisted-8.1.0-py2.4-linux-i686.egg/twisted/internet/defer.py",
line 186, in addCallbacks
self._runCallbacks()
--- <exception caught here> ---
File
"/home/tahoe/lib/python2.4/site-packages/Twisted-8.1.0-py2.4-linux-i686.egg/twisted/internet/defer.py",
line 328, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "build/bdist.linux-i686/egg/foolscap/broker.py", line 536, in
<lambda>
File "build/bdist.linux-i686/egg/foolscap/broker.py", line 571, in
_doCall
File "build/bdist.linux-i686/egg/foolscap/referenceable.py", line 59,
in doRemoteCall
File
"/home/tahoe/lib/python2.4/site-packages/allmydata_tahoe-1.2.0-py2.4.egg/allmydata/storage.py",
line 1139, in remote_slot_testv_and_readv_and_writev
owner_num=0)
File
"/home/tahoe/lib/python2.4/site-packages/allmydata_tahoe-1.2.0-py2.4.egg/allmydata/storage.py",
line 1159, in _allocate_slot_share
fileutil.make_dirs(bucketdir)
File
"/home/tahoe/lib/python2.4/site-packages/allmydata_tahoe-1.2.0-py2.4.egg/allmydata/util/fileutil.py",
line 115, in make_dirs
raise tx
exceptions.OSError: [Errno 122] Disk quota exceeded:
'/home/tahoe/.tahoe/storage/shares/dt/dt2yapm26hivzo5rnb2wdmetsy'
]
* 11:04:54.839 [539]: entering loop
* 11:04:54.840 [540]: current goal: before update: , sh0 to [stczs25o], sh1
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to
[dlbsgdt4]
* 11:04:54.840 [541]: we are planning to push new seqnum=#1
* 11:04:54.841 [542]: current goal: after update: , sh0 to [stczs25o], sh1
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to
[qjmzhhda], sh6 to [stczs25o], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to
[dlbsgdt4]
* 11:04:54.841 [543]: we are planning to push new seqnum=#1
* 11:04:54.841 [544]: need to send 1 new shares
* 11:04:54.841 [545]: _send_shares
* 11:04:54.841 [546]: sending 1 shares
* 11:04:54.843 [547]: 1 shares sent
* 11:04:54.879 [548]: _got_write_answer from cfy2dvty
* 11:04:54.879 [549]: entering loop
* 11:04:54.879 [550]: current goal: before update: , sh0 to [stczs25o], sh1
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to
[qjmzhhda], sh6 to [stczs25o], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to
[dlbsgdt4]
* 11:04:54.879 [551]: we are planning to push new seqnum=#1
* 11:04:54.879 [552]: 4 queries still outstanding
* 11:04:54.887 [553]: _got_write_answer from qjmzhhda
* 11:04:54.887 [554]: entering loop
* 11:04:54.888 [555]: current goal: before update: , sh0 to [stczs25o], sh1
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to
[qjmzhhda], sh6 to [stczs25o], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to
[dlbsgdt4]
* 11:04:54.888 [556]: we are planning to push new seqnum=#1
* 11:04:54.888 [557]: 3 queries still outstanding
* 11:04:54.891 [558]: _got_write_answer from j7bojlfb
* 11:04:54.891 [559]: entering loop
* 11:04:54.892 [560]: current goal: before update: , sh0 to [stczs25o], sh1
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to
[qjmzhhda], sh6 to [stczs25o], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to
[dlbsgdt4]
* 11:04:54.892 [561]: we are planning to push new seqnum=#1
* 11:04:54.892 [562]: 2 queries still outstanding
* 11:04:54.964 [563]: _got_write_answer from stczs25o
* 11:04:54.964 [564]: entering loop
* 11:04:54.964 [565]: current goal: before update: , sh0 to [stczs25o], sh1
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to
[qjmzhhda], sh6 to [stczs25o], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to
[dlbsgdt4]
* 11:04:54.964 [566]: we are planning to push new seqnum=#1
* 11:04:54.964 [567]: 1 queries still outstanding
* 11:04:55.040 [568]: _got_write_answer from stczs25o
o 11:04:55.040 [569]: WEIRD they had shares [0] that we didn't know
about [INCIDENT-TRIGGER]
* 11:04:55.254 [570]: wrote: True
* 11:04:55.254 [571]: read_data: {0:
['\x00\x00\x00\x00\x00\x00\x00\x00\x01\xf8\x0fp\xe3\x9d3b\xd02D\xfc\x0cS\xfd\xa3\xb7\xb5\xe2)J\xe6\xfd\xfa\xff\xdd]\xdfa\x88xQ%\x9c\xc6Y\xb4\x97\x10\xe7}\xf9\n\x12\x12\x90r\xacz\x03\n\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']}
* 11:04:55.254 [572]: set(read_data.keys()): set([0])
* 11:04:55.254 [573]: set(shnums): set([6])
* 11:04:55.254 [574]: entering loop
* 11:04:55.254 [575]: currently surprised, so don't send any new shares
* 11:04:55.254 [576]: no queries outstanding, no placements needed: done
* 11:04:55.254 [577]: UNUSUAL Publish done, UncoordinatedWriteError
# 11:04:54.839 [535]: UNUSUAL an outbound callRemote (that we [z4e2] sent to
someone else [3dsz]) failed on the far end
* 11:04:54.839 [536]: reqID=3, rref=<RemoteReference at 0x293b950
[pb://[EMAIL PROTECTED]:34686,127.0.0.1:34686/ag6rXXXXX]>,
methname=RIStorageServer.tahoe.allmydata.com.slot_testv_and_readv_and_writev
* 11:04:54.839 [537]: the REMOTE failure was: FAILURE:
[CopiedFailure instance: Traceback from remote host -- Traceback (most
recent call last):
File "build/bdist.linux-i686/egg/foolscap/eventual.py", line 26, in
_turn
File "build/bdist.linux-i686/egg/foolscap/broker.py", line 536, in
doNextCall
File
"/home/tahoe/lib/python2.4/site-packages/Twisted-8.1.0-py2.4-linux-i686.egg/twisted/internet/defer.py",
line 195, in addCallback
callbackKeywords=kw)
File
"/home/tahoe/lib/python2.4/site-packages/Twisted-8.1.0-py2.4-linux-i686.egg/twisted/internet/defer.py",
line 186, in addCallbacks
self._runCallbacks()
--- <exception caught here> ---
File
"/home/tahoe/lib/python2.4/site-packages/Twisted-8.1.0-py2.4-linux-i686.egg/twisted/internet/defer.py",
line 328, in _runCallbacks
self.result = callback(self.result, *args, **kw)
File "build/bdist.linux-i686/egg/foolscap/broker.py", line 536, in
<lambda>
File "build/bdist.linux-i686/egg/foolscap/broker.py", line 571, in
_doCall
File "build/bdist.linux-i686/egg/foolscap/referenceable.py", line 59,
in doRemoteCall
File
"/home/tahoe/lib/python2.4/site-packages/allmydata_tahoe-1.2.0-py2.4.egg/allmydata/storage.py",
line 1139, in remote_slot_testv_and_readv_and_writev
owner_num=0)
File
"/home/tahoe/lib/python2.4/site-packages/allmydata_tahoe-1.2.0-py2.4.egg/allmydata/storage.py",
line 1159, in _allocate_slot_share
fileutil.make_dirs(bucketdir)
File
"/home/tahoe/lib/python2.4/site-packages/allmydata_tahoe-1.2.0-py2.4.egg/allmydata/util/fileutil.py",
line 115, in make_dirs
raise tx
exceptions.OSError: [Errno 122] Disk quota exceeded:
'/home/tahoe/.tahoe/storage/shares/dt/dt2yapm26hivzo5rnb2wdmetsy'
]
# 11:04:55.258 [578]: Unhandled Error Traceback (most recent call last):
Failure: allmydata.mutable.common.UncoordinatedWriteError: FAILURE:
[CopiedFailure instance: Traceback from remote host -- Traceback (most recent
call last):
Failure: allmydata.mutable.common.UncoordinatedWriteError:
]
_______________________________________________
tahoe-dev mailing list
[email protected]
http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev