[tahoe-dev] Surprise shares

Francois Deppierraz Tue, 09 Dec 2008 02:40:46 -0800

While playing with a Tahoe storage grid running 1.2.0 storage nodes, a
trunk client and the blackmatch fuse implementation, I possibly did not
follow the "Don't Do That" coordination directive on mutable files.


The result is that now each time I try to create a new dirnode, even an
unlinked one with "tahoe mkdir", it fails with an
UncoordinatedWriteError and the following message gets logged.

  WEIRD they had shares [0] that we didn't know about

I don't really understand what those surprise shares stand for ?

You'll find an excerpt of the incident logfile attached to this mail.

Thanks a lot for you help !

François

# 11:04:54.726 [495]: Publish(dt2ya): starting

    * 11:04:54.727 [496]: starting publish, datalen is 0
    * 11:04:54.727 [497]: new seqnum will be 1
    * 11:04:54.727 [498]: _encrypt_and_encode
    * 11:04:54.728 [499]: _generate_shares
    * 11:04:54.729 [500]: my new root_hash is 
7ahxby45gnrnamse7qgfh7ndw626ekkk4367v765lxpwdcdykesq
    * 11:04:54.734 [501]: entering loop
    * 11:04:54.734 [502]: current goal: before update:
    * 11:04:54.734 [503]: we are planning to push new seqnum=#1
    * 11:04:54.734 [504]: current goal: after update: , sh0 to [stczs25o], sh1 
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to 
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to 
[dlbsgdt4]
    * 11:04:54.734 [505]: we are planning to push new seqnum=#1
    * 11:04:54.734 [506]: need to send 10 new shares
    * 11:04:54.734 [507]: _send_shares
    * 11:04:54.735 [508]: sending 10 shares
    * 11:04:54.756 [509]: 10 shares sent
    * 11:04:54.789 [510]: _got_write_answer from fv4p67ck
    * 11:04:54.789 [511]: entering loop
    * 11:04:54.790 [512]: current goal: before update: , sh0 to [stczs25o], sh1 
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to 
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to 
[dlbsgdt4]
    * 11:04:54.790 [513]: we are planning to push new seqnum=#1
    * 11:04:54.790 [514]: 9 queries still outstanding
    * 11:04:54.813 [515]: _got_write_answer from 7vlsqics
    * 11:04:54.813 [516]: entering loop
    * 11:04:54.814 [517]: current goal: before update: , sh0 to [stczs25o], sh1 
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to 
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to 
[dlbsgdt4]
    * 11:04:54.814 [518]: we are planning to push new seqnum=#1
    * 11:04:54.814 [519]: 8 queries still outstanding
    * 11:04:54.815 [520]: _got_write_answer from vcqcuxod
    * 11:04:54.815 [521]: entering loop
    * 11:04:54.815 [522]: current goal: before update: , sh0 to [stczs25o], sh1 
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to 
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to 
[dlbsgdt4]
    * 11:04:54.815 [523]: we are planning to push new seqnum=#1
    * 11:04:54.816 [524]: 7 queries still outstanding
    * 11:04:54.819 [525]: _got_write_answer from dlbsgdt4
    * 11:04:54.819 [526]: entering loop
    * 11:04:54.819 [527]: current goal: before update: , sh0 to [stczs25o], sh1 
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to 
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to 
[dlbsgdt4]
    * 11:04:54.819 [528]: we are planning to push new seqnum=#1
    * 11:04:54.820 [529]: 6 queries still outstanding
    * 11:04:54.834 [530]: _got_write_answer from om7xqw7k
    * 11:04:54.834 [531]: entering loop
    * 11:04:54.835 [532]: current goal: before update: , sh0 to [stczs25o], sh1 
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to 
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to 
[dlbsgdt4]
    * 11:04:54.835 [533]: we are planning to push new seqnum=#1
    * 11:04:54.835 [534]: 5 queries still outstanding
    * 11:04:54.839 [538]: UNUSUAL error while writing shares [6] to peerid 
3dszcosf FAILURE:

      [CopiedFailure instance: Traceback from remote host -- Traceback (most 
recent call last):
        File "build/bdist.linux-i686/egg/foolscap/eventual.py", line 26, in 
_turn
          
        File "build/bdist.linux-i686/egg/foolscap/broker.py", line 536, in 
doNextCall
          
        File 
"/home/tahoe/lib/python2.4/site-packages/Twisted-8.1.0-py2.4-linux-i686.egg/twisted/internet/defer.py",
 line 195, in addCallback
          callbackKeywords=kw)
        File 
"/home/tahoe/lib/python2.4/site-packages/Twisted-8.1.0-py2.4-linux-i686.egg/twisted/internet/defer.py",
 line 186, in addCallbacks
          self._runCallbacks()
      --- <exception caught here> ---
        File 
"/home/tahoe/lib/python2.4/site-packages/Twisted-8.1.0-py2.4-linux-i686.egg/twisted/internet/defer.py",
 line 328, in _runCallbacks
          self.result = callback(self.result, *args, **kw)
        File "build/bdist.linux-i686/egg/foolscap/broker.py", line 536, in 
<lambda>
          
        File "build/bdist.linux-i686/egg/foolscap/broker.py", line 571, in 
_doCall
          
        File "build/bdist.linux-i686/egg/foolscap/referenceable.py", line 59, 
in doRemoteCall
          
        File 
"/home/tahoe/lib/python2.4/site-packages/allmydata_tahoe-1.2.0-py2.4.egg/allmydata/storage.py",
 line 1139, in remote_slot_testv_and_readv_and_writev
          owner_num=0)
        File 
"/home/tahoe/lib/python2.4/site-packages/allmydata_tahoe-1.2.0-py2.4.egg/allmydata/storage.py",
 line 1159, in _allocate_slot_share
          fileutil.make_dirs(bucketdir)
        File 
"/home/tahoe/lib/python2.4/site-packages/allmydata_tahoe-1.2.0-py2.4.egg/allmydata/util/fileutil.py",
 line 115, in make_dirs
          raise tx
      exceptions.OSError: [Errno 122] Disk quota exceeded: 
'/home/tahoe/.tahoe/storage/shares/dt/dt2yapm26hivzo5rnb2wdmetsy'
      ]

    * 11:04:54.839 [539]: entering loop
    * 11:04:54.840 [540]: current goal: before update: , sh0 to [stczs25o], sh1 
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to 
[qjmzhhda], sh6 to [3dszcosf], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to 
[dlbsgdt4]
    * 11:04:54.840 [541]: we are planning to push new seqnum=#1
    * 11:04:54.841 [542]: current goal: after update: , sh0 to [stczs25o], sh1 
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to 
[qjmzhhda], sh6 to [stczs25o], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to 
[dlbsgdt4]
    * 11:04:54.841 [543]: we are planning to push new seqnum=#1
    * 11:04:54.841 [544]: need to send 1 new shares
    * 11:04:54.841 [545]: _send_shares
    * 11:04:54.841 [546]: sending 1 shares
    * 11:04:54.843 [547]: 1 shares sent
    * 11:04:54.879 [548]: _got_write_answer from cfy2dvty
    * 11:04:54.879 [549]: entering loop
    * 11:04:54.879 [550]: current goal: before update: , sh0 to [stczs25o], sh1 
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to 
[qjmzhhda], sh6 to [stczs25o], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to 
[dlbsgdt4]
    * 11:04:54.879 [551]: we are planning to push new seqnum=#1
    * 11:04:54.879 [552]: 4 queries still outstanding
    * 11:04:54.887 [553]: _got_write_answer from qjmzhhda
    * 11:04:54.887 [554]: entering loop
    * 11:04:54.888 [555]: current goal: before update: , sh0 to [stczs25o], sh1 
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to 
[qjmzhhda], sh6 to [stczs25o], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to 
[dlbsgdt4]
    * 11:04:54.888 [556]: we are planning to push new seqnum=#1
    * 11:04:54.888 [557]: 3 queries still outstanding
    * 11:04:54.891 [558]: _got_write_answer from j7bojlfb
    * 11:04:54.891 [559]: entering loop
    * 11:04:54.892 [560]: current goal: before update: , sh0 to [stczs25o], sh1 
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to 
[qjmzhhda], sh6 to [stczs25o], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to 
[dlbsgdt4]
    * 11:04:54.892 [561]: we are planning to push new seqnum=#1
    * 11:04:54.892 [562]: 2 queries still outstanding
    * 11:04:54.964 [563]: _got_write_answer from stczs25o
    * 11:04:54.964 [564]: entering loop
    * 11:04:54.964 [565]: current goal: before update: , sh0 to [stczs25o], sh1 
to [vcqcuxod], sh2 to [om7xqw7k], sh3 to [j7bojlfb], sh4 to [fv4p67ck], sh5 to 
[qjmzhhda], sh6 to [stczs25o], sh7 to [7vlsqics], sh8 to [cfy2dvty], sh9 to 
[dlbsgdt4]
    * 11:04:54.964 [566]: we are planning to push new seqnum=#1
    * 11:04:54.964 [567]: 1 queries still outstanding
    * 11:04:55.040 [568]: _got_write_answer from stczs25o
          o 11:04:55.040 [569]: WEIRD they had shares [0] that we didn't know 
about [INCIDENT-TRIGGER]
    * 11:04:55.254 [570]: wrote: True
    * 11:04:55.254 [571]: read_data: {0: 
['\x00\x00\x00\x00\x00\x00\x00\x00\x01\xf8\x0fp\xe3\x9d3b\xd02D\xfc\x0cS\xfd\xa3\xb7\xb5\xe2)J\xe6\xfd\xfa\xff\xdd]\xdfa\x88xQ%\x9c\xc6Y\xb4\x97\x10\xe7}\xf9\n\x12\x12\x90r\xacz\x03\n\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']}
    * 11:04:55.254 [572]: set(read_data.keys()): set([0])
    * 11:04:55.254 [573]: set(shnums): set([6])
    * 11:04:55.254 [574]: entering loop
    * 11:04:55.254 [575]: currently surprised, so don't send any new shares
    * 11:04:55.254 [576]: no queries outstanding, no placements needed: done
    * 11:04:55.254 [577]: UNUSUAL Publish done, UncoordinatedWriteError

# 11:04:54.839 [535]: UNUSUAL an outbound callRemote (that we [z4e2] sent to 
someone else [3dsz]) failed on the far end

    * 11:04:54.839 [536]: reqID=3, rref=<RemoteReference at 0x293b950 
[pb://[EMAIL PROTECTED]:34686,127.0.0.1:34686/ag6rXXXXX]>, 
methname=RIStorageServer.tahoe.allmydata.com.slot_testv_and_readv_and_writev
    * 11:04:54.839 [537]: the REMOTE failure was: FAILURE:

      [CopiedFailure instance: Traceback from remote host -- Traceback (most 
recent call last):
        File "build/bdist.linux-i686/egg/foolscap/eventual.py", line 26, in 
_turn
          
        File "build/bdist.linux-i686/egg/foolscap/broker.py", line 536, in 
doNextCall
          
        File 
"/home/tahoe/lib/python2.4/site-packages/Twisted-8.1.0-py2.4-linux-i686.egg/twisted/internet/defer.py",
 line 195, in addCallback
          callbackKeywords=kw)
        File 
"/home/tahoe/lib/python2.4/site-packages/Twisted-8.1.0-py2.4-linux-i686.egg/twisted/internet/defer.py",
 line 186, in addCallbacks
          self._runCallbacks()
      --- <exception caught here> ---
        File 
"/home/tahoe/lib/python2.4/site-packages/Twisted-8.1.0-py2.4-linux-i686.egg/twisted/internet/defer.py",
 line 328, in _runCallbacks
          self.result = callback(self.result, *args, **kw)
        File "build/bdist.linux-i686/egg/foolscap/broker.py", line 536, in 
<lambda>
          
        File "build/bdist.linux-i686/egg/foolscap/broker.py", line 571, in 
_doCall
          
        File "build/bdist.linux-i686/egg/foolscap/referenceable.py", line 59, 
in doRemoteCall
          
        File 
"/home/tahoe/lib/python2.4/site-packages/allmydata_tahoe-1.2.0-py2.4.egg/allmydata/storage.py",
 line 1139, in remote_slot_testv_and_readv_and_writev
          owner_num=0)
        File 
"/home/tahoe/lib/python2.4/site-packages/allmydata_tahoe-1.2.0-py2.4.egg/allmydata/storage.py",
 line 1159, in _allocate_slot_share
          fileutil.make_dirs(bucketdir)
        File 
"/home/tahoe/lib/python2.4/site-packages/allmydata_tahoe-1.2.0-py2.4.egg/allmydata/util/fileutil.py",
 line 115, in make_dirs
          raise tx
      exceptions.OSError: [Errno 122] Disk quota exceeded: 
'/home/tahoe/.tahoe/storage/shares/dt/dt2yapm26hivzo5rnb2wdmetsy'
      ]

# 11:04:55.258 [578]: Unhandled Error Traceback (most recent call last): 
Failure: allmydata.mutable.common.UncoordinatedWriteError: FAILURE:

[CopiedFailure instance: Traceback from remote host -- Traceback (most recent 
call last):
Failure: allmydata.mutable.common.UncoordinatedWriteError: 
]

_______________________________________________
tahoe-dev mailing list
[email protected]
http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev

[tahoe-dev] Surprise shares

Reply via email to