Nikolaus,

A while ago we had discussed an llfuse exception in a thread called 
"strange llfuse exception" that was not being correctly formed due to a 
Cython bug. With the workaround in llfuse 0.40 the issue with the exception 
was resolved, but the same triggering event of Google Storage returning 403 
intermittently still triggers an exception that results in bad behavior. 
Specifically, after an AccessDenied exception (or two as in the logs 
below), mount.s3ql hangs and with it all file system operations also hang 
(e.g., accessing the file system extended attributes or running the 'ls' 
and 'df' commands).

First, some logs. Other threads make requests in the same timeframe without 
problems, but threads 11 and 12 are the ones that will encounter the 
problem. Their requests in the logs look like this:



2014-02-27 20:50:49.508 [pid=23834, thread='CommitThread', 
module='s3ql.block_cache', fn='upload', line=419]: upload(<Dirty 
CacheEntry, inode=36994, blockno=0>): start
2014-02-27 20:50:49.508 [pid=23834, thread='CommitThread', 
module='s3ql.block_cache', fn='upload', line=463]: upload(<Dirty 
CacheEntry, inode=36994, blockno=0>): created new object 3834
2014-02-27 20:50:49.508 [pid=23834, thread='CommitThread', 
module='s3ql.block_cache', fn='upload', line=466]: upload(<Dirty 
CacheEntry, inode=36994, blockno=0>): created new block 3550
2014-02-27 20:50:49.508 [pid=23834, thread='CommitThread', 
module='s3ql.block_cache', fn='upload', line=467]: upload(<Dirty 
CacheEntry, inode=36994, blockno=0>): adding to upload queue
2014-02-27 20:50:49.511 [pid=23834, thread='Thread-12', 
module='s3ql.backends.s3c', fn='open_write', line=243]: 
open_write(s3ql_data_3834): start
2014-02-27 20:50:49.512 [pid=23834, thread='CommitThread', 
module='s3ql.block_cache', fn='_deref_block', line=527]: removing block 3549
2014-02-27 20:50:49.512 [pid=23834, thread='Thread-12', 
module='s3ql.backends.s3c', fn='close', line=709]: 
ObjectW(s3ql_data_3834).close(): start
2014-02-27 20:50:49.512 [pid=23834, thread='Thread-12', 
module='s3ql.backends.s3c', fn='_do_request', line=280]: _do_request(): 
start with parameters ('PUT', '/prefixs3ql_data_3834', None, None, 
{'Content-Length': 143, 'x-amz-meta-data-01': 'fdjJxCnUu', 'Content-Type': 
'application/octet-stream', 'x-amz-meta-format': 'pickle', 
'x-amz-meta-data-00': 
'gAJ9cQAoWAQAAABkYXRhcQFjX2NvZGVjcwplbmNvZGUKcQJYTAAAAEVrb2ZwazV6TTNGc1gyUmhkR0ZmTXpnek5GZGRubmJaOEw4VWtRL25Na1JvVGZZK3o3T1JjT3k2UVZHeFptWGRzNUowTW51TGVBPT1xA1gGAAAAbGF0aW4xcQSGcQVScQZYCwAAAGNvbXByZXNzaW9ucQdYBAAAAExaTUFxCFgKAAAAZW5jcnlwdGlvbnEJWAYAAABBRVN'},
 
<_io.FileIO name=62 mode='rb+'>)
2014-02-27 20:50:49.512 [pid=23834, thread='Thread-12', 
module='s3ql.backends.s3c', fn='_send_request', line=483]: _send_request(): 
processing request for /bucketname/prefixs3ql_data_3834

[...]

2014-02-27 20:50:49.558 [pid=23834, thread='Thread-11', 
module='s3ql.backends.s3c', fn='close', line=709]: 
ObjectW(s3ql_data_3836).close(): start
2014-02-27 20:50:49.558 [pid=23834, thread='Thread-11', 
module='s3ql.backends.s3c', fn='_do_request', line=280]: _do_request(): 
start with parameters ('PUT', '/prefixs3ql_data_3836', None, None, 
{'Content-Length': 18055, 'x-amz-meta-data-01': 'fdjJxCnUu', 
'Content-Type': 'application/octet-stream', 'x-amz-meta-format': 'pickle', 
'x-amz-meta-data-00': 
'gAJ9cQAoWAQAAABkYXRhcQFjX2NvZGVjcwplbmNvZGUKcQJYTAAAAEVrb2ZwazV6TTNGc1gyUmhkR0ZmTXpnek5wemhIUXQzSFpUTmgvK0p4YlM1cTR3OGJkOThwWTlOdFJRSnl0a3c5ekhCWEtMbXh3PT1xA1gGAAAAbGF0aW4xcQSGcQVScQZYCwAAAGNvbXByZXNzaW9ucQdYBAAAAExaTUFxCFgKAAAAZW5jcnlwdGlvbnEJWAYAAABBRVN'},
 
<_io.FileIO name=66 mode='rb+'>)
2014-02-27 20:50:49.558 [pid=23834, thread='Thread-11', 
module='s3ql.backends.s3c', fn='_send_request', line=483]: _send_request(): 
processing request for /bucketname/prefixs3ql_data_3836
2014-02-27 20:50:49.559 [pid=23834, thread='Thread-11', 
module='s3ql.backends.common', fn='wrapped', line=87]: Encountered 
BadStatusLine exception (''), retrying call to ObjectW.close for the 1-th 
time...
2014-02-27 20:50:49.579 [pid=23834, thread='Thread-11', 
module='s3ql.backends.s3c', fn='close', line=709]: 
ObjectW(s3ql_data_3836).close(): start
2014-02-27 20:50:49.579 [pid=23834, thread='Thread-11', 
module='s3ql.backends.s3c', fn='_do_request', line=280]: _do_request(): 
start with parameters ('PUT', '/prefixs3ql_data_3836', None, None, 
{'Content-Length': 18055, 'Content-Type': 'application/octet-stream', 
'connection': 'keep-alive', 'x-amz-meta-format': 'pickle', 
'x-amz-meta-data-01': 'fdjJxCnUu', 'x-amz-meta-data-00': 
'gAJ9cQAoWAQAAABkYXRhcQFjX2NvZGVjcwplbmNvZGUKcQJYTAAAAEVrb2ZwazV6TTNGc1gyUmhkR0ZmTXpnek5wemhIUXQzSFpUTmgvK0p4YlM1cTR3OGJkOThwWTlOdFJRSnl0a3c5ekhCWEtMbXh3PT1xA1gGAAAAbGF0aW4xcQSGcQVScQZYCwAAAGNvbXByZXNzaW9ucQdYBAAAAExaTUFxCFgKAAAAZW5jcnlwdGlvbnEJWAYAAABBRVN'},
 
<_io.FileIO name=66 mode='rb+'>)
2014-02-27 20:50:49.580 [pid=23834, thread='Thread-11', 
module='s3ql.backends.s3c', fn='_send_request', line=483]: _send_request(): 
processing request for /bucketname/prefixs3ql_data_3836

[...]



A short while later, the response comes in as 403 for thread 12, and that 
triggers an AccessDenied exception:



2014-02-27 20:50:49.848 [pid=23834, thread='Thread-12', 
module='s3ql.backends.s3c', fn='_do_request', line=299]: status: 403, 
request-id: None
2014-02-27 20:50:49.848 [pid=23834, thread='Thread-12', 
module='s3ql.mount', fn='exchook', line=685]: recording exception 
AccessDenied: Access denied.
2014-02-27 20:50:49.849 [pid=23834, thread='Thread-12', module='root', 
fn='excepthook', line=158]: Uncaught top-level exception:
Traceback (most recent call last):
  File "/usr/lib64/python3.3/site-packages/s3ql/mount.py", line 58, in 
run_with_except_hook
    run_old(*args, **kw)
  File "/usr/lib64/python3.3/threading.py", line 858, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.3/site-packages/s3ql/block_cache.py", line 360, 
in _upload_loop
    self._do_upload(*tmp)
  File "/usr/lib64/python3.3/site-packages/s3ql/block_cache.py", line 380, 
in _do_upload
    % obj_id).get_obj_size()
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line 
67, in wrapped
    return method(*a, **kw)
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line 
430, in perform_write
    return fn(fh)
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line 
853, in __exit__
    self.close()
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line 
846, in close
    self.fh.close()
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line 
999, in close
    self.fh.close()
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line 
67, in wrapped
    return method(*a, **kw)
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/s3c.py", line 717, 
in close
    headers=self.headers, body=self.fh)
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/s3c.py", line 369, 
in _do_request
    raise get_S3Error(tree.findtext('Code'), tree.findtext('Message'))
s3ql.backends.s3c.AccessDeniedError: AccessDenied: Access denied.
2014-02-27 20:50:49.850 [pid=23834, thread='MainThread', module='s3ql.fs', 
fn='forget', line=1134]: forget([(1, 1), (2, 1), (4, 1), (5, 1), (6, 1), 
(7, 1), (8, 1), (14, 1), (15, 1), (16, 1), (17, 1), (20, 1), (21, 1), (22, 
1), (23, 1), (24, 1), (25, 1), (27, 1), (30, 1), (31, 1), (42, 1), (48, 1), 
(49, 1), (50, 1), (51, 1), (52, 1), (53, 1), (54, 1), (55, 1), (56, 1), 
(57, 1), (36936, 1), (24905, 1), (36938, 1), (35939, 1), (36958, 1), 
(36959, 1), (36961, 1), (36963, 1), (36964, 1), (36965, 1), (36966, 1), 
(36967, 1), (36969, 1), (36971, 1), (36972, 1), (36974, 1), (36976, 1), 
(36977, 1), (36978, 1), (36979, 1), (36980, 1), (36981, 1), (36982, 1), 
(36983, 1), (36984, 1), (36994, 1), (36995, 1), (36996, 1), (34728, 1), 
(34729, 1), (34730, 1), (34731, 1), (34732, 1), (34733, 1), (180, 1)]): 
start
2014-02-27 20:50:49.859 [pid=23834, thread='MainThread', 
module='s3ql.mount', fn='stop', line=757]: CommitThread: stop() called
2014-02-27 20:50:49.859 [pid=23834, thread='CommitThread', 
module='s3ql.mount', fn='run', line=752]: CommitThread: end
2014-02-27 20:50:49.859 [pid=23834, thread='MainThread', 
module='s3ql.mount', fn='stop', line=663]: MetadataUploadThread: stop() 
called
2014-02-27 20:50:49.859 [pid=23834, thread='Metadata-Upload-Thread', 
module='s3ql.mount', fn='run', line=658]: MetadataUploadThread: end
2014-02-27 20:50:49.859 [pid=23834, thread='MainThread', 
module='s3ql.block_cache', fn='destroy', line=315]: destroy(): clearing 
cache...
2014-02-27 20:50:49.860 [pid=23834, thread='MainThread', 
module='s3ql.block_cache', fn='clear', line=812]: clear: start
2014-02-27 20:50:49.860 [pid=23834, thread='MainThread', 
module='s3ql.block_cache', fn='expire', line=687]: expire: start
2014-02-27 20:50:49.860 [pid=23834, thread='MainThread', 
module='s3ql.block_cache', fn='expire', line=714]: removing inode 20, block 
0 from cache

[...]



As S3ql is clearing its cache (at least that's what I assume is happening 
from the logs), the response for thread 11 comes in, also as 403:



2014-02-27 20:50:50.361 [pid=23834, thread='Thread-11', 
module='s3ql.backends.s3c', fn='_do_request', line=299]: status: 403, 
request-id: None
2014-02-27 20:50:50.361 [pid=23834, thread='Thread-11', 
module='s3ql.mount', fn='exchook', line=682]: Unhandled top-level exception 
during shutdown (will not be re-raised)
2014-02-27 20:50:50.361 [pid=23834, thread='Thread-11', module='root', 
fn='excepthook', line=158]: Uncaught top-level exception:
Traceback (most recent call last):
  File "/usr/lib64/python3.3/site-packages/s3ql/mount.py", line 58, in 
run_with_except_hook
    run_old(*args, **kw)
  File "/usr/lib64/python3.3/threading.py", line 858, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.3/site-packages/s3ql/block_cache.py", line 360, 
in _upload_loop
    self._do_upload(*tmp)
  File "/usr/lib64/python3.3/site-packages/s3ql/block_cache.py", line 380, 
in _do_upload
    % obj_id).get_obj_size()
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line 
67, in wrapped
    return method(*a, **kw)
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line 
430, in perform_write
    return fn(fh)
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line 
853, in __exit__
    self.close()
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line 
846, in close
    self.fh.close()
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line 
999, in close
    self.fh.close()
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line 
67, in wrapped
    return method(*a, **kw)
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/s3c.py", line 717, 
in close
    headers=self.headers, body=self.fh)
  File "/usr/lib64/python3.3/site-packages/s3ql/backends/s3c.py", line 369, 
in _do_request
    raise get_S3Error(tree.findtext('Code'), tree.findtext('Message'))
s3ql.backends.s3c.AccessDeniedError: AccessDenied: Access denied.
2014-02-27 20:50:50.362 [pid=23834, thread='MainThread', 
module='s3ql.block_cache', fn='expire', line=709]: expire: uploading <Dirty 
CacheEntry, inode=36936, blockno=0>..
2014-02-27 20:50:50.362 [pid=23834, thread='MainThread', 
module='s3ql.block_cache', fn='upload', line=419]: upload(<Dirty 
CacheEntry, inode=36936, blockno=0>): start
2014-02-27 20:50:50.364 [pid=23834, thread='MainThread', 
module='s3ql.block_cache', fn='upload', line=493]: upload(<CacheEntry, 
inode=36936, blockno=0>): unchanged, block_id=3552
2014-02-27 20:50:50.364 [pid=23834, thread='MainThread', 
module='s3ql.block_cache', fn='expire', line=729]: expire: waiting for 
transfer threads..
2014-02-27 20:50:50.364 [pid=23834, thread='MainThread', 
module='s3ql.block_cache', fn='expire', line=714]: removing inode 36936, 
block 0 from cache
2014-02-27 20:50:50.364 [pid=23834, thread='MainThread', 
module='s3ql.block_cache', fn='expire', line=732]: expire: end
2014-02-27 20:50:50.364 [pid=23834, thread='MainThread', 
module='s3ql.block_cache', fn='clear', line=818]: clear: end



This triggers another AccessDenied exception, but that's where the log 
ends. At this point S3ql is no longer responsive, 'ls' and 'df' hang, and 
any other file system operation also seems to hang. This also happened 
consistently with all S3ql processes I had running on other boxes that 
happened to be uploading data in the time window when Google intermittently 
returned 403.

In the FUSE kernel documentation (
https://www.kernel.org/doc/Documentation/filesystems/fuse.txt) I read an 
interesting tidbit about the FUSE control filesystem:


Control filesystem
~~~~~~~~~~~~~~~~~~

There's a control filesystem for FUSE, which can be mounted by:

  mount -t fusectl none /sys/fs/fuse/connections

Mounting it under the '/sys/fs/fuse/connections' directory makes it
backwards compatible with earlier versions.

Under the fuse control filesystem each connection has a directory
named by a unique number.

For each connection the following files exist within this directory:

 'waiting'

  The number of requests which are waiting to be transferred to
  userspace or being processed by the filesystem daemon.  If there is
  no filesystem activity and 'waiting' is non-zero, then the
  filesystem is hung or deadlocked.

 'abort'

  Writing anything into this file will abort the filesystem
  connection.  This means that all waiting requests will be aborted an
  error returned for all aborted and new requests.

Only the owner of the mount may read or write these files.



So after mounting that, I found one connection under 
/sys/fs/fuse/connections, and the value in its 'waiting' file was 2, with 
no file system activity present. According to the FUSE docs this appears to 
mean that the file system is in fact hung.

Writing into the 'abort' file for that connection as per the docs does 
abort the pending requests, and commands like 'df' now work normally again.

Question is, is this due to the timing of those (repeated) exceptions and 
how S3ql handles them, or a FUSE bug? I'd be ok with the file system 
operation in progress being aborted and returning an error in place of the 
hang.

Thanks for any help.

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to