Nikolaus,
A while ago we had discussed an llfuse exception in a thread called
"strange llfuse exception" that was not being correctly formed due to a
Cython bug. With the workaround in llfuse 0.40 the issue with the exception
was resolved, but the same triggering event of Google Storage returning 403
intermittently still triggers an exception that results in bad behavior.
Specifically, after an AccessDenied exception (or two as in the logs
below), mount.s3ql hangs and with it all file system operations also hang
(e.g., accessing the file system extended attributes or running the 'ls'
and 'df' commands).
First, some logs. Other threads make requests in the same timeframe without
problems, but threads 11 and 12 are the ones that will encounter the
problem. Their requests in the logs look like this:
2014-02-27 20:50:49.508 [pid=23834, thread='CommitThread',
module='s3ql.block_cache', fn='upload', line=419]: upload(<Dirty
CacheEntry, inode=36994, blockno=0>): start
2014-02-27 20:50:49.508 [pid=23834, thread='CommitThread',
module='s3ql.block_cache', fn='upload', line=463]: upload(<Dirty
CacheEntry, inode=36994, blockno=0>): created new object 3834
2014-02-27 20:50:49.508 [pid=23834, thread='CommitThread',
module='s3ql.block_cache', fn='upload', line=466]: upload(<Dirty
CacheEntry, inode=36994, blockno=0>): created new block 3550
2014-02-27 20:50:49.508 [pid=23834, thread='CommitThread',
module='s3ql.block_cache', fn='upload', line=467]: upload(<Dirty
CacheEntry, inode=36994, blockno=0>): adding to upload queue
2014-02-27 20:50:49.511 [pid=23834, thread='Thread-12',
module='s3ql.backends.s3c', fn='open_write', line=243]:
open_write(s3ql_data_3834): start
2014-02-27 20:50:49.512 [pid=23834, thread='CommitThread',
module='s3ql.block_cache', fn='_deref_block', line=527]: removing block 3549
2014-02-27 20:50:49.512 [pid=23834, thread='Thread-12',
module='s3ql.backends.s3c', fn='close', line=709]:
ObjectW(s3ql_data_3834).close(): start
2014-02-27 20:50:49.512 [pid=23834, thread='Thread-12',
module='s3ql.backends.s3c', fn='_do_request', line=280]: _do_request():
start with parameters ('PUT', '/prefixs3ql_data_3834', None, None,
{'Content-Length': 143, 'x-amz-meta-data-01': 'fdjJxCnUu', 'Content-Type':
'application/octet-stream', 'x-amz-meta-format': 'pickle',
'x-amz-meta-data-00':
'gAJ9cQAoWAQAAABkYXRhcQFjX2NvZGVjcwplbmNvZGUKcQJYTAAAAEVrb2ZwazV6TTNGc1gyUmhkR0ZmTXpnek5GZGRubmJaOEw4VWtRL25Na1JvVGZZK3o3T1JjT3k2UVZHeFptWGRzNUowTW51TGVBPT1xA1gGAAAAbGF0aW4xcQSGcQVScQZYCwAAAGNvbXByZXNzaW9ucQdYBAAAAExaTUFxCFgKAAAAZW5jcnlwdGlvbnEJWAYAAABBRVN'},
<_io.FileIO name=62 mode='rb+'>)
2014-02-27 20:50:49.512 [pid=23834, thread='Thread-12',
module='s3ql.backends.s3c', fn='_send_request', line=483]: _send_request():
processing request for /bucketname/prefixs3ql_data_3834
[...]
2014-02-27 20:50:49.558 [pid=23834, thread='Thread-11',
module='s3ql.backends.s3c', fn='close', line=709]:
ObjectW(s3ql_data_3836).close(): start
2014-02-27 20:50:49.558 [pid=23834, thread='Thread-11',
module='s3ql.backends.s3c', fn='_do_request', line=280]: _do_request():
start with parameters ('PUT', '/prefixs3ql_data_3836', None, None,
{'Content-Length': 18055, 'x-amz-meta-data-01': 'fdjJxCnUu',
'Content-Type': 'application/octet-stream', 'x-amz-meta-format': 'pickle',
'x-amz-meta-data-00':
'gAJ9cQAoWAQAAABkYXRhcQFjX2NvZGVjcwplbmNvZGUKcQJYTAAAAEVrb2ZwazV6TTNGc1gyUmhkR0ZmTXpnek5wemhIUXQzSFpUTmgvK0p4YlM1cTR3OGJkOThwWTlOdFJRSnl0a3c5ekhCWEtMbXh3PT1xA1gGAAAAbGF0aW4xcQSGcQVScQZYCwAAAGNvbXByZXNzaW9ucQdYBAAAAExaTUFxCFgKAAAAZW5jcnlwdGlvbnEJWAYAAABBRVN'},
<_io.FileIO name=66 mode='rb+'>)
2014-02-27 20:50:49.558 [pid=23834, thread='Thread-11',
module='s3ql.backends.s3c', fn='_send_request', line=483]: _send_request():
processing request for /bucketname/prefixs3ql_data_3836
2014-02-27 20:50:49.559 [pid=23834, thread='Thread-11',
module='s3ql.backends.common', fn='wrapped', line=87]: Encountered
BadStatusLine exception (''), retrying call to ObjectW.close for the 1-th
time...
2014-02-27 20:50:49.579 [pid=23834, thread='Thread-11',
module='s3ql.backends.s3c', fn='close', line=709]:
ObjectW(s3ql_data_3836).close(): start
2014-02-27 20:50:49.579 [pid=23834, thread='Thread-11',
module='s3ql.backends.s3c', fn='_do_request', line=280]: _do_request():
start with parameters ('PUT', '/prefixs3ql_data_3836', None, None,
{'Content-Length': 18055, 'Content-Type': 'application/octet-stream',
'connection': 'keep-alive', 'x-amz-meta-format': 'pickle',
'x-amz-meta-data-01': 'fdjJxCnUu', 'x-amz-meta-data-00':
'gAJ9cQAoWAQAAABkYXRhcQFjX2NvZGVjcwplbmNvZGUKcQJYTAAAAEVrb2ZwazV6TTNGc1gyUmhkR0ZmTXpnek5wemhIUXQzSFpUTmgvK0p4YlM1cTR3OGJkOThwWTlOdFJRSnl0a3c5ekhCWEtMbXh3PT1xA1gGAAAAbGF0aW4xcQSGcQVScQZYCwAAAGNvbXByZXNzaW9ucQdYBAAAAExaTUFxCFgKAAAAZW5jcnlwdGlvbnEJWAYAAABBRVN'},
<_io.FileIO name=66 mode='rb+'>)
2014-02-27 20:50:49.580 [pid=23834, thread='Thread-11',
module='s3ql.backends.s3c', fn='_send_request', line=483]: _send_request():
processing request for /bucketname/prefixs3ql_data_3836
[...]
A short while later, the response comes in as 403 for thread 12, and that
triggers an AccessDenied exception:
2014-02-27 20:50:49.848 [pid=23834, thread='Thread-12',
module='s3ql.backends.s3c', fn='_do_request', line=299]: status: 403,
request-id: None
2014-02-27 20:50:49.848 [pid=23834, thread='Thread-12',
module='s3ql.mount', fn='exchook', line=685]: recording exception
AccessDenied: Access denied.
2014-02-27 20:50:49.849 [pid=23834, thread='Thread-12', module='root',
fn='excepthook', line=158]: Uncaught top-level exception:
Traceback (most recent call last):
File "/usr/lib64/python3.3/site-packages/s3ql/mount.py", line 58, in
run_with_except_hook
run_old(*args, **kw)
File "/usr/lib64/python3.3/threading.py", line 858, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python3.3/site-packages/s3ql/block_cache.py", line 360,
in _upload_loop
self._do_upload(*tmp)
File "/usr/lib64/python3.3/site-packages/s3ql/block_cache.py", line 380,
in _do_upload
% obj_id).get_obj_size()
File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line
67, in wrapped
return method(*a, **kw)
File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line
430, in perform_write
return fn(fh)
File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line
853, in __exit__
self.close()
File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line
846, in close
self.fh.close()
File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line
999, in close
self.fh.close()
File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line
67, in wrapped
return method(*a, **kw)
File "/usr/lib64/python3.3/site-packages/s3ql/backends/s3c.py", line 717,
in close
headers=self.headers, body=self.fh)
File "/usr/lib64/python3.3/site-packages/s3ql/backends/s3c.py", line 369,
in _do_request
raise get_S3Error(tree.findtext('Code'), tree.findtext('Message'))
s3ql.backends.s3c.AccessDeniedError: AccessDenied: Access denied.
2014-02-27 20:50:49.850 [pid=23834, thread='MainThread', module='s3ql.fs',
fn='forget', line=1134]: forget([(1, 1), (2, 1), (4, 1), (5, 1), (6, 1),
(7, 1), (8, 1), (14, 1), (15, 1), (16, 1), (17, 1), (20, 1), (21, 1), (22,
1), (23, 1), (24, 1), (25, 1), (27, 1), (30, 1), (31, 1), (42, 1), (48, 1),
(49, 1), (50, 1), (51, 1), (52, 1), (53, 1), (54, 1), (55, 1), (56, 1),
(57, 1), (36936, 1), (24905, 1), (36938, 1), (35939, 1), (36958, 1),
(36959, 1), (36961, 1), (36963, 1), (36964, 1), (36965, 1), (36966, 1),
(36967, 1), (36969, 1), (36971, 1), (36972, 1), (36974, 1), (36976, 1),
(36977, 1), (36978, 1), (36979, 1), (36980, 1), (36981, 1), (36982, 1),
(36983, 1), (36984, 1), (36994, 1), (36995, 1), (36996, 1), (34728, 1),
(34729, 1), (34730, 1), (34731, 1), (34732, 1), (34733, 1), (180, 1)]):
start
2014-02-27 20:50:49.859 [pid=23834, thread='MainThread',
module='s3ql.mount', fn='stop', line=757]: CommitThread: stop() called
2014-02-27 20:50:49.859 [pid=23834, thread='CommitThread',
module='s3ql.mount', fn='run', line=752]: CommitThread: end
2014-02-27 20:50:49.859 [pid=23834, thread='MainThread',
module='s3ql.mount', fn='stop', line=663]: MetadataUploadThread: stop()
called
2014-02-27 20:50:49.859 [pid=23834, thread='Metadata-Upload-Thread',
module='s3ql.mount', fn='run', line=658]: MetadataUploadThread: end
2014-02-27 20:50:49.859 [pid=23834, thread='MainThread',
module='s3ql.block_cache', fn='destroy', line=315]: destroy(): clearing
cache...
2014-02-27 20:50:49.860 [pid=23834, thread='MainThread',
module='s3ql.block_cache', fn='clear', line=812]: clear: start
2014-02-27 20:50:49.860 [pid=23834, thread='MainThread',
module='s3ql.block_cache', fn='expire', line=687]: expire: start
2014-02-27 20:50:49.860 [pid=23834, thread='MainThread',
module='s3ql.block_cache', fn='expire', line=714]: removing inode 20, block
0 from cache
[...]
As S3ql is clearing its cache (at least that's what I assume is happening
from the logs), the response for thread 11 comes in, also as 403:
2014-02-27 20:50:50.361 [pid=23834, thread='Thread-11',
module='s3ql.backends.s3c', fn='_do_request', line=299]: status: 403,
request-id: None
2014-02-27 20:50:50.361 [pid=23834, thread='Thread-11',
module='s3ql.mount', fn='exchook', line=682]: Unhandled top-level exception
during shutdown (will not be re-raised)
2014-02-27 20:50:50.361 [pid=23834, thread='Thread-11', module='root',
fn='excepthook', line=158]: Uncaught top-level exception:
Traceback (most recent call last):
File "/usr/lib64/python3.3/site-packages/s3ql/mount.py", line 58, in
run_with_except_hook
run_old(*args, **kw)
File "/usr/lib64/python3.3/threading.py", line 858, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python3.3/site-packages/s3ql/block_cache.py", line 360,
in _upload_loop
self._do_upload(*tmp)
File "/usr/lib64/python3.3/site-packages/s3ql/block_cache.py", line 380,
in _do_upload
% obj_id).get_obj_size()
File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line
67, in wrapped
return method(*a, **kw)
File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line
430, in perform_write
return fn(fh)
File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line
853, in __exit__
self.close()
File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line
846, in close
self.fh.close()
File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line
999, in close
self.fh.close()
File "/usr/lib64/python3.3/site-packages/s3ql/backends/common.py", line
67, in wrapped
return method(*a, **kw)
File "/usr/lib64/python3.3/site-packages/s3ql/backends/s3c.py", line 717,
in close
headers=self.headers, body=self.fh)
File "/usr/lib64/python3.3/site-packages/s3ql/backends/s3c.py", line 369,
in _do_request
raise get_S3Error(tree.findtext('Code'), tree.findtext('Message'))
s3ql.backends.s3c.AccessDeniedError: AccessDenied: Access denied.
2014-02-27 20:50:50.362 [pid=23834, thread='MainThread',
module='s3ql.block_cache', fn='expire', line=709]: expire: uploading <Dirty
CacheEntry, inode=36936, blockno=0>..
2014-02-27 20:50:50.362 [pid=23834, thread='MainThread',
module='s3ql.block_cache', fn='upload', line=419]: upload(<Dirty
CacheEntry, inode=36936, blockno=0>): start
2014-02-27 20:50:50.364 [pid=23834, thread='MainThread',
module='s3ql.block_cache', fn='upload', line=493]: upload(<CacheEntry,
inode=36936, blockno=0>): unchanged, block_id=3552
2014-02-27 20:50:50.364 [pid=23834, thread='MainThread',
module='s3ql.block_cache', fn='expire', line=729]: expire: waiting for
transfer threads..
2014-02-27 20:50:50.364 [pid=23834, thread='MainThread',
module='s3ql.block_cache', fn='expire', line=714]: removing inode 36936,
block 0 from cache
2014-02-27 20:50:50.364 [pid=23834, thread='MainThread',
module='s3ql.block_cache', fn='expire', line=732]: expire: end
2014-02-27 20:50:50.364 [pid=23834, thread='MainThread',
module='s3ql.block_cache', fn='clear', line=818]: clear: end
This triggers another AccessDenied exception, but that's where the log
ends. At this point S3ql is no longer responsive, 'ls' and 'df' hang, and
any other file system operation also seems to hang. This also happened
consistently with all S3ql processes I had running on other boxes that
happened to be uploading data in the time window when Google intermittently
returned 403.
In the FUSE kernel documentation (
https://www.kernel.org/doc/Documentation/filesystems/fuse.txt) I read an
interesting tidbit about the FUSE control filesystem:
Control filesystem
~~~~~~~~~~~~~~~~~~
There's a control filesystem for FUSE, which can be mounted by:
mount -t fusectl none /sys/fs/fuse/connections
Mounting it under the '/sys/fs/fuse/connections' directory makes it
backwards compatible with earlier versions.
Under the fuse control filesystem each connection has a directory
named by a unique number.
For each connection the following files exist within this directory:
'waiting'
The number of requests which are waiting to be transferred to
userspace or being processed by the filesystem daemon. If there is
no filesystem activity and 'waiting' is non-zero, then the
filesystem is hung or deadlocked.
'abort'
Writing anything into this file will abort the filesystem
connection. This means that all waiting requests will be aborted an
error returned for all aborted and new requests.
Only the owner of the mount may read or write these files.
So after mounting that, I found one connection under
/sys/fs/fuse/connections, and the value in its 'waiting' file was 2, with
no file system activity present. According to the FUSE docs this appears to
mean that the file system is in fact hung.
Writing into the 'abort' file for that connection as per the docs does
abort the pending requests, and commands like 'df' now work normally again.
Question is, is this due to the timing of those (repeated) exceptions and
how S3ql handles them, or a FUSE bug? I'd be ok with the file system
operation in progress being aborted and returning an error in place of the
hang.
Thanks for any help.
--
You received this message because you are subscribed to the Google Groups
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.