On 2020-12-28 at 13:41 +0000, Nikolaus Rath wrote: > On Dec 28 2020, Ivan Shapovalov <[email protected]> wrote: > > 2020-12-27 19:04:33.819 211867 DEBUG Thread-1 > > s3ql.backends.b2.b2_backend._do_request: RESPONSE: POST 400 97 > > 2020-12-27 19:04:33.820 211867 DEBUG MainThread > > s3ql.block_cache.with_event_loop: upload of 8652 failed > > NoneType: None > > 2020-12-27 19:04:33.827 211867 DEBUG Thread-1 s3ql.mount.exchook: > > recording exception 400 > > : bad_request - Checksum did not match data received > > zsh: terminated mount.s3ql b2://<mybucket> /mnt/b2/files -o > > -- 8< -- > > > > Leaving out the question of why journald eats the last line, the > > situation is pretty clear. The backend (B2Backend._do_request) > > raises > > an exception (B2Error) which is not considered a "temporary > > failure". > > > > I have just patched up error handling in the B2 backend to consider > > the > > checksum mismatch a transient failure (testing now). > > Is B2 not using SSL for its data connection? That should make sure > that > there are no checksum errors....
Indeed it does. I have added some proper exception logging and found
the actual problem, which is — unsurprisingly — combination of user
error, unclear system requirements and broken logging.
The B2 backend creates a temporary file for each object that is being
uploaded. My s3ql instance has object size = 1 GiB, and with threads=8,
that means at most 8 GiB worth of temporary files at once. Thing is,
temporary files are created in /tmp, which is a tmpfs and has a size
limit.
-- 8< --
2020-12-28 16:39:12.924 340652 ERROR Thread-3 s3ql.mount.exchook: Unhandled
exception in thread, terminating
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/s3ql/backends/common.py", line 279, in
perform_write
return fn(fh)
File "/usr/lib/python3.9/site-packages/s3ql/block_cache.py", line 334, in
do_write
fh.write(buf)
File "/usr/lib/python3.9/site-packages/s3ql/backends/b2/object_w.py", line
36, in write
self.fh.write(buf)
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/s3ql/mount.py", line 58, in
run_with_except_hook
run_old(*args, **kw)
File "/usr/lib/python3.9/threading.py", line 892, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.9/site-packages/s3ql/block_cache.py", line 319, in
_upload_loop
self._do_upload(*tmp)
File "/usr/lib/python3.9/site-packages/s3ql/block_cache.py", line 376, in
_do_upload
obj_size = backend.perform_write(do_write, 's3ql_data_%d'
File "/usr/lib/python3.9/site-packages/s3ql/backends/common.py", line 108, in
wrapped
return method(*a, **kw)
File "/usr/lib/python3.9/site-packages/s3ql/backends/common.py", line 279, in
perform_write
return fn(fh)
File "/usr/lib/python3.9/site-packages/s3ql/backends/b2/object_w.py", line
79, in __exit__
self.close()
File "/usr/lib/python3.9/site-packages/s3ql/backends/common.py", line 108, in
wrapped
return method(*a, **kw)
File "/usr/lib/python3.9/site-packages/s3ql/backends/b2/object_w.py", line
64, in close
response = self.backend._do_upload_request(self.headers, self.fh)
File "/usr/lib/python3.9/site-packages/s3ql/backends/b2/b2_backend.py", line
291, in _do_upload_request
response, response_body = self._do_request(upload_url_info['connection'],
'POST', upload_url_info['path'], headers, body)
File "/usr/lib/python3.9/site-packages/s3ql/backends/b2/b2_backend.py", line
235, in _do_request
response = connection.read_response()
File "/usr/lib/python3.9/site-packages/dugong/__init__.py", line 790, in
read_response
return eval_coroutine(self.co_read_response(), self.timeout)
File "/usr/lib/python3.9/site-packages/dugong/__init__.py", line 1531, in
eval_coroutine
if not next(crt).poll(timeout=timeout):
File "/usr/lib/python3.9/site-packages/dugong/__init__.py", line 803, in
co_read_response
raise StateError('No pending requests')
dugong.StateError: No pending requests
zsh: terminated mount.s3ql b2://<mybucket> /mnt/b2/files -o
-- 8< --
I guess this was causing the 400 as well, because we were sending the
temporary file including the last partial write, but the locally
computed hash did not include it.
There is still a problem of getting the dugong.StateError instead of a
400. I don't know why it started giving me this, even after rolling
back all the tentative patches. Though I'm sure it will be something
trivial.
--
Ivan Shapovalov / intelfx /
--
You received this message because you are subscribed to the Google Groups
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/s3ql/c805df5f25ded87b785116989f0a2906f9f30457.camel%40intelfx.name.
signature.asc
Description: This is a digitally signed message part
