On 2020-12-28 at 13:41 +0000, Nikolaus Rath wrote:
> On Dec 28 2020, Ivan Shapovalov <[email protected]> wrote:
> > 2020-12-27 19:04:33.819 211867 DEBUG    Thread-1
> > s3ql.backends.b2.b2_backend._do_request: RESPONSE: POST 400  97
> > 2020-12-27 19:04:33.820 211867 DEBUG    MainThread
> > s3ql.block_cache.with_event_loop: upload of 8652 failed
> > NoneType: None
> > 2020-12-27 19:04:33.827 211867 DEBUG Thread-1 s3ql.mount.exchook:
> > recording exception 400
> > : bad_request - Checksum did not match data received
> > zsh: terminated  mount.s3ql b2://<mybucket> /mnt/b2/files -o
> > -- 8< --
> > 
> > Leaving out the question of why journald eats the last line, the
> > situation is pretty clear. The backend (B2Backend._do_request)
> > raises
> > an exception (B2Error) which is not considered a "temporary
> > failure".
> > 
> > I have just patched up error handling in the B2 backend to consider
> > the
> > checksum mismatch a transient failure (testing now).
> 
> Is B2 not using SSL for its data connection? That should make sure
> that
> there are no checksum errors....

Indeed it does. I have added some proper exception logging and found
the actual problem, which is — unsurprisingly — combination of user
error, unclear system requirements and broken logging.

The B2 backend creates a temporary file for each object that is being
uploaded. My s3ql instance has object size = 1 GiB, and with threads=8,
that means at most 8 GiB worth of temporary files at once. Thing is,
temporary files are created in /tmp, which is a tmpfs and has a size
limit.

-- 8< --
2020-12-28 16:39:12.924 340652 ERROR    Thread-3 s3ql.mount.exchook: Unhandled 
exception in thread, terminating
Traceback (most recent call last):                                             
  File "/usr/lib/python3.9/site-packages/s3ql/backends/common.py", line 279, in 
perform_write
    return fn(fh)                                                              
  File "/usr/lib/python3.9/site-packages/s3ql/block_cache.py", line 334, in 
do_write
    fh.write(buf)                                                              
  File "/usr/lib/python3.9/site-packages/s3ql/backends/b2/object_w.py", line 
36, in write
    self.fh.write(buf)                                                         
OSError: [Errno 28] No space left on device

During handling of the above exception, another exception occurred:

Traceback (most recent call last):                                             
  File "/usr/lib/python3.9/site-packages/s3ql/mount.py", line 58, in 
run_with_except_hook
    run_old(*args, **kw)                                                       
  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.9/site-packages/s3ql/block_cache.py", line 319, in 
_upload_loop
    self._do_upload(*tmp)                                                      
  File "/usr/lib/python3.9/site-packages/s3ql/block_cache.py", line 376, in 
_do_upload
    obj_size = backend.perform_write(do_write, 's3ql_data_%d'
  File "/usr/lib/python3.9/site-packages/s3ql/backends/common.py", line 108, in 
wrapped
    return method(*a, **kw)                                                    
  File "/usr/lib/python3.9/site-packages/s3ql/backends/common.py", line 279, in 
perform_write
    return fn(fh)                                                              
  File "/usr/lib/python3.9/site-packages/s3ql/backends/b2/object_w.py", line 
79, in __exit__
    self.close()                                                               
  File "/usr/lib/python3.9/site-packages/s3ql/backends/common.py", line 108, in 
wrapped
    return method(*a, **kw)                                                    
  File "/usr/lib/python3.9/site-packages/s3ql/backends/b2/object_w.py", line 
64, in close
    response = self.backend._do_upload_request(self.headers, self.fh)
  File "/usr/lib/python3.9/site-packages/s3ql/backends/b2/b2_backend.py", line 
291, in _do_upload_request
    response, response_body = self._do_request(upload_url_info['connection'], 
'POST', upload_url_info['path'], headers, body)
  File "/usr/lib/python3.9/site-packages/s3ql/backends/b2/b2_backend.py", line 
235, in _do_request
    response = connection.read_response()
  File "/usr/lib/python3.9/site-packages/dugong/__init__.py", line 790, in 
read_response
    return eval_coroutine(self.co_read_response(), self.timeout)
  File "/usr/lib/python3.9/site-packages/dugong/__init__.py", line 1531, in 
eval_coroutine
    if not next(crt).poll(timeout=timeout):
  File "/usr/lib/python3.9/site-packages/dugong/__init__.py", line 803, in 
co_read_response
    raise StateError('No pending requests')
dugong.StateError: No pending requests                                         
zsh: terminated  mount.s3ql b2://<mybucket> /mnt/b2/files -o
-- 8< --

I guess this was causing the 400 as well, because we were sending the
temporary file including the last partial write, but the locally
computed hash did not include it.

There is still a problem of getting the dugong.StateError instead of a
400. I don't know why it started giving me this, even after rolling
back all the tentative patches. Though I'm sure it will be something
trivial.

-- 
Ivan Shapovalov / intelfx /

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/c805df5f25ded87b785116989f0a2906f9f30457.camel%40intelfx.name.

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to