On 2020-12-27 at 11:44 +0000, Nikolaus Rath wrote: > <snip> Thank you for the reply, Nikolaus.
> What does your kernel log say at this time (dmesg)? > > Could it be that you're running out of memory, and the OOM killer is > killing mount.s3ql to free up memory? Kernel log is silent. It's definitely not an OOM (an OOM would have SIGKILLed s3ql anyway). > The TERM signal does not make sense to me, to this a non-fatal signal > that should result in S3QL gracefully exiting. > > > Could you try what happens when you manually send SIGTERM to a > running > mount.s3ql process? Does it terminate properly with full logging > until > the end? Nope. It dies immediately. Which is sort of expected, because I actually see no SIGTERM handler in s3ql. And, on that matter, I see where it comes from. See below. > So, in summary: > > - Run standalone under gdb (and not as a systemd service) > - Check kernel logs > - Check memory usage > - Try to send SIGTERM to a non-problematic mount OK, so I did not yet try to run s3ql under gdb, but I think I (partially) know what happens. Running mount.s3ql in a plain shell session: -- 8< -- mount.s3ql b2://<mybucket> /mnt/b2/files -o fg,log=none,authfile=/etc/s3ql/authinfo2,cachedir=/var/tmp/s3ql,debug,allow-other,compress=none,cachesize=10485760,threads=8,keep-cache,backend-options=disable-versions -- 8< -- Produces this log: -- 8< -- 2020-12-27 19:04:33.819 211867 DEBUG Thread-1 s3ql.backends.b2.b2_backend._do_request: RESPONSE: POST 400 97 2020-12-27 19:04:33.820 211867 DEBUG MainThread s3ql.block_cache.with_event_loop: upload of 8652 failed NoneType: None 2020-12-27 19:04:33.827 211867 DEBUG Thread-1 s3ql.mount.exchook: recording exception 400 : bad_request - Checksum did not match data received zsh: terminated mount.s3ql b2://<mybucket> /mnt/b2/files -o -- 8< -- Leaving out the question of why journald eats the last line, the situation is pretty clear. The backend (B2Backend._do_request) raises an exception (B2Error) which is not considered a "temporary failure". It bubbles all the way through ObjectW.close(), AbstractBackend.perform_write(), BlockCache._do_upload(), BlockCache._upload_loop() and is never caught. Finally, exchook() from mount.py:setup_exchook() gets called and sends SIGTERM to the mount process (mount.py:687). Does that sound plausible? I have just patched up error handling in the B2 backend to consider the checksum mismatch a transient failure (testing now). But I take it the whole SIGTERM thing is also unexpected? -- Ivan Shapovalov / intelfx / -- You received this message because you are subscribed to the Google Groups "s3ql" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/s3ql/538bf8da278012fd83d37b127c835ee67e8a3c06.camel%40intelfx.name.
signature.asc
Description: This is a digitally signed message part
