Re: [s3ql] S3QL crashes when uploading large files with inconclusive logs (yet again)

Nikolaus Rath Sun, 27 Dec 2020 03:44:45 -0800

On Dec 27 2020, Ivan Shapovalov <[email protected]> wrote:
> Full mount.s3ql invocation is:
>
> -- 8< --
> /usr/bin/mount.s3ql b2://<mybucket> /mnt/b2/files -o 
> systemd,log=none,authfile=/etc/s3ql/authinfo2,cachedir=/var/tmp/s3ql,debug,allow-other,compress=none,cachesize=10485760,threads=8,keep-cache,backend-options=disable-versions
> -- 8< --


Please also run this under gdb:

$ gdb python3
> run /usr/bin/mount.s3ql b2://<mybucket> /mnt/b2/files -o 
> systemd,log=none,authfile=/etc/s3ql/authinfo2,cachedir=/var/tmp/s3ql,debug,allow-other,compress=none,cachesize=10485760,threads=8,keep-cache,backend-options=disable-versions


When the process terminates, run:

> thread apply all bt


>
> Full s3ql log (captured stdout/stderr) is available here: 
> https://intelfx.name/files/persistent/2020-12-27%20s3ql/s3ql.2.log.zst
> https://intelfx.name/files/persistent/2020-12-27%20s3ql/s3ql.3.log.zst
> (caution -- each log is ~150 MB unpacked, which is a separate concern
> of mine but irrelevant to this problem.)
>
> Manual excerpt from s3ql log relevant to the block that failed:
> -- 8< --
> mount.s3ql[117923]: 2020-12-27 13:02:03.135 117923 DEBUG    MainThread
> s3ql.block_cache.upload_if_dirty: started with <Dirty CacheEntry,
> inode=11545, blockno=109>

Debug logging won't help with this problem, you can safely disable that.

> Strangely, systemd tells me that mount.s3ql does not just exit
> normally, but is terminated by SIGTERM:
>
> -- 8< --
> $ systemctl status [email protected]              
> ● [email protected] - s3ql file system at /mnt/b2/files
>      Loaded: loaded (/etc/systemd/system/[email protected]; enabled; vendor 
> preset: disabled)
>      Active: inactive (dead) since Sun 2020-12-27 13:02:05 MSK; 1h 27min ago
>     Process: 117923 ExecStart=/usr/bin/mount.s3ql ${What} /mnt/b2/files -o 
> systemd,log=none,authfile=/etc/s3ql/authinfo2,cachedir=/var/tmp/s3ql,${Options}
>  (code=killed, signal=TERM)
>     Process: 135718 ExecStop=/usr/bin/umount /mnt/b2/files (code=exited, 
> status=32)
>    Main PID: 117923 (code=killed, signal=TERM)
>          IP: 40.5M in, 8.5G out
>         CPU: 6min 11.868s
> -- 8< --

What does your kernel log say at this time (dmesg)?

Could it be that you're running out of memory, and the OOM killer is
killing mount.s3ql to free up memory?


The TERM signal does not make sense to me, to this a non-fatal signal
that should result in S3QL gracefully exiting.


Could you try what happens when you manually send SIGTERM to a running
mount.s3ql process? Does it terminate properly with full logging until
the end?

> mount.s3ql[117923]: NoneType: None
>
> I'd gladly go debug it myself, but I don't know where to start with
> this async stuff. Any pointers? How do I read this log? What is this
> strange "NoneType: None"?

This doesn't make any sense to me either.

Normally I would suggest to add e.g. line numbers and file names to
https://github.com/s3ql/s3ql/blob/master/src/s3ql/logging.py#L86 to see
where this message is generated, but looking at it it seems that it
should already inculde the function name.. So whatever logs this is
bypassing the regular logging code path.


So, in summary:

- Run standalone under gdb (and not as a systemd service)
- Check kernel logs
- Check memory usage
- Try to send SIGTERM to a non-problematic mount



Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/87blefwk0s.fsf%40vostro.rath.org.

Re: [s3ql] S3QL crashes when uploading large files with inconclusive logs (yet again)

Reply via email to