[s3ql] s3ql (b2 backend) crashes after writing a specific amount of data

Ivan Shapovalov Sat, 15 Aug 2020 22:41:08 -0700

Greetings again.

(I had no time to follow up on my idea of partial object caching; in
the meantime, I've been using s3ql with a ridiculously low object size
as a workaround. But this is about another file system.)


Today I've been rsyncing my NAS to the cloud, when s3ql suddenly exited
in the middle of a transfer:

==== 8< ====
Aug 16 08:07:15 stratofortress.nexus.i.intelfx.name systemd[1]: Starting 
s3ql-mnt-b2-files.service...
Aug 16 08:07:18 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: Using 
cached metadata.
Aug 16 08:07:18 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: 
mount.s3ql[40268:MainThread] s3ql.mount.get_metadata: Using cached metadata.
Aug 16 08:07:18 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: Loaded 0 
entries from cache
Aug 16 08:07:18 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: 
mount.s3ql[40268:MainThread] s3ql.block_cache.__init__: Loaded 0 entries from 
cache
Aug 16 08:07:18 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: Mounting 
b2://intelfx-name-stratofortress-files at /mnt/b2/files...
Aug 16 08:07:18 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: 
mount.s3ql[40268:MainThread] s3ql.mount.main_async: Mounting 
b2://intelfx-name-stratofortress-files at /mnt/b2/files...
Aug 16 08:07:19 stratofortress.nexus.i.intelfx.name systemd[1]: Started 
s3ql-mnt-b2-files.service.
Aug 16 08:07:19 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: 
/usr/lib/python3.8/site-packages/_pyfuse3.py:30: TrioDeprecationWarning: 
trio.hazmat is deprecated since Trio 0.15.0; use trio.lowlevel instead 
(https://github.com/python-trio/trio/issues/476)
Aug 16 08:07:19 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]:   await 
fn(*args, **kwargs)
Aug 16 08:07:19 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: 
mount.s3ql[40268:MainThread] py.warnings._showwarnmsg: 
/usr/lib/python3.8/site-packages/_pyfuse3.py:30: TrioDeprecationWarning: 
trio.hazmat is deprecated since Trio 0.15.0; use trio.lowlevel instead 
(https://github.com/python-trio/trio/issues/476)
                                                                         await 
fn(*args, **kwargs)
Aug 16 08:09:16 stratofortress.nexus.i.intelfx.name systemd[1]: 
s3ql-mnt-b2-files.service: Succeeded.
Aug 16 08:09:16 stratofortress.nexus.i.intelfx.name systemd[1]: 
s3ql-mnt-b2-files.service: Consumed 1min 32.062s CPU time.
==== 8< ====

On rsync side, it looked like this:

==== 8< ====
         19.70G   1%  214.51MB/s    0:01:34 (xfr#19, to-chk=410/1602)
>f..t...... Backups/Hosts/able/borg/data/39/19782
rsync: [receiver] close failed on 
"/mnt/b2/files/Backups/Hosts/able/borg/data/39/.19781.C3Bufb": Transport 
endpoint is not connected (107)
rsync: [receiver] stat 
"/mnt/b2/files/Backups/Hosts/able/borg/data/39/.19781.C3Bufb" failed: Transport 
endpoint is not connected (107)
rsync: [receiver] rename 
"/mnt/b2/files/Backups/Hosts/able/borg/data/39/.19781.C3Bufb" -> 
"Backups/Hosts/able/borg/data/39/19781": Transport endpoint is not connected 
(107)
rsync error: error in file IO (code 11) at receiver.c(868) [receiver=v3.2.3]
rsync: [sender] write error: Broken pipe (32)
==== 8< ====

s3ql command line:

==== 8< ====
$ systemctl cat s3ql-mnt-b2-files.service
# /etc/systemd/system/s3ql-mnt-b2-files.service
[Unit]
Wants=network-online.target
After=network.target network-online.target

[Service]
Type=notify
ExecStart=/usr/bin/mount.s3ql b2://intelfx-name-stratofortress-files 
/mnt/b2/files -o 
systemd,log=syslog,authfile=/etc/s3ql/authinfo2,cachedir=/var/tmp/s3ql,cachesize=104857600,max-cache-entries=524088,allow-other,compress=none,threads=16,backend-options=disable-versions,debug
ExecStop=/usr/bin/umount /mnt/b2/files
LimitNOFILE=524288
MemoryHigh=1G
==== 8< ====

Here are the last few lines of the debug log:

==== 8< ====
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: 
mount.s3ql[43764:MainThread] s3ql.fs.getxattr: started with 4113, 
b'security.capability'
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: 
mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: 
mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: 
mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: 
mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: 
mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: 
mount.s3ql[43764:MainThread] s3ql.block_cache.with_event_loop: upload of 3439 
failed
                                                                       
NoneType: None
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: 
mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: 
mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: 
mount.s3ql[43764:MainThread] s3ql.fs.getxattr: started with 4113, 
b'security.capability'
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: 
mount.s3ql[43764:Thread-10] s3ql.mount.exchook: recording exception No pending 
requests
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name umount[44705]: umount: 
/mnt/b2/files: target is busy.
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name systemd[1]: 
s3ql-mnt-b2-files.service: Control process exited, code=exited, status=32/n/a
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name systemd[1]: 
s3ql-mnt-b2-files.service: Failed with result 'exit-code'.
Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name systemd[1]: 
s3ql-mnt-b2-files.service: Consumed 1min 27.740s CPU time, read 2.3M from disk, 
written 70.6M to disk.
==== 8< ====

By some unknown reason, the first line of this excerpt
(s3ql.fs.getxattr) is being spammed all over the log -- I'm not sure
what makes several hundred getxattrs per second.

Anyway, the actual log (17M):
https://intelfx.name/files/persistent/2020-08-16%20s3ql%20crash/s3ql-mnt-b2-files.log

I can reliably reproduce this crash by running the same rsync command,
which fails after writing a specific amount of data.
This also happened with another filesystem (the borg one), although I
have not tried reproducing it there as I need to repair the borg
repository first.

Any ideas what could go wrong or how to debug further?

Thanks,
-- 
Ivan Shapovalov / intelfx /

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/s3ql/8c53b4f0b018199862a00f99f7b1bdb2152b053b.camel%40intelfx.name.

signature.asc
Description: This is a digitally signed message part

[s3ql] s3ql (b2 backend) crashes after writing a specific amount of data

Reply via email to