Greetings again. (I had no time to follow up on my idea of partial object caching; in the meantime, I've been using s3ql with a ridiculously low object size as a workaround. But this is about another file system.)
Today I've been rsyncing my NAS to the cloud, when s3ql suddenly exited in the middle of a transfer: ==== 8< ==== Aug 16 08:07:15 stratofortress.nexus.i.intelfx.name systemd[1]: Starting s3ql-mnt-b2-files.service... Aug 16 08:07:18 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: Using cached metadata. Aug 16 08:07:18 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: mount.s3ql[40268:MainThread] s3ql.mount.get_metadata: Using cached metadata. Aug 16 08:07:18 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: Loaded 0 entries from cache Aug 16 08:07:18 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: mount.s3ql[40268:MainThread] s3ql.block_cache.__init__: Loaded 0 entries from cache Aug 16 08:07:18 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: Mounting b2://intelfx-name-stratofortress-files at /mnt/b2/files... Aug 16 08:07:18 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: mount.s3ql[40268:MainThread] s3ql.mount.main_async: Mounting b2://intelfx-name-stratofortress-files at /mnt/b2/files... Aug 16 08:07:19 stratofortress.nexus.i.intelfx.name systemd[1]: Started s3ql-mnt-b2-files.service. Aug 16 08:07:19 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: /usr/lib/python3.8/site-packages/_pyfuse3.py:30: TrioDeprecationWarning: trio.hazmat is deprecated since Trio 0.15.0; use trio.lowlevel instead (https://github.com/python-trio/trio/issues/476) Aug 16 08:07:19 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: await fn(*args, **kwargs) Aug 16 08:07:19 stratofortress.nexus.i.intelfx.name mount.s3ql[40268]: mount.s3ql[40268:MainThread] py.warnings._showwarnmsg: /usr/lib/python3.8/site-packages/_pyfuse3.py:30: TrioDeprecationWarning: trio.hazmat is deprecated since Trio 0.15.0; use trio.lowlevel instead (https://github.com/python-trio/trio/issues/476) await fn(*args, **kwargs) Aug 16 08:09:16 stratofortress.nexus.i.intelfx.name systemd[1]: s3ql-mnt-b2-files.service: Succeeded. Aug 16 08:09:16 stratofortress.nexus.i.intelfx.name systemd[1]: s3ql-mnt-b2-files.service: Consumed 1min 32.062s CPU time. ==== 8< ==== On rsync side, it looked like this: ==== 8< ==== 19.70G 1% 214.51MB/s 0:01:34 (xfr#19, to-chk=410/1602) >f..t...... Backups/Hosts/able/borg/data/39/19782 rsync: [receiver] close failed on "/mnt/b2/files/Backups/Hosts/able/borg/data/39/.19781.C3Bufb": Transport endpoint is not connected (107) rsync: [receiver] stat "/mnt/b2/files/Backups/Hosts/able/borg/data/39/.19781.C3Bufb" failed: Transport endpoint is not connected (107) rsync: [receiver] rename "/mnt/b2/files/Backups/Hosts/able/borg/data/39/.19781.C3Bufb" -> "Backups/Hosts/able/borg/data/39/19781": Transport endpoint is not connected (107) rsync error: error in file IO (code 11) at receiver.c(868) [receiver=v3.2.3] rsync: [sender] write error: Broken pipe (32) ==== 8< ==== s3ql command line: ==== 8< ==== $ systemctl cat s3ql-mnt-b2-files.service # /etc/systemd/system/s3ql-mnt-b2-files.service [Unit] Wants=network-online.target After=network.target network-online.target [Service] Type=notify ExecStart=/usr/bin/mount.s3ql b2://intelfx-name-stratofortress-files /mnt/b2/files -o systemd,log=syslog,authfile=/etc/s3ql/authinfo2,cachedir=/var/tmp/s3ql,cachesize=104857600,max-cache-entries=524088,allow-other,compress=none,threads=16,backend-options=disable-versions,debug ExecStop=/usr/bin/umount /mnt/b2/files LimitNOFILE=524288 MemoryHigh=1G ==== 8< ==== Here are the last few lines of the debug log: ==== 8< ==== Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: mount.s3ql[43764:MainThread] s3ql.fs.getxattr: started with 4113, b'security.capability' Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0 Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0 Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0 Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0 Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0 Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: mount.s3ql[43764:MainThread] s3ql.block_cache.with_event_loop: upload of 3439 failed NoneType: None Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0 Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: mount.s3ql[43764:MainThread] s3ql.block_cache._get_entry: started with 4113, 0 Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: mount.s3ql[43764:MainThread] s3ql.fs.getxattr: started with 4113, b'security.capability' Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name mount.s3ql[43764]: mount.s3ql[43764:Thread-10] s3ql.mount.exchook: recording exception No pending requests Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name umount[44705]: umount: /mnt/b2/files: target is busy. Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name systemd[1]: s3ql-mnt-b2-files.service: Control process exited, code=exited, status=32/n/a Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name systemd[1]: s3ql-mnt-b2-files.service: Failed with result 'exit-code'. Aug 16 08:28:27 stratofortress.nexus.i.intelfx.name systemd[1]: s3ql-mnt-b2-files.service: Consumed 1min 27.740s CPU time, read 2.3M from disk, written 70.6M to disk. ==== 8< ==== By some unknown reason, the first line of this excerpt (s3ql.fs.getxattr) is being spammed all over the log -- I'm not sure what makes several hundred getxattrs per second. Anyway, the actual log (17M): https://intelfx.name/files/persistent/2020-08-16%20s3ql%20crash/s3ql-mnt-b2-files.log I can reliably reproduce this crash by running the same rsync command, which fails after writing a specific amount of data. This also happened with another filesystem (the borg one), although I have not tried reproducing it there as I need to repair the borg repository first. Any ideas what could go wrong or how to debug further? Thanks, -- Ivan Shapovalov / intelfx / -- You received this message because you are subscribed to the Google Groups "s3ql" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/s3ql/8c53b4f0b018199862a00f99f7b1bdb2152b053b.camel%40intelfx.name.
signature.asc
Description: This is a digitally signed message part
