Hello, I'm using OVH Cloud with a large S3QL filesystem. (This is not hubiC.). OVH Cloud is OpenStack and uses swiftks. Normally the response time from OVH Cloud is exceedingly fast, but recently I'm getting a very large number of timeouts after needing to run an fsck on the filesystem.
fsck.s3ql --cachedir /autofs/cache/s3ql --log /autofs/cache/s3ql/fsck.GRA1_Live.log --backend-options tcp-timeout=90 swiftks://auth.cloud.ovh.net/GRA1:XXXXXX/ Here is a sample from the log file: 2018-03-27 18:25:16.413 1483:MainThread s3ql.fsck.main: Starting fsck of swiftks://auth.cloud.ovh.net/GRA1:XXXXXX/ 2018-03-27 18:25:16.786 1483:MainThread s3ql.fsck.main: Using cached metadata. 2018-03-27 18:25:17.037 1483:MainThread s3ql.fsck.main: Remote metadata is outdated. 2018-03-27 18:25:17.037 1483:MainThread s3ql.fsck.main: Checking DB integrity... ...everything is fine...but it just takes a very long time... 2018-03-27 19:52:33.499 1483:MainThread s3ql.metadata.upload_metadata: Compressing and uploading metadata... 2018-03-27 20:15:13.078 1483:MainThread s3ql.metadata.upload_metadata: Wrote 857 MiB of compressed metadata. 2018-03-27 20:15:13.078 1483:MainThread s3ql.metadata.upload_metadata: Cycling metadata backups... 2018-03-27 20:15:13.078 1483:MainThread s3ql.metadata.cycle_metadata: Backing up old metadata... 2018-03-27 20:15:13.078 1483:MainThread s3ql.metadata.cycle_metadata: - [CJD] copy old metadata 9... 2018-03-27 20:19:44.250 1483:MainThread s3ql.backends.common.wrapped: Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying Backend._copy_helper (attempt 3)... ... 2018-03-27 20:52:26.376 1483:MainThread s3ql.backends.common.wrapped: Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying Backend._copy_helper (attempt 16)... 2018-03-27 20:59:49.105 1483:MainThread s3ql.backends.common.wrapped: Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying Backend._copy_helper (attempt 17)... ... Recently I've been getting up to retries in the region of 70-80 for each metadata backup. (You'll see by the "CJD" reference I've been trying to track a little more detail, but I'm not really familiar enough with Python to know where or how to debug further.) It's not a crash as such so I didn't think trying to get a stack dump was necessarily a useful thing to deliver at this stage - but please do feel free to tell me otherwise. This is the S3QL package from Debian, s3ql:amd64 2.21+dfsg-3. I looked at upgrading to your recent 2.26, which is available as a Debian package, but this crashed out horribly (and no, unfortunately I didn't take a screen capture). I assume there's a broken dependency somewhere but a production system is not the place to try investigating that. I'm aware that I have a very large database (the uncompressed size is now 12GB) and I want to split it. But until I can get this filesystem mounted I can't copy off the data and so I can't reduce its size. I've increased --backend-options tcp-timeout=90 from its previous default of 20 but that doesn't seem to be making any difference. What controls the metadata copy timeout, and can I increase it enough to get my filesystem mounted? Thanks, Chris -- You received this message because you are subscribed to the Google Groups "s3ql" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
