Hello,

I'm using OVH Cloud with a large S3QL filesystem. (This is not hubiC.). OVH 
Cloud is OpenStack and uses swiftks. Normally the response time from OVH 
Cloud is exceedingly fast, but recently I'm getting a very large number of 
timeouts after needing to run an fsck on the filesystem.

fsck.s3ql --cachedir /autofs/cache/s3ql --log 
/autofs/cache/s3ql/fsck.GRA1_Live.log --backend-options tcp-timeout=90 
swiftks://auth.cloud.ovh.net/GRA1:XXXXXX/


Here is a sample from the log file:

2018-03-27 18:25:16.413 1483:MainThread s3ql.fsck.main: Starting fsck of 
swiftks://auth.cloud.ovh.net/GRA1:XXXXXX/
2018-03-27 18:25:16.786 1483:MainThread s3ql.fsck.main: Using cached 
metadata.
2018-03-27 18:25:17.037 1483:MainThread s3ql.fsck.main: Remote metadata is 
outdated.
2018-03-27 18:25:17.037 1483:MainThread s3ql.fsck.main: Checking DB 
integrity...
...everything is fine...but it just takes a very long time...
2018-03-27 19:52:33.499 1483:MainThread s3ql.metadata.upload_metadata: 
Compressing and uploading metadata...
2018-03-27 20:15:13.078 1483:MainThread s3ql.metadata.upload_metadata: 
Wrote 857 MiB of compressed metadata.
2018-03-27 20:15:13.078 1483:MainThread s3ql.metadata.upload_metadata: 
Cycling metadata backups...
2018-03-27 20:15:13.078 1483:MainThread s3ql.metadata.cycle_metadata: 
Backing up old metadata...
2018-03-27 20:15:13.078 1483:MainThread s3ql.metadata.cycle_metadata: - 
[CJD] copy old metadata 9...
2018-03-27 20:19:44.250 1483:MainThread s3ql.backends.common.wrapped: 
Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying 
Backend._copy_helper (attempt 3)...
...
2018-03-27 20:52:26.376 1483:MainThread s3ql.backends.common.wrapped: 
Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying 
Backend._copy_helper (attempt 16)...
2018-03-27 20:59:49.105 1483:MainThread s3ql.backends.common.wrapped: 
Encountered ConnectionTimedOut (send/recv timeout exceeded), retrying 
Backend._copy_helper (attempt 17)...
...

Recently I've been getting up to retries in the region of 70-80 for each 
metadata backup. (You'll see by the "CJD" reference I've been trying to 
track a little more detail, but I'm not really familiar enough with Python 
to know where or how to debug further.) It's not a crash as such so I 
didn't think trying to get a stack dump was necessarily a useful thing to 
deliver at this stage - but please do feel free to tell me otherwise.

This is the S3QL package from Debian, s3ql:amd64 2.21+dfsg-3. I looked at 
upgrading to your recent 2.26, which is available as a Debian package, but 
this crashed out horribly (and no, unfortunately I didn't take a screen 
capture). I assume there's a broken dependency somewhere but a production 
system is not the place to try investigating that.

I'm aware that I have a very large database (the uncompressed size is now 
12GB) and I want to split it. But until I can get this filesystem mounted I 
can't copy off the data and so I can't reduce its size. I've increased 
--backend-options 
tcp-timeout=90 from its previous default of 20 but that doesn't seem to be 
making any difference.

What controls the metadata copy timeout, and can I increase it enough to 
get my filesystem mounted?

Thanks,
Chris

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to