Nikolaus, After fscking and upgrading my volume to the new db revision, I was able to mount the filesystem and access all the files without problems (even though I stress the filesystem a lot, I never lost data using s3ql).
With the volume back online, I started to rsync about 280gb of files *from* S3QL to an EBS volume (everything running inside AWS, on the same region). I had 3 jobs in parallel - one syncing small videos (15000 ~100mb files), one syncing photos (15000 ~5mb files) and another one syncing a 3GB volume with about 100.000 small (10-500kb) files. The high io wait situation occurred after about 3 hours of processing. I use newrelic to monitor the server, so I could notice a spike of writes on my local cache disk, which I documented here - https://docs.google.com/document/d/1S927JPyMG4SCxkiIcReQDHWACM1qozYOMTa86pGQ1k4/edit?usp=sharing As I'm not serving any content directly from s3ql now, it didn't affect the other services. I know this is a quite peculiar way of using s3ql, but here's my analysis. In brief, the 3 rsync processes started around 8:45pm. Everything went well until around 12:00am, when the I/O wait started to spike. This seems to be the same scenario that happened on monday and led me to reboot the server. Also after 12:00am, the network I/O drops from around 100mb/s to 2mb/s, and there's a constant stream of writes to my local cache disk until near 5am. The rsync processes finished without errors, but took about 12 hours to complete. So mount.s3ql never crashed and went back to normal operation after a few hours. It seems some race condition is happening on the cache when there are too many concurrent reads from on the filesystem which delays all I/O. If there's any other debug info I can provide, I'll be glad to do so. I can also try tuning the local cache (it ran using the default parameters) or killing mount.s3ql when the wait starts so I can test fsck also. Thanks a lot for your help, Guilherme On Tuesday, February 17, 2015 at 3:25:11 PM UTC-2, Nikolaus Rath wrote: > > >> Actually that doesn't sound like a solution at all. mount.s3ql should > >> never crash, now matter how high the load is. Can you still reproduce > >> that? If so, it'd be great if you could post the backtrace. > >> > > This time it happened while I was performing 2 full backups with lots of > > small files, from my mounted s3qlfs to another s3 bucket via duplicity. > > I've experienced this when apache was under high load. When mount.s3ql > > stops, all other processes start to wait for io, increasing the load > > constantly, I can try to force this behaviour on another bucket after I > > restore this one. > > Please do! > -- You received this message because you are subscribed to the Google Groups "s3ql" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
