Re: [s3ql] fsck 2.12 hangs processing objects

Guilherme Barile Wed, 18 Feb 2015 02:48:07 -0800

Nikolaus,

After fscking and upgrading my volume to the new db revision, I was able to 
mount the filesystem and access all the files without problems (even though 
I stress the filesystem a lot, I never lost data using s3ql).

With the volume back online, I started to rsync about 280gb of files *from* 
S3QL to an EBS volume (everything running inside AWS, on the same region). 
I had 3 jobs in parallel - one syncing small videos (15000 ~100mb files), 
one syncing photos (15000 ~5mb files) and another one syncing a 3GB volume 
with about 100.000 small (10-500kb) files.

The high io wait situation occurred after about 3 hours of processing. I 
use newrelic to monitor the server, so I could notice a spike of writes on 
my local cache disk, which I documented here - 
https://docs.google.com/document/d/1S927JPyMG4SCxkiIcReQDHWACM1qozYOMTa86pGQ1k4/edit?usp=sharing

As I'm not serving any content directly from s3ql now, it didn't affect the 
other services. I know this is a quite peculiar way of using s3ql, but 
here's my analysis. 

In brief, the 3 rsync processes started around 8:45pm. Everything went well 
until around 12:00am, when the I/O wait started to spike. This seems to be 
the same scenario that happened on monday and led me to reboot the server. 
Also after 12:00am, the network I/O drops from around 100mb/s to 2mb/s, and 
there's a constant stream of writes to my local cache disk until near 5am. 
The rsync processes finished without errors, but took about 12 hours to 
complete.

So mount.s3ql never crashed and went back to normal operation after a few 
hours. It seems some race condition is happening on the cache when there 
are too many concurrent reads from on the filesystem which delays all I/O.

If there's any other debug info I can provide, I'll be glad to do so. I can 
also try tuning the local cache (it ran using the default parameters) or 
killing mount.s3ql when the wait starts so I can test fsck also.

Thanks a lot for your help,

Guilherme

On Tuesday, February 17, 2015 at 3:25:11 PM UTC-2, Nikolaus Rath wrote:
>
> >> Actually that doesn't sound like a solution at all. mount.s3ql should 
> >> never crash, now matter how high the load is. Can you still reproduce 
> >> that? If so, it'd be great if you could post the backtrace. 
> >> 
> > This time it happened while I was performing 2 full backups with lots of 
> > small files, from my mounted s3qlfs to another s3 bucket via duplicity. 
> > I've experienced this when apache was under high load. When mount.s3ql 
> > stops, all other processes start to wait for io, increasing the load 
> > constantly, I can try to force this behaviour on another bucket after I 
> > restore this one. 
>
> Please do! 
>

-- 
You received this message because you are subscribed to the Google Groups 
"s3ql" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [s3ql] fsck 2.12 hangs processing objects

Reply via email to