[radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Sylvain Munaut
Hi, I've just noticed something rather worrying on our cluster. Some files are apparently truncated. From the first look I had at it, it happened on files where there was a metadata update right after the file was stored. The exact sequence was: - PUT to store the file - GET to get the file

Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Yehuda Sadeh
On Mon, Mar 18, 2013 at 2:50 AM, Sylvain Munaut s.mun...@whatever-company.com wrote: Hi, I've just noticed something rather worrying on our cluster. Some files are apparently truncated. From the first look I had at it, it happened on files where there was a metadata update right after the

Re: CephFS Space Accounting and Quotas

2013-03-18 Thread Jim Schutt
On 03/15/2013 05:17 PM, Greg Farnum wrote: [Putting list back on cc] On Friday, March 15, 2013 at 4:11 PM, Jim Schutt wrote: On 03/15/2013 04:23 PM, Greg Farnum wrote: As I come back and look at these again, I'm not sure what the context for these logs is. Which test did they come from,

Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Sylvain Munaut
Hi, What version are you using? Do you have logs? I'm running a custom build 0.56.3 + some patches ( basically up to7889c5412 + fixes for #4150 and #4177 ). I don't have any radosgw low ( debug level is set to 0 and it didn't output anything ). I have the HTTP logs : 10.0.0.253 s3.svc -

Re: Ceph availability test recovering question

2013-03-18 Thread Andrey Korolyov
Hello, I`m experiencing same long-lasting problem - during recovery ops, some percentage of read I/O remains in-flight for seconds, rendering upper-level filesystem on the qemu client very slow and almost unusable. Different striping has almost no effect on visible delays and reads may be

Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Yehuda Sadeh
On Mon, Mar 18, 2013 at 7:40 AM, Sylvain Munaut s.mun...@whatever-company.com wrote: Hi, What version are you using? Do you have logs? I'm running a custom build 0.56.3 + some patches ( basically up to7889c5412 + fixes for #4150 and #4177 ). I don't have any radosgw low ( debug level is

Re: [radosgw] Race condition corrupting data on COPY ?

2013-03-18 Thread Sylvain Munaut
Hi, Can't make much out of it, will probably need rgw logs (and preferably with also 'debug ms = 1') for this issue. Well, the problem is that I can't make it happen again ... it happened 4 times during an import of ~3000 files ... I'm trying to reproduce this on a test cluster but so far, no

corruption of active mmapped files in btrfs snapshots

2013-03-18 Thread Alexandre Oliva
For quite a while, I've experienced oddities with snapshotted Firefox _CACHE_00?_ files, whose checksums (and contents) would change after the btrfs snapshot was taken, and would even change depending on how the file was brought to memory (e.g., rsyncing it to backup storage vs checking its md5sum

Re: Direct IO on CephFS for blocks larger than 8MB

2013-03-18 Thread Greg Farnum
On Saturday, March 16, 2013 at 5:38 AM, Henry C Chang wrote: The following patch should fix the problem. -Henry diff --git a/fs/ceph/file.c b/fs/ceph/file.c index e51558f..4bcbcb6 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -608,7 +608,7 @@ out: pos += len; written += len;

Re: Direct IO on CephFS for blocks larger than 8MB

2013-03-18 Thread Sage Weil
On Mon, 18 Mar 2013, Greg Farnum wrote: On Saturday, March 16, 2013 at 5:38 AM, Henry C Chang wrote: The following patch should fix the problem. -Henry diff --git a/fs/ceph/file.c b/fs/ceph/file.c index e51558f..4bcbcb6 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@

Re: corruption of active mmapped files in btrfs snapshots

2013-03-18 Thread Alexandre Oliva
While I wrote the previous email, a smoking gun formed in one of my servers: a snapshot that had passed a database consistency check turned out to be corrupted when I tried to rollback to it! Since the snapshot was not modified in any way between the initial scripted check and the later manual

Re: corruption of active mmapped files in btrfs snapshots

2013-03-18 Thread Chris Mason
A few questions. Does leveldb use O_DIRECT and mmap together? (the source of a write being pages that are mmap'd from somewhere else) That's the most likely place for this kind of problem. Also, you mention crc errors. Are those reported by btrfs or are they application level crcs. Thanks for

[PATCH] ceph: fix buffer pointer advance in ceph_sync_write

2013-03-18 Thread Henry C Chang
We should advance the user data pointer by _len_ instead of _written_. _len_ is the data length written in each iteration while _written_ is the accumulated data length we have writtent out. Signed-off-by: Henry C Chang henry.cy.ch...@gmail.com --- fs/ceph/file.c |2 +- 1 file changed, 1

Re: Direct IO on CephFS for blocks larger than 8MB

2013-03-18 Thread Henry C Chang
I just sent out the patch with sign-off. Thanks for testing. 2013/3/19 Sage Weil s...@inktank.com: On Mon, 18 Mar 2013, Greg Farnum wrote: On Saturday, March 16, 2013 at 5:38 AM, Henry C Chang wrote: The following patch should fix the problem. -Henry diff --git a/fs/ceph/file.c

Re: Ceph availability test recovering question

2013-03-18 Thread Wolfgang Hennerbichler
On 03/17/2013 05:18 AM, kelvin_hu...@wiwynn.com wrote: Hi, all Hi, ... My question is: 1.The state of I/O pause is normal when ceph recovering ? I have experienced the same issue. This works as designed, and is probably because of the heartbeat-timeout in osd heartbeat grace period set to

Re: corruption of active mmapped files in btrfs snapshots

2013-03-18 Thread Alexandre Oliva
On Mar 18, 2013, Chris Mason chris.ma...@fusionio.com wrote: A few questions. Does leveldb use O_DIRECT and mmap together? No, it doesn't use O_DIRECT at all. Its I/O interface is very simplified: it just opens each new file (database chunks limited to 2MB) with O_CREAT|O_RDWR|O_TRUNC, and