Hi,
I've just noticed something rather worrying on our cluster.
Some files are apparently truncated. From the first look I had at it,
it happened on files where there was a metadata update right after the
file was stored. The exact sequence was:
- PUT to store the file
- GET to get the file
On Mon, Mar 18, 2013 at 2:50 AM, Sylvain Munaut
s.mun...@whatever-company.com wrote:
Hi,
I've just noticed something rather worrying on our cluster.
Some files are apparently truncated. From the first look I had at it,
it happened on files where there was a metadata update right after the
On 03/15/2013 05:17 PM, Greg Farnum wrote:
[Putting list back on cc]
On Friday, March 15, 2013 at 4:11 PM, Jim Schutt wrote:
On 03/15/2013 04:23 PM, Greg Farnum wrote:
As I come back and look at these again, I'm not sure what the context
for these logs is. Which test did they come from,
Hi,
What version are you using? Do you have logs?
I'm running a custom build 0.56.3 + some patches ( basically up
to7889c5412 + fixes for #4150 and #4177 ).
I don't have any radosgw low ( debug level is set to 0 and it didn't
output anything ).
I have the HTTP logs :
10.0.0.253 s3.svc -
Hello,
I`m experiencing same long-lasting problem - during recovery ops, some
percentage of read I/O remains in-flight for seconds, rendering
upper-level filesystem on the qemu client very slow and almost
unusable. Different striping has almost no effect on visible delays
and reads may be
On Mon, Mar 18, 2013 at 7:40 AM, Sylvain Munaut
s.mun...@whatever-company.com wrote:
Hi,
What version are you using? Do you have logs?
I'm running a custom build 0.56.3 + some patches ( basically up
to7889c5412 + fixes for #4150 and #4177 ).
I don't have any radosgw low ( debug level is
Hi,
Can't make much out of it, will probably need rgw logs (and preferably
with also 'debug ms = 1') for this issue.
Well, the problem is that I can't make it happen again ... it happened
4 times during an import of ~3000 files ... I'm trying to reproduce
this on a test cluster but so far, no
For quite a while, I've experienced oddities with snapshotted Firefox
_CACHE_00?_ files, whose checksums (and contents) would change after the
btrfs snapshot was taken, and would even change depending on how the
file was brought to memory (e.g., rsyncing it to backup storage vs
checking its md5sum
On Saturday, March 16, 2013 at 5:38 AM, Henry C Chang wrote:
The following patch should fix the problem.
-Henry
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index e51558f..4bcbcb6 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -608,7 +608,7 @@ out:
pos += len;
written += len;
On Mon, 18 Mar 2013, Greg Farnum wrote:
On Saturday, March 16, 2013 at 5:38 AM, Henry C Chang wrote:
The following patch should fix the problem.
-Henry
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index e51558f..4bcbcb6 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@
While I wrote the previous email, a smoking gun formed in one of my
servers: a snapshot that had passed a database consistency check turned
out to be corrupted when I tried to rollback to it! Since the snapshot
was not modified in any way between the initial scripted check and the
later manual
A few questions. Does leveldb use O_DIRECT and mmap together? (the
source of a write being pages that are mmap'd from somewhere else)
That's the most likely place for this kind of problem. Also, you
mention crc errors. Are those reported by btrfs or are they application
level crcs.
Thanks for
We should advance the user data pointer by _len_ instead of _written_.
_len_ is the data length written in each iteration while _written_ is the
accumulated data length we have writtent out.
Signed-off-by: Henry C Chang henry.cy.ch...@gmail.com
---
fs/ceph/file.c |2 +-
1 file changed, 1
I just sent out the patch with sign-off. Thanks for testing.
2013/3/19 Sage Weil s...@inktank.com:
On Mon, 18 Mar 2013, Greg Farnum wrote:
On Saturday, March 16, 2013 at 5:38 AM, Henry C Chang wrote:
The following patch should fix the problem.
-Henry
diff --git a/fs/ceph/file.c
On 03/17/2013 05:18 AM, kelvin_hu...@wiwynn.com wrote:
Hi, all
Hi,
...
My question is:
1.The state of I/O pause is normal when ceph recovering ?
I have experienced the same issue. This works as designed, and is
probably because of the heartbeat-timeout in osd heartbeat grace
period set to
On Mar 18, 2013, Chris Mason chris.ma...@fusionio.com wrote:
A few questions. Does leveldb use O_DIRECT and mmap together?
No, it doesn't use O_DIRECT at all. Its I/O interface is very
simplified: it just opens each new file (database chunks limited to 2MB)
with O_CREAT|O_RDWR|O_TRUNC, and
16 matches
Mail list logo