On Thu, Jul 19, 2012 at 1:28 AM, Gregory Farnum g...@inktank.com wrote:
On Wed, Jul 18, 2012 at 12:07 PM, Andrey Korolyov and...@xdel.ru wrote:
On Wed, Jul 18, 2012 at 10:30 PM, Gregory Farnum g...@inktank.com wrote:
On Wed, Jul 18, 2012 at 12:47 AM, Andrey Korolyov and...@xdel.ru wrote:
On
It's actually the sum of the latencies of all 3971 asynchronous reads,
in seconds, so the average latency was ~200ms, which is still pretty
high.
OK. I did realize it later that day when I've noticed that sum does go
up only. So sum is number of seconds spent and divided by avgcount gives
an
I'd just like to report the same behaviour on my test cluster with 0.48.
I've set up a single box (Sl6.1 - 2.6.32-220.23.1 kernel) with 1 mds,
mon and osd, and replication set to '1' for both data and metadata.
Having mounted using ceph-fuse, I'm running a simple fio job to create load:
Try to determine how much of the 200ms avg latency comes from osds vs
the qemu block driver.
Look like that osd.0 performs with low latency but osd.1 latency is way
too high and on average it appears as 200ms. osd is backed by btrfs over
LVM2. May be issue lie in backing fs selection? All
Late last year Josh Durgin had put together a series of
fixes for rbd that never got committed. I told him I
would get them in, and this series represents the last
six that remain.
Here's a summary:
[PATCH 1/6] rbd: return errors for mapped but deleted snapshot
This adds code to distinguish
When a snapshot is deleted, the OSD will return ENOENT when reading
from it. This is normally interpreted as a hole by rbd, which will
return zeroes. To minimize the time in which this can happen, stop
requests early when we are notified that our snapshot no longer
exists.
[el...@inktank.com:
Snapshots cannot be resized, and the new capacity of head should not
be reflected by the snapshot.
Signed-off-by: Josh Durgin josh.dur...@inktank.com
Reviewed-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c |7 ++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git
If an image was mapped to a snapshot, the size of the head version
would be shown. Protect capacity with header_rwsem, since it may
change.
Signed-off-by: Josh Durgin josh.dur...@dreamhost.com
Reviewed-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c | 11 ---
1 files changed,
The image may have been resized.
Signed-off-by: Josh Durgin josh.dur...@dreamhost.com
Reviewed-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 9c3a1db..a6bbda2 100644
This prevents a race between requests with a given snap context and
header updates that free it. The osd client was already expecting the
snap context to be reference counted, since it get()s it in
ceph_osdc_build_request and put()s it when the request completes.
Also remove the second
Previously the original header version was sent. Now, we update it
when the header changes.
Signed-off-by: Josh Durgin josh.dur...@dreamhost.com
Reviewed-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c |7 +--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git
On Tue, Jul 17, 2012 at 01:18:50PM -0700, Yehuda Sadeh wrote:
On Wed, Jul 11, 2012 at 5:34 AM, Guangliang Zhao gz...@suse.com wrote:
The bio_pair alloced in bio_chain_clone would not be freed,
this will cause a memory leak. It could be freed actually only
after 3 times release, because
The header_rwsem of rbd_dev initializes twice in
function rbd_add.
Signed-off-by: Guangliang Zhao gz...@suse.com
---
drivers/block/rbd.c |2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 013c7a5..50117dd 100644
---
Hi Cephers!
I'm working with rbd mapping. I figured out that the block device size
of the rbd device is not update while the device is mounted. Here my
tests:
1. Pick up a device and check his size
# rbd ls
size
# rbd info test
rbd image 'test':
size 1 MB in 2500 objects
order 22 (4096 KB
Hi,
On 19-07-12 16:55, Sébastien Han wrote:
Hi Cephers!
I'm working with rbd mapping. I figured out that the block device size
of the rbd device is not update while the device is mounted. Here my
tests:
iirc this is not something RBD specific, but since the device is in use
it can't be
Ok I got your point seems logic, but why is this possible with LVM for example?
You can easily do this with LVM without un-mounting the device.
Cheers.
On Thu, Jul 19, 2012 at 5:15 PM, Wido den Hollander w...@widodh.nl wrote:
Hi,
On 19-07-12 16:55, Sébastien Han wrote:
Hi Cephers!
I'm
On Thu, Jul 19, 2012 at 8:26 AM, Sébastien Han han.sebast...@gmail.com wrote:
Ok I got your point seems logic, but why is this possible with LVM for
example?
You can easily do this with LVM without un-mounting the device.
Do your LVM volumes have partition tables inside them? That might be
On Thu, Jul 19, 2012 at 8:38 AM, Tommi Virtanen t...@inktank.com wrote:
Do your LVM volumes have partition tables inside them? That might be
the difference.
Of course, you can put your filesystem straight on the RBD; that would
be a good experiment to run.
Oops, I see you did put your fs
On 19-07-12 17:26, Sébastien Han wrote:
Ok I got your point seems logic, but why is this possible with LVM for example?
You can easily do this with LVM without un-mounting the device.
LVM runs through the device mapper and are not regular block devices.
If you resize the disk underneath
Hum ok, I see. Thanks!
But if you have any clue to force the kernel to re-read without
unmont/mounting :)
On Thu, Jul 19, 2012 at 5:47 PM, Wido den Hollander w...@widodh.nl wrote:
On 19-07-12 17:26, Sébastien Han wrote:
Ok I got your point seems logic, but why is this possible with LVM for
On Thu, Jul 19, 2012 at 5:19 AM, Vladimir Bashkirtsev
vladi...@bashkirtsev.com wrote:
Look like that osd.0 performs with low latency but osd.1 latency is way too
high and on average it appears as 200ms. osd is backed by btrfs over LVM2.
May be issue lie in backing fs selection? All four osds
This series of patches changes the way the snap context seq field
is used. Currently it is used in a way that isn't really useful, and
as such is a bit confusing. This behavior seems to be a hold over
from a time when there was no snap_id field maintained for an rbd_dev.
Summary:
[PATCH 1/4]
In what appears to be an artifact of a different way of encoding
whether an rbd image maps a snapshot, __rbd_refresh_header() has
code that arranges to update the seq value in an rbd image's
snapshot context to point to the first entry in its snapshot
array if that's where it was pointing
In rbd_header_add_snap() there is code to set snapc-seq to the
just-added snapshot id. This is the only remnant left of the
use of that field for recording which snapshot an rbd_dev was
associated with. That functionality is no longer supported,
so get rid of that final bit of code.
Doing so
The snap_seq field in an rbd_image_header structure held the value
from the rbd image header when it was last refreshed. We now
maintain this value in the snapc-seq field. So get rid of the
other one.
Signed-off-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c |2 --
1 files
I've had a little more luck using cfdisk than vanilla fdisk when it
comes to detecting changes. You might try running partprobe and then
cfdisk and seeing if you get anything different.
Calvin
On Thu, Jul 19, 2012 at 9:50 AM, Sébastien Han han.sebast...@gmail.com wrote:
Hum ok, I see. Thanks!
On Thu, Jul 19, 2012 at 9:52 AM, Tommi Virtanen t...@inktank.com wrote:
On Thu, Jul 19, 2012 at 5:19 AM, Vladimir Bashkirtsev
vladi...@bashkirtsev.com wrote:
Look like that osd.0 performs with low latency but osd.1 latency is way
too
high and on average it appears as 200ms. osd is backed
On 07/19/2012 01:06 PM, Calvin Morrow wrote:
On Thu, Jul 19, 2012 at 9:52 AM, Tommi Virtanent...@inktank.com wrote:
On Thu, Jul 19, 2012 at 5:19 AM, Vladimir Bashkirtsev
vladi...@bashkirtsev.com wrote:
Look like that osd.0 performs with low latency but osd.1 latency is way
too
high and on
With LVM, you can re-scan the scsi bus to extend a physical drive and
then run a pvextend.
@Calvin: I tried your solution
# partprobe /dev/rbd1
Unfortunatly nothing changed.
Did you make it working?
Cheers!
On Thu, Jul 19, 2012 at 5:50 PM, Sébastien Han han.sebast...@gmail.com wrote:
Hum
I haven't tried resizing an rbd yet, but I was changing partitions on
a non-ceph two-node cluster with shared storage yesterday while
certain partitions were in use (partitions 1,2,5 were mounted,
deleting partition ids 6+, adding new ones) and fdisk wasn't
re-reading disk changes. Partprobe
On 07/19/2012 09:44 PM, Sébastien Han wrote:
With LVM, you can re-scan the scsi bus to extend a physical drive and
then run a pvextend.
@Calvin: I tried your solution
# partprobe /dev/rbd1
Did you try blockdev?
# blockdev --rereadpt /dev/rbd1
Regards,
Andreas
Unfortunatly nothing
This series includes a bunch of relatively small cleanups.
They're grouped a bit below, but they apply together in
this sequence and the later ones may have dependencies on
those earlier in the series.
Summaries:
[PATCH 01/12] rbd: drop extra header_rwsem init
[PATCH 02/12] rbd: simplify
On 07/19/2012 10:11 AM, Alex Elder wrote:
We now use rbd_dev-snap_id to record the snapshot id--using
the special value SNAP_NONE to indicate the rbd_dev is not mapping
a snapshot at all.
That's CEPH_NOSNAP, not SNAP_NONE, right? In any case,
Reviewed-by: Josh Durgin josh.dur...@inktank.com
In commit c01a there was inadvertently added an extra
initialization of rbd_dev-header_rwsem. This gets rid of the
duplicate.
(Guangliang Zhao also offered up the same fix.)
Reported-by: Guangliang Zhao gz...@suse.com
Signed-off-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c |
This just replaces a while loop with list_for_each_entry_safe()
in __rbd_remove_all_snaps().
Signed-off-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c |5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index
There was a dout() call in rbd_do_request() that was reporting
the reporting the offset as the length and vice versa. While
fixing that I did a quick scan of other dout() calls and fixed
a couple of other minor things.
Signed-off-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c |7
There are two structures in which a count of snapshots are
maintained:
struct ceph_snap_context {
...
u32 num_snaps;
...
}
and
struct ceph_snap_realm {
...
u32 num_prior_parent_snaps; /* had prior to parent_since */
...
u32
The snapc parameter to in rbd_req_sync_read() is not used, so
get rid of it.
Reported-by: Josh Durgin josh.dur...@inktank.com
Signed-off-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c |3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/drivers/block/rbd.c
The function rbd_header_from_disk() is only called in one spot, and
it passes GFP_KERNEL as its value for the gfp_flags parameter.
Just drop that parameter and substitute GFP_KERNEL everywhere within
that function it had been used. (If we find we need the parameter
again in the future it's easy
Both rbd_register_snap_dev() and __rbd_remove_snap_dev() have
rbd_dev parameters that are unused. Remove them.
Signed-off-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c | 19 +++
1 files changed, 7 insertions(+), 12 deletions(-)
diff --git a/drivers/block/rbd.c
The only place that passes a version pointer to rbd_req_sync_exec()
is in rbd_header_add_snap(), and that spot ignores the result.
The only thing rbd_req_sync_exec() does with its ver parameter is
pass it directly to rbd_req_sync_op(). So we can just use a null
pointer there, and drop the ver
It's not obvious whether the snapshot pointer whose address is
provided to __rbd_add_snap_dev() will be assigned by that function.
Change it to return the snapshot, or a pointer-coded errno in the
event of a failure.
Signed-off-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c | 37
Either rbd_create_rw_ops() will succeed, or it will fail because a
memory allocation failed. Have it just return a valid pointer or
null rather than stuffing a pointer into a provided address and
returning an errno.
Signed-off-by: Alex Elder el...@inktank.com
---
drivers/block/rbd.c | 68
All of the callers of rbd_req_sync_op() except one pass a non-null
ops pointer. The only one that does not is rbd_req_sync_read(),
which passes CEPH_OSD_OP_READ as its opcode and, CEPH_OSD_FLAG_READ
for flags.
By allocating the ops array in rbd_req_sync_read() and moving the
special case code
This fixes a few issues in rbd_header_from_disk():
- The memcmp() call at the beginning of the function is really
looking at the text field of struct rbd_image_header_ondisk.
While it does lie at the beginning of the structure, the
comparison should be done against the field,
On 07/19/2012 04:02 PM, Josh Durgin wrote:
On 07/19/2012 10:11 AM, Alex Elder wrote:
We now use rbd_dev-snap_id to record the snapshot id--using
the special value SNAP_NONE to indicate the rbd_dev is not mapping
a snapshot at all.
That's CEPH_NOSNAP, not SNAP_NONE, right? In any case,
Yes.
On 07/19/2012 10:09 AM, Alex Elder wrote:
This series of patches changes the way the snap context seq field
is used. Currently it is used in a way that isn't really useful, and
as such is a bit confusing. This behavior seems to be a hold over
from a time when there was no snap_id field
Hi Linus,
Please pull these last minute fixes for Ceph from:
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git for-linus
The important one fixes a bug in the socket failure handling behavior that
was turned up in some recent failure injection testing. The other two are
I finally figured out how to make objdump interleave the source code in
the .ko file dumps on our qa machines. The problem is that the debug info
refeferences the path where the kernel was compiled (which is non-obvious
since the info is compressed). For our environment, this is a quick
I am trying to get Ceph running on an ARM system, currently one quad
core node, running Ubuntu 12.04.
It compiles fine, currently without tcmalloc and google perf tools, but
I am running into a problem with mkcephfs. 'mkcephfs -a -c ceph.conf'
didn't work so I did it piece by piece until I got
On 20/07/2012 1:22 AM, Tommi Virtanen wrote:
On Thu, Jul 19, 2012 at 5:19 AM, Vladimir Bashkirtsev
vladi...@bashkirtsev.com wrote:
Look like that osd.0 performs with low latency but osd.1 latency is way too
high and on average it appears as 200ms. osd is backed by btrfs over LVM2.
May be issue
Hi Noah,
Thank you for fixes and suggestions of compilation of java-rados ,
It's working fine now.
Thanks,
Ramu.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at
We are seeing degradation at 64k node/leaf sizes as well. So far the
degradation is most obvious with small writes. it affects XFS as
well, though not as severely. We are vigorously looking into it. :)
Just confirming that one of our clients has run fair amount (on
gigabytes scale) of
What node/leaf size are you using on your btrfs volume?
Default 4K.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Yes, they can hold up reads to the same object. Depending on where
they're stuck, they may be blocking other requests as well if they're
e.g. taking up all the filestore threads. Waiting for subops means
they're waiting for replicas to acknowledge the write and commit it to
disk. The real cause
55 matches
Mail list logo