Re: problems creating new ceph cluster when using journal on block device

2012-11-08 Thread Wido den Hollander
On 08-11-12 08:29, Travis Rhoden wrote: Hey folks, I'm trying to set up a brand new Ceph cluster, based on v0.53. My hardware has SSDs for journals, and I'm trying to get mkcephfs to intialize everything for me. However, the command hangs forever and I eventually have to kill it. After

Re: problems creating new ceph cluster when using journal on block device

2012-11-08 Thread Mark Kirkwood
On 08/11/12 21:08, Wido den Hollander wrote: On 08-11-12 08:29, Travis Rhoden wrote: Hey folks, I'm trying to set up a brand new Ceph cluster, based on v0.53. My hardware has SSDs for journals, and I'm trying to get mkcephfs to intialize everything for me. However, the command hangs forever

Re: less cores more iops / speed

2012-11-08 Thread Stefan Priebe - Profihost AG
Am 08.11.2012 01:59, schrieb Mark Nelson: There's also the context switching overhead. It'd be interesting to know how much the writer processes were shifting around on cores. What do you mean by that? I'm talking about the KVM guest not about the ceph nodes. Stefan, what tool were you

Re: syncfs slower than without syncfs

2012-11-08 Thread Stefan Priebe - Profihost AG
done: http://tracker.newdream.net/issues/3461 Am 08.11.2012 04:09, schrieb Josh Durgin: On 11/07/2012 08:26 AM, Stefan Priebe wrote: Am 07.11.2012 16:04, schrieb Mark Nelson: Whew, glad you found the problem Stefan! I was starting to wonder what was going on. :) Do you mind filling a bug

Re: less cores more iops / speed

2012-11-08 Thread Alexandre DERUMIER
What do you mean by that? I'm talking about the KVM guest not about the ceph nodes. Do you have tried to compare virtio-blk and virtio-scsi ? Do you have tried directly from the host with the rbd kernel module ? - Mail original - De: Stefan Priebe - Profihost AG

Re: less cores more iops / speed

2012-11-08 Thread Stefan Priebe - Profihost AG
Am 08.11.2012 09:58, schrieb Alexandre DERUMIER: What do you mean by that? I'm talking about the KVM guest not about the ceph nodes. Do you have tried to compare virtio-blk and virtio-scsi ? How to change? Right now i'm using the PVE defaults = scsi-hd. Do you have tried directly from the

clock syncronisation

2012-11-08 Thread Stefan Priebe - Profihost AG
Hello list, is there any prefered way to use clock syncronisation? I've tried running openntpd and ntpd on all servers but i'm still getting: 2012-11-08 09:55:38.255928 mon.0 [WRN] message from mon.2 was stamped 0.063136s in the future, clocks not synchronized 2012-11-08 09:55:39.328639 mon.0

Re: less cores more iops / speed

2012-11-08 Thread Alexandre DERUMIER
Do you have tried to compare virtio-blk and virtio-scsi ? How to change? Right now i'm using the PVE defaults = scsi-hd. (virtio-blk is classic virtio ;) Do you have tried directly from the host with the rbd kernel module ? No don't know how to use ;-)

Re: less cores more iops / speed

2012-11-08 Thread Stefan Priebe - Profihost AG
Am 08.11.2012 10:05, schrieb Alexandre DERUMIER: Do you have tried to compare virtio-blk and virtio-scsi ? How to change? Right now i'm using the PVE defaults = scsi-hd. (virtio-blk is classic virtio ;) Do you have tried directly from the host with the rbd kernel module ? No don't know how

Re: problem with hanging cluster

2012-11-08 Thread Adam Ochmański
W dniu 08.11.2012 12:14, Adam Ochmański pisze: Hi, our test cluster going stuck every time when one of our osd host going down, when mising osd go to up state and recovery go to 100% cluster still not working propertly. I forgot add version of ceph i use: 0.53-422-g2d20f3a -- Best, blink --

Re: clock syncronisation

2012-11-08 Thread Wido den Hollander
On 08-11-12 10:04, Stefan Priebe - Profihost AG wrote: Hello list, is there any prefered way to use clock syncronisation? I've tried running openntpd and ntpd on all servers but i'm still getting: 2012-11-08 09:55:38.255928 mon.0 [WRN] message from mon.2 was stamped 0.063136s in the future,

Re: clock syncronisation

2012-11-08 Thread Andrey Korolyov
On Thu, Nov 8, 2012 at 4:00 PM, Wido den Hollander w...@widodh.nl wrote: On 08-11-12 10:04, Stefan Priebe - Profihost AG wrote: Hello list, is there any prefered way to use clock syncronisation? I've tried running openntpd and ntpd on all servers but i'm still getting: 2012-11-08

Re: clock syncronisation

2012-11-08 Thread Stefan Priebe - Profihost AG
Am 08.11.2012 13:00, schrieb Wido den Hollander: On 08-11-12 10:04, Stefan Priebe - Profihost AG wrote: Hello list, is there any prefered way to use clock syncronisation? I've tried running openntpd and ntpd on all servers but i'm still getting: 2012-11-08 09:55:38.255928 mon.0 [WRN]

[PATCH 1/2] mds: Don't expire log segment before it's fully flushed

2012-11-08 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com Expiring log segment before it's fully flushed may cause various issues during log replay. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/leveldb | 2 +- src/mds/MDLog.cc | 8 +--- 2 files changed, 6 insertions(+), 4 deletions(-) diff

Re: less cores more iops / speed

2012-11-08 Thread Mark Nelson
On 11/08/2012 02:45 AM, Stefan Priebe - Profihost AG wrote: Am 08.11.2012 01:59, schrieb Mark Nelson: There's also the context switching overhead. It'd be interesting to know how much the writer processes were shifting around on cores. What do you mean by that? I'm talking about the KVM guest

Re: SSD journal suggestion

2012-11-08 Thread Atchley, Scott
On Nov 8, 2012, at 3:22 AM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: 2012/11/8 Mark Nelson mark.nel...@inktank.com: I haven't done much with IPoIB (just RDMA), but my understanding is that it tends to top out at like 15Gb/s. Some others on this mailing list can probably

[PATCH 0/3] rbd: a few picky changes

2012-11-08 Thread Alex Elder
These three changes are pretty trivial. -Alex [PATCH 1/3] rbd: standardize rbd_request variable names [PATCH 2/3] rbd: standardize ceph_osd_request variable names [PATCH 3/3] rbd: be picky about osd request status type -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the

[PATCH 1/3] rbd: standardize rbd_request variable names

2012-11-08 Thread Alex Elder
There are two names used for items of rbd_request structure type: req and req_data. The former name is also used to represent items of pointers to struct ceph_osd_request. Change all variables that have these names so they are instead called rbd_req consistently. Signed-off-by: Alex Elder

[PATCH 2/3] rbd: standardize ceph_osd_request variable names

2012-11-08 Thread Alex Elder
There are spots where a ceph_osds_request pointer variable is given the name req. Since we're dealing with (at least) three types of requests (block layer, rbd, and osd), I find this slightly distracting. Change such instances to use osd_req consistently to make the abstraction represented a

[PATCH 3/3] rbd: be picky about osd request status type

2012-11-08 Thread Alex Elder
The result field in a ceph osd reply header is a signed 32-bit type, but rbd code often casually uses int to represent it. The following changes the types of variables that handle this result value to be s32 instead of int to be completely explicit about it. Only at the point we pass that result

[PATCH 0/2] rbd: clean up rbd_rq_fn()

2012-11-08 Thread Alex Elder
Some refactoring to improve readability.-Alex [PATCH 1/2] rbd: encapsulate handling for a single request [PATCH 2/2] rbd: a little more cleanup of rbd_rq_fn() -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More

[PATCH 2/2] rbd: a little more cleanup of rbd_rq_fn()

2012-11-08 Thread Alex Elder
Now that a big hunk in the middle of rbd_rq_fn() has been moved into its own routine we can simplify it a little more. Signed-off-by: Alex Elder el...@inktank.com --- drivers/block/rbd.c | 50 +++--- 1 file changed, 23 insertions(+), 27 deletions(-)

[PATCH] rbd: end request on error in rbd_do_request() caller

2012-11-08 Thread Alex Elder
Only one of the three callers of rbd_do_request() provide a collection structure to aggregate status. If an error occurs in rbd_do_request(), have the caller take care of calling rbd_coll_end_req() if necessary in that one spot. Signed-off-by: Alex Elder el...@inktank.com ---

Re: SSD journal suggestion

2012-11-08 Thread Mark Nelson
On 11/08/2012 07:55 AM, Atchley, Scott wrote: On Nov 8, 2012, at 3:22 AM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: 2012/11/8 Mark Nelson mark.nel...@inktank.com: I haven't done much with IPoIB (just RDMA), but my understanding is that it tends to top out at like 15Gb/s.

Re: extreme ceph-osd cpu load for rand. 4k write

2012-11-08 Thread Stefan Priebe - Profihost AG
Is there any way to find out why a ceph-osd process takes around 10 times more load on rand 4k writes than on 4k reads? Stefan Am 07.11.2012 21:41, schrieb Stefan Priebe: Hello list, whiling benchmarking i was wondering, why the ceph-osd load is so extreme high while having random 4k write

Re: extreme ceph-osd cpu load for rand. 4k write

2012-11-08 Thread Sage Weil
On Thu, 8 Nov 2012, Stefan Priebe - Profihost AG wrote: Is there any way to find out why a ceph-osd process takes around 10 times more load on rand 4k writes than on 4k reads? Something like perf or oprofile is probably your best bet. perf can be tedious to deploy, depending on where your

Re: problems creating new ceph cluster when using journal on block device

2012-11-08 Thread Travis Rhoden
[osd] osd journal size = 4000 Not sure if this is the problem, but when using a block device you don't have to specify the size for the journal. So happy to know that, Wido! I had hoped there was a way to skip that. Tried without it -- only difference in the logs was seeing that

Re: SSD journal suggestion

2012-11-08 Thread Atchley, Scott
On Nov 8, 2012, at 10:00 AM, Scott Atchley atchle...@ornl.gov wrote: On Nov 8, 2012, at 9:39 AM, Mark Nelson mark.nel...@inktank.com wrote: On 11/08/2012 07:55 AM, Atchley, Scott wrote: On Nov 8, 2012, at 3:22 AM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: 2012/11/8

Re: less cores more iops / speed

2012-11-08 Thread Stefan Priebe - Profihost AG
Am 08.11.2012 14:19, schrieb Mark Nelson: On 11/08/2012 02:45 AM, Stefan Priebe - Profihost AG wrote: Am 08.11.2012 01:59, schrieb Mark Nelson: There's also the context switching overhead. It'd be interesting to know how much the writer processes were shifting around on cores. What do you

Re: extreme ceph-osd cpu load for rand. 4k write

2012-11-08 Thread Stefan Priebe - Profihost AG
Am 08.11.2012 16:01, schrieb Mark Nelson: Hi Stefan, You might want to try running sysprof or perf while the OSDs are running during the tests and see where CPU time is being spent. Also, how are you determining how much CPU usage is being used? Hi Mark, have a 300MB perf.data file and no

Re: less cores more iops / speed

2012-11-08 Thread Alexandre DERUMIER
So it is a problem of KVM which let's the processes jump between cores a lot. maybe numad from redhat can help ? http://fedoraproject.org/wiki/Features/numad It's try to keep process on same numa node and I think it's also doing some dynamic pinning. - Mail original - De: Stefan

Re: extreme ceph-osd cpu load for rand. 4k write

2012-11-08 Thread Mark Nelson
On 11/08/2012 09:45 AM, Stefan Priebe - Profihost AG wrote: Am 08.11.2012 16:01, schrieb Sage Weil: On Thu, 8 Nov 2012, Stefan Priebe - Profihost AG wrote: Is there any way to find out why a ceph-osd process takes around 10 times more load on rand 4k writes than on 4k reads? Something like

Re: SSD journal suggestion

2012-11-08 Thread Andrey Korolyov
On Thu, Nov 8, 2012 at 7:02 PM, Atchley, Scott atchle...@ornl.gov wrote: On Nov 8, 2012, at 10:00 AM, Scott Atchley atchle...@ornl.gov wrote: On Nov 8, 2012, at 9:39 AM, Mark Nelson mark.nel...@inktank.com wrote: On 11/08/2012 07:55 AM, Atchley, Scott wrote: On Nov 8, 2012, at 3:22 AM,

Re: some snapshot problems

2012-11-08 Thread Sage Weil
Hi Liu, Sorry for the late reply; I have had a very busy week. :) On Thu, 1 Nov 2012, liu yaqi wrote: Dear Mr.Weil I am a student of Institute of Computing Technology, Chinese Academy of Sciences, and I am learning the realization of snapshot in ceph system. There are sometings that

Review request for branch wip-java-tests

2012-11-08 Thread Joe Buck
I have a branch for review that reworks that tests for the java bindings and builds them if both --enable-cephfs-java and --with-debug are specified. The tests can also be built and run via ant. Branch name is wip-java-tests. Regards, -Joe Buck -- To unsubscribe from this list: send the line

Re: Review request for branch wip-java-tests

2012-11-08 Thread Sage Weil
Merged, thanks! sage On Thu, 8 Nov 2012, Joe Buck wrote: I have a branch for review that reworks that tests for the java bindings and builds them if both --enable-cephfs-java and --with-debug are specified. The tests can also be built and run via ant. Branch name is wip-java-tests.

Re: problems creating new ceph cluster when using journal on block device

2012-11-08 Thread Travis Rhoden
Solved! I stumbled into the solution while switching from block device to a file. I was being bit by running mkcephfs multiple times -- it wasn't really failing on the journal, it was failing because the OSD data disk had been initialized before. I couldn't see that until I used a file for the

Re: problems creating new ceph cluster when using journal on block device

2012-11-08 Thread Mark Nelson
On 11/08/2012 11:36 AM, Travis Rhoden wrote: Solved! I stumbled into the solution while switching from block device to a file. I was being bit by running mkcephfs multiple times -- it wasn't really failing on the journal, it was failing because the OSD data disk had been initialized before. I

Re: SSD journal suggestion

2012-11-08 Thread Atchley, Scott
On Nov 8, 2012, at 11:19 AM, Andrey Korolyov and...@xdel.ru wrote: On Thu, Nov 8, 2012 at 7:02 PM, Atchley, Scott atchle...@ornl.gov wrote: On Nov 8, 2012, at 10:00 AM, Scott Atchley atchle...@ornl.gov wrote: On Nov 8, 2012, at 9:39 AM, Mark Nelson mark.nel...@inktank.com wrote: On

Re: Ignoresync hack no longer applies on 3.6.5

2012-11-08 Thread Nick Bartos
Sorry about that, I think it got chopped. Here's a full trace from another run, using kernel 3.6.6 and definitely has the patch applied: https://gist.github.com/4041120 There are no instances of sync_fs_one_sb skipping in the logs. On Mon, Nov 5, 2012 at 1:29 AM, Sage Weil s...@inktank.com

Re: SSD journal suggestion

2012-11-08 Thread Joseph Glanville
On 9 November 2012 02:00, Atchley, Scott atchle...@ornl.gov wrote: On Nov 8, 2012, at 9:39 AM, Mark Nelson mark.nel...@inktank.com wrote: On 11/08/2012 07:55 AM, Atchley, Scott wrote: On Nov 8, 2012, at 3:22 AM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: 2012/11/8 Mark

Re: extreme ceph-osd cpu load for rand. 4k write

2012-11-08 Thread Stefan Priebe
Am 08.11.2012 17:06, schrieb Mark Nelson: On 11/08/2012 09:45 AM, Stefan Priebe - Profihost AG wrote: Am 08.11.2012 16:01, schrieb Sage Weil: On Thu, 8 Nov 2012, Stefan Priebe - Profihost AG wrote: Is there any way to find out why a ceph-osd process takes around 10 times more load on rand 4k

Re: unexpected problem with radosgw fcgi

2012-11-08 Thread Yehuda Sadeh
On Wed, Nov 7, 2012 at 6:16 AM, Sławomir Skowron szi...@gmail.com wrote: I have realize that requests from fastcgi in nginx from radosgw returning: HTTP/1.1 200, not a HTTP/1.1 200 OK Any other cgi that i run, for example php via fastcgi return this like RFC says, with OK. Is someone

Re: extreme ceph-osd cpu load for rand. 4k write

2012-11-08 Thread Josh Durgin
On 11/08/2012 01:27 PM, Stefan Priebe wrote: Am 08.11.2012 17:06, schrieb Mark Nelson: On 11/08/2012 09:45 AM, Stefan Priebe - Profihost AG wrote: Am 08.11.2012 16:01, schrieb Sage Weil: On Thu, 8 Nov 2012, Stefan Priebe - Profihost AG wrote: Is there any way to find out why a ceph-osd

Re: less cores more iops / speed

2012-11-08 Thread Andrey Korolyov
On Thu, Nov 8, 2012 at 7:53 PM, Alexandre DERUMIER aderum...@odiso.com wrote: So it is a problem of KVM which let's the processes jump between cores a lot. maybe numad from redhat can help ? http://fedoraproject.org/wiki/Features/numad It's try to keep process on same numa node and I think

Re: SSD journal suggestion / rsockets

2012-11-08 Thread Joseph Glanville
On 9 November 2012 08:21, Dieter Kasper d.kas...@kabelmail.de wrote: Joseph, I've downloaded and read the presentation from 'Sean Hefty / Intel Corporation' about rsockets, which sounds very promising to me. Can you please teach me how to get access to the rsockets source ? Thanks,

Re: unexpected problem with radosgw fcgi

2012-11-08 Thread Sławomir Skowron
Ok, i will digg in nginx, thanks. Dnia 8 lis 2012 o godz. 22:48 Yehuda Sadeh yeh...@inktank.com napisał(a): On Wed, Nov 7, 2012 at 6:16 AM, Sławomir Skowron szi...@gmail.com wrote: I have realize that requests from fastcgi in nginx from radosgw returning: HTTP/1.1 200, not a HTTP/1.1 200 OK

Re: extreme ceph-osd cpu load for rand. 4k write

2012-11-08 Thread Stefan Priebe
Am 08.11.2012 22:58, schrieb Mark Nelson: Also, I'm not sure what version you are running, but you may want to try testing master and see if that helps. Sam has done some work on our threading and locking code that might help. This is git master (two hours old). Stefan -- To unsubscribe from

rbd map command hangs for 15 minutes during system start up

2012-11-08 Thread Mandell Degerness
We are seeing a somewhat random, but frequent hang on our systems during startup. The hang happens at the point where an rbd map rbdvol command is run. I've attached the ceph logs from the cluster. The map command happens at Nov 8 18:41:09 on server 172.18.0.15. The process which hung can be

Review request branch wip-java-test

2012-11-08 Thread Joe Buck
I have a 3 line change to the file qa/workunits/libcephfs-java/test.sh that tweaks how LD_LIBRARY_PATH is set for the test execution. The branch is wip-java-test in ceph.git. Best, -Joe Buck-- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to

Re: trying to import crushmap results in max_devices osdmap max_osd

2012-11-08 Thread Josh Durgin
On 11/07/2012 07:28 AM, Stefan Priebe - Profihost AG wrote: Hello, i've added two nodes with 4 devices each and modified the crushmap. But importing the new map results in: crushmap max_devices 55 osdmap max_osd 35 What's wrong? I think this is an obsolete check since

Re: SSD journal suggestion / rsockets

2012-11-08 Thread Dieter Kasper
Joseph, I've downloaded and read the presentation from 'Sean Hefty / Intel Corporation' about rsockets, which sounds very promising to me. Can you please teach me how to get access to the rsockets source ? Thanks, -Dieter On Thu, Nov 08, 2012 at 09:12:45PM +0100, Joseph Glanville wrote: On 9

Re: bobtail timing

2012-11-08 Thread Yehuda Sadeh
On Wed, Oct 31, 2012 at 1:46 PM, Sage Weil s...@inktank.com wrote: I would like to freeze v0.55, the bobtail stable release, at the end of next week. If there is any functionality you are working on that should be included, we need to get it into master (preferably well) before that. There

Re: rbd map command hangs for 15 minutes during system start up

2012-11-08 Thread Josh Durgin
On 11/08/2012 02:10 PM, Mandell Degerness wrote: We are seeing a somewhat random, but frequent hang on our systems during startup. The hang happens at the point where an rbd map rbdvol command is run. I've attached the ceph logs from the cluster. The map command happens at Nov 8 18:41:09 on

Re: bobtail timing

2012-11-08 Thread Samuel Just
I've got wip_recovery_qos and wip_persist_missing that should go into bobtail. wip_recovery_qos passed regression (mostly, failures due to fsx, a bug fixed in master, and timeouts waiting for machines), and is waiting on review. wip_persist_missing has a teuthology test I'll push tomorrow

[PATCH] vstart: allow minimum pool size of one

2012-11-08 Thread Noah Watkins
I needed this patch after some simple 1 OSD vstart environments refused to allow clients to connect. -- A minimum pool size of 2 was introduced by 13486857cf. This sets the minimum to one so that basic vstart environments work. Signed-off-by: Noah Watkins noahwatk...@gmail.com diff