Re: OSD crash on 0.48.2argonaut

2012-11-15 Thread Josh Durgin
On 11/14/2012 11:31 PM, eric_yh_c...@wiwynn.com wrote: Dear All: I met this issue on one of osd node. Is this a known issue? Thanks! ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe) 1: /usr/bin/ceph-osd() [0x6edaba] 2: (()+0xfcb0) [0x7f08b112dcb0] 3:

Re: improve speed with auth supported=none

2012-11-15 Thread Stefan Priebe - Profihost AG
Am 14.11.2012 14:24, schrieb Soporte: El 13/11/2012 04:52 a.m., Stefan Priebe escribió: Am 13.11.2012 08:42, schrieb Josh Durgin: On 11/12/2012 01:57 PM, Stefan Priebe wrote: Thanks, this gives another burst for iops. I'm now at 23.000 iops ;-) So for random 4k iops ceph auth and especially

Re: endless flying slow requests

2012-11-15 Thread Stefan Priebe - Profihost AG
Am 14.11.2012 15:59, schrieb Sage Weil: Hi Stefan, I would be nice to confirm that no clients are waiting on replies for these requests; currently we suspect that the OSD request tracking is the buggy part. If you query the OSD admin socket you should be able to dump requests and see the

Re: endless flying slow requests

2012-11-15 Thread Josh Durgin
On 11/15/2012 12:09 AM, Stefan Priebe - Profihost AG wrote: Am 14.11.2012 15:59, schrieb Sage Weil: Hi Stefan, I would be nice to confirm that no clients are waiting on replies for these requests; currently we suspect that the OSD request tracking is the buggy part. If you query the OSD admin

Re: OSD network failure

2012-11-15 Thread Josh Durgin
On 11/13/2012 06:15 AM, Gandalf Corvotempesta wrote: Hi, what happens in case of OSD network failure? Is ceph smart enough to isolate OSDs not synced? Should I use LACP in ODS network or a single 10GBe per server should be ok? LACP will need stackable switches and much more hardware investment.

Re: problem with ceph and btrfs patch: set journal_info in async trans commit worker

2012-11-15 Thread Stefan Priebe - Profihost AG
Hi Miao, Am 15.11.2012 06:18, schrieb Miao Xie: Hi, Stefan On wed, 14 Nov 2012 14:42:07 +0100, Stefan Priebe - Profihost AG wrote: Hello list, i wanted to try out ceph with latest vanilla kernel 3.7-rc5. I was seeing a massive performance degration. I see around 22x btrfs-endio-write

Re: [PATCH] make mkcephfs and init-ceph osd filesystem handling more flexible

2012-11-15 Thread Danny Al-Gaaf
Hi Sage, Am 15.11.2012 01:12, schrieb Sage Weil: Hi Danny, Have you had a chance to work on this? I'd like to include this in bobtail. If you don't have time we can go ahead an implement it, but I'd like avoid duplicating effort. I already work on it. Do you have a deadline for

Re: ceph-osd cpu usage

2012-11-15 Thread Alexandre DERUMIER
cpu usage is same for read and write ? - Mail original - De: Stefan Priebe - Profihost AG s.pri...@profihost.ag À: ceph-devel@vger.kernel.org Envoyé: Jeudi 15 Novembre 2012 11:56:37 Objet: ceph-osd cpu usage Hello list, my main problem right now is that ceph does not scale for

Re: endless flying slow requests

2012-11-15 Thread Sage Weil
On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote: Am 14.11.2012 15:59, schrieb Sage Weil: Hi Stefan, I would be nice to confirm that no clients are waiting on replies for these requests; currently we suspect that the OSD request tracking is the buggy part. If you query the OSD

Re: [PATCH] make mkcephfs and init-ceph osd filesystem handling more flexible

2012-11-15 Thread Sage Weil
On Thu, 15 Nov 2012, Danny Al-Gaaf wrote: Hi Sage, Am 15.11.2012 01:12, schrieb Sage Weil: Hi Danny, Have you had a chance to work on this? I'd like to include this in bobtail. If you don't have time we can go ahead an implement it, but I'd like avoid duplicating effort. I

Re: ceph-osd cpu usage

2012-11-15 Thread Mark Nelson
Out of curiosity, does it help much if you disable crc32c calculations? Use the nocrc option in your ceph.conf file. I've had my eye on crcutil as an alternative to how we do crc32c now. http://code.google.com/p/crcutil/ Mark On 11/15/2012 06:19 AM, Stefan Priebe - Profihost AG wrote: Am

Re: ceph-osd cpu usage

2012-11-15 Thread Sage Weil
On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote: Hello list, my main problem right now is that ceph does not scale for me (more vms using rbd). It does not scale as the ceph-osd is using all my CPU core all the time (8 cores) with just 4 SSDs. The SSDs are far away from being loaded.

Re: ceph-osd cpu usage

2012-11-15 Thread Stefan Priebe - Profihost AG
Am 15.11.2012 16:14, schrieb Sage Weil: On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote: Hello list, my main problem right now is that ceph does not scale for me (more vms using rbd). It does not scale as the ceph-osd is using all my CPU core all the time (8 cores) with just 4 SSDs.

Re: ceph-osd cpu usage

2012-11-15 Thread Stefan Priebe
Am 15.11.2012 16:12, schrieb Mark Nelson: Out of curiosity, does it help much if you disable crc32c calculations? Use the nocrc option in your ceph.conf file. I've had my eye on crcutil as an alternative to how we do crc32c now. http://code.google.com/p/crcutil/ Will try that how and where

new process: cherry-picking to stable releases

2012-11-15 Thread Sage Weil
If you are *ever* cherry-picking something to an older stable branch, please use git cherry-pick -x sha1 That will append a '(cherry-picked from )' message to the bottom of the commit, allowing us to always find the original commit that we are duplicating. This implies that we are

Re: new process: cherry-picking to stable releases

2012-11-15 Thread Alex Elder
On 11/15/2012 11:30 AM, Sage Weil wrote: If you are *ever* cherry-picking something to an older stable branch, please use git cherry-pick -x sha1 That will append a '(cherry-picked from )' message to the bottom of the commit, allowing us to always find the original commit that we

master = next

2012-11-15 Thread Stefan Priebe
Hello list, maybe i do not understand the difference between master and next but is it correct that the following commits are in next but NOT in master? b40387d msg/Pipe: fix leak of Authorizer 0fb23cf Merge remote-tracking branch 'gh/wip-3477' into next 12c2b7f msg/DispatchQueue: release

Re: master = next

2012-11-15 Thread Yehuda Sadeh
On Thu, Nov 15, 2012 at 11:17 AM, Stefan Priebe s.pri...@profihost.ag wrote: Hello list, maybe i do not understand the difference between master and next but is it correct that the following commits are in next but NOT in master? b40387d msg/Pipe: fix leak of Authorizer 0fb23cf Merge

Re: master = next

2012-11-15 Thread Sage Weil
On Thu, 15 Nov 2012, Stefan Priebe wrote: Hello list, maybe i do not understand the difference between master and next but is it correct that the following commits are in next but NOT in master? b40387d msg/Pipe: fix leak of Authorizer 0fb23cf Merge remote-tracking branch 'gh/wip-3477'

Re: ceph-osd cpu usage

2012-11-15 Thread Stefan Priebe
Hi Mark, Am 15.11.2012 16:12, schrieb Mark Nelson: Out of curiosity, does it help much if you disable crc32c calculations? Use the nocrc option in your ceph.conf file. I've had my eye on crcutil as an alternative to how we do crc32c now. http://code.google.com/p/crcutil/ Mark This changes

Re: ceph-osd cpu usage

2012-11-15 Thread Stefan Priebe
Am 15.11.2012 16:14, schrieb Sage Weil: On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote: Hmm, most significant time seems to be in the allocator and doing fsetxattr(2) (10%!). Also some path traversal stuff. Yes fsetxattr seems to be CPU hungry. Can you try the wip-fd-simple-cache

Re: poor performance

2012-11-15 Thread Gregory Farnum
On Sun, Nov 4, 2012 at 7:13 AM, Aleksey Samarin nrg3...@gmail.com wrote: What may be possible solutions? Update centos to 6.3? From what I've heard the RHEL libc doesn't support the syncfs syscall (even though the kernel does have it). :( So you'd need to make sure the kernel supports it and

ceph-osd crashing (os/FileStore.cc: 4500: FAILED assert(replaying))

2012-11-15 Thread Stefan Priebe
Hello list, actual master incl. upstream/wip-fd-simple-cache results in this crash when i try to start some of my osds (others work fine) today on multiple nodes: -2 2012-11-15 22:04:09.226945 7f3af1c7a780 0 osd.52 pg_epoch: 657 pg[3.3b( v 632'823 (632'823,632'823] n=5 ec=17 les/c

Re: mon can't start

2012-11-15 Thread Gregory Farnum
Sorry we missed this — everybody's been very busy! If you're still having trouble, can you install the ceph debug symbol packages and get this again? The backtrace isn't very helpful without that, unfortunately. -Greg On Wed, Oct 24, 2012 at 7:21 PM, jie sun 0maid...@gmail.com wrote: Hi, My

RFC: incompatible change to rbd tool behavior on copy

2012-11-15 Thread Dan Mick
A user has noticed some surprising behavior with the rbd command-line tool: with rbd copy, if the destination pool is not set (either with --dest-pool or by specifying destpool/image), then it is assumed to be the source pool name. This seems to me only marginally convenient, and much more

Re: Authorization issues in the 0.54

2012-11-15 Thread Andrey Korolyov
On Thu, Nov 15, 2012 at 5:03 PM, Andrey Korolyov and...@xdel.ru wrote: On Thu, Nov 15, 2012 at 5:12 AM, Yehuda Sadeh yeh...@inktank.com wrote: On Wed, Nov 14, 2012 at 4:20 AM, Andrey Korolyov and...@xdel.ru wrote: Hi, In the 0.54 cephx is probably broken somehow: $ ceph auth add

Re: changed rbd cp behavior in 0.53

2012-11-15 Thread Dan Mick
It's a bit different with rbd, as there's no current dir, but I do tend to agree that like every other place pool defaults, which means 'rbd' literally is more correct. See my RFC from today: RFC: incompatible change to rbd tool behavior on copy On 11/15/2012 08:43 AM, Deb Barba wrote:

Re: changed rbd cp behavior in 0.53

2012-11-15 Thread Andrey Korolyov
On Thu, Nov 15, 2012 at 8:43 PM, Deb Barba deb.ba...@inktank.com wrote: This is not common UNIX/posix behavior. if you just give the source a file name, it should assume . (current directory) as it's location, not whatever path you started from. I would expect most UNIX users would be losing

Re: rbd map command hangs for 15 minutes during system start up

2012-11-15 Thread Nick Bartos
Sorry I guess this e-mail got missed. I believe those patches came from the ceph/linux-3.5.5-ceph branch. I'm now using the wip-3.5 branch patches, which seem to all be fine. We'll stick with 3.5 and this backport for now until we can figure out what's wrong with 3.6. I typically ignore the

Re: rbd map command hangs for 15 minutes during system start up

2012-11-15 Thread Sage Weil
On Thu, 15 Nov 2012, Nick Bartos wrote: Sorry I guess this e-mail got missed. I believe those patches came from the ceph/linux-3.5.5-ceph branch. I'm now using the wip-3.5 branch patches, which seem to all be fine. We'll stick with 3.5 and this backport for now until we can figure out

Re: osd not in tree

2012-11-15 Thread Josh Durgin
On 11/15/2012 11:21 PM, Drunkard Zhang wrote: I installed mon x1, mds x1 and osd x11 in one host, then add some osd from other hosts, But they are not in osd tree, also not usable, how can I fix this? The crush command I used: ceph osd crush set 11 osd.11 3 pool=data datacenter=dh-1L,

Re: poor performance

2012-11-15 Thread Aleksey Samarin
Thanks for your reply! I was easier to change rhel on ubuntu. Now everything is fast and stable! :) If interested can attach logs. All the best, Alex! 2012/11/16 Gregory Farnum g...@inktank.com: On Sun, Nov 4, 2012 at 7:13 AM, Aleksey Samarin nrg3...@gmail.com wrote: What may be possible