Re: Is Ceph recovery able to handle massive crash

2013-01-09 Thread Denis Fondras
Hello, Le 09/01/2013 00:36, Gregory Farnum a écrit : It looks like it's taking approximately forever for writes to complete to disk; it's shutting down because threads are going off to write and not coming back. If you set osd op thread timeout = 60 (or 120) it might manage to churn through,

RE: OSD's slow down to a crawl

2013-01-09 Thread Matthew Anderson
Hi Sage, Sorry for the late follow up, I've been on a bit of a testing rampage and managed to somewhat sort the problem. Most of the problems appears to be from the 3.7.1 kernel. It seems to have a fairly big issue with its networking stack that was causing Ceph's network operations to hang.

Re: Crushmap Design Question

2013-01-09 Thread Wido den Hollander
Hi, On 01/09/2013 01:53 AM, Chen, Xiaoxi wrote: Hi, Setting rep size to 3 only make the data triple-replication, that means when you fail all OSDs in 2 out of 3 DCs, the data still accessable. But Monitor is another story, for monitor clusters with 2N+1 nodes, it require at

Are there significant performance enhancements in 0.56.x to be expected soon or planned in the near future?

2013-01-09 Thread Lachfeld, Jutta
Hi all, in expectation of better performance, we are just switching from CEPH version 0.48 to 0.56.1 for comparisons between Hadoop with HDFS and Hadoop with CEPH FS. We are now wondering whether there are currently any development activities concerning further significant performance

Re: Are there significant performance enhancements in 0.56.x to be expected soon or planned in the near future?

2013-01-09 Thread Wido den Hollander
On 01/09/2013 01:51 PM, Lachfeld, Jutta wrote: Hi all, in expectation of better performance, we are just switching from CEPH version 0.48 to 0.56.1 for comparisons between Hadoop with HDFS and Hadoop with CEPH FS. We are now wondering whether there are currently any development activities

Re: Are there significant performance enhancements in 0.56.x to be expected soon or planned in the near future?

2013-01-09 Thread Christopher Kunz
Hi, Yes, 0.56(.1) has a significant performance increase compared to 0.48 That is not exactly the OP's question, though. If I understand correctly, she is concerned about ongoing performance improvements within the bobtail branch, i.e. between 0.56.1 and 0.56.X (with X1). Jutta, what kind of

Re: Are there significant performance enhancements in 0.56.x to be expected soon or planned in the near future?

2013-01-09 Thread Dennis Jacobfeuerborn
On 01/09/2013 01:51 PM, Lachfeld, Jutta wrote: Hi all, in expectation of better performance, we are just switching from CEPH version 0.48 to 0.56.1 for comparisons between Hadoop with HDFS and Hadoop with CEPH FS. We are now wondering whether there are currently any development

Re: OSD's slow down to a crawl

2013-01-09 Thread Mark Nelson
On 01/09/2013 02:52 AM, Matthew Anderson wrote: Hi Sage, Sorry for the late follow up, I've been on a bit of a testing rampage and managed to somewhat sort the problem. Most of the problems appears to be from the 3.7.1 kernel. It seems to have a fairly big issue with its networking stack

Re: Are there significant performance enhancements in 0.56.x to be expected soon or planned in the near future?

2013-01-09 Thread Mark Kampe
Performance work is always ongoing, but I am not aware of any significant imminent enhancements. We are just wrapping up an investigation of the effects of various file system and I/O options on different types of traffic, and the next major area of focus will be RADOS Block Device and VMs over

Re: Are there significant performance enhancements in 0.56.x to be expected soon or planned in the near future?

2013-01-09 Thread Mark Nelson
On 01/09/2013 06:51 AM, Lachfeld, Jutta wrote: Hi all, in expectation of better performance, we are just switching from CEPH version 0.48 to 0.56.1 for comparisons between Hadoop with HDFS and Hadoop with CEPH FS. We are now wondering whether there are currently any development activities

Re: Windows port

2013-01-09 Thread Florian Haas
On Tue, Jan 8, 2013 at 3:00 PM, Dino Yancey dino2...@gmail.com wrote: Hi, I am also curious if a Windows port, specifically the client-side, is on the roadmap. This is somewhat OT from the original post, but if all you're interested is using RBD block storage from Windows, you can already do

RE: Crushmap Design Question

2013-01-09 Thread Moore, Shawn M
Correct, it never went below N+1 (3 total mons and 2 of them never went down). Several times in the past I verified that a pg was actually mapped to valid dc's with that command. I just wrote a quick script that will do this on the fly and after recovering the cluster last night, every pg has

Re: Crushmap Design Question

2013-01-09 Thread Joao Eduardo Luis
On 01/09/2013 08:59 AM, Wido den Hollander wrote: Hi, On 01/09/2013 01:53 AM, Chen, Xiaoxi wrote: Hi, Setting rep size to 3 only make the data triple-replication, that means when you fail all OSDs in 2 out of 3 DCs, the data still accessable. But Monitor is another story, for

RE: Usage of CEPH FS versa HDFS for Hadoop: TeraSort benchmark performance comparison issue

2013-01-09 Thread Lachfeld, Jutta
Hi Noah, the current content of the web page http://ceph.com/docs/master/cephfs/hadoop shows a configuration parameter ceph.object.size. Is it the CEPH equivalent to the HDFS block size parameter which I have been looking for? Does the parameter ceph.object.size apply to version 0.56.1? I

Re: OSD memory leaks?

2013-01-09 Thread Dave Spano
Thank you. I appreciate it! Dave Spano Optogenics Systems Administrator - Original Message - From: Sébastien Han han.sebast...@gmail.com To: Dave Spano dsp...@optogenics.com Cc: ceph-devel ceph-devel@vger.kernel.org, Samuel Just sam.j...@inktank.com Sent: Wednesday, January

Re: OSD crash, ceph version 0.56.1

2013-01-09 Thread Ian Pye
On Wed, Jan 9, 2013 at 4:38 PM, Sage Weil s...@inktank.com wrote: On Wed, 9 Jan 2013, Ian Pye wrote: Hi, Every time I try an bring up an OSD, it crashes and I get the following: error (121) Remote I/O error not handled on operation 20 This error code (EREMOTEIO) is not used by Ceph. What

Re: recoverying from 95% full osd

2013-01-09 Thread Roman Hlynovskiy
Hello again! I left the system in working state overnight and got it in a wierd state this morning: chef@ceph-node02:/var/log/ceph$ ceph -s health HEALTH_OK monmap e4: 3 mons at {a=192.168.7.11:6789/0,b=192.168.7.12:6789/0,c=192.168.7.13:6789/0}, election epoch 254, quorum 0,1,2 a,b,c

Re: [PATCH] configure.ac: check for org.junit.rules.ExternalResource

2013-01-09 Thread Danny Al-Gaaf
Am 10.01.2013 05:32, schrieb Gary Lowell: I have this patch, and the ones from Friday in the wip-rpm-update branch. Everything looks good except that we have the following new warning from configure: …. checking for kaffe... no checking for java... java checking for uudecode... no