Re: Large numbers of OSD per node

2012-11-06 Thread Stefan Kleijkers
On 11/06/2012 11:24 AM, Gandalf Corvotempesta wrote: 2012/11/6 Wido den Hollander w...@widodh.nl: The setup described on that page has 90 nodes, so one node failing is a little over 1% of the cluster which fails. I think i'm missing something. In case of a failure, they will always have to

Re: Large numbers of OSD per node

2012-11-06 Thread Stefan Kleijkers
On 11/06/2012 12:31 PM, Gandalf Corvotempesta wrote: 2012/11/6 Stefan Kleijkers ste...@unilogicnetworks.net: Well you have to keep in mind that when a node fails the PG's that resided on that node have to be redistributed over all the other nodes. So you begin moving about 1% of the data

Re: Ceph journal

2012-10-31 Thread Stefan Kleijkers
Hello, On 10/31/2012 10:24 PM, Tren Blackburn wrote: On Wed, Oct 31, 2012 at 2:18 PM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: In a multi replica cluster (for example, replica = 3) is safe to set journal on a tmpfs? As fa as I understood with journal enabled all writes are

Re: Ceph journal

2012-10-31 Thread Stefan Kleijkers
Hello, On 10/31/2012 10:58 PM, Gandalf Corvotempesta wrote: 2012/10/31 Tren Blackburn t...@eotnetworks.com: Unless you're using btrfs which writes to the journal and osd fs concurrently, if you lose the journal device (such as due to a reboot), you've lost the osd device, requiring it to be

Re: SAS disks for OSD's

2012-04-23 Thread Stefan Kleijkers
Hello, I don't think you will see a great difference in performance between sas and sata disks, because the connection type is only a minor point in performance. There are other factors that impact the performance a lot more. How many RPMs are the disks? The more RPM the lower the average

Re: Partition for OSD journal

2012-04-12 Thread Stefan Kleijkers
Hello, Just partition the disk, like sda1, sda2, etc and add the following to your config: [osd.0] osd journal = /dev/sda1 [osd.1] osd journal = /dev/sda2 etc... So you can point the osd journal to a file or to a device. The negative side to this is you have to

Re: Can we have mutiple OSD in a single machine

2012-04-11 Thread Stefan Kleijkers
Hello, Yes that's no problem. I'm using that configuration for some time now. Just generate a config with multiple OSD clauses with the same node/host. With the newer ceph version mkcephfs is smart enough to detect the osd's on the same node and will generate a crushmap whereby the objects

Re: OSD suicide

2012-04-03 Thread Stefan Kleijkers
Hello Vladimir, Well in that case you could try BTRFS. With BTRFS it's possible to grab all the disks in a node together in a RAID0/RAID1/RAID10 configuration. So you can run one or a few OSDs per node. But I would recommend the newest kernel possible. I haven't tried with the 3.3 range, but

Re: OSD suicide

2012-04-02 Thread Stefan Kleijkers
Hello, A while back I had the same errors you are seeing. I had these problems only when using mdraid. After doing IO for some time the IO stalled and in most cases if you look at the cepg-osd daemon it's in D mode (waiting for IO). Also if you look with top you notice a very high load and IO

Re: OSD hit suicide timeout

2011-11-11 Thread Stefan Kleijkers
Hello Christian, I have the same problem for some time now, but I'm in doubt if it's Ceph related. With my setup it looks like the IO stalls, ofcourse after some time the osd kills itself. Do you use BTRFS as local filesystem? If so, do you get WARNING: at fs/btrfs/inode.c:2198

Re: 0.37 crash

2011-10-20 Thread Stefan Kleijkers
Hello, I got the exact same problem. Upgraded from 0.36 to 0.37 and one of the two osds wouldn't start. In the log of the osd I also found the same error as below. The ceph-osd had status D (with ps, which is uninterruptable sleep) and I see a high IO wait with top. Also I noticed a lot of