On 10/13/2016 12:56 PM, Marco Gaiarin wrote:
> Mandi! Alwin Antreich
> In chel di` si favelave...
>> I have to ask a more general question here, why are you putting the journal
>> on a RAID1?
> For safety?
>> For better performance and less
>> complexity the journal should reside on standalone SSDs. With the RAID1 you
>> limit the speed of the journal, then it
>> would be better to reside the journal on the OSD disks itself.
> I know that. But i'm setting up a little ceph cluster, using as network
> backend only gigabit ethernet, so my bottleneck is mostly the 50GB/s of
> the network.
> Also, i cannot efford to buy a SSD for every OSD, and using the same SSD
> for many/all the OSD in the box is a big SPoF.
To use a SSD for OSD journals is no spof, of course, when the SSD fails then
the OSDs connected to this SSD will be
down, but the cluster will recover the data that is not redundant anymore onto
the remaining OSDs in the cluster,
auto-magicly. That's the same thing as if the whole machine would die, then
everything would need to recover the same way.
The ratio for SDD journal to OSD disks, can be done by a simple dd test. Take
the write speed of the SSD and divide it
through the write speed of the OSD, then you will get the max number of disks
that can be used on one SSD, without
expecting the SSD to be the bottleneck.
eg: 500MB/s write speed for the SSD and divide it by OSD write speed, 100MB/s =
> So i'm using (software) raid1, confident enought that the penalty of
> the raid cannot impact so much overral.
As in the text above, I wouldn't recommend adding another layer in between.
Makes harder to troubleshoot.
>>> The proxmox correctly see the 4 OSD candidate disks, but does not see the
>>> journal partition. So i've used commandline:
>> pveceph is a wrapper around ceph tools and a dependency for pveceph is
>> smartmontools. So mdadm doesn't list smart
>> attributes and this might be why it's not seeing it. But this is more a
>> guess and should be verified by someone who
>> knows better.
> I suppose that.
> I'm a bit unconfident with the error/warning message printed ad the
> general behaviour.
> AFAI've understood, ceph can use for journal disks, partition and even
> Probably 'md' devices are nor partition nor disks, and ceph get
> Would be better, for example, to simply put journal on files? Eg,
> format the md device, mount it and create inside the journal files?
You can use files as journal disks, but if recall right, then there was a
thread on the ceph mailing list discussing
this and its limitations.
pve-user mailing list