Re: [ceph-users] Issue with journal on another drive

2015-09-30 Thread Jan Schermer
I have some experience with Kingstons - which model do you plan to use?

Shorter version: don't use Kingstons. For anything. Ever.

Jan

> On 30 Sep 2015, at 11:24, Andrija Panic  wrote:
> 
> Make sure to check this blog page 
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>  
> 
> 
> Since Im not sure if you are playing arround with CEPH, or plan it for 
> production and good performance.
> My experience SSD as journal: SSD Samsung 850 PRO = 200 IOPS sustained 
> writes, vs Intel S3500 18.000 IOPS sustained writes - so you understand the 
> difference,,,
> 
> regards
> 
> On 30 September 2015 at 11:17, Jiri Kanicky  > wrote:
> Thanks to all for responses. Great thread with a lot of info.
> 
> I will go with the 3 partitions on Kingstone SDD for 3 OSDs on each node.
> 
> Thanks
> Jiri
> 
> On 30/09/2015 00:38, Lionel Bouton wrote:
> Hi,
> 
> Le 29/09/2015 13:32, Jiri Kanicky a écrit :
> Hi Lionel.
> 
> Thank you for your reply. In this case I am considering to create
> separate partitions for each disk on the SSD drive. Would be good to
> know what is the performance difference, because creating partitions
> is kind of waste of space.
> The difference is hard to guess : filesystems need more CPU power than
> raw block devices for example, so if you don't have much CPU power this
> can make a significant difference. Filesystems might put more load on
> our storage too (for example ext3/4 with data=journal will at least
> double the disk writes). So there's a lot to consider and nothing will
> be faster for journals than a raw partition. LVM logical volumes come a
> close second behind because usually (if you simply use LVM to create
> your logical volumes and don't try to use anything else like snapshots)
> they don't change access patterns and almost don't need any CPU power.
> 
> One more question, is it a good idea to move journal for 3 OSDs to a
> single SSD considering if SSD fails the whole node with 3 HDDs will be
> down?
> If your SSDs are working well with Ceph and aren't cheap models dying
> under heavy writes, yes. I use one 200GB DC3710 SSD for 6 7200rpm SATA
> OSDs (using 60GB of it for the 6 journals) and it works very well (they
> were a huge performance boost compared to our previous use of internal
> journals).
> Some SSDs are slower than HDDs for Ceph journals though (there has been
> a lot of discussions on this subject on this mailing list).
> 
> Thinking of it, leaving journal on each OSD might be safer, because
> journal on one disk does not affect other disks (OSDs). Or do you
> think that having the journal on SSD is better trade off?
> You will put significantly more stress on your HDD leaving journal on
> them and good SSDs are far more robust than HDDs so if you pick Intel DC
> or equivalent SSD for journal your infrastructure might even be more
> robust than one using internal journals (HDDs are dropping like flies
> when you have hundreds of them). There are other components able to take
> down all your OSDs : the disk controller, the CPU, the memory, the power
> supply, ... So adding one robust SSD shouldn't change the overall
> availabilty much (you must check their wear level and choose the models
> according to the amount of writes you want them to support over their
> lifetime though).
> 
> The main reason for journals on SSD is performance anyway. If your setup
> is already fast enough without them, I wouldn't try to add SSDs.
> Otherwise, if you can't reach the level of performance needed by adding
> the OSDs already needed for your storage capacity objectives, go SSD.
> 
> Best regards,
> 
> Lionel
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> 
> 
> -- 
> 
> Andrija Panić
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-30 Thread Jiri Kanicky

Thanks to all for responses. Great thread with a lot of info.

I will go with the 3 partitions on Kingstone SDD for 3 OSDs on each node.

Thanks
Jiri

On 30/09/2015 00:38, Lionel Bouton wrote:

Hi,

Le 29/09/2015 13:32, Jiri Kanicky a écrit :

Hi Lionel.

Thank you for your reply. In this case I am considering to create
separate partitions for each disk on the SSD drive. Would be good to
know what is the performance difference, because creating partitions
is kind of waste of space.

The difference is hard to guess : filesystems need more CPU power than
raw block devices for example, so if you don't have much CPU power this
can make a significant difference. Filesystems might put more load on
our storage too (for example ext3/4 with data=journal will at least
double the disk writes). So there's a lot to consider and nothing will
be faster for journals than a raw partition. LVM logical volumes come a
close second behind because usually (if you simply use LVM to create
your logical volumes and don't try to use anything else like snapshots)
they don't change access patterns and almost don't need any CPU power.


One more question, is it a good idea to move journal for 3 OSDs to a
single SSD considering if SSD fails the whole node with 3 HDDs will be
down?

If your SSDs are working well with Ceph and aren't cheap models dying
under heavy writes, yes. I use one 200GB DC3710 SSD for 6 7200rpm SATA
OSDs (using 60GB of it for the 6 journals) and it works very well (they
were a huge performance boost compared to our previous use of internal
journals).
Some SSDs are slower than HDDs for Ceph journals though (there has been
a lot of discussions on this subject on this mailing list).


Thinking of it, leaving journal on each OSD might be safer, because
journal on one disk does not affect other disks (OSDs). Or do you
think that having the journal on SSD is better trade off?

You will put significantly more stress on your HDD leaving journal on
them and good SSDs are far more robust than HDDs so if you pick Intel DC
or equivalent SSD for journal your infrastructure might even be more
robust than one using internal journals (HDDs are dropping like flies
when you have hundreds of them). There are other components able to take
down all your OSDs : the disk controller, the CPU, the memory, the power
supply, ... So adding one robust SSD shouldn't change the overall
availabilty much (you must check their wear level and choose the models
according to the amount of writes you want them to support over their
lifetime though).

The main reason for journals on SSD is performance anyway. If your setup
is already fast enough without them, I wouldn't try to add SSDs.
Otherwise, if you can't reach the level of performance needed by adding
the OSDs already needed for your storage capacity objectives, go SSD.

Best regards,

Lionel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-30 Thread Andrija Panic
Make sure to check this blog page
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/

Since Im not sure if you are playing arround with CEPH, or plan it for
production and good performance.
My experience SSD as journal: SSD Samsung 850 PRO = 200 IOPS sustained
writes, vs Intel S3500 18.000 IOPS sustained writes - so you understand the
difference,,,

regards

On 30 September 2015 at 11:17, Jiri Kanicky  wrote:

> Thanks to all for responses. Great thread with a lot of info.
>
> I will go with the 3 partitions on Kingstone SDD for 3 OSDs on each node.
>
> Thanks
> Jiri
>
> On 30/09/2015 00:38, Lionel Bouton wrote:
>
>> Hi,
>>
>> Le 29/09/2015 13:32, Jiri Kanicky a écrit :
>>
>>> Hi Lionel.
>>>
>>> Thank you for your reply. In this case I am considering to create
>>> separate partitions for each disk on the SSD drive. Would be good to
>>> know what is the performance difference, because creating partitions
>>> is kind of waste of space.
>>>
>> The difference is hard to guess : filesystems need more CPU power than
>> raw block devices for example, so if you don't have much CPU power this
>> can make a significant difference. Filesystems might put more load on
>> our storage too (for example ext3/4 with data=journal will at least
>> double the disk writes). So there's a lot to consider and nothing will
>> be faster for journals than a raw partition. LVM logical volumes come a
>> close second behind because usually (if you simply use LVM to create
>> your logical volumes and don't try to use anything else like snapshots)
>> they don't change access patterns and almost don't need any CPU power.
>>
>> One more question, is it a good idea to move journal for 3 OSDs to a
>>> single SSD considering if SSD fails the whole node with 3 HDDs will be
>>> down?
>>>
>> If your SSDs are working well with Ceph and aren't cheap models dying
>> under heavy writes, yes. I use one 200GB DC3710 SSD for 6 7200rpm SATA
>> OSDs (using 60GB of it for the 6 journals) and it works very well (they
>> were a huge performance boost compared to our previous use of internal
>> journals).
>> Some SSDs are slower than HDDs for Ceph journals though (there has been
>> a lot of discussions on this subject on this mailing list).
>>
>> Thinking of it, leaving journal on each OSD might be safer, because
>>> journal on one disk does not affect other disks (OSDs). Or do you
>>> think that having the journal on SSD is better trade off?
>>>
>> You will put significantly more stress on your HDD leaving journal on
>> them and good SSDs are far more robust than HDDs so if you pick Intel DC
>> or equivalent SSD for journal your infrastructure might even be more
>> robust than one using internal journals (HDDs are dropping like flies
>> when you have hundreds of them). There are other components able to take
>> down all your OSDs : the disk controller, the CPU, the memory, the power
>> supply, ... So adding one robust SSD shouldn't change the overall
>> availabilty much (you must check their wear level and choose the models
>> according to the amount of writes you want them to support over their
>> lifetime though).
>>
>> The main reason for journals on SSD is performance anyway. If your setup
>> is already fast enough without them, I wouldn't try to add SSDs.
>> Otherwise, if you can't reach the level of performance needed by adding
>> the OSDs already needed for your storage capacity objectives, go SSD.
>>
>> Best regards,
>>
>> Lionel
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-30 Thread J David
On Tue, Sep 29, 2015 at 7:32 AM, Jiri Kanicky  wrote:
> Thank you for your reply. In this case I am considering to create separate
> partitions for each disk on the SSD drive. Would be good to know what is the
> performance difference, because creating partitions is kind of waste of
> space.

It may be worth pointing out with SSD's as journals that "wasted
space" is not necessarily a bad thing.  First, unless you buy tiny
SSDs, which generally have poorer performance characteristics due to
fewer chips, you will be limited by IOPs and latency, not available
space.  Second, due to SSD wear leveling algorithms, any wasted/unused
space that never gets accessed translates directly into longer life
and that is no bad thing.


On Wed, Sep 30, 2015 at 6:04 AM, Jan Schermer  wrote:
> I have some experience with Kingstons - which model do you plan to use?
>
> Shorter version: don't use Kingstons. For anything. Ever.

The Kingston SSDNow E100 series are the only Kingston products to ever
pass our internal qualifications, and appear to be pretty decent.
That said, our production Ceph clusters are all Intel all the time.
SSD vendors that are not Intel pretty much exist for the sole purpose
of keeping Intel honest.

Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-29 Thread Andrija Panic
Jiri,

if you colocate more Journals on 1 SSD (we do...), make sure to understand
the following:

- if SSD dies, all OSDs that had their journals on it, are lost...
- the more journals you put on single SSD (1 journal being 1 partition),
the worse performance, since total SSD performance is not i.e.
dedicated/available to only 1 journal, since you are now i.e. colocating 6
journals on 1 SSD...so perromance is 1/6 for each journal...

Latenc will go up, bandwith will go down, the more journals you colocate...
XFS recommended...

I suggest make balance between wanted performance and $$$ for SSDs...

best

On 29 September 2015 at 13:32, Jiri Kanicky  wrote:

> Hi Lionel.
>
> Thank you for your reply. In this case I am considering to create separate
> partitions for each disk on the SSD drive. Would be good to know what is
> the performance difference, because creating partitions is kind of waste of
> space.
>
> One more question, is it a good idea to move journal for 3 OSDs to a
> single SSD considering if SSD fails the whole node with 3 HDDs will be
> down? Thinking of it, leaving journal on each OSD might be safer, because
> journal on one disk does not affect other disks (OSDs). Or do you think
> that having the journal on SSD is better trade off?
>
> Thank you
> Jiri
>
>
> On 29/09/2015 21:10, Lionel Bouton wrote:
>
>> Le 29/09/2015 07:29, Jiri Kanicky a écrit :
>>
>>> Hi,
>>>
>>> Is it possible to create journal in directory as explained here:
>>>
>>> http://wiki.skytech.dk/index.php/Ceph_-_howto,_rbd,_lvm,_cluster#Add.2Fmove_journal_in_running_cluster
>>>
>> Yes, the general idea (stop, flush, move, update ceph.conf, mkjournal,
>> start) is valid for moving your journal wherever you want.
>> That said it probably won't perform as well on a filesystem (LVM as
>> lower overhead than a filesystem).
>>
>> 1. Create BTRFS over /dev/sda6 (assuming this is SSD partition alocate
>>> for journal) and mount it to /srv/ceph/journal
>>>
>> BTRFS is probably the worst idea for hosting journals. If you must use
>> BTRFS, you'll have to make sure that the journals are created NoCoW
>> before the first byte is ever written to them.
>>
>> 2. Add OSD: ceph-deploy osd create --fs-type btrfs
>>> ceph1:sdb:/srv/ceph/journal/osd$id/journal
>>>
>> I've no experience with ceph-deploy...
>>
>> Best regards,
>>
>> Lionel
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-29 Thread Jiri Kanicky

Hi Lionel.

Thank you for your reply. In this case I am considering to create 
separate partitions for each disk on the SSD drive. Would be good to 
know what is the performance difference, because creating partitions is 
kind of waste of space.


One more question, is it a good idea to move journal for 3 OSDs to a 
single SSD considering if SSD fails the whole node with 3 HDDs will be 
down? Thinking of it, leaving journal on each OSD might be safer, 
because journal on one disk does not affect other disks (OSDs). Or do 
you think that having the journal on SSD is better trade off?


Thank you
Jiri

On 29/09/2015 21:10, Lionel Bouton wrote:

Le 29/09/2015 07:29, Jiri Kanicky a écrit :

Hi,

Is it possible to create journal in directory as explained here:
http://wiki.skytech.dk/index.php/Ceph_-_howto,_rbd,_lvm,_cluster#Add.2Fmove_journal_in_running_cluster

Yes, the general idea (stop, flush, move, update ceph.conf, mkjournal,
start) is valid for moving your journal wherever you want.
That said it probably won't perform as well on a filesystem (LVM as
lower overhead than a filesystem).


1. Create BTRFS over /dev/sda6 (assuming this is SSD partition alocate
for journal) and mount it to /srv/ceph/journal

BTRFS is probably the worst idea for hosting journals. If you must use
BTRFS, you'll have to make sure that the journals are created NoCoW
before the first byte is ever written to them.


2. Add OSD: ceph-deploy osd create --fs-type btrfs
ceph1:sdb:/srv/ceph/journal/osd$id/journal

I've no experience with ceph-deploy...

Best regards,

Lionel



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-29 Thread Lionel Bouton
Hi,

Le 29/09/2015 13:32, Jiri Kanicky a écrit :
> Hi Lionel.
>
> Thank you for your reply. In this case I am considering to create
> separate partitions for each disk on the SSD drive. Would be good to
> know what is the performance difference, because creating partitions
> is kind of waste of space.

The difference is hard to guess : filesystems need more CPU power than
raw block devices for example, so if you don't have much CPU power this
can make a significant difference. Filesystems might put more load on
our storage too (for example ext3/4 with data=journal will at least
double the disk writes). So there's a lot to consider and nothing will
be faster for journals than a raw partition. LVM logical volumes come a
close second behind because usually (if you simply use LVM to create
your logical volumes and don't try to use anything else like snapshots)
they don't change access patterns and almost don't need any CPU power.

>
> One more question, is it a good idea to move journal for 3 OSDs to a
> single SSD considering if SSD fails the whole node with 3 HDDs will be
> down?

If your SSDs are working well with Ceph and aren't cheap models dying
under heavy writes, yes. I use one 200GB DC3710 SSD for 6 7200rpm SATA
OSDs (using 60GB of it for the 6 journals) and it works very well (they
were a huge performance boost compared to our previous use of internal
journals).
Some SSDs are slower than HDDs for Ceph journals though (there has been
a lot of discussions on this subject on this mailing list).

> Thinking of it, leaving journal on each OSD might be safer, because
> journal on one disk does not affect other disks (OSDs). Or do you
> think that having the journal on SSD is better trade off?

You will put significantly more stress on your HDD leaving journal on
them and good SSDs are far more robust than HDDs so if you pick Intel DC
or equivalent SSD for journal your infrastructure might even be more
robust than one using internal journals (HDDs are dropping like flies
when you have hundreds of them). There are other components able to take
down all your OSDs : the disk controller, the CPU, the memory, the power
supply, ... So adding one robust SSD shouldn't change the overall
availabilty much (you must check their wear level and choose the models
according to the amount of writes you want them to support over their
lifetime though).

The main reason for journals on SSD is performance anyway. If your setup
is already fast enough without them, I wouldn't try to add SSDs.
Otherwise, if you can't reach the level of performance needed by adding
the OSDs already needed for your storage capacity objectives, go SSD.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-09-29 Thread Bill Sharer
I think I got over 10% improvement when I changed from cooked journal 
file on btrfs based system SSD to a raw partition on the system SSD.  
The cluster I've been testing with is all consumer grade stuff running 
on top of AMD piledriver and kaveri based mobo's with the on-board 
SATA.  My SSDs are a hodgepodge of OCZ Vertex 4 and Samsung 840 and 850 
(non-pro).  I'm also seeing a performance win by merging individual osds 
into btrfs mirror sets after doing thatand dropping the replica count 
from 3 to 2.  I also consider this a better defense in depth strategy 
since btrfs self-heals when it hits bit rot on the mirrors and raid sets.


That boost was probably aio and dio kicking in because of the raw versus 
cooked.  Note that I'm running Hammer on gentoo and my current WIP is 
moving kernels from 3.8 to 4.0.5 everywhere.  It will be interesting to 
see what happens with that.


Regards
Bill

On 09/29/2015 07:32 AM, Jiri Kanicky wrote:

Hi Lionel.

Thank you for your reply. In this case I am considering to create 
separate partitions for each disk on the SSD drive. Would be good to 
know what is the performance difference, because creating partitions 
is kind of waste of space.


One more question, is it a good idea to move journal for 3 OSDs to a 
single SSD considering if SSD fails the whole node with 3 HDDs will be 
down? Thinking of it, leaving journal on each OSD might be safer, 
because journal on one disk does not affect other disks (OSDs). Or do 
you think that having the journal on SSD is better trade off?


Thank you
Jiri

On 29/09/2015 21:10, Lionel Bouton wrote:

Le 29/09/2015 07:29, Jiri Kanicky a écrit :

Hi,

Is it possible to create journal in directory as explained here:
http://wiki.skytech.dk/index.php/Ceph_-_howto,_rbd,_lvm,_cluster#Add.2Fmove_journal_in_running_cluster 


Yes, the general idea (stop, flush, move, update ceph.conf, mkjournal,
start) is valid for moving your journal wherever you want.
That said it probably won't perform as well on a filesystem (LVM as
lower overhead than a filesystem).


1. Create BTRFS over /dev/sda6 (assuming this is SSD partition alocate
for journal) and mount it to /srv/ceph/journal

BTRFS is probably the worst idea for hosting journals. If you must use
BTRFS, you'll have to make sure that the journals are created NoCoW
before the first byte is ever written to them.


2. Add OSD: ceph-deploy osd create --fs-type btrfs
ceph1:sdb:/srv/ceph/journal/osd$id/journal

I've no experience with ceph-deploy...

Best regards,

Lionel



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-07-13 Thread Rimma Iontel

Thank you Lionel,

This was very helpful.  I actually chose to split the partition and then 
recreated the OSDs.  Everything is up and running now.


Rimma

On 7/13/15 6:34 PM, Lionel Bouton wrote:

On 07/14/15 00:08, Rimma Iontel wrote:

Hi all,

[...]
Is there something that needed to be done to journal partition to
enable sharing between multiple OSDs?  Or is there something else
that's causing the isssue?


IIRC you can't share a volume between multiple OSDs. What you could do
if splitting this partition isn't possible is create a LVM volume group
with it as a single physical volume (change type of partition to lvm,
pvcreate /dev/sda6, vgcreate journal_vg /dev/sda6). Then you can create
a logical volumes in it for each of your OSDs (lvcreate -n
osdn_journal -L one_third_of_available_space journal_vg) and use
them (/dev/journal_vg/osdn_journal) in your configuration.

Lionel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issue with journal on another drive

2015-07-13 Thread Lionel Bouton
On 07/14/15 00:08, Rimma Iontel wrote:
 Hi all,

 [...]
 Is there something that needed to be done to journal partition to
 enable sharing between multiple OSDs?  Or is there something else
 that's causing the isssue?


IIRC you can't share a volume between multiple OSDs. What you could do
if splitting this partition isn't possible is create a LVM volume group
with it as a single physical volume (change type of partition to lvm,
pvcreate /dev/sda6, vgcreate journal_vg /dev/sda6). Then you can create
a logical volumes in it for each of your OSDs (lvcreate -n
osdn_journal -L one_third_of_available_space journal_vg) and use
them (/dev/journal_vg/osdn_journal) in your configuration.

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com