Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-14 Thread Mark Nelson

On 08/14/2017 02:42 PM, Nick Fisk wrote:

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Ronny Aasen
Sent: 14 August 2017 18:55
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] luminous/bluetsore osd memory requirements

On 10.08.2017 17:30, Gregory Farnum wrote:

This has been discussed a lot in the performance meetings so I've
added Mark to discuss. My naive recollection is that the per-terabyte
recommendation will be more realistic  than it was in the past (an
effective increase in memory needs), but also that it will be under
much better control than previously.



Is there any way to tune or reduce the memory footprint? perhaps by
sacrificing performace ? our jewel cluster osd servers is maxed out on
memory. And with the added memory requirements I  fear we may not be
able to upgrade to luminous/bluestore..


Check out this PR, it shows the settings to set memory used for cache and
their defaults

https://github.com/ceph/ceph/pull/16157


Hey guys, sorry for the late reply.  The gist of it is that memory is 
used in bluestore is a couple of different ways:


1) various internal buffers and such
2) bluestore specific cache (unencoded onodes, extents, etc)
3) rocksdb block cache
  3a) encoded data from bluestore
  3b) bloom filters and table indexes
4) other rocksdb memory/etc

Right now when you set the bluestore cache size it first favors rocksdb 
block cache up to 512MB and then start favoring bluestore onode cache 
after that.  Even without bloom filters that seems to improve bluestore 
performance with small cache sizes.  With bloom filters it's likely even 
more important to feed whatever you can to rocksdb's block cache to keep 
the index and bloom filters in memory as much as possible.  It's unclear 
right now how quickly we should let the block cache grow as the number 
of objects increases.  Prior to using bloom filters it appeared that 
favoring the onode cache was better.  Now we probably both want to favor 
the bloom filters and bluestore's onode cache.


So the first order of business is to see how changing the bluestore 
cache size hurts you.  Bluestore's default behavior of favoring the 
rocksdb block cache (and specifically the bloom filters) first is 
probably still decent but you may want to play around with it if you 
expect a lot of small objects and limited memory.  For really low memory 
scenarios you could also try reducing the rocksdb buffer sizes, but 
smaller buffers are going to give you higher write-amp.  It's possible 
this PR may help though:


https://github.com/ceph/rocksdb/pull/19

You might be able to lower memory further with smaller PG/OSD maps, but 
at some point you start hitting diminishing returns.


Mark






kind regards
Ronny Aasen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-14 Thread Nick Fisk
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Ronny Aasen
> Sent: 14 August 2017 18:55
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] luminous/bluetsore osd memory requirements
> 
> On 10.08.2017 17:30, Gregory Farnum wrote:
> > This has been discussed a lot in the performance meetings so I've
> > added Mark to discuss. My naive recollection is that the per-terabyte
> > recommendation will be more realistic  than it was in the past (an
> > effective increase in memory needs), but also that it will be under
> > much better control than previously.
> 
> 
> Is there any way to tune or reduce the memory footprint? perhaps by
> sacrificing performace ? our jewel cluster osd servers is maxed out on
> memory. And with the added memory requirements I  fear we may not be
> able to upgrade to luminous/bluestore..

Check out this PR, it shows the settings to set memory used for cache and
their defaults

https://github.com/ceph/ceph/pull/16157


> 
> kind regards
> Ronny Aasen
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-14 Thread Ronny Aasen

On 10.08.2017 17:30, Gregory Farnum wrote:
This has been discussed a lot in the performance meetings so I've 
added Mark to discuss. My naive recollection is that the per-terabyte 
recommendation will be more realistic  than it was in the past (an 
effective increase in memory needs), but also that it will be under 
much better control than previously.



Is there any way to tune or reduce the memory footprint? perhaps by 
sacrificing performace ? our jewel cluster osd servers is maxed out on 
memory. And with the added memory requirements I  fear we may not be 
able to upgrade to luminous/bluestore..


kind regards
Ronny Aasen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-14 Thread Lars Täuber
Hi there,

can someone share her/his experiences regarding this question? Maybe 
differentiated according to the different available algorithms?

Sat, 12 Aug 2017 14:40:05 +0200
Stijn De Weirdt  ==> Gregory Farnum 
, Mark Nelson , 
"ceph-users@lists.ceph.com"  :
> also any indication how much more cpu EC uses (10%,
> 100%, ...)?


I would be interested in the hardware recommendations for the newly introduced 
ceph-mgr daemon also. The big search engines don't tell me anything about this 
yet.

Thanks in advance
Lars
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-13 Thread Nick Fisk
Hi David,

 

No serious testing but I have various disks fail, nodes go offline…etc over the 
last 12 months and I’m still only seeing 15-20% CPU max for user+system.

 

From: David Turner [mailto:drakonst...@gmail.com] 
Sent: 12 August 2017 21:20
To: n...@fisk.me.uk; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] luminous/bluetsore osd memory requirements

 

Did you do any of that testing to involve a degraded cluster, backfilling, 
peering, etc? A healthy cluster running normally uses sometimes 4x less memory 
and CPU resources as a cluster consistently peering and degraded.

 

On Sat, Aug 12, 2017, 2:40 PM Nick Fisk <n...@fisk.me.uk 
<mailto:n...@fisk.me.uk> > wrote:

I was under the impression the memory requirements for Bluestore would be
around 2-3GB per OSD regardless of capacity.

CPU wise, I would lean towards working out how much total Ghz you require
and then get whatever CPU you need to get there, but with a preference of
Ghz over cores. Yes, there will be a slight overhead to having more threads
running on a lower number of cores, but I believe this is fairly minimal in
comparison to the speed boost obtained by the single threaded portion of the
data path in each OSD from running on a faster Ghz core. Each PG takes a
lock for each operation and so any other requests for the same PG will queue
up and be processed sequentially. The faster you can process through this
stage the better. I'm pretty sure if you graphed PG activity on an average
cluster, you would see a high skew to a certain number of PG's being hit
more often than others. I think Mark N has been experiencing the effects of
the PG locking in recent tests.

Also don't forget to make sure your CPUs are running at c-state C1 and max
Freq. This can sometimes give up to a 4x reduction in latency.

Also, if you look at the number of threads running on a OSD node, it will be
in the 10's of 100's of threads, each OSD process itself has several
threads. So don't think that 12 OSD's=12 core processor.

I did some tests to measure cpu usage per IO, which you may find useful.

http://www.sys-pro.co.uk/how-many-mhz-does-a-ceph-io-need/

I can max out 12x7.2k disks on a E3 1240 CPU and its only running at about
15-20%.

I haven't done any proper Bluestore tests, but from some rough testing the
CPU usage wasn't too dissimilar from Filestore.

Depending on if you are running hdd's or ssd's and how many per node. I
would possibly look at the single socket E3's or E5's.

Although saying that, the recent AMD and Intel announcements also have some
potentially interesting single socket Ceph potentials in the mix.

Hope that helps.

Nick

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com 
> <mailto:ceph-users-boun...@lists.ceph.com> ] On Behalf
> Of Stijn De Weirdt
> Sent: 12 August 2017 14:41
> To: David Turner <drakonst...@gmail.com <mailto:drakonst...@gmail.com> >; 
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
> Subject: Re: [ceph-users] luminous/bluetsore osd memory requirements
>
> hi david,
>
> sure i understand that. but how bad does it get when you oversubscribe
> OSDs? if context switching itself is dominant, then using HT should
> allow to run double the amount of OSDs on same CPU (on OSD per HT
> core); but if the issue is actual cpu cycles, HT won't help that much
> either (1 OSD per HT core vs 2 OSD per phys core).
>
> i guess the reason for this is that OSD processes have lots of threads?
>
> maybe i can run some tests on a ceph test cluster myself ;)
>
> stijn
>
>
> On 08/12/2017 03:13 PM, David Turner wrote:
> > The reason for an entire core peer osd is that it's trying to avoid
> > context switching your CPU to death. If you have a quad-core
> > processor with HT, I wouldn't recommend more than 8 osds on the box.
> > I probably would go with 7 myself to keep one core available for
> > system operations. This recommendation has nothing to do with GHz.
> > Higher GHz per core will likely improve your cluster latency. Of
> > course if your use case says that you only need very minimal
> > through-put... There is no need to hit or exceed the recommendation.
> > The number of cores recommendation is not changing for bluestore. It
> > might add a recommendation of how fast your processor should be...
> > But making it based on how much GHz per TB is an invitation to context
switch to death.
> >
> > On Sat, Aug 12, 2017, 8:40 AM Stijn De Weirdt
> > <stijn.dewei...@ugent.be <mailto:stijn.dewei...@ugent.be> >
> > wrote:
> >
> >> hi all,
> >>
> >> thanks for all the feedback. it's clear we should stick to the
> >> 1GB/TB for the memory.
> >>
> >> any (changes to) recommendation for the CPU? in particular, i

Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-12 Thread David Turner
Did you do any of that testing to involve a degraded cluster, backfilling,
peering, etc? A healthy cluster running normally uses sometimes 4x less
memory and CPU resources as a cluster consistently peering and degraded.

On Sat, Aug 12, 2017, 2:40 PM Nick Fisk <n...@fisk.me.uk> wrote:

> I was under the impression the memory requirements for Bluestore would be
> around 2-3GB per OSD regardless of capacity.
>
> CPU wise, I would lean towards working out how much total Ghz you require
> and then get whatever CPU you need to get there, but with a preference of
> Ghz over cores. Yes, there will be a slight overhead to having more threads
> running on a lower number of cores, but I believe this is fairly minimal in
> comparison to the speed boost obtained by the single threaded portion of
> the
> data path in each OSD from running on a faster Ghz core. Each PG takes a
> lock for each operation and so any other requests for the same PG will
> queue
> up and be processed sequentially. The faster you can process through this
> stage the better. I'm pretty sure if you graphed PG activity on an average
> cluster, you would see a high skew to a certain number of PG's being hit
> more often than others. I think Mark N has been experiencing the effects of
> the PG locking in recent tests.
>
> Also don't forget to make sure your CPUs are running at c-state C1 and max
> Freq. This can sometimes give up to a 4x reduction in latency.
>
> Also, if you look at the number of threads running on a OSD node, it will
> be
> in the 10's of 100's of threads, each OSD process itself has several
> threads. So don't think that 12 OSD's=12 core processor.
>
> I did some tests to measure cpu usage per IO, which you may find useful.
>
> http://www.sys-pro.co.uk/how-many-mhz-does-a-ceph-io-need/
>
> I can max out 12x7.2k disks on a E3 1240 CPU and its only running at about
> 15-20%.
>
> I haven't done any proper Bluestore tests, but from some rough testing the
> CPU usage wasn't too dissimilar from Filestore.
>
> Depending on if you are running hdd's or ssd's and how many per node. I
> would possibly look at the single socket E3's or E5's.
>
> Although saying that, the recent AMD and Intel announcements also have some
> potentially interesting single socket Ceph potentials in the mix.
>
> Hope that helps.
>
> Nick
>
> > -Original Message-
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> > Of Stijn De Weirdt
> > Sent: 12 August 2017 14:41
> > To: David Turner <drakonst...@gmail.com>; ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] luminous/bluetsore osd memory requirements
> >
> > hi david,
> >
> > sure i understand that. but how bad does it get when you oversubscribe
> > OSDs? if context switching itself is dominant, then using HT should
> > allow to run double the amount of OSDs on same CPU (on OSD per HT
> > core); but if the issue is actual cpu cycles, HT won't help that much
> > either (1 OSD per HT core vs 2 OSD per phys core).
> >
> > i guess the reason for this is that OSD processes have lots of threads?
> >
> > maybe i can run some tests on a ceph test cluster myself ;)
> >
> > stijn
> >
> >
> > On 08/12/2017 03:13 PM, David Turner wrote:
> > > The reason for an entire core peer osd is that it's trying to avoid
> > > context switching your CPU to death. If you have a quad-core
> > > processor with HT, I wouldn't recommend more than 8 osds on the box.
> > > I probably would go with 7 myself to keep one core available for
> > > system operations. This recommendation has nothing to do with GHz.
> > > Higher GHz per core will likely improve your cluster latency. Of
> > > course if your use case says that you only need very minimal
> > > through-put... There is no need to hit or exceed the recommendation.
> > > The number of cores recommendation is not changing for bluestore. It
> > > might add a recommendation of how fast your processor should be...
> > > But making it based on how much GHz per TB is an invitation to context
> switch to death.
> > >
> > > On Sat, Aug 12, 2017, 8:40 AM Stijn De Weirdt
> > > <stijn.dewei...@ugent.be>
> > > wrote:
> > >
> > >> hi all,
> > >>
> > >> thanks for all the feedback. it's clear we should stick to the
> > >> 1GB/TB for the memory.
> > >>
> > >> any (changes to) recommendation for the CPU? in particular, is it
> > >> still the rather vague "1 HT core per OSD" (or was it "1 1Ghz HT
> > >> core per O

Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-12 Thread Nick Fisk
I was under the impression the memory requirements for Bluestore would be
around 2-3GB per OSD regardless of capacity.

CPU wise, I would lean towards working out how much total Ghz you require
and then get whatever CPU you need to get there, but with a preference of
Ghz over cores. Yes, there will be a slight overhead to having more threads
running on a lower number of cores, but I believe this is fairly minimal in
comparison to the speed boost obtained by the single threaded portion of the
data path in each OSD from running on a faster Ghz core. Each PG takes a
lock for each operation and so any other requests for the same PG will queue
up and be processed sequentially. The faster you can process through this
stage the better. I'm pretty sure if you graphed PG activity on an average
cluster, you would see a high skew to a certain number of PG's being hit
more often than others. I think Mark N has been experiencing the effects of
the PG locking in recent tests.

Also don't forget to make sure your CPUs are running at c-state C1 and max
Freq. This can sometimes give up to a 4x reduction in latency.

Also, if you look at the number of threads running on a OSD node, it will be
in the 10's of 100's of threads, each OSD process itself has several
threads. So don't think that 12 OSD's=12 core processor.

I did some tests to measure cpu usage per IO, which you may find useful. 

http://www.sys-pro.co.uk/how-many-mhz-does-a-ceph-io-need/

I can max out 12x7.2k disks on a E3 1240 CPU and its only running at about
15-20%.

I haven't done any proper Bluestore tests, but from some rough testing the
CPU usage wasn't too dissimilar from Filestore.

Depending on if you are running hdd's or ssd's and how many per node. I
would possibly look at the single socket E3's or E5's.

Although saying that, the recent AMD and Intel announcements also have some
potentially interesting single socket Ceph potentials in the mix.

Hope that helps.

Nick

> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> Of Stijn De Weirdt
> Sent: 12 August 2017 14:41
> To: David Turner <drakonst...@gmail.com>; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] luminous/bluetsore osd memory requirements
> 
> hi david,
> 
> sure i understand that. but how bad does it get when you oversubscribe 
> OSDs? if context switching itself is dominant, then using HT should 
> allow to run double the amount of OSDs on same CPU (on OSD per HT 
> core); but if the issue is actual cpu cycles, HT won't help that much 
> either (1 OSD per HT core vs 2 OSD per phys core).
> 
> i guess the reason for this is that OSD processes have lots of threads?
> 
> maybe i can run some tests on a ceph test cluster myself ;)
> 
> stijn
> 
> 
> On 08/12/2017 03:13 PM, David Turner wrote:
> > The reason for an entire core peer osd is that it's trying to avoid 
> > context switching your CPU to death. If you have a quad-core 
> > processor with HT, I wouldn't recommend more than 8 osds on the box. 
> > I probably would go with 7 myself to keep one core available for 
> > system operations. This recommendation has nothing to do with GHz. 
> > Higher GHz per core will likely improve your cluster latency. Of 
> > course if your use case says that you only need very minimal 
> > through-put... There is no need to hit or exceed the recommendation. 
> > The number of cores recommendation is not changing for bluestore. It 
> > might add a recommendation of how fast your processor should be... 
> > But making it based on how much GHz per TB is an invitation to context
switch to death.
> >
> > On Sat, Aug 12, 2017, 8:40 AM Stijn De Weirdt 
> > <stijn.dewei...@ugent.be>
> > wrote:
> >
> >> hi all,
> >>
> >> thanks for all the feedback. it's clear we should stick to the 
> >> 1GB/TB for the memory.
> >>
> >> any (changes to) recommendation for the CPU? in particular, is it 
> >> still the rather vague "1 HT core per OSD" (or was it "1 1Ghz HT 
> >> core per OSD"? it would be nice if we had some numbers like 
> >> required specint per TB and/or per Gbs. also any indication how 
> >> much more cpu EC uses (10%, 100%, ...)?
> >>
> >> i'm aware that this also depeneds on the use case, but i'll take 
> >> any pointers i can get. we will probably end up overprovisioning, 
> >> but it would be nice if we can avoid a whole cpu (32GB dimms are 
> >> cheap, so lots of ram with single socket is really possible these
days).
> >>
> >> stijn
> >>
> >> On 08/10/2017 05:30 PM, Gregory Farnum wrote:
> >>> This has been discussed a lot in the performance meetings so I've

Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-12 Thread Stijn De Weirdt
hi david,

sure i understand that. but how bad does it get when you oversubscribe
OSDs? if context switching itself is dominant, then using HT should
allow to run double the amount of OSDs on same CPU (on OSD per HT core);
but if the issue is actual cpu cycles, HT won't help that much either (1
OSD per HT core vs 2 OSD per phys core).

i guess the reason for this is that OSD processes have lots of threads?

maybe i can run some tests on a ceph test cluster myself ;)

stijn


On 08/12/2017 03:13 PM, David Turner wrote:
> The reason for an entire core peer osd is that it's trying to avoid context
> switching your CPU to death. If you have a quad-core processor with HT, I
> wouldn't recommend more than 8 osds on the box. I probably would go with 7
> myself to keep one core available for system operations. This
> recommendation has nothing to do with GHz. Higher GHz per core will likely
> improve your cluster latency. Of course if your use case says that you only
> need very minimal through-put... There is no need to hit or exceed the
> recommendation. The number of cores recommendation is not changing for
> bluestore. It might add a recommendation of how fast your processor should
> be... But making it based on how much GHz per TB is an invitation to
> context switch to death.
> 
> On Sat, Aug 12, 2017, 8:40 AM Stijn De Weirdt 
> wrote:
> 
>> hi all,
>>
>> thanks for all the feedback. it's clear we should stick to the 1GB/TB
>> for the memory.
>>
>> any (changes to) recommendation for the CPU? in particular, is it still
>> the rather vague "1 HT core per OSD" (or was it "1 1Ghz HT core per
>> OSD"? it would be nice if we had some numbers like required specint per
>> TB and/or per Gbs. also any indication how much more cpu EC uses (10%,
>> 100%, ...)?
>>
>> i'm aware that this also depeneds on the use case, but i'll take any
>> pointers i can get. we will probably end up overprovisioning, but it
>> would be nice if we can avoid a whole cpu (32GB dimms are cheap, so lots
>> of ram with single socket is really possible these days).
>>
>> stijn
>>
>> On 08/10/2017 05:30 PM, Gregory Farnum wrote:
>>> This has been discussed a lot in the performance meetings so I've added
>>> Mark to discuss. My naive recollection is that the per-terabyte
>>> recommendation will be more realistic  than it was in the past (an
>>> effective increase in memory needs), but also that it will be under much
>>> better control than previously.
>>>
>>> On Thu, Aug 10, 2017 at 1:35 AM Stijn De Weirdt >>
>>> wrote:
>>>
 hi all,

 we are planning to purchse new OSD hardware, and we are wondering if for
 upcoming luminous with bluestore OSDs, anything wrt the hardware
 recommendations from
 http://docs.ceph.com/docs/master/start/hardware-recommendations/
 will be different, esp the memory/cpu part. i understand from colleagues
 that the async messenger makes a big difference in memory usage (maybe
 also cpu load?); but we are also interested in the "1GB of RAM per TB"
 recommendation/requirement.

 many thanks,

 stijn
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-12 Thread David Turner
The reason for an entire core peer osd is that it's trying to avoid context
switching your CPU to death. If you have a quad-core processor with HT, I
wouldn't recommend more than 8 osds on the box. I probably would go with 7
myself to keep one core available for system operations. This
recommendation has nothing to do with GHz. Higher GHz per core will likely
improve your cluster latency. Of course if your use case says that you only
need very minimal through-put... There is no need to hit or exceed the
recommendation. The number of cores recommendation is not changing for
bluestore. It might add a recommendation of how fast your processor should
be... But making it based on how much GHz per TB is an invitation to
context switch to death.

On Sat, Aug 12, 2017, 8:40 AM Stijn De Weirdt 
wrote:

> hi all,
>
> thanks for all the feedback. it's clear we should stick to the 1GB/TB
> for the memory.
>
> any (changes to) recommendation for the CPU? in particular, is it still
> the rather vague "1 HT core per OSD" (or was it "1 1Ghz HT core per
> OSD"? it would be nice if we had some numbers like required specint per
> TB and/or per Gbs. also any indication how much more cpu EC uses (10%,
> 100%, ...)?
>
> i'm aware that this also depeneds on the use case, but i'll take any
> pointers i can get. we will probably end up overprovisioning, but it
> would be nice if we can avoid a whole cpu (32GB dimms are cheap, so lots
> of ram with single socket is really possible these days).
>
> stijn
>
> On 08/10/2017 05:30 PM, Gregory Farnum wrote:
> > This has been discussed a lot in the performance meetings so I've added
> > Mark to discuss. My naive recollection is that the per-terabyte
> > recommendation will be more realistic  than it was in the past (an
> > effective increase in memory needs), but also that it will be under much
> > better control than previously.
> >
> > On Thu, Aug 10, 2017 at 1:35 AM Stijn De Weirdt  >
> > wrote:
> >
> >> hi all,
> >>
> >> we are planning to purchse new OSD hardware, and we are wondering if for
> >> upcoming luminous with bluestore OSDs, anything wrt the hardware
> >> recommendations from
> >> http://docs.ceph.com/docs/master/start/hardware-recommendations/
> >> will be different, esp the memory/cpu part. i understand from colleagues
> >> that the async messenger makes a big difference in memory usage (maybe
> >> also cpu load?); but we are also interested in the "1GB of RAM per TB"
> >> recommendation/requirement.
> >>
> >> many thanks,
> >>
> >> stijn
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-12 Thread Stijn De Weirdt
hi all,

thanks for all the feedback. it's clear we should stick to the 1GB/TB
for the memory.

any (changes to) recommendation for the CPU? in particular, is it still
the rather vague "1 HT core per OSD" (or was it "1 1Ghz HT core per
OSD"? it would be nice if we had some numbers like required specint per
TB and/or per Gbs. also any indication how much more cpu EC uses (10%,
100%, ...)?

i'm aware that this also depeneds on the use case, but i'll take any
pointers i can get. we will probably end up overprovisioning, but it
would be nice if we can avoid a whole cpu (32GB dimms are cheap, so lots
of ram with single socket is really possible these days).

stijn

On 08/10/2017 05:30 PM, Gregory Farnum wrote:
> This has been discussed a lot in the performance meetings so I've added
> Mark to discuss. My naive recollection is that the per-terabyte
> recommendation will be more realistic  than it was in the past (an
> effective increase in memory needs), but also that it will be under much
> better control than previously.
> 
> On Thu, Aug 10, 2017 at 1:35 AM Stijn De Weirdt 
> wrote:
> 
>> hi all,
>>
>> we are planning to purchse new OSD hardware, and we are wondering if for
>> upcoming luminous with bluestore OSDs, anything wrt the hardware
>> recommendations from
>> http://docs.ceph.com/docs/master/start/hardware-recommendations/
>> will be different, esp the memory/cpu part. i understand from colleagues
>> that the async messenger makes a big difference in memory usage (maybe
>> also cpu load?); but we are also interested in the "1GB of RAM per TB"
>> recommendation/requirement.
>>
>> many thanks,
>>
>> stijn
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-10 Thread Gregory Farnum
This has been discussed a lot in the performance meetings so I've added
Mark to discuss. My naive recollection is that the per-terabyte
recommendation will be more realistic  than it was in the past (an
effective increase in memory needs), but also that it will be under much
better control than previously.

On Thu, Aug 10, 2017 at 1:35 AM Stijn De Weirdt 
wrote:

> hi all,
>
> we are planning to purchse new OSD hardware, and we are wondering if for
> upcoming luminous with bluestore OSDs, anything wrt the hardware
> recommendations from
> http://docs.ceph.com/docs/master/start/hardware-recommendations/
> will be different, esp the memory/cpu part. i understand from colleagues
> that the async messenger makes a big difference in memory usage (maybe
> also cpu load?); but we are also interested in the "1GB of RAM per TB"
> recommendation/requirement.
>
> many thanks,
>
> stijn
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-10 Thread Wido den Hollander

> Op 10 augustus 2017 om 11:14 schreef Marcus Haarmann 
> <marcus.haarm...@midoco.de>:
> 
> 
> Hi, 
> 
> we have done some testing with bluestore and found that the memory 
> consumption of the osd 
> processes is depending not on the real data amount stored but on the number 
> of stored 
> objects. 
> This means that e.g. a block device of 100 GB which spreads over 100 objects 
> has a different 
> memory usage than storing 1000 smaller objects (the bluestore blocksize 
> should be tuned for 
> that kind of setup). (100 objects of size 4k to 100k had a memory 
> consumption of ~4GB on the osd 
> on standard block size, while the amount of data was only ~15GB). 

Yes, the amount of objects and PGs will determine how much Memory a OSD will 
use.

> So it depends on the usage, a cephfs stores each file as a single object, 
> while the rbd is configured 
> to allocate larger objects. 
> 

Not true in this case. Both CephFS and RBD stripe over 4MB RADOS objects. So a 
1024MB file in CephFS will result in 256 RADOS objects of 4MB in size.

This is configurable using directory layouts, but 4MB is the default.

Wido

> Marcus Haarmann 
> 
> 
> Von: "Stijn De Weirdt" <stijn.dewei...@ugent.be> 
> An: "ceph-users" <ceph-users@lists.ceph.com> 
> Gesendet: Donnerstag, 10. August 2017 10:34:48 
> Betreff: [ceph-users] luminous/bluetsore osd memory requirements 
> 
> hi all, 
> 
> we are planning to purchse new OSD hardware, and we are wondering if for 
> upcoming luminous with bluestore OSDs, anything wrt the hardware 
> recommendations from 
> http://docs.ceph.com/docs/master/start/hardware-recommendations/ 
> will be different, esp the memory/cpu part. i understand from colleagues 
> that the async messenger makes a big difference in memory usage (maybe 
> also cpu load?); but we are also interested in the "1GB of RAM per TB" 
> recommendation/requirement. 
> 
> many thanks, 
> 
> stijn 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-10 Thread Marcus Haarmann
Hi, 

we have done some testing with bluestore and found that the memory consumption 
of the osd 
processes is depending not on the real data amount stored but on the number of 
stored 
objects. 
This means that e.g. a block device of 100 GB which spreads over 100 objects 
has a different 
memory usage than storing 1000 smaller objects (the bluestore blocksize 
should be tuned for 
that kind of setup). (100 objects of size 4k to 100k had a memory 
consumption of ~4GB on the osd 
on standard block size, while the amount of data was only ~15GB). 
So it depends on the usage, a cephfs stores each file as a single object, while 
the rbd is configured 
to allocate larger objects. 

Marcus Haarmann 


Von: "Stijn De Weirdt" <stijn.dewei...@ugent.be> 
An: "ceph-users" <ceph-users@lists.ceph.com> 
Gesendet: Donnerstag, 10. August 2017 10:34:48 
Betreff: [ceph-users] luminous/bluetsore osd memory requirements 

hi all, 

we are planning to purchse new OSD hardware, and we are wondering if for 
upcoming luminous with bluestore OSDs, anything wrt the hardware 
recommendations from 
http://docs.ceph.com/docs/master/start/hardware-recommendations/ 
will be different, esp the memory/cpu part. i understand from colleagues 
that the async messenger makes a big difference in memory usage (maybe 
also cpu load?); but we are also interested in the "1GB of RAM per TB" 
recommendation/requirement. 

many thanks, 

stijn 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] luminous/bluetsore osd memory requirements

2017-08-10 Thread Stijn De Weirdt
hi all,

we are planning to purchse new OSD hardware, and we are wondering if for
upcoming luminous with bluestore OSDs, anything wrt the hardware
recommendations from
http://docs.ceph.com/docs/master/start/hardware-recommendations/
will be different, esp the memory/cpu part. i understand from colleagues
that the async messenger makes a big difference in memory usage (maybe
also cpu load?); but we are also interested in the "1GB of RAM per TB"
recommendation/requirement.

many thanks,

stijn
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com