Re: [ceph-users] Bluestore with so many small files

2019-04-23 Thread Frédéric Nass
Hi, 

You probably forgot to recreate the OSD after changing 
bluestore_min_alloc_size. 

Regards, 
Frédéric. 

- Le 22 Avr 19, à 5:41, 刘 俊  a écrit : 

> Hi All ,
> I still see this issue with latest ceph Luminous 12.2.11 and 12.2.12.
> I have set bluestore_min_alloc_size = 4096 before the test.
> when I write 10 small objects less than 64KB through rgw, the RAW USED
> showed in "ceph df" looks incorrect.
> For example, I test three times and clean up the rgw data pool each time, the
> object size for first time is 4KB, for second time is 32KB, for third time is
> 64KB.
> The RAW USED showed in "ceph df" are the same(18GB),  looks like always equal 
> to
> 64KB*10/1024*3 . (replicator is 3 here )
> Any thought?
> Jamie
> ___
> Hi Behnam,

> On 2/12/2018 4:06 PM, Behnam Loghmani wrote:
>> Hi there, > > I am using ceph Luminous 12.2.2 with: > > 3 osds (each osd is
>> 100G) - no WAL/DB separation. > 3 mons > 1 rgw > cluster size 3 > > I stored
>> lots of thumbnails with very small size on ceph with radosgw. > > Actual size
>> of files is something about 32G but it filled 70G of each osd. > > what's the
>> reason of this high disk usage? Most probably the major reason is BlueStore
> > allocation granularity. E.g.
> an object of 1K bytes length needs 64K of disk space if default
> bluestore_min_alloc_size_hdd  (=64K) is applied.
> Additional inconsistency in space reporting might also appear since
> BlueStore adds up DB volume space when accounting total store space.
> While free space is taken from Block device only. is As a result when
> reporting "Used" space always contain that total DB space part ( i.e.
> Used = Total(Block+DB) - Free(Block) ). That correlates to other
> comments in this thread about RockDB space usage.
> There is a pending PR to fix that: [
> https://github.com/ceph/ceph/pull/19454/commits/144fb9663778f833782bdcb16acd707c3ed62a86
> |
> https://github.com/ceph/ceph/pull/19454/commits/144fb9663778f833782bdcb16acd707c3ed62a86
> ] You may look for "Bluestore: inaccurate disk usage statistics problem"
> in this mail list for previous discussion as well.

>> should I change "bluestore_min_alloc_size_hdd"? and If I change it and > set 
>> it
>> to smaller size, does it impact on performance? Unfortunately I haven't
> > benchmark "small writes over hdd" cases much
> hence don't have exacts answer here. Indeed these 'min_alloc_size'
> family of parameters might impact the performance quite significantly.
>> > what is the best practice for storing small files on bluestore? > > Best
> > > regards, > Behnam Loghmani
>> > On Mon, Feb 12, 2018 at 5:06 PM, David Turner < [
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | drakonstein at
>> > gmail.com ] > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > | drakonstein at gmail.com ] >> wrote: > > Some of your overhead is the 
>> > Wal and
>> > rocksdb that are on the OSDs. > The Wal is pretty static in size, but 
>> > rocksdb
>> > grows with the amount > of objects you have. You also have copies of the 
>> > osdmap
>> > on each osd. > There's just overhead that adds up. The biggest is going to 
>> > be >
>> > rocksdb with how many objects you have. > > > On Mon, Feb 12, 2018, 8:06 AM
>> > Behnam Loghmani > < [ 
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>> > behnam.loghmani at gmail.com ] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | behnam.loghmani at
>> > gmail.com ] >> wrote: > > Hi there, > > I am using ceph Luminous 12.2.2 
>> > with: >
>> > > 3 osds (each osd is 100G) - no WAL/DB separation. > 3 mons > 1 rgw > 
>> > > cluster
>> > size 3 > > I stored lots of thumbnails with very small size on ceph with >
>> > radosgw. > > Actual size of files is something about 32G but it filled 70G 
>> > of >
>> > each osd. > > what's the reason of this high disk usage? > should I change
>> > "bluestore_min_alloc_size_hdd"? and If I change > it and set it to smaller
>> > size, does it impact on performance? > > what is the best practice for 
>> > storing
>> > small files on bluestore? > > Best regards, > Behnam Loghmani >
>> > ___ > ceph-users mailing list 
>> > > [
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | ceph-users at
>> > lists.ceph.com ] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com | ceph-users at
>> > lists.ceph.com ] > > [ 
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] > < [
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ] >

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph

[ceph-users] Bluestore with so many small files

2019-04-21 Thread 刘 俊
Hi All,

I still see this issue with latest ceph Luminous 12.2.11 and 12.2.12.

I have set bluestore_min_alloc_size = 4096 before the test.

when I write 10 small objects less than 64KB through rgw, the RAW USED 
showed in "ceph df" looks incorrect.

For example, I test three times and clean up the rgw data pool each time, the 
object size for first time is 4KB, for second time is 32KB, for third time is 
64KB.

The RAW USED showed in "ceph df" are the same(18GB),  looks like always equal 
to 64KB*10/1024*3. (replicator is 3 here)

Any thought?

Jamie

___

Hi Behnam,

On 2/12/2018 4:06 PM, Behnam Loghmani wrote:
> Hi there,
>
> I am using ceph Luminous 12.2.2 with:
>
> 3 osds (each osd is 100G) - no WAL/DB separation.
> 3 mons
> 1 rgw
> cluster size 3
>
> I stored lots of thumbnails with very small size on ceph with radosgw.
>
> Actual size of files is something about 32G but it filled 70G of each osd.
>
> what's the reason of this high disk usage?
Most probably the major reason is BlueStore allocation granularity. E.g.
an object of 1K bytes length needs 64K of disk space if default
bluestore_min_alloc_size_hdd  (=64K) is applied.
Additional inconsistency in space reporting might also appear since
BlueStore adds up DB volume space when accounting total store space.
While free space is taken from Block device only. is As a result when
reporting "Used" space always contain that total DB space part ( i.e.
Used = Total(Block+DB) - Free(Block) ). That correlates to other
comments in this thread about RockDB space usage.
There is a pending PR to fix that:
https://github.com/ceph/ceph/pull/19454/commits/144fb9663778f833782bdcb16acd707c3ed62a86
You may look for "Bluestore: inaccurate disk usage statistics problem"
in this mail list for previous discussion as well.

> should I change "bluestore_min_alloc_size_hdd"? and If I change it and
> set it to smaller size, does it impact on performance?
Unfortunately I haven't benchmark "small writes over hdd" cases much
hence don't have exacts answer here. Indeed these 'min_alloc_size'
family of parameters might impact the performance quite significantly.
>
> what is the best practice for storing small files on bluestore?
>
> Best regards,
> Behnam Loghmani


>
> On Mon, Feb 12, 2018 at 5:06 PM, David Turner  gmail.com
>  gmail.com>> wrote:
>
> Some of your overhead is the Wal and rocksdb that are on the OSDs.
> The Wal is pretty static in size, but rocksdb grows with the amount
> of objects you have. You also have copies of the osdmap on each osd.
> There's just overhead that adds up. The biggest is going to be
> rocksdb with how many objects you have.
>
>
> On Mon, Feb 12, 2018, 8:06 AM Behnam Loghmani
>  gmail.com 
>  gmail.com>> wrote:
>
> Hi there,
>
> I am using ceph Luminous 12.2.2 with:
>
> 3 osds (each osd is 100G) - no WAL/DB separation.
> 3 mons
> 1 rgw
> cluster size 3
>
> I stored lots of thumbnails with very small size on ceph with
> radosgw.
>
> Actual size of files is something about 32G but it filled 70G of
> each osd.
>
> what's the reason of this high disk usage?
> should I change "bluestore_min_alloc_size_hdd"? and If I change
> it and set it to smaller size, does it impact on performance?
>
> what is the best practice for storing small files on bluestore?
>
> Best regards,
> Behnam Loghmani
> ___
> ceph-users mailing list
> ceph-users at 
> lists.ceph.com 
>  lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore with so many small files

2018-02-13 Thread Igor Fedotov

Hi Behnam,

On 2/12/2018 4:06 PM, Behnam Loghmani wrote:

Hi there,

I am using ceph Luminous 12.2.2 with:

3 osds (each osd is 100G) - no WAL/DB separation.
3 mons
1 rgw
cluster size 3

I stored lots of thumbnails with very small size on ceph with radosgw.

Actual size of files is something about 32G but it filled 70G of each osd.

what's the reason of this high disk usage?
Most probably the major reason is BlueStore allocation granularity. E.g. 
an object of 1K bytes length needs 64K of disk space if default 
bluestore_min_alloc_size_hdd  (=64K) is applied.
Additional inconsistency in space reporting might also appear since 
BlueStore adds up DB volume space when accounting total store space. 
While free space is taken from Block device only. is As a result when 
reporting "Used" space always contain that total DB space part ( i.e. 
Used = Total(Block+DB) - Free(Block) ). That correlates to other 
comments in this thread about RockDB space usage.

There is a pending PR to fix that:
https://github.com/ceph/ceph/pull/19454/commits/144fb9663778f833782bdcb16acd707c3ed62a86
You may look for "Bluestore: inaccurate disk usage statistics problem" 
in this mail list for previous discussion as well.


should I change "bluestore_min_alloc_size_hdd"? and If I change it and 
set it to smaller size, does it impact on performance?
Unfortunately I haven't benchmark "small writes over hdd" cases much 
hence don't have exacts answer here. Indeed these 'min_alloc_size' 
family of parameters might impact the performance quite significantly.


what is the best practice for storing small files on bluestore?

Best regards,
Behnam Loghmani


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore with so many small files

2018-02-12 Thread Wido den Hollander



On 02/12/2018 03:16 PM, Behnam Loghmani wrote:
so you mean that rocksdb and osdmap filled disk about 40G for only 800k 
files?

I think it's not reasonable and it's too high


Could you check the output of the OSDs using a 'perf dump' on their 
admin socket?


The 'bluestore' and 'bluefs' sections should tell you:

- db_used_bytes
- onodes

using those values you can figure out how much data the DB is using and 
how many objects you have in the OSD.


Wido



On Mon, Feb 12, 2018 at 5:06 PM, David Turner > wrote:


Some of your overhead is the Wal and rocksdb that are on the OSDs.
The Wal is pretty static in size, but rocksdb grows with the amount
of objects you have. You also have copies of the osdmap on each osd.
There's just overhead that adds up. The biggest is going to be
rocksdb with how many objects you have.


On Mon, Feb 12, 2018, 8:06 AM Behnam Loghmani
mailto:behnam.loghm...@gmail.com>> wrote:

Hi there,

I am using ceph Luminous 12.2.2 with:

3 osds (each osd is 100G) - no WAL/DB separation.
3 mons
1 rgw
cluster size 3

I stored lots of thumbnails with very small size on ceph with
radosgw.

Actual size of files is something about 32G but it filled 70G of
each osd.

what's the reason of this high disk usage?
should I change "bluestore_min_alloc_size_hdd"? and If I change
it and set it to smaller size, does it impact on performance?

what is the best practice for storing small files on bluestore?

Best regards,
Behnam Loghmani
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore with so many small files

2018-02-12 Thread Behnam Loghmani
so you mean that rocksdb and osdmap filled disk about 40G for only 800k
files?
I think it's not reasonable and it's too high

On Mon, Feb 12, 2018 at 5:06 PM, David Turner  wrote:

> Some of your overhead is the Wal and rocksdb that are on the OSDs. The Wal
> is pretty static in size, but rocksdb grows with the amount of objects you
> have. You also have copies of the osdmap on each osd. There's just overhead
> that adds up. The biggest is going to be rocksdb with how many objects you
> have.
>
> On Mon, Feb 12, 2018, 8:06 AM Behnam Loghmani 
> wrote:
>
>> Hi there,
>>
>> I am using ceph Luminous 12.2.2 with:
>>
>> 3 osds (each osd is 100G) - no WAL/DB separation.
>> 3 mons
>> 1 rgw
>> cluster size 3
>>
>> I stored lots of thumbnails with very small size on ceph with radosgw.
>>
>> Actual size of files is something about 32G but it filled 70G of each osd.
>>
>> what's the reason of this high disk usage?
>> should I change "bluestore_min_alloc_size_hdd"? and If I change it and
>> set it to smaller size, does it impact on performance?
>>
>> what is the best practice for storing small files on bluestore?
>>
>> Best regards,
>> Behnam Loghmani
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore with so many small files

2018-02-12 Thread David Turner
Some of your overhead is the Wal and rocksdb that are on the OSDs. The Wal
is pretty static in size, but rocksdb grows with the amount of objects you
have. You also have copies of the osdmap on each osd. There's just overhead
that adds up. The biggest is going to be rocksdb with how many objects you
have.

On Mon, Feb 12, 2018, 8:06 AM Behnam Loghmani 
wrote:

> Hi there,
>
> I am using ceph Luminous 12.2.2 with:
>
> 3 osds (each osd is 100G) - no WAL/DB separation.
> 3 mons
> 1 rgw
> cluster size 3
>
> I stored lots of thumbnails with very small size on ceph with radosgw.
>
> Actual size of files is something about 32G but it filled 70G of each osd.
>
> what's the reason of this high disk usage?
> should I change "bluestore_min_alloc_size_hdd"? and If I change it and set
> it to smaller size, does it impact on performance?
>
> what is the best practice for storing small files on bluestore?
>
> Best regards,
> Behnam Loghmani
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore with so many small files

2018-02-12 Thread Behnam Loghmani
Hi there,

I am using ceph Luminous 12.2.2 with:

3 osds (each osd is 100G) - no WAL/DB separation.
3 mons
1 rgw
cluster size 3

I stored lots of thumbnails with very small size on ceph with radosgw.

Actual size of files is something about 32G but it filled 70G of each osd.

what's the reason of this high disk usage?
should I change "bluestore_min_alloc_size_hdd"? and If I change it and set
it to smaller size, does it impact on performance?

what is the best practice for storing small files on bluestore?

Best regards,
Behnam Loghmani
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com