[ceph-users] bluestore min alloc size vs. wasted space

2018-02-20 Thread Flemming Frandsen
I have set up a little ceph installation and added about 80k files of 
various sizes, then I added 1M files of 1 byte each totalling 1 MB, to 
see what kind of overhead is incurred per object.


The overhead for adding 1M objects seems to be 12252M/100 = 
0.012252M or 122 kB per file, which is a bit high, but in line with a 
min allocation size of 64 kB.



My ceph.conf file contained this line from when I initially deployed the 
cluster:

bluestore min alloc size = 4096

How do I set the min alloc size if not in the ceph.conf file?

Is it possible to change bluestore min alloc size for an existing 
cluster? How?



Even at this level of overhead I'm nowhere near to the 1129 kB per file 
was lost with the real data.



GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
273G  253G   19906M  7.12   81059
POOLS:
NAMEID QUOTA OBJECTS QUOTA BYTES 
USED   %USED MAX AVAIL OBJECTS DIRTY READ  WRITE 
RAW USED
.rgw.root   1  N/A N/A   1113 
0  120G 4 4   108 4 2226
default.rgw.control 2  N/A N/A  0 
0  120G 8 8 0 00
default.rgw.meta3  N/A N/A  0 
0  120G 0 0 0 00
default.rgw.log 4  N/A N/A  0 
0  120G 207 207 54085 360140
fs1_data5  N/A N/A  7890M  
3.11  120G   80001 80001 0  715k   15781M
fs1_metadata6  N/A N/A 40951k  
0.02  120G 839 839   682  103k   81902k


Overhead per object: (19586M-15781M) / 81059 = 0.046M = 46 kB per object



Added 1M files of 1 byte each totalling 1 MB:


GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
273G  241G   32158M 11.50   1056k
POOLS:
NAMEID QUOTA OBJECTS QUOTA BYTES 
USED   %USED MAX AVAIL OBJECTS DIRTY READ  WRITE 
RAW USED
.rgw.root   1  N/A N/A   1113 
0  114G 4 4   108 4 2226
default.rgw.control 2  N/A N/A  0 
0  114G 8 8 0 00
default.rgw.meta3  N/A N/A  0 
0  114G 0 0 0 00
default.rgw.log 4  N/A N/A  0 
0  114G 207 207 56374 375400
fs1_data5  N/A N/A  7891M  
3.27  114G 1080001 1054k  287k 3645k   15783M
fs1_metadata6  N/A N/A 29854k  
0.01  114G1837 1837  5739  118k   59708k


Delta:
   fs1_data: +2M raw space as expected
   fs1_metadata: -22M raw space, because who the fuck knows?
   RAW USED: +12252M

--
 Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
 Please use rele...@stibo.com for all Release Management requests

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df: Raw used vs. used vs. actual bytes in cephfs

2018-02-19 Thread Flemming Frandsen

I didn't know about ceph df detail, that's quite useful, thanks.

I was thinking that the problem had to do with some sort of internal 
fragmentation, because the filesystem in question does have millions 
(2.9 M or threabouts) of files, however, even if 4k is lost for each 
file, that only amounts to about 23 GB of raw space lost and I have 3276 
GB of raw space unaccounted for.


I've researched the min alloc option a bit and even though no 
documentation seems to exist, I've found that the default is 64k for 
hdd, but even if the lost space per file is 64k and that's mirrored, I 
can only account for 371 GB, so that doesn't really help a great deal.


I have set up an experimental cluster with "bluestore min alloc size = 
4096" and so far I've been unable to make it lose space like the first 
cluster.



I'm very worried that ceph is unusable because of this issue.



On 19/02/18 19:38, Pavel Shub wrote:

Could you be running into block size (minimum allocation unit)
overhead? Default bluestore block size is 4k for hdd and 64k for ssd.
This is exacerbated if you have tons of small files. I tend to see
this when "ceph df detail" sum of raw used in pools is less than the
global raw bytes used.

On Mon, Feb 19, 2018 at 2:09 AM, Flemming Frandsen
<flemming.frand...@stibosystems.com> wrote:

Each OSD lives on a separate HDD in bluestore with the journals on 2GB
partitions on a shared SSD.


On 16/02/18 21:08, Gregory Farnum wrote:

What does the cluster deployment look like? Usually this happens when you’re
sharing disks with the OS, or have co-located file journals or something.
On Fri, Feb 16, 2018 at 4:02 AM Flemming Frandsen
<flemming.frand...@stibosystems.com> wrote:

I'm trying out cephfs and I'm in the process of copying over some
real-world data to see what happens.

I have created a number of cephfs file systems, the only one I've
started working on is the one called jenkins specifically the one named
jenkins which lives in fs_jenkins_data and fs_jenkins_metadata.

According to ceph df I have about 1387 GB of data in all of the pools,
while the raw used space is 5918 GB, which gives a ratio of about 4.3, I
would have expected a ratio around 2 as the pool size has been set to 2.


Can anyone explain where half my space has been squandered?

  > ceph df
GLOBAL:
  SIZE  AVAIL RAW USED %RAW USED
  8382G 2463G5918G 70.61
POOLS:
  NAME ID USED   %USED MAX
AVAIL OBJECTS
  .rgw.root11113 0 258G
4
  default.rgw.control  2   0 0 258G
8
  default.rgw.meta 3   0 0 258G
0
  default.rgw.log  4   0 0 258G
207
  fs_docker-nexus_data 5  66120M 11.09 258G
22655
  fs_docker-nexus_metadata 6  39463k 0 258G
2376
  fs_meta_data 7 330 0 258G
4
  fs_meta_metadata 8567k 0 258G
22
  fs_jenkins_data  9   1321G 71.84 258G
28576278
  fs_jenkins_metadata  10 52178k 0 258G
2285493
  fs_nexus_data11  0 0 258G
0
  fs_nexus_metadata12   4181     0 258G
21

--
   Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
   Please use rele...@stibo.com for all Release Management requests

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
  Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
  Please use rele...@stibo.com for all Release Management requests


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
 Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
 Please use rele...@stibo.com for all Release Management requests

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df: Raw used vs. used vs. actual bytes in cephfs

2018-02-18 Thread Flemming Frandsen
Each OSD lives on a separate HDD in bluestore with the journals on 2GB 
partitions on a shared SSD.



On 16/02/18 21:08, Gregory Farnum wrote:
What does the cluster deployment look like? Usually this happens when 
you’re sharing disks with the OS, or have co-located file journals or 
something.
On Fri, Feb 16, 2018 at 4:02 AM Flemming Frandsen 
<flemming.frand...@stibosystems.com 
<mailto:flemming.frand...@stibosystems.com>> wrote:


I'm trying out cephfs and I'm in the process of copying over some
real-world data to see what happens.

I have created a number of cephfs file systems, the only one I've
started working on is the one called jenkins specifically the one
named
jenkins which lives in fs_jenkins_data and fs_jenkins_metadata.

According to ceph df I have about 1387 GB of data in all of the pools,
while the raw used space is 5918 GB, which gives a ratio of about
4.3, I
would have expected a ratio around 2 as the pool size has been set
to 2.


Can anyone explain where half my space has been squandered?

 > ceph df
GLOBAL:
 SIZE  AVAIL RAW USED %RAW USED
 8382G 2463G5918G 70.61
POOLS:
 NAME ID USED   %USED  MAX
AVAIL OBJECTS
 .rgw.root11113 0 258G   
4
 default.rgw.control  2   0 0 258G   
8
 default.rgw.meta 3   0 0 258G   
0
 default.rgw.log  4   0 0 258G   
  207
 fs_docker-nexus_data 5  66120M 11.09 258G   
22655
 fs_docker-nexus_metadata 6  39463k 0 258G   
 2376
 fs_meta_data 7 330 0 258G   
4
 fs_meta_metadata 8567k 0 258G   
   22
 fs_jenkins_data  9   1321G 71.84 258G   
 28576278
 fs_jenkins_metadata  10 52178k 0 258G   
  2285493
 fs_nexus_data11  0 0 258G   
0
 fs_nexus_metadata12   4181 0 258G   
   21


--
  Regards Flemming Frandsen - Stibo Systems - DK - STEP Release
Manager
  Please use rele...@stibo.com <mailto:rele...@stibo.com> for all
Release Management requests

___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
 Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
 Please use rele...@stibo.com for all Release Management requests

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph df: Raw used vs. used vs. actual bytes in cephfs

2018-02-16 Thread Flemming Frandsen
I'm trying out cephfs and I'm in the process of copying over some 
real-world data to see what happens.


I have created a number of cephfs file systems, the only one I've 
started working on is the one called jenkins specifically the one named 
jenkins which lives in fs_jenkins_data and fs_jenkins_metadata.


According to ceph df I have about 1387 GB of data in all of the pools, 
while the raw used space is 5918 GB, which gives a ratio of about 4.3, I 
would have expected a ratio around 2 as the pool size has been set to 2.



Can anyone explain where half my space has been squandered?

> ceph df
GLOBAL:
SIZE  AVAIL RAW USED %RAW USED
8382G 2463G5918G 70.61
POOLS:
NAME ID USED   %USED MAX 
AVAIL OBJECTS

.rgw.root11113 0 258G4
default.rgw.control  2   0 0 258G8
default.rgw.meta 3   0 0 258G0
default.rgw.log  4   0 0 258G  207
fs_docker-nexus_data 5  66120M 11.09 258G22655
fs_docker-nexus_metadata 6  39463k 0 258G 2376
fs_meta_data 7 330 0 258G4
fs_meta_metadata 8567k 0 258G   22
fs_jenkins_data  9   1321G 71.84 258G 28576278
fs_jenkins_metadata  10 52178k 0 258G  2285493
fs_nexus_data11  0 0 258G0
fs_nexus_metadata12   4181 0 258G   21

--
 Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
 Please use rele...@stibo.com for all Release Management requests

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing osd crush chooseleaf type at runtime

2018-02-06 Thread Flemming Frandsen

Ah! Right, I guess my actual question was:

How does osd crush chooseleaf type = 0 and 1 alter the crushmap?


By experimentation I've figured out that:

"osd crush chooseleaf type = 0" turns into "step choose firstn 0 type 
osd" and


"osd crush chooseleaf type = 1" turns into "step chooseleaf firstn 0 
type host".



Changing the crushmap in this way worked perfectly for me, ceph -s 
complained while doing the rebalancing, but eventually became happy with 
the result.



On 02/02/18 17:07, Gregory Farnum wrote:
Once you've created a crush map you need to edit it directly (either 
by dumping it from the cluster, editing with the crush tool, and 
importing; or via the ceph cli commands), rather than by updating 
config settings. I believe doing so is explained in the ceph docs.


On Fri, Feb 2, 2018 at 4:47 AM Flemming Frandsen 
<flemming.frand...@stibosystems.com 
<mailto:flemming.frand...@stibosystems.com>> wrote:


Hi, I'm just starting to play around with Ceph, so please excuse my
complete lack of a clue if this question is covered somewhere, but I
have been unable to find an answer.


I have a single machine running Ceph which was set up with osd crush
chooseleaf type = 0 in /etc/ceph/ceph.conf, now I've added a new
machine
with some new OSDs, so I'd like to change to osd crush chooseleaf
type =
1 and have Ceph re-balance the replicas.

How do I do that?

Preferably I'd like to make the change without making the cluster
unavailable.


So far I've edited the config file and tried restarting daemons,
including rebooting the entire OS, but I still see PGs that live
only on
one host.

I've read the config docuemntation page but it doesn't mention what to
do to make that specific config change take effect:

http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/

I've barked up the crushmap tree a bit, but I did not see how "osd
crush
chooseleaf type" relates to that in any way.


--
  Regards Flemming Frandsen - Stibo Systems - DK - STEP Release
Manager
  Please use rele...@stibo.com <mailto:rele...@stibo.com> for all
Release Management requests

___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
 Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
 Please use rele...@stibo.com for all Release Management requests

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Changing osd crush chooseleaf type at runtime

2018-02-02 Thread Flemming Frandsen
Hi, I'm just starting to play around with Ceph, so please excuse my 
complete lack of a clue if this question is covered somewhere, but I 
have been unable to find an answer.



I have a single machine running Ceph which was set up with osd crush 
chooseleaf type = 0 in /etc/ceph/ceph.conf, now I've added a new machine 
with some new OSDs, so I'd like to change to osd crush chooseleaf type = 
1 and have Ceph re-balance the replicas.


How do I do that?

Preferably I'd like to make the change without making the cluster 
unavailable.



So far I've edited the config file and tried restarting daemons, 
including rebooting the entire OS, but I still see PGs that live only on 
one host.


I've read the config docuemntation page but it doesn't mention what to 
do to make that specific config change take effect:


http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/

I've barked up the crushmap tree a bit, but I did not see how "osd crush 
chooseleaf type" relates to that in any way.



--
 Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
 Please use rele...@stibo.com for all Release Management requests

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com