[ceph-users] bluestore min alloc size vs. wasted space
I have set up a little ceph installation and added about 80k files of various sizes, then I added 1M files of 1 byte each totalling 1 MB, to see what kind of overhead is incurred per object. The overhead for adding 1M objects seems to be 12252M/100 = 0.012252M or 122 kB per file, which is a bit high, but in line with a min allocation size of 64 kB. My ceph.conf file contained this line from when I initially deployed the cluster: bluestore min alloc size = 4096 How do I set the min alloc size if not in the ceph.conf file? Is it possible to change bluestore min alloc size for an existing cluster? How? Even at this level of overhead I'm nowhere near to the 1129 kB per file was lost with the real data. GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 273G 253G 19906M 7.12 81059 POOLS: NAMEID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED .rgw.root 1 N/A N/A 1113 0 120G 4 4 108 4 2226 default.rgw.control 2 N/A N/A 0 0 120G 8 8 0 00 default.rgw.meta3 N/A N/A 0 0 120G 0 0 0 00 default.rgw.log 4 N/A N/A 0 0 120G 207 207 54085 360140 fs1_data5 N/A N/A 7890M 3.11 120G 80001 80001 0 715k 15781M fs1_metadata6 N/A N/A 40951k 0.02 120G 839 839 682 103k 81902k Overhead per object: (19586M-15781M) / 81059 = 0.046M = 46 kB per object Added 1M files of 1 byte each totalling 1 MB: GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 273G 241G 32158M 11.50 1056k POOLS: NAMEID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED .rgw.root 1 N/A N/A 1113 0 114G 4 4 108 4 2226 default.rgw.control 2 N/A N/A 0 0 114G 8 8 0 00 default.rgw.meta3 N/A N/A 0 0 114G 0 0 0 00 default.rgw.log 4 N/A N/A 0 0 114G 207 207 56374 375400 fs1_data5 N/A N/A 7891M 3.27 114G 1080001 1054k 287k 3645k 15783M fs1_metadata6 N/A N/A 29854k 0.01 114G1837 1837 5739 118k 59708k Delta: fs1_data: +2M raw space as expected fs1_metadata: -22M raw space, because who the fuck knows? RAW USED: +12252M -- Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager Please use rele...@stibo.com for all Release Management requests ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph df: Raw used vs. used vs. actual bytes in cephfs
I didn't know about ceph df detail, that's quite useful, thanks. I was thinking that the problem had to do with some sort of internal fragmentation, because the filesystem in question does have millions (2.9 M or threabouts) of files, however, even if 4k is lost for each file, that only amounts to about 23 GB of raw space lost and I have 3276 GB of raw space unaccounted for. I've researched the min alloc option a bit and even though no documentation seems to exist, I've found that the default is 64k for hdd, but even if the lost space per file is 64k and that's mirrored, I can only account for 371 GB, so that doesn't really help a great deal. I have set up an experimental cluster with "bluestore min alloc size = 4096" and so far I've been unable to make it lose space like the first cluster. I'm very worried that ceph is unusable because of this issue. On 19/02/18 19:38, Pavel Shub wrote: Could you be running into block size (minimum allocation unit) overhead? Default bluestore block size is 4k for hdd and 64k for ssd. This is exacerbated if you have tons of small files. I tend to see this when "ceph df detail" sum of raw used in pools is less than the global raw bytes used. On Mon, Feb 19, 2018 at 2:09 AM, Flemming Frandsen <flemming.frand...@stibosystems.com> wrote: Each OSD lives on a separate HDD in bluestore with the journals on 2GB partitions on a shared SSD. On 16/02/18 21:08, Gregory Farnum wrote: What does the cluster deployment look like? Usually this happens when you’re sharing disks with the OS, or have co-located file journals or something. On Fri, Feb 16, 2018 at 4:02 AM Flemming Frandsen <flemming.frand...@stibosystems.com> wrote: I'm trying out cephfs and I'm in the process of copying over some real-world data to see what happens. I have created a number of cephfs file systems, the only one I've started working on is the one called jenkins specifically the one named jenkins which lives in fs_jenkins_data and fs_jenkins_metadata. According to ceph df I have about 1387 GB of data in all of the pools, while the raw used space is 5918 GB, which gives a ratio of about 4.3, I would have expected a ratio around 2 as the pool size has been set to 2. Can anyone explain where half my space has been squandered? > ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 8382G 2463G5918G 70.61 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS .rgw.root11113 0 258G 4 default.rgw.control 2 0 0 258G 8 default.rgw.meta 3 0 0 258G 0 default.rgw.log 4 0 0 258G 207 fs_docker-nexus_data 5 66120M 11.09 258G 22655 fs_docker-nexus_metadata 6 39463k 0 258G 2376 fs_meta_data 7 330 0 258G 4 fs_meta_metadata 8567k 0 258G 22 fs_jenkins_data 9 1321G 71.84 258G 28576278 fs_jenkins_metadata 10 52178k 0 258G 2285493 fs_nexus_data11 0 0 258G 0 fs_nexus_metadata12 4181 0 258G 21 -- Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager Please use rele...@stibo.com for all Release Management requests ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager Please use rele...@stibo.com for all Release Management requests ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager Please use rele...@stibo.com for all Release Management requests ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph df: Raw used vs. used vs. actual bytes in cephfs
Each OSD lives on a separate HDD in bluestore with the journals on 2GB partitions on a shared SSD. On 16/02/18 21:08, Gregory Farnum wrote: What does the cluster deployment look like? Usually this happens when you’re sharing disks with the OS, or have co-located file journals or something. On Fri, Feb 16, 2018 at 4:02 AM Flemming Frandsen <flemming.frand...@stibosystems.com <mailto:flemming.frand...@stibosystems.com>> wrote: I'm trying out cephfs and I'm in the process of copying over some real-world data to see what happens. I have created a number of cephfs file systems, the only one I've started working on is the one called jenkins specifically the one named jenkins which lives in fs_jenkins_data and fs_jenkins_metadata. According to ceph df I have about 1387 GB of data in all of the pools, while the raw used space is 5918 GB, which gives a ratio of about 4.3, I would have expected a ratio around 2 as the pool size has been set to 2. Can anyone explain where half my space has been squandered? > ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 8382G 2463G5918G 70.61 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS .rgw.root11113 0 258G 4 default.rgw.control 2 0 0 258G 8 default.rgw.meta 3 0 0 258G 0 default.rgw.log 4 0 0 258G 207 fs_docker-nexus_data 5 66120M 11.09 258G 22655 fs_docker-nexus_metadata 6 39463k 0 258G 2376 fs_meta_data 7 330 0 258G 4 fs_meta_metadata 8567k 0 258G 22 fs_jenkins_data 9 1321G 71.84 258G 28576278 fs_jenkins_metadata 10 52178k 0 258G 2285493 fs_nexus_data11 0 0 258G 0 fs_nexus_metadata12 4181 0 258G 21 -- Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager Please use rele...@stibo.com <mailto:rele...@stibo.com> for all Release Management requests ___ ceph-users mailing list ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager Please use rele...@stibo.com for all Release Management requests ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph df: Raw used vs. used vs. actual bytes in cephfs
I'm trying out cephfs and I'm in the process of copying over some real-world data to see what happens. I have created a number of cephfs file systems, the only one I've started working on is the one called jenkins specifically the one named jenkins which lives in fs_jenkins_data and fs_jenkins_metadata. According to ceph df I have about 1387 GB of data in all of the pools, while the raw used space is 5918 GB, which gives a ratio of about 4.3, I would have expected a ratio around 2 as the pool size has been set to 2. Can anyone explain where half my space has been squandered? > ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 8382G 2463G5918G 70.61 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS .rgw.root11113 0 258G4 default.rgw.control 2 0 0 258G8 default.rgw.meta 3 0 0 258G0 default.rgw.log 4 0 0 258G 207 fs_docker-nexus_data 5 66120M 11.09 258G22655 fs_docker-nexus_metadata 6 39463k 0 258G 2376 fs_meta_data 7 330 0 258G4 fs_meta_metadata 8567k 0 258G 22 fs_jenkins_data 9 1321G 71.84 258G 28576278 fs_jenkins_metadata 10 52178k 0 258G 2285493 fs_nexus_data11 0 0 258G0 fs_nexus_metadata12 4181 0 258G 21 -- Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager Please use rele...@stibo.com for all Release Management requests ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Changing osd crush chooseleaf type at runtime
Ah! Right, I guess my actual question was: How does osd crush chooseleaf type = 0 and 1 alter the crushmap? By experimentation I've figured out that: "osd crush chooseleaf type = 0" turns into "step choose firstn 0 type osd" and "osd crush chooseleaf type = 1" turns into "step chooseleaf firstn 0 type host". Changing the crushmap in this way worked perfectly for me, ceph -s complained while doing the rebalancing, but eventually became happy with the result. On 02/02/18 17:07, Gregory Farnum wrote: Once you've created a crush map you need to edit it directly (either by dumping it from the cluster, editing with the crush tool, and importing; or via the ceph cli commands), rather than by updating config settings. I believe doing so is explained in the ceph docs. On Fri, Feb 2, 2018 at 4:47 AM Flemming Frandsen <flemming.frand...@stibosystems.com <mailto:flemming.frand...@stibosystems.com>> wrote: Hi, I'm just starting to play around with Ceph, so please excuse my complete lack of a clue if this question is covered somewhere, but I have been unable to find an answer. I have a single machine running Ceph which was set up with osd crush chooseleaf type = 0 in /etc/ceph/ceph.conf, now I've added a new machine with some new OSDs, so I'd like to change to osd crush chooseleaf type = 1 and have Ceph re-balance the replicas. How do I do that? Preferably I'd like to make the change without making the cluster unavailable. So far I've edited the config file and tried restarting daemons, including rebooting the entire OS, but I still see PGs that live only on one host. I've read the config docuemntation page but it doesn't mention what to do to make that specific config change take effect: http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/ I've barked up the crushmap tree a bit, but I did not see how "osd crush chooseleaf type" relates to that in any way. -- Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager Please use rele...@stibo.com <mailto:rele...@stibo.com> for all Release Management requests ___ ceph-users mailing list ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager Please use rele...@stibo.com for all Release Management requests ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Changing osd crush chooseleaf type at runtime
Hi, I'm just starting to play around with Ceph, so please excuse my complete lack of a clue if this question is covered somewhere, but I have been unable to find an answer. I have a single machine running Ceph which was set up with osd crush chooseleaf type = 0 in /etc/ceph/ceph.conf, now I've added a new machine with some new OSDs, so I'd like to change to osd crush chooseleaf type = 1 and have Ceph re-balance the replicas. How do I do that? Preferably I'd like to make the change without making the cluster unavailable. So far I've edited the config file and tried restarting daemons, including rebooting the entire OS, but I still see PGs that live only on one host. I've read the config docuemntation page but it doesn't mention what to do to make that specific config change take effect: http://docs.ceph.com/docs/master/rados/configuration/ceph-conf/ I've barked up the crushmap tree a bit, but I did not see how "osd crush chooseleaf type" relates to that in any way. -- Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager Please use rele...@stibo.com for all Release Management requests ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com