Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5
Do you by any chance have your OSDs placed at a local directory path rather than on a non utilized physical disk? No, I have 18 Disks per Server. Each OSD is mapped to a physical disk. Here in the output of one server: ansible@zrh-srv-m-cph02:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg01-root 28G 4.5G 22G 18% / none 4.0K 0 4.0K 0% /sys/fs/cgroup udev 48G 4.0K 48G 1% /dev tmpfs9.5G 1.3M 9.5G 1% /run none 5.0M 0 5.0M 0% /run/lock none 48G 20K 48G 1% /run/shm none 100M 0 100M 0% /run/user /dev/mapper/vg01-tmp 4.5G 9.4M 4.3G 1% /tmp /dev/mapper/vg01-varlog 9.1G 5.1G 3.6G 59% /var/log /dev/sdf1932G 15G 917G 2% /var/lib/ceph/osd/ceph-3 /dev/sdg1932G 15G 917G 2% /var/lib/ceph/osd/ceph-4 /dev/sdl1932G 13G 919G 2% /var/lib/ceph/osd/ceph-8 /dev/sdo1932G 15G 917G 2% /var/lib/ceph/osd/ceph-11 /dev/sde1932G 15G 917G 2% /var/lib/ceph/osd/ceph-2 /dev/sdd1932G 15G 917G 2% /var/lib/ceph/osd/ceph-1 /dev/sdt1932G 15G 917G 2% /var/lib/ceph/osd/ceph-15 /dev/sdq1932G 12G 920G 2% /var/lib/ceph/osd/ceph-12 /dev/sdc1932G 14G 918G 2% /var/lib/ceph/osd/ceph-0 /dev/sds1932G 17G 916G 2% /var/lib/ceph/osd/ceph-14 /dev/sdu1932G 14G 918G 2% /var/lib/ceph/osd/ceph-16 /dev/sdm1932G 15G 917G 2% /var/lib/ceph/osd/ceph-9 /dev/sdk1932G 17G 915G 2% /var/lib/ceph/osd/ceph-7 /dev/sdn1932G 14G 918G 2% /var/lib/ceph/osd/ceph-10 /dev/sdr1932G 15G 917G 2% /var/lib/ceph/osd/ceph-13 /dev/sdv1932G 14G 918G 2% /var/lib/ceph/osd/ceph-17 /dev/sdh1932G 17G 916G 2% /var/lib/ceph/osd/ceph-5 /dev/sdj1932G 14G 918G 2% /var/lib/ceph/osd/ceph-30 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5
Hi! Do you by any chance have your OSDs placed at a local directory path rather than on a non utilized physical disk? If I remember correctly from a similar setup that I had performed in the past the ceph df command accounts for the entire disk and not just for the OSD data directory. I am not sure if this still applies since it was on an early Firefly release but it is something that it's easy to look for. I don't know if the above make sense but what I mean is that if for instance your OSD are at something like /var/lib/ceph/osd.X (or whatever) and this doesn't correspond to mounted a device (e.g. /dev/sdc1) but are local on the disk that provides the / or /var partition then you should do a df -h to see what the amount of data are on that partition and compare it with the ceph df output. It should be (more or less) the same. Best, George 2015-03-27 18:27 GMT+01:00 Gregory Farnum g...@gregs42.com: Ceph has per-pg and per-OSD metadata overhead. You currently have 26000 PGs, suitable for use on a cluster of the order of 260 OSDs. You have placed almost 7GB of data into it (21GB replicated) and have about 7GB of additional overhead. You might try putting a suitable amount of data into the cluster before worrying about the ratio of space used to data stored. :) -Greg Hello Greg, I put a suitable amount of data now, and it looks like my ratio is still 1 to 5. The folder: /var/lib/ceph/osd/ceph-N/current/meta/ did not grow, so it looks like that is not the problem. Do you have any hint how to troubleshoot this issue ??? ansible@zrh-srv-m-cph02:~$ ceph osd pool get .rgw.buckets size size: 3 ansible@zrh-srv-m-cph02:~$ ceph osd pool get .rgw.buckets min_size min_size: 2 ansible@zrh-srv-m-cph02:~$ ceph -w cluster 4179fcec-b336-41a1-a7fd-4a19a75420ea health HEALTH_WARN pool .rgw.buckets has too few pgs monmap e4: 4 mons at {rml-srv-m-cph01=10.120.50.20:6789/0,rml-srv-m-cph02=10.120.50.21:6789/0,rml-srv-m-stk03=10.120.50.32:6789/0,zrh-srv-m-cph02=10.120.50.2:6789/0}, election epoch 668, quorum 0,1,2,3 zrh-srv-m-cph02,rml-srv-m-cph01,rml-srv-m-cph02,rml-srv-m-stk03 osdmap e2170: 54 osds: 54 up, 54 in pgmap v619041: 28684 pgs, 15 pools, 109 GB data, 7358 kobjects 518 GB used, 49756 GB / 50275 GB avail 28684 active+clean ansible@zrh-srv-m-cph02:~$ ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 50275G 49756G 518G 1.03 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd0155 016461G 2 gianfranco 7156 016461G 2 images 8 257M 016461G 38 .rgw.root 9840 016461G 3 .rgw.control 10 0 016461G 8 .rgw 11 21334 016461G 108 .rgw.gc12 0 016461G 32 .users.uid 13 1575 016461G 6 .users 1472 016461G 6 .rgw.buckets.index 15 0 016461G 30 .users.swift 1736 016461G 3 .rgw.buckets 18 108G 0.2216461G 7534745 .intent-log19 0 016461G 0 .rgw.buckets.extra 20 0 016461G 0 volumes21 512M 016461G 161 ansible@zrh-srv-m-cph02:~$ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5
2015-03-27 18:27 GMT+01:00 Gregory Farnum g...@gregs42.com: Ceph has per-pg and per-OSD metadata overhead. You currently have 26000 PGs, suitable for use on a cluster of the order of 260 OSDs. You have placed almost 7GB of data into it (21GB replicated) and have about 7GB of additional overhead. You might try putting a suitable amount of data into the cluster before worrying about the ratio of space used to data stored. :) -Greg Hello Greg, I put a suitable amount of data now, and it looks like my ratio is still 1 to 5. The folder: /var/lib/ceph/osd/ceph-N/current/meta/ did not grow, so it looks like that is not the problem. Do you have any hint how to troubleshoot this issue ??? ansible@zrh-srv-m-cph02:~$ ceph osd pool get .rgw.buckets size size: 3 ansible@zrh-srv-m-cph02:~$ ceph osd pool get .rgw.buckets min_size min_size: 2 ansible@zrh-srv-m-cph02:~$ ceph -w cluster 4179fcec-b336-41a1-a7fd-4a19a75420ea health HEALTH_WARN pool .rgw.buckets has too few pgs monmap e4: 4 mons at {rml-srv-m-cph01=10.120.50.20:6789/0,rml-srv-m-cph02=10.120.50.21:6789/0,rml-srv-m-stk03=10.120.50.32:6789/0,zrh-srv-m-cph02=10.120.50.2:6789/0}, election epoch 668, quorum 0,1,2,3 zrh-srv-m-cph02,rml-srv-m-cph01,rml-srv-m-cph02,rml-srv-m-stk03 osdmap e2170: 54 osds: 54 up, 54 in pgmap v619041: 28684 pgs, 15 pools, 109 GB data, 7358 kobjects 518 GB used, 49756 GB / 50275 GB avail 28684 active+clean ansible@zrh-srv-m-cph02:~$ ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 50275G 49756G 518G 1.03 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS rbd0155 016461G 2 gianfranco 7156 016461G 2 images 8 257M 016461G 38 .rgw.root 9840 016461G 3 .rgw.control 10 0 016461G 8 .rgw 11 21334 016461G 108 .rgw.gc12 0 016461G 32 .users.uid 13 1575 016461G 6 .users 1472 016461G 6 .rgw.buckets.index 15 0 016461G 30 .users.swift 1736 016461G 3 .rgw.buckets 18 108G 0.2216461G 7534745 .intent-log19 0 016461G 0 .rgw.buckets.extra 20 0 016461G 0 volumes21 512M 016461G 161 ansible@zrh-srv-m-cph02:~$ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5
I will start now to push a lot of data into the cluster to see if the metadata grows a lot or stays costant. There is a way to clean up old metadata ? I pushed a lot of more data to the cluster. Then I lead the cluster sleep for the night. This morning I find this values: 6841 MB data 25814 MB used that is a bit more of 1 to 3. It looks like the extra space is in these folders (for N from 1 to 36): /var/lib/ceph/osd/ceph-N/current/meta/ This meta folders have a lot of data in it. I would really be happy to have pointers to understand what is in there and how to clean that up eventually. The problem is that googling for ceph meta or ceph metadata will produce results for Ceph MDS that is completely unrelated :( thanks Saverio ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5
Thanks for the answer. Now the meaning of MB data and MB used is clear, and if all the pools have size=3 I expect a ratio 1 to 3 of the two values. I still can't understand why MB used is so big in my setup. All my pools are size =3 but the ratio MB data and MB used is 1 to 5 instead of 1 to 3. My first guess was that I wrote a wrong crushmap that was making more than 3 copies.. (is it really possible to make such a mistake?) So I changed my crushmap and I put the default one, that just spreads data across hosts, but I see no change, the ratio is still 1 to 5. I thought maybe my 3 monitors have different views of the pgmap, so I tried to restart the monitors but this also did not help. What useful information may I share here to troubleshoot this issue further ? ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e) Thank you Saverio 2015-03-25 14:55 GMT+01:00 Gregory Farnum g...@gregs42.com: On Wed, Mar 25, 2015 at 1:24 AM, Saverio Proto ziopr...@gmail.com wrote: Hello there, I started to push data into my ceph cluster. There is something I cannot understand in the output of ceph -w. When I run ceph -w I get this kinkd of output: 2015-03-25 09:11:36.785909 mon.0 [INF] pgmap v278788: 26056 pgs: 26056 active+clean; 2379 MB data, 19788 MB used, 33497 GB / 33516 GB avail 2379MB is actually the data I pushed into the cluster, I can see it also in the ceph df output, and the numbers are consistent. What I dont understand is 19788MB used. All my pools have size 3, so I expected something like 2379 * 3. Instead this number is very big. I really need to understand how MB used grows because I need to know how many disks to buy. MB used is the summation of (the programmatic equivalent to) df across all your nodes, whereas MB data is calculated by the OSDs based on data they've written down. Depending on your configuration MB used can include thing like the OSD journals, or even totally unrelated data if the disks are shared with other applications. MB used including the space used by the OSD journals is my first guess about what you're seeing here, in which case you'll notice that it won't grow any faster than MB data does once the journal is fully allocated. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5
On Thu, Mar 26, 2015 at 2:56 AM, Saverio Proto ziopr...@gmail.com wrote: Thanks for the answer. Now the meaning of MB data and MB used is clear, and if all the pools have size=3 I expect a ratio 1 to 3 of the two values. I still can't understand why MB used is so big in my setup. All my pools are size =3 but the ratio MB data and MB used is 1 to 5 instead of 1 to 3. My first guess was that I wrote a wrong crushmap that was making more than 3 copies.. (is it really possible to make such a mistake?) So I changed my crushmap and I put the default one, that just spreads data across hosts, but I see no change, the ratio is still 1 to 5. I thought maybe my 3 monitors have different views of the pgmap, so I tried to restart the monitors but this also did not help. What useful information may I share here to troubleshoot this issue further ? ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e) You just need to go look at one of your OSDs and see what data is stored on it. Did you configure things so that the journals are using a file on the same storage disk? If so, *that* is why the data used is large. I promise that your 5:1 ratio won't persist as you write more than 2GB of data into the cluster. -Greg Thank you Saverio 2015-03-25 14:55 GMT+01:00 Gregory Farnum g...@gregs42.com: On Wed, Mar 25, 2015 at 1:24 AM, Saverio Proto ziopr...@gmail.com wrote: Hello there, I started to push data into my ceph cluster. There is something I cannot understand in the output of ceph -w. When I run ceph -w I get this kinkd of output: 2015-03-25 09:11:36.785909 mon.0 [INF] pgmap v278788: 26056 pgs: 26056 active+clean; 2379 MB data, 19788 MB used, 33497 GB / 33516 GB avail 2379MB is actually the data I pushed into the cluster, I can see it also in the ceph df output, and the numbers are consistent. What I dont understand is 19788MB used. All my pools have size 3, so I expected something like 2379 * 3. Instead this number is very big. I really need to understand how MB used grows because I need to know how many disks to buy. MB used is the summation of (the programmatic equivalent to) df across all your nodes, whereas MB data is calculated by the OSDs based on data they've written down. Depending on your configuration MB used can include thing like the OSD journals, or even totally unrelated data if the disks are shared with other applications. MB used including the space used by the OSD journals is my first guess about what you're seeing here, in which case you'll notice that it won't grow any faster than MB data does once the journal is fully allocated. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5
You just need to go look at one of your OSDs and see what data is stored on it. Did you configure things so that the journals are using a file on the same storage disk? If so, *that* is why the data used is large. I followed your suggestion and this is the result of my trobleshooting. Each OSD controls a disk that is mounted in a folder with the name: /var/lib/ceph/osd/ceph-N where N is the OSD number The journal is stored on another disk drive. I have three extra SSD drives per server, that I partitioned with 6 partitions each, and those partitions are journal partitions. I checked that the setup is correct because each /var/lib/ceph/osd/ceph-N/journal points correctly to another drive. with df -h I see the folders where my OSD are mounted. The space occupation looks well distributed among all OSDs as expected. the data is always in a folder called: /var/lib/ceph/osd/ceph-N/current I checked with the tool ncdu where the data is stored inside the current folders. in each OSD there is a folder with a lot of data called /var/lib/ceph/osd/ceph-N/current/meta If I sum the MB for each meta folder that is more or less the extra space that is consumed, leading to the 1 to 5 ratio. the meta folder contains a lot of binary files, unreadable, but looking at the file names it looks like it is where the versions of the osdmap are stored. but it is really a lot of metadata. I will start now to push a lot of data into the cluster to see if the metadata grows a lot or stays costant. There is a way to clean up old metadata ? thanks Saverio ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com