[ceph-users] Ceph mon quorum problems under load
621 mon.ariel1 mon.0 192.168.16.31:6789/0 33890 : cluster [INF] mon.ariel1 is new leader, mons ariel1,ariel2,ariel4 in quorum (ranks 0,1,2) 2018-07-06 02:50:03.642986 mon.ariel1 mon.0 192.168.16.31:6789/0 33895 : cluster [INF] overall HEALTH_OK 2018-07-06 02:50:46.757619 mon.ariel1 mon.0 192.168.16.31:6789/0 33899 : cluster [INF] mon.ariel1 calling monitor election 2018-07-06 02:50:46.920468 mon.ariel1 mon.0 192.168.16.31:6789/0 33900 : cluster [INF] mon.ariel1 is new leader, mons ariel1,ariel2,ariel4 in quorum (ranks 0,1,2) 2018-07-06 02:50:47.104222 mon.ariel1 mon.0 192.168.16.31:6789/0 33905 : cluster [INF] Health check cleared: MON_DOWN (was: 1/3 mons down, quorum ariel2,ariel4) 2018-07-06 02:50:47.104240 mon.ariel1 mon.0 192.168.16.31:6789/0 33906 : cluster [INF] Cluster is now healthy 2018-07-06 02:50:47.256301 mon.ariel1 mon.0 192.168.16.31:6789/0 33907 : cluster [INF] overall HEALTH_OK There seems to be some disturbance of mon traffic. Since the mons are communicating via a 10GBit interface, I would not assume a problem here. There are no errors logged either on the network interfaces or on the switches. Maybe the disks are too slow (osds are on SATA), so we are thinking about putting the bluestore journal on an SSD. But would that action help to stabilize the mons ? Or would a setup with 5 machines (5 mons running) be the better choice ? So we are a little stuck where to search for a solution. What debug output would help to see whether we have a disk or network problem here ? Thankx for your input ! Marcus Haarmann ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Help change civetweb front port error: Permission denied
Ceph is running as non-root user, so normally it is not permitted to listen to a port < 1024 for non-root users. This is not specific to ceph. You could trick a listener on port 80 with a redirect via iptables or you might proxy the connection through an apache/nginx instance. Marcus Haarmann Von: "谭林江" <tanlinji...@icloud.com> An: "ceph-users" <ceph-users@lists.ceph.com> Gesendet: Montag, 18. September 2017 10:32:53 Betreff: [ceph-users] Help change civetweb front port error: Permission denied Hi I create a gateway node and change it port is rgw_frontends = "civetweb port=80”, when run it response error: 2017-09-18 04:25:16.967378 7f2dd72e08c0 0 ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process radosgw, pid 3151 2017-09-18 04:25:17.025703 7f2dd72e08c0 0 framework: civetweb 2017-09-18 04:25:17.025712 7f2dd72e08c0 0 framework conf key: port, val: 80 2017-09-18 04:25:17.025716 7f2dd72e08c0 0 starting handler: civetweb 2017-09-18 04:25:17.025943 7f2dd72e08c0 0 civetweb: 0x55ac3b9bab20: set_ports_option: cannot bind to 80: 13 (Permission denied) 2017-09-18 04:25:17.032177 7f2db4ff9700 -1 failed to list objects pool_iterate returned r=-2 2017-09-18 04:25:17.032183 7f2db4ff9700 0 ERROR: lists_keys_next(): ret=-2 2017-09-18 04:25:17.032186 7f2db4ff9700 0 ERROR: sync_all_users() returned ret=-2 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs(Kraken 11.2.1), Unable to write more file when one dir more than 100000 files, mds_bal_fragment_size_max = 5000000
Its a feature ... http://docs.ceph.com/docs/master/cephfs/dirfrags/ https://www.spinics.net/lists/ceph-users/msg31473.html Marcus Haarmann Von: donglifec...@gmail.com An: "zyan" <z...@redhat.com> CC: "ceph-users" <ceph-users@lists.ceph.com> Gesendet: Freitag, 8. September 2017 07:30:53 Betreff: [ceph-users] cephfs(Kraken 11.2.1), Unable to write more file when one dir more than 10 files, mds_bal_fragment_size_max = 500 ZhengYan, I test cephfs( Kraken 11.2.1), I can't write more files when one dir more than 10 files, I have already set up " mds_bal_fragment_size_max = 500". why is this case? Is it a bug? Thanks a lot. donglifec...@gmail.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] luminous/bluetsore osd memory requirements
Hi, we have done some testing with bluestore and found that the memory consumption of the osd processes is depending not on the real data amount stored but on the number of stored objects. This means that e.g. a block device of 100 GB which spreads over 100 objects has a different memory usage than storing 1000 smaller objects (the bluestore blocksize should be tuned for that kind of setup). (100 objects of size 4k to 100k had a memory consumption of ~4GB on the osd on standard block size, while the amount of data was only ~15GB). So it depends on the usage, a cephfs stores each file as a single object, while the rbd is configured to allocate larger objects. Marcus Haarmann Von: "Stijn De Weirdt" <stijn.dewei...@ugent.be> An: "ceph-users" <ceph-users@lists.ceph.com> Gesendet: Donnerstag, 10. August 2017 10:34:48 Betreff: [ceph-users] luminous/bluetsore osd memory requirements hi all, we are planning to purchse new OSD hardware, and we are wondering if for upcoming luminous with bluestore OSDs, anything wrt the hardware recommendations from http://docs.ceph.com/docs/master/start/hardware-recommendations/ will be different, esp the memory/cpu part. i understand from colleagues that the async messenger makes a big difference in memory usage (maybe also cpu load?); but we are also interested in the "1GB of RAM per TB" recommendation/requirement. many thanks, stijn ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CEPH bluestore space consumption with small objects
Hi, I can check if this would change anything, but we are currently trying to find a different solution. The issue we ran into by using rados as backend with a bluestore osd was that every object seems to be cached in the osd and the memory consumption of the osd was increasing very much. This is not very useful for us, since the objects need to be accessed rarely and have a very long period of time to exist. So we are checking out now rdb with a database on top or a filesystem on top, which will handle the huge amount of small objects. This will have the drawback that a filesystem or a database could become inconsistent easier than a rados-only approach. Even cephfs was not the right approach since the space consumption would be the same as with rados directly. Thanks to everybody, Marcus Haarmann Von: "Pavel Shub" <pa...@citymaps.com> An: "Gregory Farnum" <gfar...@redhat.com> CC: "Wido den Hollander" <w...@42on.com>, "ceph-users" <ceph-users@lists.ceph.com>, "Marcus Haarmann" <marcus.haarm...@midoco.de> Gesendet: Dienstag, 8. August 2017 17:50:44 Betreff: Re: [ceph-users] CEPH bluestore space consumption with small objects Marcus, You may want to look at the bluestore_min_alloc_size setting as well as the respective bluestore_min_alloc_size_ssd and bluestore_min_alloc_size_hdd. By default bluestore sets a 64k block size for ssds. I'm also using ceph for small objects and I've see my OSD usage go down from 80% to 20% after setting the min alloc size to 4k. Thanks, Pavel On Thu, Aug 3, 2017 at 3:59 PM, Gregory Farnum <gfar...@redhat.com> wrote: > Don't forget that at those sizes the internal journals and rocksdb size > tunings are likely to be a significant fixed cost. > > On Thu, Aug 3, 2017 at 3:13 AM Wido den Hollander <w...@42on.com> wrote: >> >> >> > Op 2 augustus 2017 om 17:55 schreef Marcus Haarmann >> > <marcus.haarm...@midoco.de>: >> > >> > >> > Hi, >> > we are doing some tests here with a Kraken setup using bluestore backend >> > (on Ubuntu 64 bit). >> > We are trying to store > 10 mio very small objects using RADOS. >> > (no fs, no rdb, only osd and monitors) >> > >> > The setup was done with ceph-deploy, using the standard bluestore >> > option, no separate devices >> > for wal. The test cluster spreads over 3 virtual machines, each with >> > 100GB storage für osd. >> > >> > We are now in the following situation (used pool is "test"): >> > rados df >> > POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRAED >> > RD_OPS RD WR_OPS WR >> > rbd 0 2 0 6 0 0 0 49452 39618k 855 12358k >> > test 17983M 595427 0 1786281 0 0 0 29 77824 596426 17985M >> > >> > total_objects 595429 >> > total_used 141G >> > total_avail 158G >> > total_space 299G >> > >> > ceph osd df >> > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS >> > 0 0.09760 1.0 102298M 50763M 51535M 49.62 1.00 72 >> > 1 0.09760 1.0 102298M 50799M 51499M 49.66 1.00 72 >> > 2 0.09760 1.0 102298M 50814M 51484M 49.67 1.00 72 >> > TOTAL 299G 148G 150G 49.65 >> > MIN/MAX VAR: 1.00/1.00 STDDEV: 0.02 >> > >> > As you can see, there are about 18GB data stored in ~595000 objects now. >> > The actual space consumption is about 150GB, which fills about half of >> > the storage. >> > >> >> Not really. Each OSD uses 50GB, but since you replicate 3 times (default) >> it's storing 150GB spread out over 3 OSDs. >> >> So your data is 18GB, but consumes 50GB. That's still ~2.5x which is a >> lot, but a lot less then 150GB. >> >> > Objects have been added with a test script using the rados command line >> > (put). >> > >> > Obviously, the stored objects are counted byte by byte in the rados df >> > command, >> > but the real space allocation is about factor 8. >> > >> >> As written above, it's ~2.5x, not 8x. >> >> > The stored objects are a mixture of 2kb, 10kb, 50kb, 100kb objects. >> > >> > Is there any recommended way to configure bluestore with a better >> > suitable >> > block size for those small objects ? I cannot find any configuration >> > option >> > which would allow modification of the internal block handling of >> > bluestore. >> > Is luminous an option which allows more specific configuration ? >> > >> >> Could
[ceph-users] CEPH bluestore space consumption with small objects
Hi, we are doing some tests here with a Kraken setup using bluestore backend (on Ubuntu 64 bit). We are trying to store > 10 mio very small objects using RADOS. (no fs, no rdb, only osd and monitors) The setup was done with ceph-deploy, using the standard bluestore option, no separate devices for wal. The test cluster spreads over 3 virtual machines, each with 100GB storage für osd. We are now in the following situation (used pool is "test"): rados df POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRAED RD_OPS RD WR_OPS WR rbd 0 2 0 6 0 0 0 49452 39618k 855 12358k test 17983M 595427 0 1786281 0 0 0 29 77824 596426 17985M total_objects 595429 total_used 141G total_avail 158G total_space 299G ceph osd df ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 0.09760 1.0 102298M 50763M 51535M 49.62 1.00 72 1 0.09760 1.0 102298M 50799M 51499M 49.66 1.00 72 2 0.09760 1.0 102298M 50814M 51484M 49.67 1.00 72 TOTAL 299G 148G 150G 49.65 MIN/MAX VAR: 1.00/1.00 STDDEV: 0.02 As you can see, there are about 18GB data stored in ~595000 objects now. The actual space consumption is about 150GB, which fills about half of the storage. Objects have been added with a test script using the rados command line (put). Obviously, the stored objects are counted byte by byte in the rados df command, but the real space allocation is about factor 8. The stored objects are a mixture of 2kb, 10kb, 50kb, 100kb objects. Is there any recommended way to configure bluestore with a better suitable block size for those small objects ? I cannot find any configuration option which would allow modification of the internal block handling of bluestore. Is luminous an option which allows more specific configuration ? Thank you all in advance for support. Marcus Haarmann ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com