[ceph-users] Glance client and RBD export checksum mismatch
Dear All, Ceph Version : 12.2.5-2.ge988fb6.el7 We are facing an issue on glance which have backend set to ceph, when we try to create an instance or volume out of an image, it throws checksum error. When we use rbd export and use md5sum, value is matching with glance checksum. When we use following script, it provides same error checksum as glance. We have used below images for testing. 1. Failing image (checksum mismatch): ffed4088-74e1-4f22-86cb-35e7e97c377c 2. Passing image (checksum identical): c048f0f9-973d-4285-9397-939251c80a84 Output from storage node: 1. Failing image: ffed4088-74e1-4f22-86cb-35e7e97c377c checksum from glance database: 34da2198ec7941174349712c6d2096d8 [root@storage01moc ~]# python test_rbd_format.py ffed4088-74e1-4f22-86cb-35e7e97c377c admin Image size: 681181184 checksum from ceph: b82d85ae5160a7b74f52be6b5871f596 Remarks: checksum is different 2. Passing image: c048f0f9-973d-4285-9397-939251c80a84 checksum from glance database: 4f977f748c9ac2989cff32732ef740ed [root@storage01moc ~]# python test_rbd_format.py c048f0f9-973d-4285-9397-939251c80a84 admin Image size: 1411121152 checksum from ceph: 4f977f748c9ac2989cff32732ef740ed Remarks: checksum is identical Wondering whether this issue is from ceph python libs or from ceph itself. Please note that we do not have ceph pool tiering configured. Please let us know whether anyone faced similar issue and any fixes for this. test_rbd_format.py === import rados, sys, rbd image_id = sys.argv[1] try: rados_id = sys.argv[2] except: rados_id = 'openstack' class ImageIterator(object): """ Reads data from an RBD image, one chunk at a time. """ def __init__(self, conn, pool, name, snapshot, store, chunk_size='8'): self.pool = pool self.conn = conn self.name = name self.snapshot = snapshot self.chunk_size = chunk_size self.store = store def __iter__(self): try: with conn.open_ioctx(self.pool) as ioctx: with rbd.Image(ioctx, self.name, snapshot=self.snapshot) as image: img_info = image.stat() size = img_info['size'] bytes_left = size while bytes_left > 0: length = min(self.chunk_size, bytes_left) data = image.read(size - bytes_left, length) bytes_left -= len(data) yield data raise StopIteration() except rbd.ImageNotFound: raise exceptions.NotFound( _('RBD image %s does not exist') % self.name) conn = rados.Rados(conffile='/etc/ceph/ceph.conf',rados_id=rados_id) conn.connect() with conn.open_ioctx('images') as ioctx: try: with rbd.Image(ioctx, image_id, snapshot='snap') as image: img_info = image.stat() print "Image size: %s " % img_info['size'] iter, size = (ImageIterator(conn, 'images', image_id, 'snap', 'rbd'), img_info['size']) import six, hashlib md5sum = hashlib.md5() for chunk in iter: if isinstance(chunk, six.string_types): chunk = six.b(chunk) md5sum.update(chunk) md5sum = md5sum.hexdigest() print "checksum from ceph: " + md5sum except: raise === Thank You ! -- Best Regards, Brayan Perera ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to trigger offline filestore merge
Hi again, Thanks to a hint from another user I seem to have gotten past this. The trick was to restart the osds with a positive merge threshold (10) then cycle through rados bench several hundred times, e.g. while true ; do rados bench -p default.rgw.buckets.index 10 write -b 4096 -t 128; sleep 5 ; done After running that for awhile the PG filestore structure has merged down and now listing the pool and backfilling are back to normal. Thanks! Dan On Tue, Apr 9, 2019 at 7:05 PM Dan van der Ster wrote: > > Hi all, > > We have a slight issue while trying to migrate a pool from filestore > to bluestore. > > This pool used to have 20 million objects in filestore -- it now has > 50,000. During its life, the filestore pgs were internally split > several times, but never merged. Now the pg _head dirs have mostly > empty directories. > This creates some problems: > > 1. rados ls -p hangs a long time, eventually triggering slow > requests while the filestore_op threads time out. (They time out while > listing the collections). > 2. backfilling from these PGs is impossible, similarly because > listing the objects to backfill eventually leads to the osd flapping. > > So I want to merge the filestore pgs. > > I tried ceph-objectstore-tool --op apply-layout-settings, but it seems > that this only splits, not merges? > > Does someone have a better idea? > > Thanks! > > Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] NFS-Ganesha Mounts as a Read-Only Filesystem
Looks like you are trying to write to the pseudo-root, mount /cephfs instead of /. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Sat, Apr 6, 2019 at 1:07 PM wrote: > > Hi all, > > > > I have recently setup a Ceph cluster and on request using CephFS (MDS > version: ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic > (stable)) as a backend for NFS-Ganesha. I have successfully tested a direct > mount with CephFS to read/write files, however I’m perplexed as to NFS > mounting as read-only despite setting the RW flags. > > > > [root@mon02 mnt]# touch cephfs/test.txt > > touch: cannot touch âcephfs/test.txtâ: Read-only file system > > > > Configuration of Ganesha is below: > > > > NFS_CORE_PARAM > > { > > Enable_NLM = false; > > Enable_RQUOTA = false; > > Protocols = 4; > > } > > > > NFSv4 > > { > > Delegations = true; > > RecoveryBackend = rados_ng; > > Minor_Versions = 1,2; > > } > > > > CACHEINODE { > > Dir_Chunk = 0; > > NParts = 1; > > Cache_Size = 1; > > } > > > > EXPORT > > { > > Export_ID = 15; > > Path = "/"; > > Pseudo = "/cephfs/"; > > Access_Type = RW; > > NFS_Protocols = "4"; > > Squash = No_Root_Squash; > > Transport_Protocols = TCP; > > SecType = "none"; > > Attr_Expiration_Time = 0; > > Delegations = R; > > > > FSAL { > > Name = CEPH; > > User_Id = "ganesha"; > > Filesystem = "cephfs"; > > Secret_Access_Key = ""; > > } > > } > > > > > > Provided mount parameters: > > > > mount -t nfs -o nfsvers=4.1,proto=tcp,rw,noatime,sync 172.16.32.15:/ > /mnt/cephfs > > > > I have tried stripping much of the config and altering mount options, but so > far completely unable to decipher the cause. Also seems I’m not the only one > who has been caught on this: > > > > https://www.spinics.net/lists/ceph-devel/msg41201.html > > > > Thanks in advance, > > > > Thomas > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch
> On Apr 8, 2019, at 5:42 PM, Bryan Stillwell wrote: > > >> On Apr 8, 2019, at 4:38 PM, Gregory Farnum wrote: >> >> On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell >> wrote: >>> >>> There doesn't appear to be any correlation between the OSDs which would >>> point to a hardware issue, and since it's happening on two different >>> clusters I'm wondering if there's a race condition that has been fixed in a >>> later version? >>> >>> Also, what exactly is the omap digest? From what I can tell it appears to >>> be some kind of checksum for the omap data. Is that correct? >> >> Yeah; it's just a crc over the omap key-value data that's checked >> during deep scrub. Same as the data digest. >> >> I've not noticed any issues around this in Luminous but I probably >> wouldn't have, so will have to leave it up to others if there are >> fixes in since 12.2.8. > > Thanks for adding some clarity to that Greg! > > For some added information, this is what the logs reported earlier today: > > 2019-04-08 11:46:15.610169 osd.504 osd.504 10.16.10.30:6804/8874 33 : cluster > [ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest > 0x26a1241b != omap_digest 0x4c10ee76 from shard 504 > 2019-04-08 11:46:15.610190 osd.504 osd.504 10.16.10.30:6804/8874 34 : cluster > [ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest > 0x26a1241b != omap_digest 0x4c10ee76 from shard 504 > > I then tried deep scrubbing it again to see if the data was fine, but the > digest calculation was just having problems. It came back with the same > problem with new digest values: > > 2019-04-08 15:56:21.186291 osd.504 osd.504 10.16.10.30:6804/8874 49 : cluster > [ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest > 0x93bac8f != omap_digest 0 xab1b9c6f from shard 504 > 2019-04-08 15:56:21.186313 osd.504 osd.504 10.16.10.30:6804/8874 50 : cluster > [ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest > 0x93bac8f != omap_digest 0 xab1b9c6f from shard 504 > > Which makes sense, but doesn’t explain why the omap data is getting out of > sync across multiple OSDs and clusters… > > I’ll see what I can figure out tomorrow, but if anyone else has some hints I > would love to hear them. I’ve dug into this more today and it appears that the omap data contains an extra entry on the OSDs with the mismatched omap digests. I then searched the RGW logs and found that a DELETE happened shortly after the OSD booted, but the omap data wasn’t updated on that OSD so it became mismatched. Here’s a timeline of the events which caused PG 7.9 to become inconsistent: 2019-04-04 14:37:34 - osd.492 marked itself down 2019-04-04 14:40:35 - osd.492 boot 2019-04-04 14:41:55 - DELETE call happened 2019-04-08 12:06:14 - omap_digest mismatch detected (pg 7.9 is active+clean+inconsistent, acting [492,546,523]) Here’s the timeline for PG 7.2b: 2019-04-03 13:54:17 - osd.488 marked itself down 2019-04-03 13:59:27 - osd.488 boot 2019-04-03 14:00:54 - DELETE call happened 2019-04-08 12:42:21 - omap_digest mismatch detected (pg 7.2b is active+clean+inconsistent, acting [488,511,541]) Bryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] showing active config settings
I noticed when changing some settings, they appear to stay the same, for example when trying to set this higher: ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4' It gives the usual warning about may need to restart, but it still has the old value: # ceph --show-config | grep osd_recovery_max_active osd_recovery_max_active = 3 restarting the OSDs seems fairly intrusive for every configuration change. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] problems with pg down
Hi Fabio, Did you resolve the issue? A bit late, i know, but did you tried to restart OSD 14? If 102 and 121 are fine i would also try to crush reweight 14 to 0. Greetings Mehmet Am 10. März 2019 19:26:57 MEZ schrieb Fabio Abreu : >Hi Darius, > >Thanks for your reply ! > >This happening after a disaster with an sata storage node, the osds 102 >and >121 is up . > >The information belllow is osd 14 log , do you recommend mark out of >this >cluster ? > >2019-03-10 17:36:17.654134 7f1991163700 0 -- 172.16.184.90:6800/589935 >>> >:/0 pipe(0x555be7808800 sd=516 :6800 s=0 pgs=0 cs=0 l=0 >c=0x555be6720400).accept failed to getpeername (107) Transport endpoint >is >not connected >2019-03-10 17:36:17.654660 7f1992d7f700 0 -- 172.16.184.90:6800/589935 >>> >:/0 pipe(0x555be773f400 sd=536 :6800 s=0 pgs=0 cs=0 l=0 >c=0x555be6720700).accept failed to getpeername (107) Transport endpoint >is >not connected >2019-03-10 17:36:17.654720 7f1993a8c700 0 -- 172.16.184.90:6800/589935 >>> >172.16.184.92:6801/102 pipe(0x555be7807400 sd=542 :6800 s=0 pgs=0 >cs=0 >l=0 c=0x555be6720280).accept connect_seq 0 vs existing 0 state wait >2019-03-10 17:36:17.654813 7f199095b700 0 -- 172.16.184.90:6800/589935 >>> >:/0 pipe(0x555be6d8e000 sd=537 :6800 s=0 pgs=0 cs=0 l=0 >c=0x555be671ff80).accept failed to getpeername (107) Transport endpoint >is >not connected >2019-03-10 17:36:17.654847 7f1992476700 0 -- 172.16.184.90:6800/589935 >>> >172.16.184.95:6840/1537112 pipe(0x555be773e000 sd=533 :6800 s=0 pgs=0 >cs=0 >l=0 c=0x555be671fc80).accept connect_seq 0 vs existing 0 state wait >2019-03-10 17:36:17.655252 7f1993486700 0 -- 172.16.184.90:6800/589935 >>> >172.16.184.92:6832/1098862 pipe(0x555be779f400 sd=521 :6800 s=0 pgs=0 >cs=0 >l=0 c=0x555be6242d00).accept connect_seq 0 vs existing 0 state wait >2019-03-10 17:36:17.655315 7f1993284700 0 -- 172.16.184.90:6800/589935 >>> >:/0 pipe(0x555be6d90800 sd=523 :6800 s=0 pgs=0 cs=0 l=0 >c=0x555be6720880).accept failed to getpeername (107) Transport endpoint >is >not connected >2019-03-10 17:36:17.655814 7f1992173700 0 -- 172.16.184.90:6800/589935 >>> >172.16.184.91:6833/316673 pipe(0x555be7740800 sd=527 :6800 s=0 pgs=0 >cs=0 >l=0 c=0x555be6720580).accept connect_seq 0 vs existing 0 state wait > >Regards, >Fabio Abreu > >On Sun, Mar 10, 2019 at 3:20 PM Darius Kasparavičius >wrote: > >> Hi, >> >> Check your osd.14 logs for information its currently stuck and not >> providing io for replication. And what happened to OSD's 102 121? >> >> On Sun, Mar 10, 2019 at 7:44 PM Fabio Abreu > >> wrote: >> > >> > Hi Everybody . >> > >> > I have an pg with down+peering state and that have requests >blocked >> impacting my pg query, I can't find the osd to apply the lost >paremeter. >> > >> > >> >http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/#placement-group-down-peering-failure >> > >> > Did someone have same scenario with state down? >> > >> > Storage : >> > >> > 100 ops are blocked > 262.144 sec on osd.14 >> > >> > root@monitor:~# ceph pg dump_stuck inactive >> > ok >> > pg_stat state up up_primary acting acting_primary >> > 5.6e0 down+remapped+peering [102,121,14]102 [14]14 >> > >> > >> > root@monitor:~# ceph -s >> > cluster xxx >> > health HEALTH_ERR >> > 1 pgs are stuck inactive for more than 300 seconds >> > 223 pgs backfill_wait >> > 14 pgs backfilling >> > 215 pgs degraded >> > 1 pgs down >> > 1 pgs peering >> > 1 pgs recovering >> > 53 pgs recovery_wait >> > 199 pgs stuck degraded >> > 1 pgs stuck inactive >> > 278 pgs stuck unclean >> > 162 pgs stuck undersized >> > 162 pgs undersized >> > 100 requests are blocked > 32 sec >> > recovery 2767660/317878237 objects degraded (0.871%) >> > recovery 7484106/317878237 objects misplaced (2.354%) >> > recovery 29/105009626 unfoun >> > >> > >> > >> > >> > -- >> > Regards, >> > Fabio Abreu Reis >> > http://fajlinux.com.br >> > Tel : +55 21 98244-0161 >> > Skype : fabioabreureis >> > ___ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > >-- >Atenciosamente, >Fabio Abreu Reis >http://fajlinux.com.br >*Tel : *+55 21 98244-0161 >*Skype : *fabioabreureis ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Try to log the IP in the header X-Forwarded-For with radosgw behind haproxy
On 4/9/19 12:43 PM, Francois Lafont wrote: 2. In my Docker container context, is it possible to put the logs above in the file "/var/log/syslog" of my host, in other words is it possible to make sure to log this in stdout of the daemon "radosgw"? In brief, is it possible log "operations" in a regular file or better for me in stdout? -- flaf ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore
Good point, thanks ! By making memory pressure (by playing with vm.min_free_kbytes), memory is freed by the kernel. So I think I essentially need to update monitoring rules, to avoid false positive. Thanks, I continue to read your resources. Le mardi 09 avril 2019 à 09:30 -0500, Mark Nelson a écrit : > My understanding is that basically the kernel is either unable or > uninterested (maybe due to lack of memory pressure?) in reclaiming > the > memory . It's possible you might have better behavior if you set > /sys/kernel/mm/khugepaged/max_ptes_none to a low value (maybe 0) or > maybe disable transparent huge pages entirely. > > > Some background: > > https://github.com/gperftools/gperftools/issues/1073 > > https://blog.nelhage.com/post/transparent-hugepages/ > > https://www.kernel.org/doc/Documentation/vm/transhuge.txt > > > Mark > > > On 4/9/19 7:31 AM, Olivier Bonvalet wrote: > > Well, Dan seems to be right : > > > > _tune_cache_size > > target: 4294967296 > >heap: 6514409472 > >unmapped: 2267537408 > > mapped: 4246872064 > > old cache_size: 2845396873 > > new cache size: 2845397085 > > > > > > So we have 6GB in heap, but "only" 4GB mapped. > > > > But "ceph tell osd.* heap release" should had release that ? > > > > > > Thanks, > > > > Olivier > > > > > > Le lundi 08 avril 2019 à 16:09 -0500, Mark Nelson a écrit : > > > One of the difficulties with the osd_memory_target work is that > > > we > > > can't > > > tune based on the RSS memory usage of the process. Ultimately > > > it's up > > > to > > > the kernel to decide to reclaim memory and especially with > > > transparent > > > huge pages it's tough to judge what the kernel is going to do > > > even > > > if > > > memory has been unmapped by the process. Instead the autotuner > > > looks > > > at > > > how much memory has been mapped and tries to balance the caches > > > based > > > on > > > that. > > > > > > > > > In addition to Dan's advice, you might also want to enable debug > > > bluestore at level 5 and look for lines containing "target:" and > > > "cache_size:". These will tell you the current target, the > > > mapped > > > memory, unmapped memory, heap size, previous aggregate cache > > > size, > > > and > > > new aggregate cache size. The other line will give you a break > > > down > > > of > > > how much memory was assigned to each of the bluestore caches and > > > how > > > much each case is using. If there is a memory leak, the > > > autotuner > > > can > > > only do so much. At some point it will reduce the caches to fit > > > within > > > cache_min and leave it there. > > > > > > > > > Mark > > > > > > > > > On 4/8/19 5:18 AM, Dan van der Ster wrote: > > > > Which OS are you using? > > > > With CentOS we find that the heap is not always automatically > > > > released. (You can check the heap freelist with `ceph tell > > > > osd.0 > > > > heap > > > > stats`). > > > > As a workaround we run this hourly: > > > > > > > > ceph tell mon.* heap release > > > > ceph tell osd.* heap release > > > > ceph tell mds.* heap release > > > > > > > > -- Dan > > > > > > > > On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet < > > > > ceph.l...@daevel.fr> wrote: > > > > > Hi, > > > > > > > > > > on a Luminous 12.2.11 deploiement, my bluestore OSD exceed > > > > > the > > > > > osd_memory_target : > > > > > > > > > > daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd > > > > > ceph3646 17.1 12.0 6828916 5893136 ? Ssl mars29 > > > > > 1903:42 /usr/bin/ceph-osd -f --cluster ceph --id 143 -- > > > > > setuser > > > > > ceph --setgroup ceph > > > > > ceph3991 12.9 11.2 6342812 5485356 ? Ssl mars29 > > > > > 1443:41 /usr/bin/ceph-osd -f --cluster ceph --id 144 -- > > > > > setuser > > > > > ceph --setgroup ceph > > > > > ceph4361 16.9 11.8 6718432 5783584 ? Ssl mars29 > > > > > 1889:41 /usr/bin/ceph-osd -f --cluster ceph --id 145 -- > > > > > setuser > > > > > ceph --setgroup ceph > > > > > ceph4731 19.7 12.2 6949584 5982040 ? Ssl mars29 > > > > > 2198:47 /usr/bin/ceph-osd -f --cluster ceph --id 146 -- > > > > > setuser > > > > > ceph --setgroup ceph > > > > > ceph5073 16.7 11.6 6639568 5701368 ? Ssl mars29 > > > > > 1866:05 /usr/bin/ceph-osd -f --cluster ceph --id 147 -- > > > > > setuser > > > > > ceph --setgroup ceph > > > > > ceph5417 14.6 11.2 6386764 5519944 ? Ssl mars29 > > > > > 1634:30 /usr/bin/ceph-osd -f --cluster ceph --id 148 -- > > > > > setuser > > > > > ceph --setgroup ceph > > > > > ceph5760 16.9 12.0 6806448 5879624 ? Ssl mars29 > > > > > 1882:42 /usr/bin/ceph-osd -f --cluster ceph --id 149 -- > > > > > setuser > > > > > ceph --setgroup ceph > > > > > ceph6105 16.0 11.6 6576336 5694556 ? Ssl mars29 > > > > > 1782:52 /usr/bin/ceph-osd -f --cluster ceph --id 150 -- > > > > > setuser > > > > > ceph --setgroup ceph > > > > > > > > > > daevel-ob@ssdr712h:~$ free -m > > > >
[ceph-users] how to trigger offline filestore merge
Hi all, We have a slight issue while trying to migrate a pool from filestore to bluestore. This pool used to have 20 million objects in filestore -- it now has 50,000. During its life, the filestore pgs were internally split several times, but never merged. Now the pg _head dirs have mostly empty directories. This creates some problems: 1. rados ls -p hangs a long time, eventually triggering slow requests while the filestore_op threads time out. (They time out while listing the collections). 2. backfilling from these PGs is impossible, similarly because listing the objects to backfill eventually leads to the osd flapping. So I want to merge the filestore pgs. I tried ceph-objectstore-tool --op apply-layout-settings, but it seems that this only splits, not merges? Does someone have a better idea? Thanks! Dan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Remove RBD mirror?
Can you pastebin the results from running the following on your backup site rbd-mirror daemon node? ceph --admin-socket /path/to/asok config set debug_rbd_mirror 15 ceph --admin-socket /path/to/asok rbd mirror restart nova wait a minute to let some logs accumulate ... ceph --admin-socket /path/to/asok config set debug_rbd_mirror 0/5 ... and collect the rbd-mirror log from /var/log/ceph/ (should have lots of "rbd::mirror"-like log entries. On Tue, Apr 9, 2019 at 12:23 PM Magnus Grönlund wrote: > > > > Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman : >> >> Any chance your rbd-mirror daemon has the admin sockets available >> (defaults to /var/run/ceph/cephdr-clientasok)? If >> so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status". > > > { > "pool_replayers": [ > { > "pool": "glance", > "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00 cluster: > production client: client.productionbackup", > "instance_id": "869081", > "leader_instance_id": "869081", > "leader": true, > "instances": [], > "local_cluster_admin_socket": > "/var/run/ceph/client.backup.1936211.backup.94225674131712.asok", > "remote_cluster_admin_socket": > "/var/run/ceph/client.productionbackup.1936211.production.9422567521.asok", > "sync_throttler": { > "max_parallel_syncs": 5, > "running_syncs": 0, > "waiting_syncs": 0 > }, > "image_replayers": [ > { > "name": "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0", > "state": "Replaying" > }, > { > "name": "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62", > "state": "Replaying" > }, > ---cut-- > { > "name": > "cinder/volume-bcb41f46-3716-4ee2-aa19-6fbc241fbf05", > "state": "Replaying" > } > ] > }, > { > "pool": "nova", > "peer": "uuid: 1fc7fefc-9bcb-4f36-a259-66c3d8086702 cluster: > production client: client.productionbackup", > "instance_id": "889074", > "leader_instance_id": "889074", > "leader": true, > "instances": [], > "local_cluster_admin_socket": > "/var/run/ceph/client.backup.1936211.backup.94225678548048.asok", > "remote_cluster_admin_socket": > "/var/run/ceph/client.productionbackup.1936211.production.94225679621728.asok", > "sync_throttler": { > "max_parallel_syncs": 5, > "running_syncs": 0, > "waiting_syncs": 0 > }, > "image_replayers": [] > } > ], > "image_deleter": { > "image_deleter_status": { > "delete_images_queue": [ > { > "local_pool_id": 3, > "global_image_id": "ff531159-de6f-4324-a022-50c079dedd45" > } > ], > "failed_deletes_queue": [] > } >> >> >> On Tue, Apr 9, 2019 at 11:26 AM Magnus Grönlund wrote: >> > >> > >> > >> > Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman : >> >> >> >> On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund >> >> wrote: >> >> > >> >> > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund >> >> > >wrote: >> >> > >> >> >> > >> Hi, >> >> > >> We have configured one-way replication of pools between a production >> >> > >> cluster and a backup cluster. But unfortunately the rbd-mirror or >> >> > >> the backup cluster is unable to keep up with the production cluster >> >> > >> so the replication fails to reach replaying state. >> >> > > >> >> > >Hmm, it's odd that they don't at least reach the replaying state. Are >> >> > >they still performing the initial sync? >> >> > >> >> > There are three pools we try to mirror, (glance, cinder, and nova, no >> >> > points for guessing what the cluster is used for :) ), >> >> > the glance and cinder pools are smaller and sees limited write >> >> > activity, and the mirroring works, the nova pool which is the largest >> >> > and has 90% of the write activity never leaves the "unknown" state. >> >> > >> >> > # rbd mirror pool status cinder >> >> > health: OK >> >> > images: 892 total >> >> > 890 replaying >> >> > 2 stopped >> >> > # >> >> > # rbd mirror pool status nova >> >> > health: WARNING >> >> > images: 2479 total >> >> > 2479 unknown >> >> > # >> >> > The production clsuter has 5k writes/s on average and the backup >> >> > cluster has 1-2k writes/s on average. The production cluster is bigger >> >> > and has better specs. I thought that the backup cluster would be able >> >> > to keep up but it looks like I was wrong. >> >> >> >> The fact that they are in the unknown state just means that
Re: [ceph-users] Remove RBD mirror?
Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman : > Any chance your rbd-mirror daemon has the admin sockets available > (defaults to /var/run/ceph/cephdr-clientasok)? If > so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status". > { "pool_replayers": [ { "pool": "glance", "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00 cluster: production client: client.productionbackup", "instance_id": "869081", "leader_instance_id": "869081", "leader": true, "instances": [], "local_cluster_admin_socket": "/var/run/ceph/client.backup.1936211.backup.94225674131712.asok", "remote_cluster_admin_socket": "/var/run/ceph/client.productionbackup.1936211.production.9422567521.asok", "sync_throttler": { "max_parallel_syncs": 5, "running_syncs": 0, "waiting_syncs": 0 }, "image_replayers": [ { "name": "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0", "state": "Replaying" }, { "name": "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62", "state": "Replaying" }, ---cut-- { "name": "cinder/volume-bcb41f46-3716-4ee2-aa19-6fbc241fbf05", "state": "Replaying" } ] }, { "pool": "nova", "peer": "uuid: 1fc7fefc-9bcb-4f36-a259-66c3d8086702 cluster: production client: client.productionbackup", "instance_id": "889074", "leader_instance_id": "889074", "leader": true, "instances": [], "local_cluster_admin_socket": "/var/run/ceph/client.backup.1936211.backup.94225678548048.asok", "remote_cluster_admin_socket": "/var/run/ceph/client.productionbackup.1936211.production.94225679621728.asok", "sync_throttler": { "max_parallel_syncs": 5, "running_syncs": 0, "waiting_syncs": 0 }, "image_replayers": [] } ], "image_deleter": { "image_deleter_status": { "delete_images_queue": [ { "local_pool_id": 3, "global_image_id": "ff531159-de6f-4324-a022-50c079dedd45" } ], "failed_deletes_queue": [] } > > On Tue, Apr 9, 2019 at 11:26 AM Magnus Grönlund > wrote: > > > > > > > > Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman : > >> > >> On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund > wrote: > >> > > >> > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund > wrote: > >> > >> > >> > >> Hi, > >> > >> We have configured one-way replication of pools between a > production cluster and a backup cluster. But unfortunately the rbd-mirror > or the backup cluster is unable to keep up with the production cluster so > the replication fails to reach replaying state. > >> > > > >> > >Hmm, it's odd that they don't at least reach the replaying state. Are > >> > >they still performing the initial sync? > >> > > >> > There are three pools we try to mirror, (glance, cinder, and nova, no > points for guessing what the cluster is used for :) ), > >> > the glance and cinder pools are smaller and sees limited write > activity, and the mirroring works, the nova pool which is the largest and > has 90% of the write activity never leaves the "unknown" state. > >> > > >> > # rbd mirror pool status cinder > >> > health: OK > >> > images: 892 total > >> > 890 replaying > >> > 2 stopped > >> > # > >> > # rbd mirror pool status nova > >> > health: WARNING > >> > images: 2479 total > >> > 2479 unknown > >> > # > >> > The production clsuter has 5k writes/s on average and the backup > cluster has 1-2k writes/s on average. The production cluster is bigger and > has better specs. I thought that the backup cluster would be able to keep > up but it looks like I was wrong. > >> > >> The fact that they are in the unknown state just means that the remote > >> "rbd-mirror" daemon hasn't started any journal replayers against the > >> images. If it couldn't keep up, it would still report a status of > >> "up+replaying". What Ceph release are you running on your backup > >> cluster? > >> > > The backup cluster is running Luminous 12.2.11 (the production cluster > 12.2.10) > > > >> > >> > >> And the journals on the rbd volumes keep growing... > >> > >> > >> > >> Is it enough to simply disable the mirroring of the pool (rbd > mirror pool disable ) and that will remove the lagging reader from > the journals and shrink them, or is there anything else that has to be done? > >> > > > >> > >You can either disable the journaling feature on the image(s) since > >> > >there is no point to leave it on if you aren't using
Re: [ceph-users] Remove RBD mirror?
Any chance your rbd-mirror daemon has the admin sockets available (defaults to /var/run/ceph/cephdr-clientasok)? If so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status". On Tue, Apr 9, 2019 at 11:26 AM Magnus Grönlund wrote: > > > > Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman : >> >> On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund wrote: >> > >> > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund >> > >wrote: >> > >> >> > >> Hi, >> > >> We have configured one-way replication of pools between a production >> > >> cluster and a backup cluster. But unfortunately the rbd-mirror or the >> > >> backup cluster is unable to keep up with the production cluster so the >> > >> replication fails to reach replaying state. >> > > >> > >Hmm, it's odd that they don't at least reach the replaying state. Are >> > >they still performing the initial sync? >> > >> > There are three pools we try to mirror, (glance, cinder, and nova, no >> > points for guessing what the cluster is used for :) ), >> > the glance and cinder pools are smaller and sees limited write activity, >> > and the mirroring works, the nova pool which is the largest and has 90% of >> > the write activity never leaves the "unknown" state. >> > >> > # rbd mirror pool status cinder >> > health: OK >> > images: 892 total >> > 890 replaying >> > 2 stopped >> > # >> > # rbd mirror pool status nova >> > health: WARNING >> > images: 2479 total >> > 2479 unknown >> > # >> > The production clsuter has 5k writes/s on average and the backup cluster >> > has 1-2k writes/s on average. The production cluster is bigger and has >> > better specs. I thought that the backup cluster would be able to keep up >> > but it looks like I was wrong. >> >> The fact that they are in the unknown state just means that the remote >> "rbd-mirror" daemon hasn't started any journal replayers against the >> images. If it couldn't keep up, it would still report a status of >> "up+replaying". What Ceph release are you running on your backup >> cluster? >> > The backup cluster is running Luminous 12.2.11 (the production cluster > 12.2.10) > >> >> > >> And the journals on the rbd volumes keep growing... >> > >> >> > >> Is it enough to simply disable the mirroring of the pool (rbd mirror >> > >> pool disable ) and that will remove the lagging reader from the >> > >> journals and shrink them, or is there anything else that has to be done? >> > > >> > >You can either disable the journaling feature on the image(s) since >> > >there is no point to leave it on if you aren't using mirroring, or run >> > >"rbd mirror pool disable " to purge the journals. >> > >> > Thanks for the confirmation. >> > I will stop the mirror of the nova pool and try to figure out if there is >> > anything we can do to get the backup cluster to keep up. >> > >> > >> Best regards >> > >> /Magnus >> > >> ___ >> > >> ceph-users mailing list >> > >> ceph-users@lists.ceph.com >> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > >> > >-- >> > >Jason >> >> >> >> -- >> Jason -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Remove RBD mirror?
Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman : > On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund > wrote: > > > > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund > wrote: > > >> > > >> Hi, > > >> We have configured one-way replication of pools between a production > cluster and a backup cluster. But unfortunately the rbd-mirror or the > backup cluster is unable to keep up with the production cluster so the > replication fails to reach replaying state. > > > > > >Hmm, it's odd that they don't at least reach the replaying state. Are > > >they still performing the initial sync? > > > > There are three pools we try to mirror, (glance, cinder, and nova, no > points for guessing what the cluster is used for :) ), > > the glance and cinder pools are smaller and sees limited write activity, > and the mirroring works, the nova pool which is the largest and has 90% of > the write activity never leaves the "unknown" state. > > > > # rbd mirror pool status cinder > > health: OK > > images: 892 total > > 890 replaying > > 2 stopped > > # > > # rbd mirror pool status nova > > health: WARNING > > images: 2479 total > > 2479 unknown > > # > > The production clsuter has 5k writes/s on average and the backup cluster > has 1-2k writes/s on average. The production cluster is bigger and has > better specs. I thought that the backup cluster would be able to keep up > but it looks like I was wrong. > > The fact that they are in the unknown state just means that the remote > "rbd-mirror" daemon hasn't started any journal replayers against the > images. If it couldn't keep up, it would still report a status of > "up+replaying". What Ceph release are you running on your backup > cluster? > > The backup cluster is running Luminous 12.2.11 (the production cluster 12.2.10) > > >> And the journals on the rbd volumes keep growing... > > >> > > >> Is it enough to simply disable the mirroring of the pool (rbd mirror > pool disable ) and that will remove the lagging reader from the > journals and shrink them, or is there anything else that has to be done? > > > > > >You can either disable the journaling feature on the image(s) since > > >there is no point to leave it on if you aren't using mirroring, or run > > >"rbd mirror pool disable " to purge the journals. > > > > Thanks for the confirmation. > > I will stop the mirror of the nova pool and try to figure out if there > is anything we can do to get the backup cluster to keep up. > > > > >> Best regards > > >> /Magnus > > >> ___ > > >> ceph-users mailing list > > >> ceph-users@lists.ceph.com > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > >-- > > >Jason > > > > -- > Jason > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Remove RBD mirror?
On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund wrote: > > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund wrote: > >> > >> Hi, > >> We have configured one-way replication of pools between a production > >> cluster and a backup cluster. But unfortunately the rbd-mirror or the > >> backup cluster is unable to keep up with the production cluster so the > >> replication fails to reach replaying state. > > > >Hmm, it's odd that they don't at least reach the replaying state. Are > >they still performing the initial sync? > > There are three pools we try to mirror, (glance, cinder, and nova, no points > for guessing what the cluster is used for :) ), > the glance and cinder pools are smaller and sees limited write activity, and > the mirroring works, the nova pool which is the largest and has 90% of the > write activity never leaves the "unknown" state. > > # rbd mirror pool status cinder > health: OK > images: 892 total > 890 replaying > 2 stopped > # > # rbd mirror pool status nova > health: WARNING > images: 2479 total > 2479 unknown > # > The production clsuter has 5k writes/s on average and the backup cluster has > 1-2k writes/s on average. The production cluster is bigger and has better > specs. I thought that the backup cluster would be able to keep up but it > looks like I was wrong. The fact that they are in the unknown state just means that the remote "rbd-mirror" daemon hasn't started any journal replayers against the images. If it couldn't keep up, it would still report a status of "up+replaying". What Ceph release are you running on your backup cluster? > >> And the journals on the rbd volumes keep growing... > >> > >> Is it enough to simply disable the mirroring of the pool (rbd mirror pool > >> disable ) and that will remove the lagging reader from the journals > >> and shrink them, or is there anything else that has to be done? > > > >You can either disable the journaling feature on the image(s) since > >there is no point to leave it on if you aren't using mirroring, or run > >"rbd mirror pool disable " to purge the journals. > > Thanks for the confirmation. > I will stop the mirror of the nova pool and try to figure out if there is > anything we can do to get the backup cluster to keep up. > > >> Best regards > >> /Magnus > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > >-- > >Jason -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Remove RBD mirror?
>On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund wrote: >> >> Hi, >> We have configured one-way replication of pools between a production cluster and a backup cluster. But unfortunately the rbd-mirror or the backup cluster is unable to keep up with the production cluster so the replication fails to reach replaying state. > >Hmm, it's odd that they don't at least reach the replaying state. Are >they still performing the initial sync? There are three pools we try to mirror, (glance, cinder, and nova, no points for guessing what the cluster is used for :) ), the glance and cinder pools are smaller and sees limited write activity, and the mirroring works, the nova pool which is the largest and has 90% of the write activity never leaves the "unknown" state. # rbd mirror pool status cinder health: OK images: 892 total 890 replaying 2 stopped # # rbd mirror pool status nova health: WARNING images: 2479 total 2479 unknown # The production clsuter has 5k writes/s on average and the backup cluster has 1-2k writes/s on average. The production cluster is bigger and has better specs. I thought that the backup cluster would be able to keep up but it looks like I was wrong. >> And the journals on the rbd volumes keep growing... >> >> Is it enough to simply disable the mirroring of the pool (rbd mirror pool disable ) and that will remove the lagging reader from the journals and shrink them, or is there anything else that has to be done? > >You can either disable the journaling feature on the image(s) since >there is no point to leave it on if you aren't using mirroring, or run >"rbd mirror pool disable " to purge the journals. Thanks for the confirmation. I will stop the mirror of the nova pool and try to figure out if there is anything we can do to get the backup cluster to keep up. >> Best regards >> /Magnus >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >-- >Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to tune Ceph RBD mirroring parameters to speed up replication
On Thu, Apr 4, 2019 at 6:27 AM huxia...@horebdata.cn wrote: > > thanks a lot, Jason. > > how much performance loss should i expect by enabling rbd mirroring? I really > need to minimize any performance impact while using this disaster recovery > feature. Will a dedicated journal on Intel Optane NVMe help? If so, how big > the size should be? The worst-case impact is effectively double the write latency and bandwidth (since the librbd client needs to journal the IO first before committing the actual changes to the image). I would definitely recommend using a separate fast pool for the journal to minimum the initial journal write latency hit. The librbd in-memory cache in writeback mode can also help since it can help absorb the additional latency since the write IO can be (effectively) immediately ACKed if you have enough space in the cache. > cheers, > > Samuel > > > huxia...@horebdata.cn > > > From: Jason Dillaman > Date: 2019-04-03 23:03 > To: huxia...@horebdata.cn > CC: ceph-users > Subject: Re: [ceph-users] How to tune Ceph RBD mirroring parameters to speed > up replication > For better or worse, out of the box, librbd and rbd-mirror are > configured to conserve memory at the expense of performance to support > the potential case of thousands of images being mirrored and only a > single "rbd-mirror" daemon attempting to handle the load. > > You can optimize writes by adding "rbd_journal_max_payload_bytes = > 8388608" to the "[client]" section on the librbd client nodes. > Normally, writes larger than 16KiB are broken into multiple journal > entries to allow the remote "rbd-mirror" daemon to make forward > progress w/o using too much memory, so this will ensure large IOs only > require a single journal entry. > > You can also add "rbd_mirror_journal_max_fetch_bytes = 33554432" to > the "[client]" section on the "rbd-mirror" daemon nodes and restart > the daemon for the change to take effect. Normally, the daemon tries > to nibble the per-image journal events to prevent excessive memory use > in the case where potentially thousands of images are being mirrored. > > On Wed, Apr 3, 2019 at 4:34 PM huxia...@horebdata.cn > wrote: > > > > Hello, folks, > > > > I am setting up two ceph clusters to test async replication via RBD > > mirroring. The two clusters are very close, just in two buildings about 20m > > away, and the networking is very good as well, 10Gb Fiber connection. In > > this case, how should i tune the relevant RBD mirroring parameters to > > accelerate the replication? > > > > thanks in advance, > > > > Samuel > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > Jason > -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Remove RBD mirror?
On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund wrote: > > Hi, > We have configured one-way replication of pools between a production cluster > and a backup cluster. But unfortunately the rbd-mirror or the backup cluster > is unable to keep up with the production cluster so the replication fails to > reach replaying state. Hmm, it's odd that they don't at least reach the replaying state. Are they still performing the initial sync? > And the journals on the rbd volumes keep growing... > > Is it enought to simply disable the mirroring of the pool (rbd mirror pool > disable ) and that will remove the lagging reader from the journals and > shrink them, or is there any thing else that has to be done? You can either disable the journaling feature on the image(s) since there is no point to leave it on if you aren't using mirroring, or run "rbd mirror pool disable " to purge the journals. > Best regards > /Magnus > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Remove RBD mirror?
Hi, We have configured one-way replication of pools between a production cluster and a backup cluster. But unfortunately the rbd-mirror or the backup cluster is unable to keep up with the production cluster so the replication fails to reach replaying state. And the journals on the rbd volumes keep growing... Is it enought to simply disable the mirroring of the pool (rbd mirror pool disable ) and that will remove the lagging reader from the journals and shrink them, or is there any thing else that has to be done? Best regards /Magnus ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] BADAUTHORIZER in Nautilus
Update: I think we have a work-around, but no root cause yet. What is working is removing the 'v2' bits from the ceph.conf file across the cluster, and turning off all cephx authentication. Now everything seems to be talking correctly other than some odd metrics around the edges. Here's my current ceph.conf, running on all ceph hosts and clients: [global] fsid = 3f390b5e-2b1d-4a2f-ba00- mon_host = [v1:10.36.9.43:6789/0] [v1:10.36.9.44:6789/0] [v1: 10.36.9.45:6789/0] auth_client_required = none auth_cluster_required = none auth_service_required = none If we get better information as to what's going on, I'll post here for future reference On Thu, Apr 4, 2019 at 9:16 AM Sage Weil wrote: > On Thu, 4 Apr 2019, Shawn Edwards wrote: > > It was disabled in a fit of genetic debugging. I've now tried to revert > > all config settings related to auth and signing to defaults. > > > > I can't seem to change the auth_*_required settings. If I try to remove > > them, they stay set. If I try to change them, I get both the old and new > > settings: > > > > root@tyr-ceph-mon0:~# ceph config dump | grep -E '(auth|cephx)' > > globaladvanced auth_client_required cephx > > * > > globaladvanced auth_cluster_required cephx > > * > > globaladvanced auth_service_required cephx > > * > > root@tyr-ceph-mon0:~# ceph config rm global auth_service_required > > root@tyr-ceph-mon0:~# ceph config dump | grep -E '(auth|cephx)' > > globaladvanced auth_client_required cephx > > * > > globaladvanced auth_cluster_required cephx > > * > > globaladvanced auth_service_required cephx > > * > > root@tyr-ceph-mon0:~# ceph config set global auth_service_required none > > root@tyr-ceph-mon0:~# ceph config dump | grep -E '(auth|cephx)' > > globaladvanced auth_client_required cephx > > * > > globaladvanced auth_cluster_required cephx > > * > > globaladvanced auth_service_required none > >* > > globaladvanced auth_service_required cephx > > * > > > > I know these are set to RO, but according to your blog posts, this means > > they don't get updated until a daemon restart. Does this look correct to > > you? I'm assuming I need to restart all daemons on all hosts. Is this > > correct? > > Yeah, that is definitely not behaving properly. Can you try "ceph > config-key dump | grep config/" to look at how those keys are stored? You > should see something like > > "config/auth_cluster_required": "cephx", > "config/auth_service_required": "cephx", > "config/auth_service_ticket_ttl": "3600.00", > > but maybe those names are formed differently, maybe with ".../global/..." > in there? My guess is a subtle naming behavior change between mimic or > something. You can remove the keys via the config-key interface and then > restart the mons (or adjust any random config option) to make the > mons refresh. After that config dump should show the right thing. > > Maybe a disagreement/confusion about the actual value of > auth_service_ticket_ttl is the cause of this. You might try doing 'ceph > config show osd.0' and/or a mon to see what value for the auth options the > daemons are actually using and reporting... > > sage > > > > > > On Thu, Apr 4, 2019 at 5:54 AM Sage Weil wrote: > > > > > That log shows > > > > > > 2019-04-03 15:39:53.299 7f3733f18700 10 monclient: tick > > > 2019-04-03 15:39:53.299 7f3733f18700 10 cephx: validate_tickets want 53 > > > have 53 need 0 > > > 2019-04-03 15:39:53.299 7f3733f18700 20 cephx client: need_tickets: > > > want=53 have=53 need=0 > > > 2019-04-03 15:39:53.299 7f3733f18700 10 monclient: _check_auth_rotating > > > have uptodate secrets (they expire after 2019-04-03 15:39:23.301595) > > > 2019-04-03 15:39:53.299 7f3733f18700 10 auth: dump_rotating: > > > 2019-04-03 15:39:53.299 7f3733f18700 10 auth: id 41691 A4Q== expires > > > 2019-04-03 14:43:07.042860 > > > 2019-04-03 15:39:53.299 7f3733f18700 10 auth: id 41692 AD9Q== expires > > > 2019-04-03 15:43:09.895511 > > > 2019-04-03 15:39:53.299 7f3733f18700 10 auth: id 41693 ADQ== expires > > > 2019-04-03 16:43:09.895511 > > > > > > which is all fine. It is getting BADAUTHORIZER talking to another OSD, > > > but I'm guessing it's because that other OSD doesn't have the right > > > tickets. It's hard to tell what's wrong without having al the OSD logs > > >
Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore
My understanding is that basically the kernel is either unable or uninterested (maybe due to lack of memory pressure?) in reclaiming the memory . It's possible you might have better behavior if you set /sys/kernel/mm/khugepaged/max_ptes_none to a low value (maybe 0) or maybe disable transparent huge pages entirely. Some background: https://github.com/gperftools/gperftools/issues/1073 https://blog.nelhage.com/post/transparent-hugepages/ https://www.kernel.org/doc/Documentation/vm/transhuge.txt Mark On 4/9/19 7:31 AM, Olivier Bonvalet wrote: Well, Dan seems to be right : _tune_cache_size target: 4294967296 heap: 6514409472 unmapped: 2267537408 mapped: 4246872064 old cache_size: 2845396873 new cache size: 2845397085 So we have 6GB in heap, but "only" 4GB mapped. But "ceph tell osd.* heap release" should had release that ? Thanks, Olivier Le lundi 08 avril 2019 à 16:09 -0500, Mark Nelson a écrit : One of the difficulties with the osd_memory_target work is that we can't tune based on the RSS memory usage of the process. Ultimately it's up to the kernel to decide to reclaim memory and especially with transparent huge pages it's tough to judge what the kernel is going to do even if memory has been unmapped by the process. Instead the autotuner looks at how much memory has been mapped and tries to balance the caches based on that. In addition to Dan's advice, you might also want to enable debug bluestore at level 5 and look for lines containing "target:" and "cache_size:". These will tell you the current target, the mapped memory, unmapped memory, heap size, previous aggregate cache size, and new aggregate cache size. The other line will give you a break down of how much memory was assigned to each of the bluestore caches and how much each case is using. If there is a memory leak, the autotuner can only do so much. At some point it will reduce the caches to fit within cache_min and leave it there. Mark On 4/8/19 5:18 AM, Dan van der Ster wrote: Which OS are you using? With CentOS we find that the heap is not always automatically released. (You can check the heap freelist with `ceph tell osd.0 heap stats`). As a workaround we run this hourly: ceph tell mon.* heap release ceph tell osd.* heap release ceph tell mds.* heap release -- Dan On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet < ceph.l...@daevel.fr> wrote: Hi, on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the osd_memory_target : daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd ceph3646 17.1 12.0 6828916 5893136 ? Ssl mars29 1903:42 /usr/bin/ceph-osd -f --cluster ceph --id 143 --setuser ceph --setgroup ceph ceph3991 12.9 11.2 6342812 5485356 ? Ssl mars29 1443:41 /usr/bin/ceph-osd -f --cluster ceph --id 144 --setuser ceph --setgroup ceph ceph4361 16.9 11.8 6718432 5783584 ? Ssl mars29 1889:41 /usr/bin/ceph-osd -f --cluster ceph --id 145 --setuser ceph --setgroup ceph ceph4731 19.7 12.2 6949584 5982040 ? Ssl mars29 2198:47 /usr/bin/ceph-osd -f --cluster ceph --id 146 --setuser ceph --setgroup ceph ceph5073 16.7 11.6 6639568 5701368 ? Ssl mars29 1866:05 /usr/bin/ceph-osd -f --cluster ceph --id 147 --setuser ceph --setgroup ceph ceph5417 14.6 11.2 6386764 5519944 ? Ssl mars29 1634:30 /usr/bin/ceph-osd -f --cluster ceph --id 148 --setuser ceph --setgroup ceph ceph5760 16.9 12.0 6806448 5879624 ? Ssl mars29 1882:42 /usr/bin/ceph-osd -f --cluster ceph --id 149 --setuser ceph --setgroup ceph ceph6105 16.0 11.6 6576336 5694556 ? Ssl mars29 1782:52 /usr/bin/ceph-osd -f --cluster ceph --id 150 --setuser ceph --setgroup ceph daevel-ob@ssdr712h:~$ free -m totalusedfree shared buff/ca che available Mem: 47771 452101643 17 9 17 43556 Swap: 0 0 0 # ceph daemon osd.147 config show | grep memory_target "osd_memory_target": "4294967296", And there is no recovery / backfilling, the cluster is fine : $ ceph status cluster: id: de035250-323d-4cf6-8c4b-cf0faf6296b1 health: HEALTH_OK services: mon: 5 daemons, quorum tolriq,tsyne,olkas,lorunde,amphel mgr: tsyne(active), standbys: olkas, tolriq, lorunde, amphel osd: 120 osds: 116 up, 116 in data: pools: 20 pools, 12736 pgs objects: 15.29M objects, 31.1TiB usage: 101TiB used, 75.3TiB / 177TiB avail pgs: 12732 active+clean 4 active+clean+scrubbing+deep io: client: 72.3MiB/s rd, 26.8MiB/s wr, 2.30kop/s rd, 1.29kop/s wr On an other host, in the same pool, I see also high memory usage : daevel-ob@ssdr712g:~$ ps auxw | grep ceph-osd ceph6287 6.6 10.6 6027388 5190032 ? Ssl mars21 1511:07 /usr/bin/ceph-osd -f --cluster ceph --id 131
Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore
Well, Dan seems to be right : _tune_cache_size target: 4294967296 heap: 6514409472 unmapped: 2267537408 mapped: 4246872064 old cache_size: 2845396873 new cache size: 2845397085 So we have 6GB in heap, but "only" 4GB mapped. But "ceph tell osd.* heap release" should had release that ? Thanks, Olivier Le lundi 08 avril 2019 à 16:09 -0500, Mark Nelson a écrit : > One of the difficulties with the osd_memory_target work is that we > can't > tune based on the RSS memory usage of the process. Ultimately it's up > to > the kernel to decide to reclaim memory and especially with > transparent > huge pages it's tough to judge what the kernel is going to do even > if > memory has been unmapped by the process. Instead the autotuner looks > at > how much memory has been mapped and tries to balance the caches based > on > that. > > > In addition to Dan's advice, you might also want to enable debug > bluestore at level 5 and look for lines containing "target:" and > "cache_size:". These will tell you the current target, the mapped > memory, unmapped memory, heap size, previous aggregate cache size, > and > new aggregate cache size. The other line will give you a break down > of > how much memory was assigned to each of the bluestore caches and how > much each case is using. If there is a memory leak, the autotuner > can > only do so much. At some point it will reduce the caches to fit > within > cache_min and leave it there. > > > Mark > > > On 4/8/19 5:18 AM, Dan van der Ster wrote: > > Which OS are you using? > > With CentOS we find that the heap is not always automatically > > released. (You can check the heap freelist with `ceph tell osd.0 > > heap > > stats`). > > As a workaround we run this hourly: > > > > ceph tell mon.* heap release > > ceph tell osd.* heap release > > ceph tell mds.* heap release > > > > -- Dan > > > > On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet < > > ceph.l...@daevel.fr> wrote: > > > Hi, > > > > > > on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the > > > osd_memory_target : > > > > > > daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd > > > ceph3646 17.1 12.0 6828916 5893136 ? Ssl mars29 > > > 1903:42 /usr/bin/ceph-osd -f --cluster ceph --id 143 --setuser > > > ceph --setgroup ceph > > > ceph3991 12.9 11.2 6342812 5485356 ? Ssl mars29 > > > 1443:41 /usr/bin/ceph-osd -f --cluster ceph --id 144 --setuser > > > ceph --setgroup ceph > > > ceph4361 16.9 11.8 6718432 5783584 ? Ssl mars29 > > > 1889:41 /usr/bin/ceph-osd -f --cluster ceph --id 145 --setuser > > > ceph --setgroup ceph > > > ceph4731 19.7 12.2 6949584 5982040 ? Ssl mars29 > > > 2198:47 /usr/bin/ceph-osd -f --cluster ceph --id 146 --setuser > > > ceph --setgroup ceph > > > ceph5073 16.7 11.6 6639568 5701368 ? Ssl mars29 > > > 1866:05 /usr/bin/ceph-osd -f --cluster ceph --id 147 --setuser > > > ceph --setgroup ceph > > > ceph5417 14.6 11.2 6386764 5519944 ? Ssl mars29 > > > 1634:30 /usr/bin/ceph-osd -f --cluster ceph --id 148 --setuser > > > ceph --setgroup ceph > > > ceph5760 16.9 12.0 6806448 5879624 ? Ssl mars29 > > > 1882:42 /usr/bin/ceph-osd -f --cluster ceph --id 149 --setuser > > > ceph --setgroup ceph > > > ceph6105 16.0 11.6 6576336 5694556 ? Ssl mars29 > > > 1782:52 /usr/bin/ceph-osd -f --cluster ceph --id 150 --setuser > > > ceph --setgroup ceph > > > > > > daevel-ob@ssdr712h:~$ free -m > > >totalusedfree shared buff/ca > > > che available > > > Mem: 47771 452101643 17 9 > > > 17 43556 > > > Swap: 0 0 0 > > > > > > # ceph daemon osd.147 config show | grep memory_target > > > "osd_memory_target": "4294967296", > > > > > > > > > And there is no recovery / backfilling, the cluster is fine : > > > > > > $ ceph status > > > cluster: > > > id: de035250-323d-4cf6-8c4b-cf0faf6296b1 > > > health: HEALTH_OK > > > > > > services: > > > mon: 5 daemons, quorum tolriq,tsyne,olkas,lorunde,amphel > > > mgr: tsyne(active), standbys: olkas, tolriq, lorunde, > > > amphel > > > osd: 120 osds: 116 up, 116 in > > > > > > data: > > > pools: 20 pools, 12736 pgs > > > objects: 15.29M objects, 31.1TiB > > > usage: 101TiB used, 75.3TiB / 177TiB avail > > > pgs: 12732 active+clean > > > 4 active+clean+scrubbing+deep > > > > > > io: > > > client: 72.3MiB/s rd, 26.8MiB/s wr, 2.30kop/s rd, > > > 1.29kop/s wr > > > > > > > > > On an other host, in the same pool, I see also high memory > > > usage : > > > > > > daevel-ob@ssdr712g:~$ ps auxw | grep ceph-osd > > > ceph6287 6.6 10.6 6027388 5190032 ? Ssl mars21 > > > 1511:07 /usr/bin/ceph-osd -f --cluster ceph --id 131 --setuser > >
Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore
Thanks for the advice, we are using Debian 9 (stretch), with a custom Linux kernel 4.14. But "heap release" didn't help. Le lundi 08 avril 2019 à 12:18 +0200, Dan van der Ster a écrit : > Which OS are you using? > With CentOS we find that the heap is not always automatically > released. (You can check the heap freelist with `ceph tell osd.0 heap > stats`). > As a workaround we run this hourly: > > ceph tell mon.* heap release > ceph tell osd.* heap release > ceph tell mds.* heap release > > -- Dan > > On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet > wrote: > > Hi, > > > > on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the > > osd_memory_target : > > > > daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd > > ceph3646 17.1 12.0 6828916 5893136 ? Ssl mars29 > > 1903:42 /usr/bin/ceph-osd -f --cluster ceph --id 143 --setuser ceph > > --setgroup ceph > > ceph3991 12.9 11.2 6342812 5485356 ? Ssl mars29 > > 1443:41 /usr/bin/ceph-osd -f --cluster ceph --id 144 --setuser ceph > > --setgroup ceph > > ceph4361 16.9 11.8 6718432 5783584 ? Ssl mars29 > > 1889:41 /usr/bin/ceph-osd -f --cluster ceph --id 145 --setuser ceph > > --setgroup ceph > > ceph4731 19.7 12.2 6949584 5982040 ? Ssl mars29 > > 2198:47 /usr/bin/ceph-osd -f --cluster ceph --id 146 --setuser ceph > > --setgroup ceph > > ceph5073 16.7 11.6 6639568 5701368 ? Ssl mars29 > > 1866:05 /usr/bin/ceph-osd -f --cluster ceph --id 147 --setuser ceph > > --setgroup ceph > > ceph5417 14.6 11.2 6386764 5519944 ? Ssl mars29 > > 1634:30 /usr/bin/ceph-osd -f --cluster ceph --id 148 --setuser ceph > > --setgroup ceph > > ceph5760 16.9 12.0 6806448 5879624 ? Ssl mars29 > > 1882:42 /usr/bin/ceph-osd -f --cluster ceph --id 149 --setuser ceph > > --setgroup ceph > > ceph6105 16.0 11.6 6576336 5694556 ? Ssl mars29 > > 1782:52 /usr/bin/ceph-osd -f --cluster ceph --id 150 --setuser ceph > > --setgroup ceph > > > > daevel-ob@ssdr712h:~$ free -m > > totalusedfree shared buff/cache > >available > > Mem: 47771 452101643 17 917 > >43556 > > Swap: 0 0 0 > > > > # ceph daemon osd.147 config show | grep memory_target > > "osd_memory_target": "4294967296", > > > > > > And there is no recovery / backfilling, the cluster is fine : > > > >$ ceph status > > cluster: > >id: de035250-323d-4cf6-8c4b-cf0faf6296b1 > >health: HEALTH_OK > > > > services: > >mon: 5 daemons, quorum tolriq,tsyne,olkas,lorunde,amphel > >mgr: tsyne(active), standbys: olkas, tolriq, lorunde, amphel > >osd: 120 osds: 116 up, 116 in > > > > data: > >pools: 20 pools, 12736 pgs > >objects: 15.29M objects, 31.1TiB > >usage: 101TiB used, 75.3TiB / 177TiB avail > >pgs: 12732 active+clean > > 4 active+clean+scrubbing+deep > > > > io: > >client: 72.3MiB/s rd, 26.8MiB/s wr, 2.30kop/s rd, > > 1.29kop/s wr > > > > > >On an other host, in the same pool, I see also high memory usage > > : > > > >daevel-ob@ssdr712g:~$ ps auxw | grep ceph-osd > >ceph6287 6.6 10.6 6027388 5190032 ? Ssl mars21 > > 1511:07 /usr/bin/ceph-osd -f --cluster ceph --id 131 --setuser ceph > > --setgroup ceph > >ceph6759 7.3 11.2 6299140 5484412 ? Ssl mars21 > > 1665:22 /usr/bin/ceph-osd -f --cluster ceph --id 132 --setuser ceph > > --setgroup ceph > >ceph7114 7.0 11.7 6576168 5756236 ? Ssl mars21 > > 1612:09 /usr/bin/ceph-osd -f --cluster ceph --id 133 --setuser ceph > > --setgroup ceph > >ceph7467 7.4 11.1 6244668 5430512 ? Ssl mars21 > > 1704:06 /usr/bin/ceph-osd -f --cluster ceph --id 134 --setuser ceph > > --setgroup ceph > >ceph7821 7.7 11.1 6309456 5469376 ? Ssl mars21 > > 1754:35 /usr/bin/ceph-osd -f --cluster ceph --id 135 --setuser ceph > > --setgroup ceph > >ceph8174 6.9 11.6 6545224 5705412 ? Ssl mars21 > > 1590:31 /usr/bin/ceph-osd -f --cluster ceph --id 136 --setuser ceph > > --setgroup ceph > >ceph8746 6.6 11.1 6290004 5477204 ? Ssl mars21 > > 1511:11 /usr/bin/ceph-osd -f --cluster ceph --id 137 --setuser ceph > > --setgroup ceph > >ceph9100 7.7 11.6 6552080 5713560 ? Ssl mars21 > > 1757:22 /usr/bin/ceph-osd -f --cluster ceph --id 138 --setuser ceph > > --setgroup ceph > > > >But ! On a similar host, in a different pool, the problem is > > less visible : > > > >daevel-ob@ssdr712i:~$ ps auxw | grep ceph-osd > >ceph3617 2.8 9.9 5660308 4847444 ? Ssl mars29 > > 313:05 /usr/bin/ceph-osd -f --cluster ceph --id 151 --setuser ceph > > --setgroup ceph > >ceph3958 2.3 9.8 5661936 4834320 ? Ssl mars29 > > 256:55 /usr/bin/ceph-osd -f --cluster ceph --id 152 --setuser ceph > >
Re: [ceph-users] bluefs-bdev-expand experience
Igor, thank you, Round 2 is explained now. Main aka block aka slow device cannot be expanded in Luminus, this functionality will be available after upgrade to Nautilus. Wal and db devices can be expanded in Luminous. Now I have recreated osd2 once again to get rid of the paradoxical cepf osd df output and tried to test db expansion, 40G -> 60G: node2:/# ceph-volume lvm zap --destroy --osd-id 2 node2:/# ceph osd lost 2 --yes-i-really-mean-it node2:/# ceph osd destroy 2 --yes-i-really-mean-it node2:/# lvcreate -L1G -n osd2wal vg0 node2:/# lvcreate -L40G -n osd2db vg0 node2:/# lvcreate -L400G -n osd2 vg0 node2:/# ceph-volume lvm create --osd-id 2 --bluestore --data vg0/osd2 --block.db vg0/osd2db --block.wal vg0/osd2wal node2:/# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 0.22739 1.0 233GiB 9.49GiB 223GiB 4.08 1.24 128 1 hdd 0.22739 1.0 233GiB 9.49GiB 223GiB 4.08 1.24 128 3 hdd 0.227390 0B 0B 0B00 0 2 hdd 0.22739 1.0 400GiB 9.49GiB 391GiB 2.37 0.72 128 TOTAL 866GiB 28.5GiB 837GiB 3.29 MIN/MAX VAR: 0.72/1.24 STDDEV: 0.83 node2:/# lvextend -L+20G /dev/vg0/osd2db Size of logical volume vg0/osd2db changed from 40.00 GiB (10240 extents) to 60.00 GiB (15360 extents). Logical volume vg0/osd2db successfully resized. node2:/# ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-2/ inferring bluefs devices from bluestore path slot 0 /var/lib/ceph/osd/ceph-2//block.wal slot 1 /var/lib/ceph/osd/ceph-2//block.db slot 2 /var/lib/ceph/osd/ceph-2//block 0 : size 0x4000 : own 0x[1000~3000] 1 : size 0xf : own 0x[2000~9e000] 2 : size 0x64 : own 0x[30~4] Expanding... 1 : expanding from 0xa to 0xf 1 : size label updated to 64424509440 node2:/# ceph-bluestore-tool show-label --dev /dev/vg0/osd2db | grep size "size": 64424509440, The label updated correctly, but ceph osd df have not changed. I expected to see 391 + 20 = 411GiB in AVAIL column, but it stays at 391: node2:/# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 0.22739 1.0 233GiB 9.50GiB 223GiB 4.08 1.24 128 1 hdd 0.22739 1.0 233GiB 9.50GiB 223GiB 4.08 1.24 128 3 hdd 0.227390 0B 0B 0B00 0 2 hdd 0.22739 1.0 400GiB 9.49GiB 391GiB 2.37 0.72 128 TOTAL 866GiB 28.5GiB 837GiB 3.29 MIN/MAX VAR: 0.72/1.24 STDDEV: 0.83 I have restarted monitors on all three nodes, 391GiB stays intact. OK, but I used bluefs-bdev-expand on running OSD... probably not good, it seems to fork by opening bluefs directly... trying once again: node2:/# systemctl stop ceph-osd@2 node2:/# lvextend -L+20G /dev/vg0/osd2db Size of logical volume vg0/osd2db changed from 60.00 GiB (15360 extents) to 80.00 GiB (20480 extents). Logical volume vg0/osd2db successfully resized. node2:/# ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-2/ inferring bluefs devices from bluestore path slot 0 /var/lib/ceph/osd/ceph-2//block.wal slot 1 /var/lib/ceph/osd/ceph-2//block.db slot 2 /var/lib/ceph/osd/ceph-2//block 0 : size 0x4000 : own 0x[1000~3000] 1 : size 0x14 : own 0x[2000~9e000] 2 : size 0x64 : own 0x[30~4] Expanding... 1 : expanding from 0xa to 0x14 1 : size label updated to 85899345920 node2:/# systemctl start ceph-osd@2 node2:/# systemctl restart ceph-mon@pier42 node2:/# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 0.22739 1.0 233GiB 9.49GiB 223GiB 4.08 1.24 128 1 hdd 0.22739 1.0 233GiB 9.50GiB 223GiB 4.08 1.24 128 3 hdd 0.227390 0B 0B 0B00 0 2 hdd 0.22739 1.0 400GiB 9.50GiB 391GiB 2.37 0.72 0 TOTAL 866GiB 28.5GiB 837GiB 3.29 MIN/MAX VAR: 0.72/1.24 STDDEV: 0.83 Something is wrong. Maybe I was wrong expecting db change to appear in AVAIL column? From Bluestore description I got db and slow should sum up, no? Thanks for your help, -- Yury On Mon, Apr 08, 2019 at 10:17:24PM +0300, Igor Fedotov wrote: > Hi Yuri, > > both issues from Round 2 relate to unsupported expansion for main device. > > In fact it doesn't work and silently bypasses the operation in you case. > > Please try with a different device... > > > Also I've just submitted a PR for mimic to indicate the bypass, will > backport to Luminous once mimic patch is approved. > > See https://github.com/ceph/ceph/pull/27447 > > > Thanks, > > Igor > > On 4/5/2019 4:07 PM, Yury Shevchuk wrote: > > On Fri, Apr 05, 2019 at 02:42:53PM +0300, Igor Fedotov wrote: > > > wrt Round 1 - an ability to expand block(main) device has been added to > > > Nautilus, > > > > > > see: https://github.com/ceph/ceph/pull/25308 > > Oh, that's good. But still separate wal may be good for studying > > load on each volume (blktrace) or moving db/wal to another
Re: [ceph-users] Try to log the IP in the header X-Forwarded-For with radosgw behind haproxy
Hi, On 4/9/19 5:02 AM, Pavan Rallabhandi wrote: Refer "rgw log http headers" under http://docs.ceph.com/docs/nautilus/radosgw/config-ref/ Or even better in the code https://github.com/ceph/ceph/pull/7639 Ok, thx for your help Pavan. I have progressed but I have already some problems. With the help of this comment: https://github.com/ceph/ceph/pull/7639#issuecomment-266893208 I have tried this config: - rgw enable ops log = true rgw ops log socket path = /tmp/opslog rgw log http headers= http_x_forwarded_for - and I have logs in the socket /tmp/opslog like this: - {"bucket":"test1","time":"2019-04-09 09:41:18.188350Z","time_local":"2019-04-09 11:41:18.188350","remote_addr":"10.111.222.51","user":"flaf","operation":"GET","uri":"GET /?prefix=toto/=%2F HTTP/1.1","http_status":"200","error_code":"","bytes_sent":832,"bytes_received":0,"object_size":0,"total_time":39,"user_agent":"DragonDisk 1.05 ( http://www.dragondisk.com )","referrer":"","http_x_headers":[{"HTTP_X_FORWARDED_FOR":"10.111.222.55"}]}, - I can see the IP address of the client in the value of HTTP_X_FORWARDED_FOR, that's cool. But I don't understand why there is a specific socket to log that? I'm using radosgw in a Docker container (installed via ceph-ansible) and I have logs of the "radosgw" daemon in the "/var/log/syslog" file of my host (I'm using the Docker "syslog" log-driver). 1. Why is there a _separate_ log source for that? Indeed, in "/var/log/syslog" I have already some logs of civetweb. For instance: 2019-04-09 12:33:45.926 7f02e021c700 1 civetweb: 0x55876dc9c000: 10.111.222.51 - - [09/Apr/2019:12:33:45 +0200] "GET /?prefix=toto/=%2F HTTP/1.1" 200 1014 - DragonDisk 1.05 ( http://www.dragondisk.com ) 2. In my Docker container context, is it possible to put the logs above in the file "/var/log/syslog" of my host, in other words is it possible to make sure to log this in stdout of the daemon "radosgw"? -- flaf ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com