Re: [ceph-users] problem w libvirt version 4.5 and 12.2.7
Konstantin, Thanks for reply. I've managed to unravel it partially. Somehow (did not look into srpm) starting from this version libvirt started to calculate real allocation if fastdiff feature is present on image. Doing "rbd object-map rebuild" on every image helped (do not know why it was needed - it is a new cluster with ceph version 12.2.7). Now the only problem is 25T image on which "virsh vol-info" takes 13s (rbd du takes 1s) compared to few minutes before, so the questions remains: - why it happened, - how to monitor/foresee this, - how to improve virsh vol-info if rbd du take less time to execute? On 03.01.2019 at 13:51, Konstantin Shalygin wrote: After update to CentOS 7.6, libvirt was updated from 3.9 to 4.5. Executing: "virsh vol-list ceph --details" makes libvirtd using 300% CPU for 2 minutes to show volumes on rbd. Quick pick at tcpdump shows accessing rbd_data.* which previous version of libvirtd did not need. Ceph version is 12.2.7. Any help will be appreciated There is nothing special in libvirt 4.5, I was upgraded hypervisors to this version, still works flawless. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] upgrade from jewel 10.2.10 to 10.2.11 broke anonymous swift
> > Does anybody have a suggestion of what I could try to troubleshoot this? Upgrading to Luminous also "solves the issue". I'll look into that :) // Johan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Help Ceph Cluster Down
Hi Chris, Indeed that's what happened. I didn't set noout flag either and I did zapped disk on new server every time. In my cluster status fre201 is only new server. Current Status after enabling 3 OSDs on fre201 host. [root@fre201 ~]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 70.92137 root default -25.45549 host fre101 0 hdd 1.81850 osd.0 up 1.0 1.0 1 hdd 1.81850 osd.1 up 1.0 1.0 2 hdd 1.81850 osd.2 up 1.0 1.0 -95.45549 host fre103 3 hdd 1.81850 osd.3 up 1.0 1.0 4 hdd 1.81850 osd.4 up 1.0 1.0 5 hdd 1.81850 osd.5 up 1.0 1.0 -35.45549 host fre105 6 hdd 1.81850 osd.6 up 1.0 1.0 7 hdd 1.81850 osd.7 up 1.0 1.0 8 hdd 1.81850 osd.8 up 1.0 1.0 -45.45549 host fre107 9 hdd 1.81850 osd.9 up 1.0 1.0 10 hdd 1.81850 osd.10 up 1.0 1.0 11 hdd 1.81850 osd.11 up 1.0 1.0 -55.45549 host fre109 12 hdd 1.81850 osd.12 up 1.0 1.0 13 hdd 1.81850 osd.13 up 1.0 1.0 14 hdd 1.81850 osd.14 up 1.0 1.0 -65.45549 host fre111 15 hdd 1.81850 osd.15 up 1.0 1.0 16 hdd 1.81850 osd.16 up 1.0 1.0 17 hdd 1.81850 osd.17 up 0.7 1.0 -75.45549 host fre113 18 hdd 1.81850 osd.18 up 1.0 1.0 19 hdd 1.81850 osd.19 up 1.0 1.0 20 hdd 1.81850 osd.20 up 1.0 1.0 -85.45549 host fre115 21 hdd 1.81850 osd.21 up 1.0 1.0 22 hdd 1.81850 osd.22 up 1.0 1.0 23 hdd 1.81850 osd.23 up 1.0 1.0 -105.45549 host fre117 24 hdd 1.81850 osd.24 up 1.0 1.0 25 hdd 1.81850 osd.25 up 1.0 1.0 26 hdd 1.81850 osd.26 up 1.0 1.0 -115.45549 host fre119 27 hdd 1.81850 osd.27 up 1.0 1.0 28 hdd 1.81850 osd.28 up 1.0 1.0 29 hdd 1.81850 osd.29 up 1.0 1.0 -125.45549 host fre121 30 hdd 1.81850 osd.30 up 1.0 1.0 31 hdd 1.81850 osd.31 up 1.0 1.0 32 hdd 1.81850 osd.32 up 1.0 1.0 -135.45549 host fre123 33 hdd 1.81850 osd.33 up 1.0 1.0 34 hdd 1.81850 osd.34 up 1.0 1.0 35 hdd 1.81850 osd.35 up 1.0 1.0 -275.45549 host fre201 36 hdd 1.81850 osd.36 up 1.0 1.0 37 hdd 1.81850 osd.37 up 1.0 1.0 38 hdd 1.81850 osd.38 up 1.0 1.0 [root@fre201 ~]# [root@fre201 ~]# [root@fre201 ~]# [root@fre201 ~]# [root@fre201 ~]# [root@fre201 ~]# ceph -s cluster: id: adb9ad8e-f458-4124-bf58-7963a8d1391f health: HEALTH_ERR 3 pools have many more objects per pg than average 585791/12391450 objects misplaced (4.727%) 2 scrub errors 2374 PGs pending on creation Reduced data availability: 6578 pgs inactive, 2025 pgs down, 74 pgs peering, 1234 pgs stale Possible data damage: 2 pgs inconsistent Degraded data redundancy: 64969/12391450 objects degraded (0.524%), 616 pgs degraded, 20 pgs undersized 96242 slow requests are blocked > 32 sec 228 stuck requests are blocked > 4096 sec too many PGs per OSD (2768 > max 200) services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02 osd: 39 osds: 39 up, 39 in; 96 remapped pgs rgw: 1 daemon active data: pools: 18 pools, 54656 pgs objects: 6050k objects, 10942 GB usage: 21900 GB used, 50721 GB / 72622 GB avail pgs: 0.002% pgs unknown 12.050% pgs not active 64969/12391450 objects degraded (0.524%) 585791/12391450 objects misplaced (4.727%) 47489 active+clean 3670 activating 1098 stale+down 923 down 575 activating+degraded 563 stale+active+clean 105 stale+activating 78activating+remapped 72peering 25stale+activating+degraded 23stale+activating+remapped 9 stale+active+undersized 6 stale+activating+undersized+degraded+remapped 5 stale+active+undersized+degraded 4
[ceph-users] cephfs : rsync backup create cache pressure on clients, filling caps
Hi, I'm currently doing cephfs backup, through a dedicated clients mounting the whole filesystem at root. others clients are mounting part of the filesystem. (kernel cephfs clients) I have around 22millions inodes, before backup, I have around 5M caps loaded by clients #ceph daemonperf mds.x.x ---mds --mds_cache--- ---mds_log -mds_mem- --mds_server-- mds_ -objecter-- purg req rlat fwd inos caps exi imi |stry recy recd|subm evts segs|ino dn |hcr hcs hsr |sess|actv rd wr rdwr|purg| 11800 22M 5.3M 00 | 600 | 2 120k 130 | 22M 22M|118 00 |167 | 0200 | 0 when backup is running, reading all the files, the caps are increasing to max (and even a little bit more) # ceph daemonperf mds.x.x ---mds --mds_cache--- ---mds_log -mds_mem- --mds_server-- mds_ -objecter-- purg req rlat fwd inos caps exi imi |stry recy recd|subm evts segs|ino dn |hcr hcs hsr |sess|actv rd wr rdwr|purg| 15500 20M 22M 00 | 600 | 2 120k 129 | 20M 20M|155 00 |167 | 0000 | 0 then mds try recall caps to others clients, and I'm gettin some 2019-01-04 01:13:11.173768 cluster [WRN] Health check failed: 1 clients failing to respond to cache pressure (MDS_CLIENT_RECALL) 2019-01-04 02:00:00.73 cluster [WRN] overall HEALTH_WARN 1 clients failing to respond to cache pressure 2019-01-04 03:00:00.69 cluster [WRN] overall HEALTH_WARN 1 clients failing to respond to cache pressure Doing a simple echo 2 | tee /proc/sys/vm/drop_caches on backup server, is freeing caps again # ceph daemonperf x ---mds --mds_cache--- ---mds_log -mds_mem- --mds_server-- mds_ -objecter-- purg req rlat fwd inos caps exi imi |stry recy recd|subm evts segs|ino dn |hcr hcs hsr |sess|actv rd wr rdwr|purg| 11600 22M 4.8M 00 | 400 | 1 117k 131 | 22M 22M|116 10 |167 | 0200 | 0 Some questions here : ceph side - Is it possible to setup some kind of priority between clients, to force retreive caps on a specific client ? Is is possible to limit the number of caps for a client ? client side --- I have tried to use vm.vfs_cache_pressure=4, to reclam inodes entries more fast, but server have 128GB ram. Is it possible to limit the number of inodes in cache on linux. Is is possible to tune something on the ceph mount point ? Regards, Alexandre ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Help Ceph Cluster Down
If you added OSDs and then deleted them repeatedly without waiting for replication to finish as the cluster attempted to re-balance across them, its highly likely that you are permanently missing PGs (especially if the disks were zapped each time). If those 3 down OSDs can be revived there is a (small) chance that you can right the ship, but 1400pg/OSD is pretty extreme. I'm surprised the cluster even let you do that - this sounds like a data loss event. Bring back the 3 OSD and see what those 2 inconsistent pgs look like with ceph pg query. On January 3, 2019 21:59:38 Arun POONIA wrote: Hi, Recently I tried adding a new node (OSD) to ceph cluster using ceph-deploy tool. Since I was experimenting with tool and ended up deleting OSD nodes on new server couple of times. Now since ceph OSDs are running on new server cluster PGs seems to be inactive (10-15%) and they are not recovering or rebalancing. Not sure what to do. I tried shutting down OSDs on new server. Status: [root@fre105 ~]# ceph -s 2019-01-03 18:56:42.867081 7fa0bf573700 -1 asok(0x7fa0b80017a0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph-guests/ceph-client.admin.4018644.140328258509136.asok': (2) No such file or directory cluster: id: adb9ad8e-f458-4124-bf58-7963a8d1391f health: HEALTH_ERR 3 pools have many more objects per pg than average 373907/12391198 objects misplaced (3.018%) 2 scrub errors 9677 PGs pending on creation Reduced data availability: 7145 pgs inactive, 6228 pgs down, 1 pg peering, 2717 pgs stale Possible data damage: 2 pgs inconsistent Degraded data redundancy: 178350/12391198 objects degraded (1.439%), 346 pgs degraded, 1297 pgs undersized 52486 slow requests are blocked > 32 sec 9287 stuck requests are blocked > 4096 sec too many PGs per OSD (2968 > max 200) services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02 osd: 39 osds: 36 up, 36 in; 51 remapped pgs rgw: 1 daemon active data: pools: 18 pools, 54656 pgs objects: 6050k objects, 10941 GB usage: 21727 GB used, 45308 GB / 67035 GB avail pgs: 13.073% pgs not active 178350/12391198 objects degraded (1.439%) 373907/12391198 objects misplaced (3.018%) 46177 active+clean 5054 down 1173 stale+down 1084 stale+active+undersized 547 activating 201 stale+active+undersized+degraded 158 stale+activating 96activating+degraded 46stale+active+clean 42activating+remapped 34stale+activating+degraded 23stale+activating+remapped 6 stale+activating+undersized+degraded+remapped 6 activating+undersized+degraded+remapped 2 activating+degraded+remapped 2 active+clean+inconsistent 1 stale+activating+degraded+remapped 1 stale+active+clean+remapped 1 stale+remapped 1 down+remapped 1 remapped+peering io: client: 0 B/s rd, 208 kB/s wr, 28 op/s rd, 28 op/s wr Thanks -- Arun Poonia ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Help Ceph Cluster Down
Hi, Recently I tried adding a new node (OSD) to ceph cluster using ceph-deploy tool. Since I was experimenting with tool and ended up deleting OSD nodes on new server couple of times. Now since ceph OSDs are running on new server cluster PGs seems to be inactive (10-15%) and they are not recovering or rebalancing. Not sure what to do. I tried shutting down OSDs on new server. Status: [root@fre105 ~]# ceph -s 2019-01-03 18:56:42.867081 7fa0bf573700 -1 asok(0x7fa0b80017a0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph-guests/ceph-client.admin.4018644.140328258509136.asok': (2) No such file or directory cluster: id: adb9ad8e-f458-4124-bf58-7963a8d1391f health: HEALTH_ERR 3 pools have many more objects per pg than average 373907/12391198 objects misplaced (3.018%) 2 scrub errors 9677 PGs pending on creation Reduced data availability: 7145 pgs inactive, 6228 pgs down, 1 pg peering, 2717 pgs stale Possible data damage: 2 pgs inconsistent Degraded data redundancy: 178350/12391198 objects degraded (1.439%), 346 pgs degraded, 1297 pgs undersized 52486 slow requests are blocked > 32 sec 9287 stuck requests are blocked > 4096 sec too many PGs per OSD (2968 > max 200) services: mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 mgr: ceph-mon03(active), standbys: ceph-mon01, ceph-mon02 osd: 39 osds: 36 up, 36 in; 51 remapped pgs rgw: 1 daemon active data: pools: 18 pools, 54656 pgs objects: 6050k objects, 10941 GB usage: 21727 GB used, 45308 GB / 67035 GB avail pgs: 13.073% pgs not active 178350/12391198 objects degraded (1.439%) 373907/12391198 objects misplaced (3.018%) 46177 active+clean 5054 down 1173 stale+down 1084 stale+active+undersized 547 activating 201 stale+active+undersized+degraded 158 stale+activating 96activating+degraded 46stale+active+clean 42activating+remapped 34stale+activating+degraded 23stale+activating+remapped 6 stale+activating+undersized+degraded+remapped 6 activating+undersized+degraded+remapped 2 activating+degraded+remapped 2 active+clean+inconsistent 1 stale+activating+degraded+remapped 1 stale+active+clean+remapped 1 stale+remapped 1 down+remapped 1 remapped+peering io: client: 0 B/s rd, 208 kB/s wr, 28 op/s rd, 28 op/s wr Thanks -- Arun Poonia ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Compacting omap data
Nautilus will make this easier. https://github.com/ceph/ceph/pull/18096 On Thu, Jan 3, 2019 at 5:22 AM Bryan Stillwell wrote: > > Recently on one of our bigger clusters (~1,900 OSDs) running Luminous > (12.2.8), we had a problem where OSDs would frequently get restarted while > deep-scrubbing. > > After digging into it I found that a number of the OSDs had very large omap > directories (50GiB+). I believe these were OSDs that had previous held PGs > that were part of the .rgw.buckets.index pool which I have recently moved to > all SSDs, however, it seems like the data remained on the HDDs. > > I was able to reduce the data usage on most of the OSDs (from ~50 GiB to < > 200 MiB!) by compacting the omap dbs offline by setting > 'leveldb_compact_on_mount = true' in the [osd] section of ceph.conf, but that > didn't work on the newer OSDs which use rocksdb. On those I had to do an > online compaction using a command like: > > $ ceph tell osd.510 compact > > That worked, but today when I tried doing that on some of the SSD-based OSDs > which are backing .rgw.buckets.index I started getting slow requests and the > compaction ultimately failed with this error: > > $ ceph tell osd.1720 compact > osd.1720: Error ENXIO: osd down > > When I tried it again it succeeded: > > $ ceph tell osd.1720 compact > osd.1720: compacted omap in 420.999 seconds > > The data usage on that OSD dropped from 57.8 GiB to 43.4 GiB which was nice, > but I don't believe that'll get any smaller until I start splitting the PGs > in the .rgw.buckets.index pool to better distribute that pool across the > SSD-based OSDs. > > The first question I have is what is the option to do an offline compaction > of rocksdb so I don't impact our customers while compacting the rest of the > SSD-based OSDs? > > The next question is if there's a way to configure Ceph to automatically > compact the omap dbs in the background in a way that doesn't affect user > experience? > > Finally, I was able to figure out that the omap directories were getting > large because we're using filestore on this cluster, but how could someone > determine this when using BlueStore? > > Thanks, > Bryan > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CephFS client df command showing raw space after adding second pool to mds
On Fri, Jan 4, 2019 at 1:53 AM David C wrote: > > Hi All > > Luminous 12.2.12 > Single MDS > Replicated pools > > A 'df' on a CephFS kernel client used to show me the usable space (i.e the > raw space with the replication overhead applied). This was when I just had a > single cephfs data pool. > > After adding a second pool to the mds and using file layouts to map a > directory to that pool, a df is now showing the raw space. It's not the end > of the world but was handy to see the usable space. > > I'm fairly sure the change was me adding the second pool although I'm not 99% > sure. > > I'm seeing this behavior on the latest Centos 7.6 kernel and a 4.14 kernel, > is this expected? > Yes, it's expected. see commit commit 06d74376c8af32f5b8d777a943aa4dc99165088b Author: Douglas Fuller Date: Wed Aug 16 10:19:27 2017 -0400 ceph: more accurate statfs Improve accuracy of statfs reporting for Ceph filesystems comprising exactly one data pool. In this case, the Ceph monitor can now report the space usage for the single data pool instead of the global data for the entire Ceph cluster. Include support for this message in mon_client and leverage it in ceph/super. > Thanks, > David > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Omap issues - metadata creating too many
If you can wait a few weeks until the next release of luminous there will be tooling to do this safely. Abhishek Lekshmanan of SUSE contributed the PR. It adds some sub-commands to radosgw-admin: radosgw-admin reshard stale-instances list radosgw-admin reshard stale-instances rm If you do it manually you should proceed with extreme caution as you could do some damage that you might not be able to recover from. Eric On 1/3/19 11:31 AM, Bryan Stillwell wrote: > Josef, > > > > I've noticed that when dynamic resharding is on it'll reshard some of > our bucket indices daily (sometimes more). This causes a lot of wasted > space in the .rgw.buckets.index pool which might be what you are seeing. > > > > You can get a listing of all the bucket instances in your cluster with > this command: > > > > radosgw-admin metadata list bucket.instance | jq -r '.[]' | sort > > > > Give that a try and see if you see the same problem. It seems that once > you remove the old bucket instances the omap dbs don't reduce in size > until you compact them. > > > > Bryan > > > > *From: *Josef Zelenka > *Date: *Thursday, January 3, 2019 at 3:49 AM > *To: *"J. Eric Ivancich" > *Cc: *"ceph-users@lists.ceph.com" , Bryan > Stillwell > *Subject: *Re: [ceph-users] Omap issues - metadata creating too many > > > > Hi, i had the default - so it was on(according to ceph kb). turned it > > off, but the issue persists. i noticed Bryan Stillwell(cc-ing him) had > > the same issue (reported about it yesterday) - tried his tips about > > compacting, but it doesn't do anything, however i have to add to his > > last point, this happens even with bluestore. Is there anything we can > > do to clean up the omap manually? > > > > Josef > > > > On 18/12/2018 23:19, J. Eric Ivancich wrote: > > On 12/17/18 9:18 AM, Josef Zelenka wrote: > > Hi everyone, i'm running a Luminous 12.2.5 cluster with 6 hosts on > > ubuntu 16.04 - 12 HDDs for data each, plus 2 SSD metadata OSDs(three > > nodes have an additional SSD i added to have more space to > rebalance the > > metadata). CUrrently, the cluster is used mainly as a radosgw > storage, > > with 28tb data in total, replication 2x for both the metadata > and data > > pools(a cephfs isntance is running alongside there, but i don't > think > > it's the perpetrator - this happenned likely before we had it). All > > pools aside from the data pool of the cephfs and data pool of the > > radosgw are located on the SSD's. Now, the interesting thing - > at random > > times, the metadata OSD's fill up their entire capacity with > OMAP data > > and go to r/o mode and we have no other option currently than > deleting > > them and re-creating. The fillup comes at a random time, it > doesn't seem > > to be triggered by anything and it isn't caused by some data > influx. It > > seems like some kind of a bug to me to be honest, but i'm not > certain - > > anyone else seen this behavior with their radosgw? Thanks a lot > > Hi Josef, > > > > Do you have rgw_dynamic_resharding turned on? Try turning it off and see > > if the behavior continues. > > > > One theory is that dynamic resharding is triggered and possibly not > > completing. This could add a lot of data to omap for the incomplete > > bucket index shards. After a delay it tries resharding again, possibly > > failing again, and adding more data to the omap. This continues. > > > > If this is the ultimate issue we have some commits on the upstream > > luminous branch that are designed to address this set of issues. > > > > But we should first see if this is the cause. > > > > Eric > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mimic 13.2.3?
It is true for all distros. It doesn't happen the first time either. I think it is a bit dangerous. On 1/3/19 12:25 AM, Ashley Merrick wrote: Have just run an apt update and have noticed there are some CEPH packages now available for update on my mimic cluster / ubuntu. Have yet to install these yet but it look's like we have the next point release of CEPH Mimic, but not able to see any release note's or official comm's yet?.. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS client df command showing raw space after adding second pool to mds
Hi All Luminous 12.2.12 Single MDS Replicated pools A 'df' on a CephFS kernel client used to show me the usable space (i.e the raw space with the replication overhead applied). This was when I just had a single cephfs data pool. After adding a second pool to the mds and using file layouts to map a directory to that pool, a df is now showing the raw space. It's not the end of the world but was handy to see the usable space. I'm fairly sure the change was me adding the second pool although I'm not 99% sure. I'm seeing this behavior on the latest Centos 7.6 kernel and a 4.14 kernel, is this expected? Thanks, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Omap issues - metadata creating too many
Josef, I've noticed that when dynamic resharding is on it'll reshard some of our bucket indices daily (sometimes more). This causes a lot of wasted space in the .rgw.buckets.index pool which might be what you are seeing. You can get a listing of all the bucket instances in your cluster with this command: radosgw-admin metadata list bucket.instance | jq -r '.[]' | sort Give that a try and see if you see the same problem. It seems that once you remove the old bucket instances the omap dbs don't reduce in size until you compact them. Bryan From: Josef Zelenka Date: Thursday, January 3, 2019 at 3:49 AM To: "J. Eric Ivancich" Cc: "ceph-users@lists.ceph.com" , Bryan Stillwell Subject: Re: [ceph-users] Omap issues - metadata creating too many Hi, i had the default - so it was on(according to ceph kb). turned it off, but the issue persists. i noticed Bryan Stillwell(cc-ing him) had the same issue (reported about it yesterday) - tried his tips about compacting, but it doesn't do anything, however i have to add to his last point, this happens even with bluestore. Is there anything we can do to clean up the omap manually? Josef On 18/12/2018 23:19, J. Eric Ivancich wrote: On 12/17/18 9:18 AM, Josef Zelenka wrote: Hi everyone, i'm running a Luminous 12.2.5 cluster with 6 hosts on ubuntu 16.04 - 12 HDDs for data each, plus 2 SSD metadata OSDs(three nodes have an additional SSD i added to have more space to rebalance the metadata). CUrrently, the cluster is used mainly as a radosgw storage, with 28tb data in total, replication 2x for both the metadata and data pools(a cephfs isntance is running alongside there, but i don't think it's the perpetrator - this happenned likely before we had it). All pools aside from the data pool of the cephfs and data pool of the radosgw are located on the SSD's. Now, the interesting thing - at random times, the metadata OSD's fill up their entire capacity with OMAP data and go to r/o mode and we have no other option currently than deleting them and re-creating. The fillup comes at a random time, it doesn't seem to be triggered by anything and it isn't caused by some data influx. It seems like some kind of a bug to me to be honest, but i'm not certain - anyone else seen this behavior with their radosgw? Thanks a lot Hi Josef, Do you have rgw_dynamic_resharding turned on? Try turning it off and see if the behavior continues. One theory is that dynamic resharding is triggered and possibly not completing. This could add a lot of data to omap for the incomplete bucket index shards. After a delay it tries resharding again, possibly failing again, and adding more data to the omap. This continues. If this is the ultimate issue we have some commits on the upstream luminous branch that are designed to address this set of issues. But we should first see if this is the cause. Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Help with setting device-class rule on pool without causing data to move
Thanks, Sage! That did the trick. Wido, seems like an interesting approach but I wasn't brave enough to attempt it! Eric, I suppose this does the same thing that the crushtool reclassify feature does? Thank you both for your suggestions. For posterity: - I grabbed some 14.0.1 packages, extracted crushtool and libceph-common.so.1 - Ran 'crushtool -i cm --reclassify --reclassify-root default hdd -o cm_reclassified' - Compared the maps with: crushtool -i cm --compare cm_reclassified That suggested I would get an acceptable amount of data reshuffling which I expected, I didn't use --set-subtree-class as I'd already added SSD drives to the cluster. My ultimate goal was to migrate the cephfs_metadata pool onto SSD drives while leaving the cephfs_data pool on the HDD drives. The device classes feature made that really trivial, I just created an intermediary rule which would use both HDD and SDD hosts (I didn't have any mixed devices in hosts), set the Metadata pool to use the new rule, waited for recovery and then set the Metadata pool to use an SSD-only rule. Not sure if that intermediary stage was strictly necessary, I was concerned about inactive PGs. Thanks, David On Mon, Dec 31, 2018 at 6:06 PM Eric Goirand wrote: > Hi David, > > CERN has provided with a python script to swap the correct bucket IDs > (default <-> hdd), you can find it here : > > https://github.com/cernceph/ceph-scripts/blob/master/tools/device-class-id-swap.py > > The principle is the following : > - extract the CRUSH map > - run the script on it => it creates a new CRUSH file. > - edit the CRUSH map and modify the rule associated with the pool(s) you > want to associate with HDD OSDs only like : > => step take default WITH step take default class hdd > > Then recompile and reinject the new CRUSH map and voilà ! > > Your cluster should be using only the HDD OSDs without rebalancing (or a > very small amount). > > In case you have forgotten something, just reapply the former CRUSH map > and start again. > > Cheers and Happy new year 2019. > > Eric > > > > On Sun, Dec 30, 2018, 21:16 David C wrote: > >> Hi All >> >> I'm trying to set the existing pools in a Luminous cluster to use the hdd >> device-class but without moving data around. If I just create a new rule >> using the hdd class and set my pools to use that new rule it will cause a >> huge amount of data movement even though the pgs are all already on HDDs. >> >> There is a thread on ceph-large [1] which appears to have the solution >> but I can't get my head around what I need to do. I'm not too clear on >> which IDs I need to swap. Could someone give me some pointers on this >> please? >> >> [1] >> http://lists.ceph.com/pipermail/ceph-large-ceph.com/2018-April/000109.html >> >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] upgrade from jewel 10.2.10 to 10.2.11 broke anonymous swift
Hello, This is with RDO CentOS7, keystone and swift_account_in_url. The CEPH cluster runs luminous. curl 'https://object.example.org/swift/v1/AUTH_12345qhexvalue/test20_segments' this list the contents of the public bucket (Read ACL is .r:* according to swift stat test20_segments ) with 10.2.10 but with 10.2.11 it says "NoSuchBucket". I've tried to look through the new running settings in ceph --show-config but nothing screams "fix anonymous swift". http://tracker.ceph.com/issues/22259 from the release notes seems related, but it says that it would fix anonymous access? I'm a bit confused. Authenticated swift (downloading an object in a private bucket with for example horizon works) and s3cmd get of a private file things seems to work nicely in both 10.2.10 and 10.2.11. Does anybody have a suggestion of what I could try to troubleshoot this? // Johan Guldmyr Systems Specialist CSC - IT Center for Science http://www.csc.fi ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] problem w libvirt version 4.5 and 12.2.7
After update to CentOS 7.6, libvirt was updated from 3.9 to 4.5. Executing: "virsh vol-list ceph --details" makes libvirtd using 300% CPU for 2 minutes to show volumes on rbd. Quick pick at tcpdump shows accessing rbd_data.* which previous version of libvirtd did not need. Ceph version is 12.2.7. Any help will be appreciated There is nothing special in libvirt 4.5, I was upgraded hypervisors to this version, still works flawless. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs kernel client instability
I wonder if anyone could offer any insight on the issue below, regarding the CentOS 7.6 kernel cephfs client connecting to a Luminous cluster. I have since tried a much newer 4.19.13 kernel, which did not show the same issue (but unfortunately for various reasons unrelated to ceph, we can't go to such a new kernel). Am I reading it right that somehow the monitor thinks this kernel is old and needs to prepare special maps in some older format for it, and that takes too long and the kernel just gives up, or perhaps has some other communication protocol error? It seems like one of these mon communication sessions only lasts half a second. Then it reconnects to another mon, and gets the same result, etc. Any way around this? Andras On 12/26/18 7:55 PM, Andras Pataki wrote: We've been using ceph-fuse with a pretty good stability record (against the Luminous 12.2.8 back end). Unfortunately ceph-fuse has extremely poor small file performance (understandably), so we've been testing the kernel client. The latest RedHat kernel 3.10.0-957.1.3.el7.x86_64 seems to work pretty well, as long as the cluster is running in a completely clean state. However, it seems that as soon as there is something happening to the cluster, the kernel client crashes pretty badly. Today's example: I've reweighted some OSDs to balance the disk usage a bit (set nobackfill, reweight the OSDs, check the new hypothetical space usage, then unset nobackfill). As soon as the reweighting procedure started, the kernel client went into an infinite loop trying to unsuccessfully connect to mons: Dec 26 19:28:53 mon5 kernel: libceph: mon0 10.128.150.10:6789 io error Dec 26 19:28:53 mon5 kernel: libceph: mon0 10.128.150.10:6789 session lost, hunting for new mon Dec 26 19:28:53 mon5 kernel: libceph: mon2 10.128.150.12:6789 session established Dec 26 19:28:58 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error Dec 26 19:28:58 mon5 kernel: libceph: mon2 10.128.150.12:6789 session lost, hunting for new mon Dec 26 19:28:58 mon5 kernel: libceph: mon1 10.128.150.11:6789 session established Dec 26 19:28:59 mon5 kernel: libceph: mon1 10.128.150.11:6789 io error Dec 26 19:28:59 mon5 kernel: libceph: mon1 10.128.150.11:6789 session lost, hunting for new mon Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789 session established Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789 session lost, hunting for new mon Dec 26 19:28:59 mon5 kernel: libceph: mon0 10.128.150.10:6789 session established Dec 26 19:29:00 mon5 kernel: libceph: mon0 10.128.150.10:6789 io error Dec 26 19:29:00 mon5 kernel: libceph: mon0 10.128.150.10:6789 session lost, hunting for new mon Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789 session established Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789 session lost, hunting for new mon Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789 session established Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789 io error Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789 session lost, hunting for new mon Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789 session established Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789 session lost, hunting for new mon Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 session established Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 io error Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 session lost, hunting for new mon Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789 session established Dec 26 19:29:02 mon5 kernel: libceph: mon2 10.128.150.12:6789 io error Dec 26 19:29:02 mon5 kernel: libceph: mon2 10.128.150.12:6789 session lost, hunting for new mon Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789 session established Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789 io error Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789 session lost, hunting for new mon ... etc ... seemingly never recovering. The cluster is healthy, all other clients are successfully doing I/O: [root@cephmon00 ceph]# ceph -s cluster: id: d7b33135-0940-4e48-8aa6-1d2026597c2f health: HEALTH_WARN noout flag(s) set 1 backfillfull osd(s) 4 pool(s) backfillfull 239119058/12419244975 objects misplaced (1.925%) services: mon: 3 daemons, quorum cephmon00,cephmon01,cephmon02 mgr: cephmon00(active) mds: cephfs-1/1/1 up {0=cephmds00=up:active}, 1 up:standby osd: 3534 osds: 3534 up, 3534 in; 5040 remapped pgs flags noout
Re: [ceph-users] Omap issues - metadata creating too many
Hi, i had the default - so it was on(according to ceph kb). turned it off, but the issue persists. i noticed Bryan Stillwell(cc-ing him) had the same issue (reported about it yesterday) - tried his tips about compacting, but it doesn't do anything, however i have to add to his last point, this happens even with bluestore. Is there anything we can do to clean up the omap manually? Josef On 18/12/2018 23:19, J. Eric Ivancich wrote: On 12/17/18 9:18 AM, Josef Zelenka wrote: Hi everyone, i'm running a Luminous 12.2.5 cluster with 6 hosts on ubuntu 16.04 - 12 HDDs for data each, plus 2 SSD metadata OSDs(three nodes have an additional SSD i added to have more space to rebalance the metadata). CUrrently, the cluster is used mainly as a radosgw storage, with 28tb data in total, replication 2x for both the metadata and data pools(a cephfs isntance is running alongside there, but i don't think it's the perpetrator - this happenned likely before we had it). All pools aside from the data pool of the cephfs and data pool of the radosgw are located on the SSD's. Now, the interesting thing - at random times, the metadata OSD's fill up their entire capacity with OMAP data and go to r/o mode and we have no other option currently than deleting them and re-creating. The fillup comes at a random time, it doesn't seem to be triggered by anything and it isn't caused by some data influx. It seems like some kind of a bug to me to be honest, but i'm not certain - anyone else seen this behavior with their radosgw? Thanks a lot Hi Josef, Do you have rgw_dynamic_resharding turned on? Try turning it off and see if the behavior continues. One theory is that dynamic resharding is triggered and possibly not completing. This could add a lot of data to omap for the incomplete bucket index shards. After a delay it tries resharding again, possibly failing again, and adding more data to the omap. This continues. If this is the ultimate issue we have some commits on the upstream luminous branch that are designed to address this set of issues. But we should first see if this is the cause. Eric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com