Re: [ceph-users] Large OMAP Objects in default.rgw.log pool
Den tis 21 maj 2019 kl 02:12 skrev mr. non non : > Does anyone have this issue before? As research, many people have issue > with rgw.index which related to small small number of index sharding (too > many objects per index). > I also check on this thread > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033611.html but > don't found any clues because number of data objects is below 100k per > index and size of objects in rgw.log is 0. > > Hi, I've had the same issue with a large omap object in the default.rgw.log pool for a long time, but after determining that it wasn't a big issue for us and the number of omap keys wasn't growing, I haven't actively tried to find a solution. The cluster was running Jewel for over a year and it was during this time the omap entries in default.rgw.log where created, but (of cause) the warning showed up after the upgrade to Luminous. #ceph health detail … HEALTH_WARN 1 large omap objects LARGE_OMAP_OBJECTS 1 large omap objects 1 large objects found in pool 'default.rgw.log' … #zgrep “Large” ceph-osd.324.log-20190516.gz 2019-05-15 23:14:48.821612 7fa17cf74700 0 log_channel(cluster) log [WRN] : Large omap object found. Object: 48:8b8fff66:::meta.log.128a376b-8807-4e19-9ddc-8220fd50d7c1.41:head Key count: 2844494 Size (bytes): 682777168 # The reply from Pavan Rallabhandi (below) in the previous thread might be useful but I’m not familiar enough with rgw to tell or figure out how that could help resolv the issue. >That can happen if you have lot of objects with swift object expiry (TTL) enabled. You can 'listomapkeys' on these log pool objects and check for the objects that have registered for TTL as omap entries. I know this is the case with at least Jewel version > >Thanks, >-Pavan. Regards /Magnus > Thanks. > -- > *From:* ceph-users on behalf of mr. > non non > *Sent:* Monday, May 20, 2019 7:32 PM > *To:* EDH - Manuel Rios Fernandez; ceph-users@lists.ceph.com > *Subject:* Re: [ceph-users] Large OMAP Objects in default.rgw.log pool > > Hi Manuel, > > I use version 12.2.8 with bluestore and also use manually index sharding > (configured to 100). As I checked, no buckets reach 100k of > objects_per_shard. > here are health status and cluster log > > # ceph health detail > HEALTH_WARN 1 large omap objects > LARGE_OMAP_OBJECTS 1 large omap objects > 1 large objects found in pool 'default.rgw.log' > Search the cluster log for 'Large omap object found' for more details. > > # cat ceph.log | tail -2 > 2019-05-19 17:49:36.306481 mon.MONNODE1 mon.0 10.118.191.231:6789/0 > 528758 : cluster [WRN] Health check failed: 1 large omap objects > (LARGE_OMAP_OBJECTS) > 2019-05-19 17:49:34.535543 osd.38 osd.38 MONNODE1_IP:6808/3514427 12 : > cluster [WRN] Large omap object found. Object: > 4:b172cd59:usage::usage.26:head Key count: 8720830 Size (bytes): 1647024346 > > All objects size are 0. > $ for i in `rados ls -p default.rgw.log`; do rados stat -p > default.rgw.log ${i};done | more > default.rgw.log/obj_delete_at_hint.78 mtime 2019-05-20 > 19:31:45.00, size 0 > default.rgw.log/meta.history mtime 2019-05-20 19:19:40.00, size 50 > default.rgw.log/obj_delete_at_hint.70 mtime 2019-05-20 > 19:31:45.00, size 0 > default.rgw.log/obj_delete_at_hint.000104 mtime 2019-05-20 > 19:31:45.00, size 0 > default.rgw.log/obj_delete_at_hint.26 mtime 2019-05-20 > 19:31:45.00, size 0 > default.rgw.log/obj_delete_at_hint.28 mtime 2019-05-20 > 19:31:45.00, size 0 > default.rgw.log/obj_delete_at_hint.40 mtime 2019-05-20 > 19:31:45.00, size 0 > default.rgw.log/obj_delete_at_hint.15 mtime 2019-05-20 > 19:31:45.00, size 0 > default.rgw.log/obj_delete_at_hint.69 mtime 2019-05-20 > 19:31:45.00, size 0 > default.rgw.log/obj_delete_at_hint.95 mtime 2019-05-20 > 19:31:45.00, size 0 > default.rgw.log/obj_delete_at_hint.03 mtime 2019-05-20 > 19:31:45.00, size 0 > default.rgw.log/obj_delete_at_hint.47 mtime 2019-05-20 > 19:31:45.00, size 0 > default.rgw.log/obj_delete_at_hint.35 mtime 2019-05-20 > 19:31:45.00, size 0 > > > Please kindly advise how to remove health_warn message. > > Many thanks. > Arnondh > > -- > *From:* EDH - Manuel Rios Fernandez > *Sent:* Monday, May 20, 2019 5:41 PM > *To:* 'mr. non non'; ceph-users@lists.ceph.com > *Subject:* RE: [ceph-users] Large OMAP Objects in default.rgw.log pool > > > Hi Arnondh, > > > > Whats your ceph version? > > > > Regards > > > > > > *De:* ceph-users *En nombre de *mr. > non non > *Enviado el:* lunes, 20 de mayo de 2019 12:39 > *Para:* ceph-users@lists.ceph.com > *Asunto:* [ceph-users] Large OMAP Objects in default.rgw.log pool > > > > Hi, > > > > I found the same issue like above. > > Does anyone know how to fix it? > > > > Thanks. > > Arnondh > ___ > ceph-users mailing list >
Re: [ceph-users] Remove RBD mirror?
Den fre 12 apr. 2019 kl 16:37 skrev Jason Dillaman : > On Fri, Apr 12, 2019 at 9:52 AM Magnus Grönlund > wrote: > > > > Hi Jason, > > > > Tried to follow the instructions and setting the debug level to 15 > worked OK, but the daemon appeared to silently ignore the restart command > (nothing indicating a restart seen in the log). > > So I set the log level to 15 in the config file and restarted the rbd > mirror daemon. The output surprised me though, my previous perception of > the issue might be completely wrong... > > Lots of "image_replayer::BootstrapRequest: failed to create local > image: (2) No such file or directory" and ":ImageReplayer: replay > encountered an error: (42) No message of desired type" > > What is the result from "rbd mirror pool status --verbose nova" > against your DR cluster now? Are they in up+error now? The ENOENT > errors most likely related to a parent image that hasn't been > mirrored. The ENOMSG error seems to indicate that there might be some > corruption in a journal and it's missing expected records (like a > production client crashed), but it should be able to recover from > that > # rbd mirror pool status --verbose nova health: WARNING images: 2479 total 2479 unknown 002344ab-c324-4c01-97ff-de32868fa712_disk: global_id: c02e0202-df8f-46ce-a4b6-1a50a9692804 state: down+unknown description: status not found last_update: 002a8fde-3a63-4e32-9c18-b0bf64393d0f_disk: global_id: d412abc4-b37e-44a2-8aba-107f352dec60 state: down+unknown description: status not found last_update: > > https://pastebin.com/1bTETNGs > > > > Best regards > > /Magnus > > > > Den tis 9 apr. 2019 kl 18:35 skrev Jason Dillaman : > >> > >> Can you pastebin the results from running the following on your backup > >> site rbd-mirror daemon node? > >> > >> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 15 > >> ceph --admin-socket /path/to/asok rbd mirror restart nova > >> wait a minute to let some logs accumulate ... > >> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 0/5 > >> > >> ... and collect the rbd-mirror log from /var/log/ceph/ (should have > >> lots of "rbd::mirror"-like log entries. > >> > >> > >> On Tue, Apr 9, 2019 at 12:23 PM Magnus Grönlund > wrote: > >> > > >> > > >> > > >> > Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman < > jdill...@redhat.com>: > >> >> > >> >> Any chance your rbd-mirror daemon has the admin sockets available > >> >> (defaults to /var/run/ceph/cephdr-clientasok)? > If > >> >> so, you can run "ceph --admin-daemon /path/to/asok rbd mirror > status". > >> > > >> > > >> > { > >> > "pool_replayers": [ > >> > { > >> > "pool": "glance", > >> > "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00 > cluster: production client: client.productionbackup", > >> > "instance_id": "869081", > >> > "leader_instance_id": "869081", > >> > "leader": true, > >> > "instances": [], > >> > "local_cluster_admin_socket": > "/var/run/ceph/client.backup.1936211.backup.94225674131712.asok", > >> > "remote_cluster_admin_socket": > "/var/run/ceph/client.productionbackup.1936211.production.9422567521.asok", > >> > "sync_throttler": { > >> > "max_parallel_syncs": 5, > >> > "running_syncs": 0, > >> > "waiting_syncs": 0 > >> > }, > >> > "image_replayers": [ > >> > { > >> > "name": > "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0", > >> > "state": "Replaying" > >> > }, > >> > { > >> > "name": > "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62", > >> > "state": "Replaying" > >> > }, > >> > ---cut---
Re: [ceph-users] Remove RBD mirror?
Hi Jason, Tried to follow the instructions and setting the debug level to 15 worked OK, but the daemon appeared to silently ignore the restart command (nothing indicating a restart seen in the log). So I set the log level to 15 in the config file and restarted the rbd mirror daemon. The output surprised me though, my previous perception of the issue might be completely wrong... Lots of "image_replayer::BootstrapRequest: failed to create local image: (2) No such file or directory" and ":ImageReplayer: replay encountered an error: (42) No message of desired type" https://pastebin.com/1bTETNGs Best regards /Magnus Den tis 9 apr. 2019 kl 18:35 skrev Jason Dillaman : > Can you pastebin the results from running the following on your backup > site rbd-mirror daemon node? > > ceph --admin-socket /path/to/asok config set debug_rbd_mirror 15 > ceph --admin-socket /path/to/asok rbd mirror restart nova > wait a minute to let some logs accumulate ... > ceph --admin-socket /path/to/asok config set debug_rbd_mirror 0/5 > > ... and collect the rbd-mirror log from /var/log/ceph/ (should have > lots of "rbd::mirror"-like log entries. > > > On Tue, Apr 9, 2019 at 12:23 PM Magnus Grönlund > wrote: > > > > > > > > Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman : > >> > >> Any chance your rbd-mirror daemon has the admin sockets available > >> (defaults to /var/run/ceph/cephdr-clientasok)? If > >> so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status". > > > > > > { > > "pool_replayers": [ > > { > > "pool": "glance", > > "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00 cluster: > production client: client.productionbackup", > > "instance_id": "869081", > > "leader_instance_id": "869081", > > "leader": true, > > "instances": [], > > "local_cluster_admin_socket": > "/var/run/ceph/client.backup.1936211.backup.94225674131712.asok", > > "remote_cluster_admin_socket": > "/var/run/ceph/client.productionbackup.1936211.production.9422567521.asok", > > "sync_throttler": { > > "max_parallel_syncs": 5, > > "running_syncs": 0, > > "waiting_syncs": 0 > > }, > > "image_replayers": [ > > { > > "name": > "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0", > > "state": "Replaying" > > }, > > { > > "name": > "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62", > > "state": "Replaying" > > }, > > ---cut-- > > { > > "name": > "cinder/volume-bcb41f46-3716-4ee2-aa19-6fbc241fbf05", > > "state": "Replaying" > > } > > ] > > }, > > { > > "pool": "nova", > > "peer": "uuid: 1fc7fefc-9bcb-4f36-a259-66c3d8086702 cluster: > production client: client.productionbackup", > > "instance_id": "889074", > > "leader_instance_id": "889074", > > "leader": true, > > "instances": [], > > "local_cluster_admin_socket": > "/var/run/ceph/client.backup.1936211.backup.94225678548048.asok", > > "remote_cluster_admin_socket": > "/var/run/ceph/client.productionbackup.1936211.production.94225679621728.asok", > > "sync_throttler": { > > "max_parallel_syncs": 5, > > "running_syncs": 0, > > "waiting_syncs": 0 > > }, > > "image_replayers": [] > > } > > ], > > "image_deleter": { > > "image_deleter_status": { > > "delete_images_queue": [ > > { > > "local_pool_id": 3, > > "global_
Re: [ceph-users] Remove RBD mirror?
Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman : > Any chance your rbd-mirror daemon has the admin sockets available > (defaults to /var/run/ceph/cephdr-clientasok)? If > so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status". > { "pool_replayers": [ { "pool": "glance", "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00 cluster: production client: client.productionbackup", "instance_id": "869081", "leader_instance_id": "869081", "leader": true, "instances": [], "local_cluster_admin_socket": "/var/run/ceph/client.backup.1936211.backup.94225674131712.asok", "remote_cluster_admin_socket": "/var/run/ceph/client.productionbackup.1936211.production.9422567521.asok", "sync_throttler": { "max_parallel_syncs": 5, "running_syncs": 0, "waiting_syncs": 0 }, "image_replayers": [ { "name": "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0", "state": "Replaying" }, { "name": "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62", "state": "Replaying" }, ---cut-- { "name": "cinder/volume-bcb41f46-3716-4ee2-aa19-6fbc241fbf05", "state": "Replaying" } ] }, { "pool": "nova", "peer": "uuid: 1fc7fefc-9bcb-4f36-a259-66c3d8086702 cluster: production client: client.productionbackup", "instance_id": "889074", "leader_instance_id": "889074", "leader": true, "instances": [], "local_cluster_admin_socket": "/var/run/ceph/client.backup.1936211.backup.94225678548048.asok", "remote_cluster_admin_socket": "/var/run/ceph/client.productionbackup.1936211.production.94225679621728.asok", "sync_throttler": { "max_parallel_syncs": 5, "running_syncs": 0, "waiting_syncs": 0 }, "image_replayers": [] } ], "image_deleter": { "image_deleter_status": { "delete_images_queue": [ { "local_pool_id": 3, "global_image_id": "ff531159-de6f-4324-a022-50c079dedd45" } ], "failed_deletes_queue": [] } > > On Tue, Apr 9, 2019 at 11:26 AM Magnus Grönlund > wrote: > > > > > > > > Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman : > >> > >> On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund > wrote: > >> > > >> > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund > wrote: > >> > >> > >> > >> Hi, > >> > >> We have configured one-way replication of pools between a > production cluster and a backup cluster. But unfortunately the rbd-mirror > or the backup cluster is unable to keep up with the production cluster so > the replication fails to reach replaying state. > >> > > > >> > >Hmm, it's odd that they don't at least reach the replaying state. Are > >> > >they still performing the initial sync? > >> > > >> > There are three pools we try to mirror, (glance, cinder, and nova, no > points for guessing what the cluster is used for :) ), > >> > the glance and cinder pools are smaller and sees limited write > activity, and the mirroring works, the nova pool which is the largest and > has 90% of the write activity never leaves the "unknown" state. > >> > > >> > # rbd mirror pool status cinder > >> > health: OK > >> > images: 892 total > >> > 890 replaying > >> > 2 stopped > >> > # > >> > # rbd mirror pool status nova > >> > health: WARNING > >> > images: 2479 total > >> > 2479 unknown > >> > # > >> > The production clsuter has 5k writes/s on average and the backup > cluster has 1-2k writes/s o
Re: [ceph-users] Remove RBD mirror?
Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman : > On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund > wrote: > > > > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund > wrote: > > >> > > >> Hi, > > >> We have configured one-way replication of pools between a production > cluster and a backup cluster. But unfortunately the rbd-mirror or the > backup cluster is unable to keep up with the production cluster so the > replication fails to reach replaying state. > > > > > >Hmm, it's odd that they don't at least reach the replaying state. Are > > >they still performing the initial sync? > > > > There are three pools we try to mirror, (glance, cinder, and nova, no > points for guessing what the cluster is used for :) ), > > the glance and cinder pools are smaller and sees limited write activity, > and the mirroring works, the nova pool which is the largest and has 90% of > the write activity never leaves the "unknown" state. > > > > # rbd mirror pool status cinder > > health: OK > > images: 892 total > > 890 replaying > > 2 stopped > > # > > # rbd mirror pool status nova > > health: WARNING > > images: 2479 total > > 2479 unknown > > # > > The production clsuter has 5k writes/s on average and the backup cluster > has 1-2k writes/s on average. The production cluster is bigger and has > better specs. I thought that the backup cluster would be able to keep up > but it looks like I was wrong. > > The fact that they are in the unknown state just means that the remote > "rbd-mirror" daemon hasn't started any journal replayers against the > images. If it couldn't keep up, it would still report a status of > "up+replaying". What Ceph release are you running on your backup > cluster? > > The backup cluster is running Luminous 12.2.11 (the production cluster 12.2.10) > > >> And the journals on the rbd volumes keep growing... > > >> > > >> Is it enough to simply disable the mirroring of the pool (rbd mirror > pool disable ) and that will remove the lagging reader from the > journals and shrink them, or is there anything else that has to be done? > > > > > >You can either disable the journaling feature on the image(s) since > > >there is no point to leave it on if you aren't using mirroring, or run > > >"rbd mirror pool disable " to purge the journals. > > > > Thanks for the confirmation. > > I will stop the mirror of the nova pool and try to figure out if there > is anything we can do to get the backup cluster to keep up. > > > > >> Best regards > > >> /Magnus > > >> ___ > > >> ceph-users mailing list > > >> ceph-users@lists.ceph.com > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > >-- > > >Jason > > > > -- > Jason > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Remove RBD mirror?
>On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund wrote: >> >> Hi, >> We have configured one-way replication of pools between a production cluster and a backup cluster. But unfortunately the rbd-mirror or the backup cluster is unable to keep up with the production cluster so the replication fails to reach replaying state. > >Hmm, it's odd that they don't at least reach the replaying state. Are >they still performing the initial sync? There are three pools we try to mirror, (glance, cinder, and nova, no points for guessing what the cluster is used for :) ), the glance and cinder pools are smaller and sees limited write activity, and the mirroring works, the nova pool which is the largest and has 90% of the write activity never leaves the "unknown" state. # rbd mirror pool status cinder health: OK images: 892 total 890 replaying 2 stopped # # rbd mirror pool status nova health: WARNING images: 2479 total 2479 unknown # The production clsuter has 5k writes/s on average and the backup cluster has 1-2k writes/s on average. The production cluster is bigger and has better specs. I thought that the backup cluster would be able to keep up but it looks like I was wrong. >> And the journals on the rbd volumes keep growing... >> >> Is it enough to simply disable the mirroring of the pool (rbd mirror pool disable ) and that will remove the lagging reader from the journals and shrink them, or is there anything else that has to be done? > >You can either disable the journaling feature on the image(s) since >there is no point to leave it on if you aren't using mirroring, or run >"rbd mirror pool disable " to purge the journals. Thanks for the confirmation. I will stop the mirror of the nova pool and try to figure out if there is anything we can do to get the backup cluster to keep up. >> Best regards >> /Magnus >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >-- >Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Remove RBD mirror?
Hi, We have configured one-way replication of pools between a production cluster and a backup cluster. But unfortunately the rbd-mirror or the backup cluster is unable to keep up with the production cluster so the replication fails to reach replaying state. And the journals on the rbd volumes keep growing... Is it enought to simply disable the mirroring of the pool (rbd mirror pool disable ) and that will remove the lagging reader from the journals and shrink them, or is there any thing else that has to be done? Best regards /Magnus ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD-mirror high cpu usage?
Hi, Answering my own question, the high load was related to the cpufreq kernel module. Unloaded the cpufreq module and the CPU load instantly dropped and the mirroring started to work. Obviously there is a bug somewhere but for the moment I’m just happy it works. /Magnus Den tors 15 nov. 2018 kl 15:24 skrev Magnus Grönlund : > Hi, > > I’m trying to setup one-way rbd-mirroring for a ceph-cluster used by an > openstack cloud, but the rbd-mirror is unable to “catch up” with the > changes. However it appears to me as if it's not due to the ceph-clusters > or the network but due to the server running the rbd-mirror process running > out of cpu? > > Is a high cpu load to be expected or is it a symptom of something else? > Or in other words, what can I check/do to get the mirroring working? > > # rbd mirror pool status nova > health: WARNING > images: 596 total > 572 starting_replay > 24 replaying > > top - 13:31:36 up 79 days, 5:31, 1 user, load average: 32.27, 26.82, > 25.33 > Tasks: 360 total, 17 running, 182 sleeping, 0 stopped, 0 zombie > %Cpu(s): 8.9 us, 70.0 sy, 0.0 ni, 18.5 id, 0.0 wa, 0.0 hi, 2.7 si, > 0.0 st > KiB Mem : 13205185+total, 12862490+free, 579508 used, 2847444 buff/cache > KiB Swap:0 total,0 free,0 used. 12948856+avail Mem > PID USER PR NIVIRTRESSHR S %CPU %MEM > TIME+ COMMAND > 2336553 ceph 20 0 17.1g 178160 20344 S 417.2 0.1 21:50.61 > rbd-mirror > 2312698 root 20 0 0 0 0 I 70.2 0.0 70:11.51 > kworker/12:2 > 2312851 root 20 0 0 0 0 R 69.2 0.0 62:29.69 > kworker/24:1 > 2324627 root 20 0 0 0 0 I 68.4 0.0 40:36.77 > kworker/14:1 > 2235817 root 20 0 0 0 0 I 68.0 0.0 469:14.08 > kworker/8:0 > 2241720 root 20 0 0 0 0 R 67.3 0.0 437:46.51 > kworker/9:1 > 2306648 root 20 0 0 0 0 R 66.9 0.0 109:27.44 > kworker/25:0 > 2324625 root 20 0 0 0 0 R 66.9 0.0 40:37.53 > kworker/13:1 > 2336318 root 20 0 0 0 0 R 66.7 0.0 14:51.96 > kworker/27:3 > 2324643 root 20 0 0 0 0 I 66.5 0.0 36:21.46 > kworker/15:2 > 2294989 root 20 0 0 0 0 I 66.3 0.0 134:09.89 > kworker/11:1 > 2324626 root 20 0 0 0 0 I 66.3 0.0 39:44.14 > kworker/28:2 > 2324019 root 20 0 0 0 0 I 65.3 0.0 44:51.80 > kworker/26:1 > 2235814 root 20 0 0 0 0 R 65.1 0.0 459:14.70 > kworker/29:2 > 2294174 root 20 0 0 0 0 I 64.5 0.0 220:58.50 > kworker/30:1 > 2324355 root 20 0 0 0 0 R 63.3 0.0 45:04.29 > kworker/10:1 > 2263800 root 20 0 0 0 0 R 62.9 0.0 353:38.48 > kworker/31:1 > 2270765 root 20 0 0 0 0 R 60.2 0.0 294:46.34 > kworker/0:0 > 2294798 root 20 0 0 0 0 R 59.8 0.0 148:48.23 > kworker/1:2 > 2307128 root 20 0 0 0 0 R 59.8 0.0 86:15.45 > kworker/6:2 > 2307129 root 20 0 0 0 0 I 59.6 0.0 85:29.66 > kworker/5:0 > 2294826 root 20 0 0 0 0 R 58.2 0.0 138:53.56 > kworker/7:3 > 2294575 root 20 0 0 0 0 I 57.8 0.0 155:03.74 > kworker/2:3 > 2294310 root 20 0 0 0 0 I 57.2 0.0 176:10.92 > kworker/4:2 > 2295000 root 20 0 0 0 0 I 57.2 0.0 132:47.28 > kworker/3:2 > 2307060 root 20 0 0 0 0 I 56.6 0.0 87:46.59 > kworker/23:2 > 2294931 root 20 0 0 0 0 I 56.4 0.0 133:31.47 > kworker/17:2 > 2318659 root 20 0 0 0 0 I 56.2 0.0 55:01.78 > kworker/16:2 > 2336304 root 20 0 0 0 0 I 56.0 0.0 11:45.92 > kworker/21:2 > 2306947 root 20 0 0 0 0 R 55.6 0.0 90:45.31 > kworker/22:2 > 2270628 root 20 0 0 0 0 I 53.8 0.0 273:43.31 > kworker/19:3 > 2294797 root 20 0 0 0 0 R 52.3 0.0 141:13.67 > kworker/18:0 > 2330537 root 20 0 0 0 0 R 52.3 0.0 25:33.25 > kworker/20:2 > > The main cluster has 12 nodes with 120 OSDs and the backup cluster has 6 > nodes with 60 OSDs (but roughly the same amount of storage), the rbd-mirror > runs on a separate server with 2* E5-2650v2 cpus and 128GB memory. > > Best regards > /Magnus > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RBD-mirror high cpu usage?
Hi, I’m trying to setup one-way rbd-mirroring for a ceph-cluster used by an openstack cloud, but the rbd-mirror is unable to “catch up” with the changes. However it appears to me as if it's not due to the ceph-clusters or the network but due to the server running the rbd-mirror process running out of cpu? Is a high cpu load to be expected or is it a symptom of something else? Or in other words, what can I check/do to get the mirroring working? # rbd mirror pool status nova health: WARNING images: 596 total 572 starting_replay 24 replaying top - 13:31:36 up 79 days, 5:31, 1 user, load average: 32.27, 26.82, 25.33 Tasks: 360 total, 17 running, 182 sleeping, 0 stopped, 0 zombie %Cpu(s): 8.9 us, 70.0 sy, 0.0 ni, 18.5 id, 0.0 wa, 0.0 hi, 2.7 si, 0.0 st KiB Mem : 13205185+total, 12862490+free, 579508 used, 2847444 buff/cache KiB Swap:0 total,0 free,0 used. 12948856+avail Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 2336553 ceph 20 0 17.1g 178160 20344 S 417.2 0.1 21:50.61 rbd-mirror 2312698 root 20 0 0 0 0 I 70.2 0.0 70:11.51 kworker/12:2 2312851 root 20 0 0 0 0 R 69.2 0.0 62:29.69 kworker/24:1 2324627 root 20 0 0 0 0 I 68.4 0.0 40:36.77 kworker/14:1 2235817 root 20 0 0 0 0 I 68.0 0.0 469:14.08 kworker/8:0 2241720 root 20 0 0 0 0 R 67.3 0.0 437:46.51 kworker/9:1 2306648 root 20 0 0 0 0 R 66.9 0.0 109:27.44 kworker/25:0 2324625 root 20 0 0 0 0 R 66.9 0.0 40:37.53 kworker/13:1 2336318 root 20 0 0 0 0 R 66.7 0.0 14:51.96 kworker/27:3 2324643 root 20 0 0 0 0 I 66.5 0.0 36:21.46 kworker/15:2 2294989 root 20 0 0 0 0 I 66.3 0.0 134:09.89 kworker/11:1 2324626 root 20 0 0 0 0 I 66.3 0.0 39:44.14 kworker/28:2 2324019 root 20 0 0 0 0 I 65.3 0.0 44:51.80 kworker/26:1 2235814 root 20 0 0 0 0 R 65.1 0.0 459:14.70 kworker/29:2 2294174 root 20 0 0 0 0 I 64.5 0.0 220:58.50 kworker/30:1 2324355 root 20 0 0 0 0 R 63.3 0.0 45:04.29 kworker/10:1 2263800 root 20 0 0 0 0 R 62.9 0.0 353:38.48 kworker/31:1 2270765 root 20 0 0 0 0 R 60.2 0.0 294:46.34 kworker/0:0 2294798 root 20 0 0 0 0 R 59.8 0.0 148:48.23 kworker/1:2 2307128 root 20 0 0 0 0 R 59.8 0.0 86:15.45 kworker/6:2 2307129 root 20 0 0 0 0 I 59.6 0.0 85:29.66 kworker/5:0 2294826 root 20 0 0 0 0 R 58.2 0.0 138:53.56 kworker/7:3 2294575 root 20 0 0 0 0 I 57.8 0.0 155:03.74 kworker/2:3 2294310 root 20 0 0 0 0 I 57.2 0.0 176:10.92 kworker/4:2 2295000 root 20 0 0 0 0 I 57.2 0.0 132:47.28 kworker/3:2 2307060 root 20 0 0 0 0 I 56.6 0.0 87:46.59 kworker/23:2 2294931 root 20 0 0 0 0 I 56.4 0.0 133:31.47 kworker/17:2 2318659 root 20 0 0 0 0 I 56.2 0.0 55:01.78 kworker/16:2 2336304 root 20 0 0 0 0 I 56.0 0.0 11:45.92 kworker/21:2 2306947 root 20 0 0 0 0 R 55.6 0.0 90:45.31 kworker/22:2 2270628 root 20 0 0 0 0 I 53.8 0.0 273:43.31 kworker/19:3 2294797 root 20 0 0 0 0 R 52.3 0.0 141:13.67 kworker/18:0 2330537 root 20 0 0 0 0 R 52.3 0.0 25:33.25 kworker/20:2 The main cluster has 12 nodes with 120 OSDs and the backup cluster has 6 nodes with 60 OSDs (but roughly the same amount of storage), the rbd-mirror runs on a separate server with 2* E5-2650v2 cpus and 128GB memory. Best regards /Magnus ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Ceph-deploy] Cluster Name
Hi Jocelyn, I'm in the process of setting up rdb-mirroring myself and stumbled on the same problem. But I think that the "trick" here is to _not_ colocate the RDB-mirror daemon with any other part of the cluster(s), it should be run on a separate host. That way you can change the CLUSTER_NAME variable in /etc/sysconfig/ceph without affecting any of the mons, osd etc. Best regards /Magnus 2018-08-09 7:41 GMT+02:00 Thode Jocelyn : > Hi Erik, > > The thing is that the rbd-mirror service uses the /etc/sysconfig/ceph file > to determine which configuration file to use (from CLUSTER_NAME). So you > need to set this to the name you chose for rbd-mirror to work. However > setting this CLUSTER_NAME variable in /etc/sysconfig/ceph makes it so that > the mon, osd etc services will also use this variable. Because of this they > cannot start anymore as all their path are set with "ceph" as cluster name. > > However there might be something that I missed which would make this point > moot > > Best Regards > Jocelyn Thode > > -Original Message- > From: Erik McCormick [mailto:emccorm...@cirrusseven.com] > Sent: mercredi, 8 août 2018 16:39 > To: Thode Jocelyn > Cc: Glen Baars ; Vasu Kulkarni < > vakul...@redhat.com>; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name > > I'm not using this feature, so maybe I'm missing something, but from the > way I understand cluster naming to work... > > I still don't understand why this is blocking for you. Unless you are > attempting to mirror between two clusters running on the same hosts (why > would you do this?) then systemd doesn't come into play. The --cluster flag > on the rbd command will simply set the name of a configuration file with > the FSID and settings of the appropriate cluster. Cluster name is just a > way of telling ceph commands and systemd units where to find the configs. > > So, what you end up with is something like: > > /etc/ceph/ceph.conf (your local cluster configuration) on both clusters > /etc/ceph/local.conf (config of the source cluster. Just a copy of > ceph.conf of the source clsuter) /etc/ceph/remote.conf (config of > destination peer cluster. Just a copy of ceph.conf of the remote cluster). > > Run all your rbd mirror commands against local and remote names. > However when starting things like mons, osds, mds, etc. you need no > cluster name as it can use ceph.conf (cluster name of ceph). > > Am I making sense, or have I completely missed something? > > -Erik > > On Wed, Aug 8, 2018 at 8:34 AM, Thode Jocelyn > wrote: > > Hi, > > > > > > > > We are still blocked by this problem on our end. Glen did you or > > someone else figure out something for this ? > > > > > > > > Regards > > > > Jocelyn Thode > > > > > > > > From: Glen Baars [mailto:g...@onsitecomputers.com.au] > > Sent: jeudi, 2 août 2018 05:43 > > To: Erik McCormick > > Cc: Thode Jocelyn ; Vasu Kulkarni > > ; ceph-users@lists.ceph.com > > Subject: RE: [ceph-users] [Ceph-deploy] Cluster Name > > > > > > > > Hello Erik, > > > > > > > > We are going to use RBD-mirror to replicate the clusters. This seems > > to need separate cluster names. > > > > Kind regards, > > > > Glen Baars > > > > > > > > From: Erik McCormick > > Sent: Thursday, 2 August 2018 9:39 AM > > To: Glen Baars > > Cc: Thode Jocelyn ; Vasu Kulkarni > > ; ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name > > > > > > > > Don't set a cluster name. It's no longer supported. It really only > > matters if you're running two or more independent clusters on the same > > boxes. That's generally inadvisable anyway. > > > > > > > > Cheers, > > > > Erik > > > > > > > > On Wed, Aug 1, 2018, 9:17 PM Glen Baars > wrote: > > > > Hello Ceph Users, > > > > Does anyone know how to set the Cluster Name when deploying with > > Ceph-deploy? I have 3 clusters to configure and need to correctly set > > the name. > > > > Kind regards, > > Glen Baars > > > > -Original Message- > > From: ceph-users On Behalf Of Glen > > Baars > > Sent: Monday, 23 July 2018 5:59 PM > > To: Thode Jocelyn ; Vasu Kulkarni > > > > Cc: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name > > > > How very timely, I am facing the exact same issue. > > > > Kind regards, > > Glen Baars > > > > -Original Message- > > From: ceph-users On Behalf Of > > Thode Jocelyn > > Sent: Monday, 23 July 2018 1:42 PM > > To: Vasu Kulkarni > > Cc: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name > > > > Hi, > > > > Yes my rbd-mirror is coloctaed with my mon/osd. It only affects nodes > > where they are collocated as they all use the "/etc/sysconfig/ceph" > > configuration file. > > > > Best > > Jocelyn Thode > > > > -Original Message- > > From: Vasu Kulkarni [mailto:vakul...@redhat.com] > > Sent: vendredi, 20 juillet 2018 17:25 > > To: Thode Jocelyn > > Cc: ceph-users@lists.ceph.com > > Subject: Re: [ceph-users] [Ceph-deploy]
Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.
Hej David and thanks! That was indeed the magic trick, no more peering, stale or down PGs. Upgraded the ceph-packages on the hosts, restarted the OSDs and then "ceph osd require-osd-release luminous" /Magnus 2018-07-12 12:05 GMT+02:00 David Majchrzak : > Hi/Hej Magnus, > > We had a similar issue going from latest hammer to jewel (so might not be > applicable for you), with PGs stuck peering / data misplaced, right after > updating all mons to latest jewel at that time 10.2.10. > > Finally setting the require_jewel_osds put everything back in place ( we > were going to do this after restarting all OSDs, following the > docs/changelogs ). > > What does your ceph health detail look like? > > Did you perform any other commands after starting your mon upgrade? Any > commands that might change the crush-map might cause issues AFAIK (correct > me if im wrong, but i think we ran into this once) if your mons and osds > are different versions. > > // david > > On jul 12 2018, at 11:45 am, Magnus Grönlund wrote: > > > Hi list, > > Things went from bad to worse, tried to upgrade some OSDs to Luminous to > see if that could help but that didn’t appear to make any difference. > But for each restarted OSD there was a few PGs that the OSD seemed to > “forget” and the number of undersized PGs grew until some PGs had been > “forgotten” by all 3 acting OSDs and became stale, even though all OSDs > (and their disks) where available. > Then the OSDs grew so big that the servers ran out of memory (48GB per > server with 10 2TB-disks per server) and started killing the OSDs… > All OSDs where then shutdown to try and preserve some data on the disks at > least, but maybe it is too late? > > /Magnus > > 2018-07-11 21:10 GMT+02:00 Magnus Grönlund : > > Hi Paul, > > No all OSDs are still jewel , the issue started before I had even started > to upgrade the first OSD and they don't appear to be flapping. > ceph -w shows a lot of slow request etc, but nothing unexpected as far as > I can tell considering the state the cluster is in. > > 2018-07-11 20:40:09.396642 osd.37 [WRN] 100 slow requests, 2 included > below; oldest blocked for > 25402.278824 secs > 2018-07-11 20:40:09.396652 osd.37 [WRN] slow request 1920.957326 seconds > old, received at 2018-07-11 20:08:08.439214: osd_op(client.73540057.0:8289463 > 2.e57b3e32 (undecoded) ack+ondisk+retry+write+known_if_redirected > e160294) currently waiting for peered > 2018-07-11 20:40:09.396660 osd.37 [WRN] slow request 1920.048094 seconds > old, received at 2018-07-11 20:08:09.348446: osd_op(client.671628641.0:998704 > 2.42f88232 (undecoded) ack+ondisk+retry+write+known_if_redirected > e160475) currently waiting for peered > 2018-07-11 20:40:10.397008 osd.37 [WRN] 100 slow requests, 2 included > below; oldest blocked for > 25403.279204 secs > 2018-07-11 20:40:10.397017 osd.37 [WRN] slow request 1920.043860 seconds > old, received at 2018-07-11 20:08:10.353060: osd_op(client.231731103.0:1007729 > 3.e0ff5786 (undecoded) ondisk+write+known_if_redirected e137428) > currently waiting for peered > 2018-07-11 20:40:10.397023 osd.37 [WRN] slow request 1920.034101 seconds > old, received at 2018-07-11 20:08:10.362819: osd_op(client.207458703.0:2000292 > 3.a8143b86 (undecoded) ondisk+write+known_if_redirected e137428) > currently waiting for peered > 2018-07-11 20:40:10.790573 mon.0 [INF] pgmap 4104 pgs: 5 down+peering, > 1142 peering, 210 remapped+peering, 5 active+recovery_wait+degraded, 1551 > active+clean, 2 activating+undersized+degraded+remapped, 15 > active+remapped+backfilling, 178 unknown, 1 active+remapped, 3 > activating+remapped, 78 active+undersized+degraded+remapped+backfill_wait, > 6 active+recovery_wait+degraded+remapped, 3 > undersized+degraded+remapped+backfill_wait+peered, > 5 active+undersized+degraded+remapped+backfilling, 295 > active+remapped+backfill_wait, 3 active+recovery_wait+undersized+degraded, > 21 activating+undersized+degraded, 559 active+undersized+degraded, 4 > remapped, 17 undersized+degraded+peered, 1 > active+recovery_wait+undersized+degraded+remapped; > 13439 GB data, 42395 GB used, 160 TB / 201 TB avail; 4069 B/s rd, 746 kB/s > wr, 5 op/s; 534753/10756032 objects degraded (4.972%); 779027/10756032 > objects misplaced (7.243%); 256 MB/s, 65 objects/s recovering > > > > There are a lot of things in the OSD-log files that I'm unfamiliar with > but so far I haven't found anything that has given me a clue on how to fix > the issue. > BTW restarting a OSD doesn't seem to help, on the contrary, that sometimes > results in PGs beeing stuck undersized! > I have attaced a osd-log from when a OSD i restarted started up. > > Best regards > /Magnus > > > 2018-07-
Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.
Hi list, Things went from bad to worse, tried to upgrade some OSDs to Luminous to see if that could help but that didn’t appear to make any difference. But for each restarted OSD there was a few PGs that the OSD seemed to “forget” and the number of undersized PGs grew until some PGs had been “forgotten” by all 3 acting OSDs and became stale, even though all OSDs (and their disks) where available. Then the OSDs grew so big that the servers ran out of memory (48GB per server with 10 2TB-disks per server) and started killing the OSDs… All OSDs where then shutdown to try and preserve some data on the disks at least, but maybe it is too late? /Magnus 2018-07-11 21:10 GMT+02:00 Magnus Grönlund : > Hi Paul, > > No all OSDs are still jewel , the issue started before I had even started > to upgrade the first OSD and they don't appear to be flapping. > ceph -w shows a lot of slow request etc, but nothing unexpected as far as > I can tell considering the state the cluster is in. > > 2018-07-11 20:40:09.396642 osd.37 [WRN] 100 slow requests, 2 included > below; oldest blocked for > 25402.278824 secs > 2018-07-11 20:40:09.396652 osd.37 [WRN] slow request 1920.957326 seconds > old, received at 2018-07-11 20:08:08.439214: osd_op(client.73540057.0:8289463 > 2.e57b3e32 (undecoded) ack+ondisk+retry+write+known_if_redirected > e160294) currently waiting for peered > 2018-07-11 20:40:09.396660 osd.37 [WRN] slow request 1920.048094 seconds > old, received at 2018-07-11 20:08:09.348446: osd_op(client.671628641.0:998704 > 2.42f88232 (undecoded) ack+ondisk+retry+write+known_if_redirected > e160475) currently waiting for peered > 2018-07-11 20:40:10.397008 osd.37 [WRN] 100 slow requests, 2 included > below; oldest blocked for > 25403.279204 secs > 2018-07-11 20:40:10.397017 osd.37 [WRN] slow request 1920.043860 seconds > old, received at 2018-07-11 20:08:10.353060: osd_op(client.231731103.0:1007729 > 3.e0ff5786 (undecoded) ondisk+write+known_if_redirected e137428) > currently waiting for peered > 2018-07-11 20:40:10.397023 osd.37 [WRN] slow request 1920.034101 seconds > old, received at 2018-07-11 20:08:10.362819: osd_op(client.207458703.0:2000292 > 3.a8143b86 (undecoded) ondisk+write+known_if_redirected e137428) > currently waiting for peered > 2018-07-11 20:40:10.790573 mon.0 [INF] pgmap 4104 pgs: 5 down+peering, > 1142 peering, 210 remapped+peering, 5 active+recovery_wait+degraded, 1551 > active+clean, 2 activating+undersized+degraded+remapped, 15 > active+remapped+backfilling, 178 unknown, 1 active+remapped, 3 > activating+remapped, 78 active+undersized+degraded+remapped+backfill_wait, > 6 active+recovery_wait+degraded+remapped, 3 > undersized+degraded+remapped+backfill_wait+peered, > 5 active+undersized+degraded+remapped+backfilling, 295 > active+remapped+backfill_wait, 3 active+recovery_wait+undersized+degraded, > 21 activating+undersized+degraded, 559 active+undersized+degraded, 4 > remapped, 17 undersized+degraded+peered, 1 > active+recovery_wait+undersized+degraded+remapped; > 13439 GB data, 42395 GB used, 160 TB / 201 TB avail; 4069 B/s rd, 746 kB/s > wr, 5 op/s; 534753/10756032 objects degraded (4.972%); 779027/10756032 > objects misplaced (7.243%); 256 MB/s, 65 objects/s recovering > > > > There are a lot of things in the OSD-log files that I'm unfamiliar with > but so far I haven't found anything that has given me a clue on how to fix > the issue. > BTW restarting a OSD doesn't seem to help, on the contrary, that sometimes > results in PGs beeing stuck undersized! > I have attaced a osd-log from when a OSD i restarted started up. > > Best regards > /Magnus > > > 2018-07-11 20:39 GMT+02:00 Paul Emmerich : > >> Did you finish the upgrade of the OSDs? Are OSDs flapping? (ceph -w) Is >> there anything weird in the OSDs' log files? >> >> >> Paul >> >> 2018-07-11 20:30 GMT+02:00 Magnus Grönlund : >> >>> Hi, >>> >>> Started to upgrade a ceph-cluster from Jewel (10.2.10) to Luminous >>> (12.2.6) >>> >>> After upgrading and restarting the mons everything looked OK, the mons >>> had quorum, all OSDs where up and in and all the PGs where active+clean. >>> But before I had time to start upgrading the OSDs it became obvious that >>> something had gone terribly wrong. >>> All of a sudden 1600 out of 4100 PGs where inactive and 40% of the data >>> was misplaced! >>> >>> The mons appears OK and all OSDs are still up and in, but a few hours >>> later there was still 1483 pgs stuck inactive, essentially all of them in >>> peering! >>> Investigating one of the stuck PGs it appears to be looping between >>> “inactive”, “remapped+peering” and “peering” an
Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.
Hi Kevin, Unfortunately restarting OSD don't appear to help, instead it seems to make it worse with PGs getting stuck degraded. Best regards /Magnus 2018-07-11 20:46 GMT+02:00 Kevin Olbrich : > Sounds a little bit like the problem I had on OSDs: > > [ceph-users] Blocked requests activating+remapped after extending pg(p)_num > > <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026680.html> > *Kevin Olbrich* > >- [ceph-users] Blocked requests activating+remapped >afterextendingpg(p)_num ><http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026681.html> > *Burkhard Linke* > - [ceph-users] Blocked requests activating+remapped > afterextendingpg(p)_num > > <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026682.html> > *Kevin Olbrich* > - [ceph-users] Blocked requests activating+remapped > afterextendingpg(p)_num > > <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026683.html> >*Kevin Olbrich* > - [ceph-users] Blocked requests activating+remapped > afterextendingpg(p)_num > > <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026685.html> >*Kevin Olbrich* > - [ceph-users] Blocked requests activating+remapped > afterextendingpg(p)_num > > <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026689.html> >*Kevin Olbrich* > - [ceph-users] Blocked requests activating+remapped > afterextendingpg(p)_num > > <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026692.html> >*Paul Emmerich* > - [ceph-users] Blocked requests activating+remapped > afterextendingpg(p)_num > > <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026695.html> >*Kevin Olbrich* > > I ended up restarting the OSDs which were stuck in that state and they > immediately fixed themselfs. > It should also work to just "out" the problem-OSDs and immeditly up them > again to fix it. > > - Kevin > > 2018-07-11 20:30 GMT+02:00 Magnus Grönlund : > >> Hi, >> >> Started to upgrade a ceph-cluster from Jewel (10.2.10) to Luminous >> (12.2.6) >> >> After upgrading and restarting the mons everything looked OK, the mons >> had quorum, all OSDs where up and in and all the PGs where active+clean. >> But before I had time to start upgrading the OSDs it became obvious that >> something had gone terribly wrong. >> All of a sudden 1600 out of 4100 PGs where inactive and 40% of the data >> was misplaced! >> >> The mons appears OK and all OSDs are still up and in, but a few hours >> later there was still 1483 pgs stuck inactive, essentially all of them in >> peering! >> Investigating one of the stuck PGs it appears to be looping between >> “inactive”, “remapped+peering” and “peering” and the epoch number is rising >> fast, see the attached pg query outputs. >> >> We really can’t afford to loose the cluster or the data so any help or >> suggestions on how to debug or fix this issue would be very, very >> appreciated! >> >> >> health: HEALTH_ERR >> 1483 pgs are stuck inactive for more than 60 seconds >> 542 pgs backfill_wait >> 14 pgs backfilling >> 11 pgs degraded >> 1402 pgs peering >> 3 pgs recovery_wait >> 11 pgs stuck degraded >> 1483 pgs stuck inactive >> 2042 pgs stuck unclean >> 7 pgs stuck undersized >> 7 pgs undersized >> 111 requests are blocked > 32 sec >> 10586 requests are blocked > 4096 sec >> recovery 9472/11120724 objects degraded (0.085%) >> recovery 1181567/11120724 objects misplaced (10.625%) >> noout flag(s) set >> mon.eselde02u32 low disk space >> >> services: >> mon: 3 daemons, quorum eselde02u32,eselde02u33,eselde02u34 >> mgr: eselde02u32(active), standbys: eselde02u33, eselde02u34 >> osd: 111 osds: 111 up, 111 in; 800 remapped pgs >> flags noout >> >> data: >> pools: 18 pools, 4104 pgs >> objects: 3620k objects, 13875 GB >> usage: 42254 GB used, 160 TB / 201 TB avail >> pgs: 1.876% pgs unknown >> 34.259% pgs not active >> 9472/11120724 objects degraded (0.085%) >>
Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.
Hi Paul, No all OSDs are still jewel , the issue started before I had even started to upgrade the first OSD and they don't appear to be flapping. ceph -w shows a lot of slow request etc, but nothing unexpected as far as I can tell considering the state the cluster is in. 2018-07-11 20:40:09.396642 osd.37 [WRN] 100 slow requests, 2 included below; oldest blocked for > 25402.278824 secs 2018-07-11 20:40:09.396652 osd.37 [WRN] slow request 1920.957326 seconds old, received at 2018-07-11 20:08:08.439214: osd_op(client.73540057.0:8289463 2.e57b3e32 (undecoded) ack+ondisk+retry+write+known_if_redirected e160294) currently waiting for peered 2018-07-11 20:40:09.396660 osd.37 [WRN] slow request 1920.048094 seconds old, received at 2018-07-11 20:08:09.348446: osd_op(client.671628641.0:998704 2.42f88232 (undecoded) ack+ondisk+retry+write+known_if_redirected e160475) currently waiting for peered 2018-07-11 20:40:10.397008 osd.37 [WRN] 100 slow requests, 2 included below; oldest blocked for > 25403.279204 secs 2018-07-11 20:40:10.397017 osd.37 [WRN] slow request 1920.043860 seconds old, received at 2018-07-11 20:08:10.353060: osd_op(client.231731103.0:1007729 3.e0ff5786 (undecoded) ondisk+write+known_if_redirected e137428) currently waiting for peered 2018-07-11 20:40:10.397023 osd.37 [WRN] slow request 1920.034101 seconds old, received at 2018-07-11 20:08:10.362819: osd_op(client.207458703.0:2000292 3.a8143b86 (undecoded) ondisk+write+known_if_redirected e137428) currently waiting for peered 2018-07-11 20:40:10.790573 mon.0 [INF] pgmap 4104 pgs: 5 down+peering, 1142 peering, 210 remapped+peering, 5 active+recovery_wait+degraded, 1551 active+clean, 2 activating+undersized+degraded+remapped, 15 active+remapped+backfilling, 178 unknown, 1 active+remapped, 3 activating+remapped, 78 active+undersized+degraded+remapped+backfill_wait, 6 active+recovery_wait+degraded+remapped, 3 undersized+degraded+remapped+backfill_wait+peered, 5 active+undersized+degraded+remapped+backfilling, 295 active+remapped+backfill_wait, 3 active+recovery_wait+undersized+degraded, 21 activating+undersized+degraded, 559 active+undersized+degraded, 4 remapped, 17 undersized+degraded+peered, 1 active+recovery_wait+undersized+degraded+remapped; 13439 GB data, 42395 GB used, 160 TB / 201 TB avail; 4069 B/s rd, 746 kB/s wr, 5 op/s; 534753/10756032 objects degraded (4.972%); 779027/10756032 objects misplaced (7.243%); 256 MB/s, 65 objects/s recovering There are a lot of things in the OSD-log files that I'm unfamiliar with but so far I haven't found anything that has given me a clue on how to fix the issue. BTW restarting a OSD doesn't seem to help, on the contrary, that sometimes results in PGs beeing stuck undersized! I have attaced a osd-log from when a OSD i restarted started up. Best regards /Magnus 2018-07-11 20:39 GMT+02:00 Paul Emmerich : > Did you finish the upgrade of the OSDs? Are OSDs flapping? (ceph -w) Is > there anything weird in the OSDs' log files? > > > Paul > > 2018-07-11 20:30 GMT+02:00 Magnus Grönlund : > >> Hi, >> >> Started to upgrade a ceph-cluster from Jewel (10.2.10) to Luminous >> (12.2.6) >> >> After upgrading and restarting the mons everything looked OK, the mons >> had quorum, all OSDs where up and in and all the PGs where active+clean. >> But before I had time to start upgrading the OSDs it became obvious that >> something had gone terribly wrong. >> All of a sudden 1600 out of 4100 PGs where inactive and 40% of the data >> was misplaced! >> >> The mons appears OK and all OSDs are still up and in, but a few hours >> later there was still 1483 pgs stuck inactive, essentially all of them in >> peering! >> Investigating one of the stuck PGs it appears to be looping between >> “inactive”, “remapped+peering” and “peering” and the epoch number is rising >> fast, see the attached pg query outputs. >> >> We really can’t afford to loose the cluster or the data so any help or >> suggestions on how to debug or fix this issue would be very, very >> appreciated! >> >> >> health: HEALTH_ERR >> 1483 pgs are stuck inactive for more than 60 seconds >> 542 pgs backfill_wait >> 14 pgs backfilling >> 11 pgs degraded >> 1402 pgs peering >> 3 pgs recovery_wait >> 11 pgs stuck degraded >> 1483 pgs stuck inactive >> 2042 pgs stuck unclean >> 7 pgs stuck undersized >> 7 pgs undersized >> 111 requests are blocked > 32 sec >> 10586 requests are blocked > 4096 sec >> recovery 9472/11120724 objects degraded (0.085%) >> recovery 1181567/11120724 objects misplaced (10.625%) >>