Re: [ceph-users] Large OMAP Objects in default.rgw.log pool

2019-05-21 Thread Magnus Grönlund
Den tis 21 maj 2019 kl 02:12 skrev mr. non non :

> Does anyone have  this issue before? As research, many people have issue
> with rgw.index which related to small small number of index sharding (too
> many objects per index).
> I also check on this thread
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-March/033611.html but
> don't found any clues because number of data objects is below 100k per
> index and size of objects in rgw.log is 0.
>
> Hi,

I've had the same issue with a large omap object in the default.rgw.log
pool for a long time, but after determining that it wasn't a big issue for
us and the number of omap keys wasn't growing, I haven't actively tried to
find a solution.
The cluster was running Jewel for over a year and it was during this time
the omap entries in default.rgw.log where created, but (of cause) the
warning showed up after the upgrade to Luminous.
#ceph health detail
…
HEALTH_WARN 1 large omap objects
LARGE_OMAP_OBJECTS 1 large omap objects
1 large objects found in pool 'default.rgw.log'
…

#zgrep “Large” ceph-osd.324.log-20190516.gz
2019-05-15 23:14:48.821612 7fa17cf74700  0 log_channel(cluster) log [WRN] :
Large omap object found. Object:
48:8b8fff66:::meta.log.128a376b-8807-4e19-9ddc-8220fd50d7c1.41:head Key
count: 2844494 Size (bytes): 682777168
#

The reply from Pavan Rallabhandi (below) in the previous thread might be
useful but I’m not familiar enough with rgw to tell or figure out how that
could help resolv the issue.
>That can happen if you have lot of objects with swift object expiry (TTL)
enabled. You can 'listomapkeys' on these log pool objects and check for the
objects that have registered for TTL as omap entries. I know this is the
case with at least Jewel version
>
>Thanks,
>-Pavan.

Regards
/Magnus



> Thanks.
> --
> *From:* ceph-users  on behalf of mr.
> non non 
> *Sent:* Monday, May 20, 2019 7:32 PM
> *To:* EDH - Manuel Rios Fernandez; ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Large OMAP Objects in default.rgw.log pool
>
> Hi Manuel,
>
> I use version 12.2.8 with bluestore and also use manually index sharding
> (configured to 100).  As I checked, no buckets reach 100k of
> objects_per_shard.
> here are health status and cluster log
>
> # ceph health detail
> HEALTH_WARN 1 large omap objects
> LARGE_OMAP_OBJECTS 1 large omap objects
> 1 large objects found in pool 'default.rgw.log'
> Search the cluster log for 'Large omap object found' for more details.
>
> # cat ceph.log | tail -2
> 2019-05-19 17:49:36.306481 mon.MONNODE1 mon.0 10.118.191.231:6789/0
> 528758 : cluster [WRN] Health check failed: 1 large omap objects
> (LARGE_OMAP_OBJECTS)
> 2019-05-19 17:49:34.535543 osd.38 osd.38 MONNODE1_IP:6808/3514427 12 :
> cluster [WRN] Large omap object found. Object:
> 4:b172cd59:usage::usage.26:head Key count: 8720830 Size (bytes): 1647024346
>
> All objects size are 0.
> $  for i in `rados ls -p default.rgw.log`; do rados stat -p
> default.rgw.log ${i};done  | more
> default.rgw.log/obj_delete_at_hint.78 mtime 2019-05-20
> 19:31:45.00, size 0
> default.rgw.log/meta.history mtime 2019-05-20 19:19:40.00, size 50
> default.rgw.log/obj_delete_at_hint.70 mtime 2019-05-20
> 19:31:45.00, size 0
> default.rgw.log/obj_delete_at_hint.000104 mtime 2019-05-20
> 19:31:45.00, size 0
> default.rgw.log/obj_delete_at_hint.26 mtime 2019-05-20
> 19:31:45.00, size 0
> default.rgw.log/obj_delete_at_hint.28 mtime 2019-05-20
> 19:31:45.00, size 0
> default.rgw.log/obj_delete_at_hint.40 mtime 2019-05-20
> 19:31:45.00, size 0
> default.rgw.log/obj_delete_at_hint.15 mtime 2019-05-20
> 19:31:45.00, size 0
> default.rgw.log/obj_delete_at_hint.69 mtime 2019-05-20
> 19:31:45.00, size 0
> default.rgw.log/obj_delete_at_hint.95 mtime 2019-05-20
> 19:31:45.00, size 0
> default.rgw.log/obj_delete_at_hint.03 mtime 2019-05-20
> 19:31:45.00, size 0
> default.rgw.log/obj_delete_at_hint.47 mtime 2019-05-20
> 19:31:45.00, size 0
> default.rgw.log/obj_delete_at_hint.35 mtime 2019-05-20
> 19:31:45.00, size 0
>
>
> Please kindly advise how to remove health_warn message.
>
> Many thanks.
> Arnondh
>
> --
> *From:* EDH - Manuel Rios Fernandez 
> *Sent:* Monday, May 20, 2019 5:41 PM
> *To:* 'mr. non non'; ceph-users@lists.ceph.com
> *Subject:* RE: [ceph-users] Large OMAP Objects in default.rgw.log pool
>
>
> Hi Arnondh,
>
>
>
> Whats your ceph version?
>
>
>
> Regards
>
>
>
>
>
> *De:* ceph-users  *En nombre de *mr.
> non non
> *Enviado el:* lunes, 20 de mayo de 2019 12:39
> *Para:* ceph-users@lists.ceph.com
> *Asunto:* [ceph-users] Large OMAP Objects in default.rgw.log pool
>
>
>
> Hi,
>
>
>
> I found the same issue like above.
>
> Does anyone know how to fix it?
>
>
>
> Thanks.
>
> Arnondh
> ___
> ceph-users mailing list
> 

Re: [ceph-users] Remove RBD mirror?

2019-04-12 Thread Magnus Grönlund
Den fre 12 apr. 2019 kl 16:37 skrev Jason Dillaman :

> On Fri, Apr 12, 2019 at 9:52 AM Magnus Grönlund 
> wrote:
> >
> > Hi Jason,
> >
> > Tried to follow the instructions and setting the debug level to 15
> worked OK, but the daemon appeared to silently ignore the restart command
> (nothing indicating a restart seen in the log).
> > So I set the log level to 15 in the config file and restarted the rbd
> mirror daemon. The output surprised me though, my previous perception of
> the issue might be completely wrong...
> > Lots of "image_replayer::BootstrapRequest: failed to create local
> image: (2) No such file or directory" and ":ImageReplayer:   replay
> encountered an error: (42) No message of desired type"
>
> What is the result from "rbd mirror pool status --verbose nova"
> against your DR cluster now? Are they in up+error now? The ENOENT
> errors most likely related to a parent image that hasn't been
> mirrored. The ENOMSG error seems to indicate that there might be some
> corruption in a journal and it's missing expected records (like a
> production client crashed), but it should be able to recover from
> that
>

# rbd mirror pool status --verbose nova
health: WARNING
images: 2479 total
2479 unknown

002344ab-c324-4c01-97ff-de32868fa712_disk:
  global_id:   c02e0202-df8f-46ce-a4b6-1a50a9692804
  state:   down+unknown
  description: status not found
  last_update:

002a8fde-3a63-4e32-9c18-b0bf64393d0f_disk:
  global_id:   d412abc4-b37e-44a2-8aba-107f352dec60
  state:   down+unknown
  description: status not found
  last_update:





> > https://pastebin.com/1bTETNGs
> >
> > Best regards
> > /Magnus
> >
> > Den tis 9 apr. 2019 kl 18:35 skrev Jason Dillaman :
> >>
> >> Can you pastebin the results from running the following on your backup
> >> site rbd-mirror daemon node?
> >>
> >> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 15
> >> ceph --admin-socket /path/to/asok rbd mirror restart nova
> >>  wait a minute to let some logs accumulate ...
> >> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 0/5
> >>
> >> ... and collect the rbd-mirror log from /var/log/ceph/ (should have
> >> lots of "rbd::mirror"-like log entries.
> >>
> >>
> >> On Tue, Apr 9, 2019 at 12:23 PM Magnus Grönlund 
> wrote:
> >> >
> >> >
> >> >
> >> > Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman <
> jdill...@redhat.com>:
> >> >>
> >> >> Any chance your rbd-mirror daemon has the admin sockets available
> >> >> (defaults to /var/run/ceph/cephdr-clientasok)?
> If
> >> >> so, you can run "ceph --admin-daemon /path/to/asok rbd mirror
> status".
> >> >
> >> >
> >> > {
> >> > "pool_replayers": [
> >> > {
> >> > "pool": "glance",
> >> > "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00
> cluster: production client: client.productionbackup",
> >> > "instance_id": "869081",
> >> > "leader_instance_id": "869081",
> >> > "leader": true,
> >> > "instances": [],
> >> > "local_cluster_admin_socket":
> "/var/run/ceph/client.backup.1936211.backup.94225674131712.asok",
> >> > "remote_cluster_admin_socket":
> "/var/run/ceph/client.productionbackup.1936211.production.9422567521.asok",
> >> > "sync_throttler": {
> >> > "max_parallel_syncs": 5,
> >> > "running_syncs": 0,
> >> > "waiting_syncs": 0
> >> > },
> >> > "image_replayers": [
> >> > {
> >> > "name":
> "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0",
> >> > "state": "Replaying"
> >> > },
> >> > {
> >> > "name":
> "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62",
> >> > "state": "Replaying"
> >> > },
> >> > ---cut---

Re: [ceph-users] Remove RBD mirror?

2019-04-12 Thread Magnus Grönlund
Hi Jason,

Tried to follow the instructions and setting the debug level to 15 worked
OK, but the daemon appeared to silently ignore the restart command (nothing
indicating a restart seen in the log).
So I set the log level to 15 in the config file and restarted the rbd
mirror daemon. The output surprised me though, my previous perception of
the issue might be completely wrong...
Lots of "image_replayer::BootstrapRequest: failed to create local
image: (2) No such file or directory" and ":ImageReplayer:   replay
encountered an error: (42) No message of desired type"

https://pastebin.com/1bTETNGs

Best regards
/Magnus

Den tis 9 apr. 2019 kl 18:35 skrev Jason Dillaman :

> Can you pastebin the results from running the following on your backup
> site rbd-mirror daemon node?
>
> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 15
> ceph --admin-socket /path/to/asok rbd mirror restart nova
>  wait a minute to let some logs accumulate ...
> ceph --admin-socket /path/to/asok config set debug_rbd_mirror 0/5
>
> ... and collect the rbd-mirror log from /var/log/ceph/ (should have
> lots of "rbd::mirror"-like log entries.
>
>
> On Tue, Apr 9, 2019 at 12:23 PM Magnus Grönlund 
> wrote:
> >
> >
> >
> > Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman :
> >>
> >> Any chance your rbd-mirror daemon has the admin sockets available
> >> (defaults to /var/run/ceph/cephdr-clientasok)? If
> >> so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status".
> >
> >
> > {
> > "pool_replayers": [
> > {
> > "pool": "glance",
> > "peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00 cluster:
> production client: client.productionbackup",
> > "instance_id": "869081",
> > "leader_instance_id": "869081",
> > "leader": true,
> > "instances": [],
> > "local_cluster_admin_socket":
> "/var/run/ceph/client.backup.1936211.backup.94225674131712.asok",
> > "remote_cluster_admin_socket":
> "/var/run/ceph/client.productionbackup.1936211.production.9422567521.asok",
> > "sync_throttler": {
> > "max_parallel_syncs": 5,
> > "running_syncs": 0,
> > "waiting_syncs": 0
> > },
> > "image_replayers": [
> > {
> > "name":
> "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0",
> > "state": "Replaying"
> > },
> > {
> > "name":
> "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62",
> > "state": "Replaying"
> > },
> > ---cut--
> > {
> > "name":
> "cinder/volume-bcb41f46-3716-4ee2-aa19-6fbc241fbf05",
> > "state": "Replaying"
> > }
> > ]
> > },
> >  {
> > "pool": "nova",
> > "peer": "uuid: 1fc7fefc-9bcb-4f36-a259-66c3d8086702 cluster:
> production client: client.productionbackup",
> > "instance_id": "889074",
> > "leader_instance_id": "889074",
> > "leader": true,
> > "instances": [],
> > "local_cluster_admin_socket":
> "/var/run/ceph/client.backup.1936211.backup.94225678548048.asok",
> > "remote_cluster_admin_socket":
> "/var/run/ceph/client.productionbackup.1936211.production.94225679621728.asok",
> >     "sync_throttler": {
> > "max_parallel_syncs": 5,
> > "running_syncs": 0,
> > "waiting_syncs": 0
> > },
> > "image_replayers": []
> > }
> > ],
> > "image_deleter": {
> > "image_deleter_status": {
> > "delete_images_queue": [
> > {
> > "local_pool_id": 3,
> > "global_

Re: [ceph-users] Remove RBD mirror?

2019-04-09 Thread Magnus Grönlund
Den tis 9 apr. 2019 kl 17:48 skrev Jason Dillaman :

> Any chance your rbd-mirror daemon has the admin sockets available
> (defaults to /var/run/ceph/cephdr-clientasok)? If
> so, you can run "ceph --admin-daemon /path/to/asok rbd mirror status".
>

{
"pool_replayers": [
{
"pool": "glance",
"peer": "uuid: df30fb21-d1de-4c3a-9c00-10eaa4b30e00 cluster:
production client: client.productionbackup",
"instance_id": "869081",
"leader_instance_id": "869081",
"leader": true,
"instances": [],
"local_cluster_admin_socket":
"/var/run/ceph/client.backup.1936211.backup.94225674131712.asok",
"remote_cluster_admin_socket":
"/var/run/ceph/client.productionbackup.1936211.production.9422567521.asok",
"sync_throttler": {
"max_parallel_syncs": 5,
"running_syncs": 0,
"waiting_syncs": 0
},
"image_replayers": [
{
"name": "glance/ea5e4ad2-090a-4665-b142-5c7a095963e0",
"state": "Replaying"
},
{
"name": "glance/d7095183-45ef-40b5-80ef-f7c9d3bb1e62",
"state": "Replaying"
},
---cut--
{
"name":
"cinder/volume-bcb41f46-3716-4ee2-aa19-6fbc241fbf05",
"state": "Replaying"
}
]
},
 {
"pool": "nova",
"peer": "uuid: 1fc7fefc-9bcb-4f36-a259-66c3d8086702 cluster:
production client: client.productionbackup",
"instance_id": "889074",
"leader_instance_id": "889074",
"leader": true,
"instances": [],
"local_cluster_admin_socket":
"/var/run/ceph/client.backup.1936211.backup.94225678548048.asok",
"remote_cluster_admin_socket":
"/var/run/ceph/client.productionbackup.1936211.production.94225679621728.asok",
"sync_throttler": {
"max_parallel_syncs": 5,
"running_syncs": 0,
"waiting_syncs": 0
},
"image_replayers": []
}
],
"image_deleter": {
"image_deleter_status": {
"delete_images_queue": [
{
"local_pool_id": 3,
"global_image_id":
"ff531159-de6f-4324-a022-50c079dedd45"
}
],
"failed_deletes_queue": []
}

>
> On Tue, Apr 9, 2019 at 11:26 AM Magnus Grönlund 
> wrote:
> >
> >
> >
> > Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman :
> >>
> >> On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund 
> wrote:
> >> >
> >> > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund 
> wrote:
> >> > >>
> >> > >> Hi,
> >> > >> We have configured one-way replication of pools between a
> production cluster and a backup cluster. But unfortunately the rbd-mirror
> or the backup cluster is unable to keep up with the production cluster so
> the replication fails to reach replaying state.
> >> > >
> >> > >Hmm, it's odd that they don't at least reach the replaying state. Are
> >> > >they still performing the initial sync?
> >> >
> >> > There are three pools we try to mirror, (glance, cinder, and nova, no
> points for guessing what the cluster is used for :) ),
> >> > the glance and cinder pools are smaller and sees limited write
> activity, and the mirroring works, the nova pool which is the largest and
> has 90% of the write activity never leaves the "unknown" state.
> >> >
> >> > # rbd mirror pool status cinder
> >> > health: OK
> >> > images: 892 total
> >> > 890 replaying
> >> > 2 stopped
> >> > #
> >> > # rbd mirror pool status nova
> >> > health: WARNING
> >> > images: 2479 total
> >> > 2479 unknown
> >> > #
> >> > The production clsuter has 5k writes/s on average and the backup
> cluster has 1-2k writes/s o

Re: [ceph-users] Remove RBD mirror?

2019-04-09 Thread Magnus Grönlund
Den tis 9 apr. 2019 kl 17:14 skrev Jason Dillaman :

> On Tue, Apr 9, 2019 at 11:08 AM Magnus Grönlund 
> wrote:
> >
> > >On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund 
> wrote:
> > >>
> > >> Hi,
> > >> We have configured one-way replication of pools between a production
> cluster and a backup cluster. But unfortunately the rbd-mirror or the
> backup cluster is unable to keep up with the production cluster so the
> replication fails to reach replaying state.
> > >
> > >Hmm, it's odd that they don't at least reach the replaying state. Are
> > >they still performing the initial sync?
> >
> > There are three pools we try to mirror, (glance, cinder, and nova, no
> points for guessing what the cluster is used for :) ),
> > the glance and cinder pools are smaller and sees limited write activity,
> and the mirroring works, the nova pool which is the largest and has 90% of
> the write activity never leaves the "unknown" state.
> >
> > # rbd mirror pool status cinder
> > health: OK
> > images: 892 total
> > 890 replaying
> > 2 stopped
> > #
> > # rbd mirror pool status nova
> > health: WARNING
> > images: 2479 total
> > 2479 unknown
> > #
> > The production clsuter has 5k writes/s on average and the backup cluster
> has 1-2k writes/s on average. The production cluster is bigger and has
> better specs. I thought that the backup cluster would be able to keep up
> but it looks like I was wrong.
>
> The fact that they are in the unknown state just means that the remote
> "rbd-mirror" daemon hasn't started any journal replayers against the
> images. If it couldn't keep up, it would still report a status of
> "up+replaying". What Ceph release are you running on your backup
> cluster?
>
> The backup cluster is running Luminous 12.2.11 (the production cluster
12.2.10)


> > >> And the journals on the rbd volumes keep growing...
> > >>
> > >> Is it enough to simply disable the mirroring of the pool  (rbd mirror
> pool disable ) and that will remove the lagging reader from the
> journals and shrink them, or is there anything else that has to be done?
> > >
> > >You can either disable the journaling feature on the image(s) since
> > >there is no point to leave it on if you aren't using mirroring, or run
> > >"rbd mirror pool disable " to purge the journals.
> >
> > Thanks for the confirmation.
> > I will stop the mirror of the nova pool and try to figure out if there
> is anything we can do to get the backup cluster to keep up.
> >
> > >> Best regards
> > >> /Magnus
> > >> ___
> > >> ceph-users mailing list
> > >> ceph-users@lists.ceph.com
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > >--
> > >Jason
>
>
>
> --
> Jason
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove RBD mirror?

2019-04-09 Thread Magnus Grönlund
>On Tue, Apr 9, 2019 at 10:40 AM Magnus Grönlund  wrote:
>>
>> Hi,
>> We have configured one-way replication of pools between a production
cluster and a backup cluster. But unfortunately the rbd-mirror or the
backup cluster is unable to keep up with the production cluster so the
replication fails to reach replaying state.
>
>Hmm, it's odd that they don't at least reach the replaying state. Are
>they still performing the initial sync?

There are three pools we try to mirror, (glance, cinder, and nova, no
points for guessing what the cluster is used for :) ),
the glance and cinder pools are smaller and sees limited write activity,
and the mirroring works, the nova pool which is the largest and has 90% of
the write activity never leaves the "unknown" state.

# rbd mirror pool status cinder
health: OK
images: 892 total
890 replaying
2 stopped
#
# rbd mirror pool status nova
health: WARNING
images: 2479 total
2479 unknown
#
The production clsuter has 5k writes/s on average and the backup cluster
has 1-2k writes/s on average. The production cluster is bigger and has
better specs. I thought that the backup cluster would be able to keep up
but it looks like I was wrong.

>> And the journals on the rbd volumes keep growing...
>>
>> Is it enough to simply disable the mirroring of the pool  (rbd mirror
pool disable ) and that will remove the lagging reader from the
journals and shrink them, or is there anything else that has to be done?
>
>You can either disable the journaling feature on the image(s) since
>there is no point to leave it on if you aren't using mirroring, or run
>"rbd mirror pool disable " to purge the journals.

Thanks for the confirmation.
I will stop the mirror of the nova pool and try to figure out if there is
anything we can do to get the backup cluster to keep up.

>> Best regards
>> /Magnus
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>--
>Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Remove RBD mirror?

2019-04-09 Thread Magnus Grönlund
Hi,
We have configured one-way replication of pools between a production
cluster and a backup cluster. But unfortunately the rbd-mirror or the
backup cluster is unable to keep up with the production cluster so the
replication fails to reach replaying state.
And the journals on the rbd volumes keep growing...

Is it enought to simply disable the mirroring of the pool  (rbd mirror pool
disable ) and that will remove the lagging reader from the journals
and shrink them, or is there any thing else that has to be done?

Best regards
/Magnus
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD-mirror high cpu usage?

2018-11-21 Thread Magnus Grönlund
Hi,

Answering my own question, the high load was related to the cpufreq kernel
module. Unloaded the cpufreq module and the CPU load instantly dropped and
the mirroring started to work.
Obviously there is a bug somewhere but for the moment I’m just happy it
works.

/Magnus

Den tors 15 nov. 2018 kl 15:24 skrev Magnus Grönlund :

> Hi,
>
> I’m trying to setup one-way rbd-mirroring for a ceph-cluster used by an
> openstack cloud, but the rbd-mirror is unable to “catch up” with the
> changes. However it appears to me as if it's not due to the ceph-clusters
> or the network but due to the server running the rbd-mirror process running
> out of cpu?
>
> Is a high cpu load to be expected or is it a symptom of something else?
> Or in other words, what can I check/do to get the mirroring working? 
>
> # rbd mirror pool status nova
> health: WARNING
> images: 596 total
> 572 starting_replay
> 24 replaying
>
> top - 13:31:36 up 79 days,  5:31,  1 user,  load average: 32.27, 26.82,
> 25.33
> Tasks: 360 total,  17 running, 182 sleeping,   0 stopped,   0 zombie
> %Cpu(s):  8.9 us, 70.0 sy,  0.0 ni, 18.5 id,  0.0 wa,  0.0 hi,  2.7 si,
> 0.0 st
> KiB Mem : 13205185+total, 12862490+free,   579508 used,  2847444 buff/cache
> KiB Swap:0 total,0 free,0 used. 12948856+avail Mem
> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM
>  TIME+ COMMAND
> 2336553 ceph  20   0   17.1g 178160  20344 S 417.2  0.1  21:50.61
> rbd-mirror
> 2312698 root  20   0   0  0  0 I  70.2  0.0  70:11.51
> kworker/12:2
> 2312851 root  20   0   0  0  0 R  69.2  0.0  62:29.69
> kworker/24:1
> 2324627 root  20   0   0  0  0 I  68.4  0.0  40:36.77
> kworker/14:1
> 2235817 root  20   0   0  0  0 I  68.0  0.0 469:14.08
> kworker/8:0
> 2241720 root  20   0   0  0  0 R  67.3  0.0 437:46.51
> kworker/9:1
> 2306648 root  20   0   0  0  0 R  66.9  0.0 109:27.44
> kworker/25:0
> 2324625 root  20   0   0  0  0 R  66.9  0.0  40:37.53
> kworker/13:1
> 2336318 root  20   0   0  0  0 R  66.7  0.0  14:51.96
> kworker/27:3
> 2324643 root  20   0   0  0  0 I  66.5  0.0  36:21.46
> kworker/15:2
> 2294989 root  20   0   0  0  0 I  66.3  0.0 134:09.89
> kworker/11:1
> 2324626 root  20   0   0  0  0 I  66.3  0.0  39:44.14
> kworker/28:2
> 2324019 root  20   0   0  0  0 I  65.3  0.0  44:51.80
> kworker/26:1
> 2235814 root  20   0   0  0  0 R  65.1  0.0 459:14.70
> kworker/29:2
> 2294174 root  20   0   0  0  0 I  64.5  0.0 220:58.50
> kworker/30:1
> 2324355 root  20   0   0  0  0 R  63.3  0.0  45:04.29
> kworker/10:1
> 2263800 root  20   0   0  0  0 R  62.9  0.0 353:38.48
> kworker/31:1
> 2270765 root  20   0   0  0  0 R  60.2  0.0 294:46.34
> kworker/0:0
> 2294798 root  20   0   0  0  0 R  59.8  0.0 148:48.23
> kworker/1:2
> 2307128 root  20   0   0  0  0 R  59.8  0.0  86:15.45
> kworker/6:2
> 2307129 root  20   0   0  0  0 I  59.6  0.0  85:29.66
> kworker/5:0
> 2294826 root  20   0   0  0  0 R  58.2  0.0 138:53.56
> kworker/7:3
> 2294575 root  20   0   0  0  0 I  57.8  0.0 155:03.74
> kworker/2:3
> 2294310 root  20   0   0  0  0 I  57.2  0.0 176:10.92
> kworker/4:2
> 2295000 root  20   0   0  0  0 I  57.2  0.0 132:47.28
> kworker/3:2
> 2307060 root  20   0   0  0  0 I  56.6  0.0  87:46.59
> kworker/23:2
> 2294931 root  20   0   0  0  0 I  56.4  0.0 133:31.47
> kworker/17:2
> 2318659 root  20   0   0  0  0 I  56.2  0.0  55:01.78
> kworker/16:2
> 2336304 root  20   0   0  0  0 I  56.0  0.0  11:45.92
> kworker/21:2
> 2306947 root  20   0   0  0  0 R  55.6  0.0  90:45.31
> kworker/22:2
> 2270628 root  20   0   0  0  0 I  53.8  0.0 273:43.31
> kworker/19:3
> 2294797 root  20   0   0  0  0 R  52.3  0.0 141:13.67
> kworker/18:0
> 2330537 root  20   0   0  0  0 R  52.3  0.0  25:33.25
> kworker/20:2
>
> The main cluster has 12 nodes with 120 OSDs and the backup cluster has 6
> nodes with 60 OSDs (but roughly the same amount of storage), the rbd-mirror
> runs on a separate server with 2* E5-2650v2 cpus and 128GB memory.
>
> Best regards
> /Magnus
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD-mirror high cpu usage?

2018-11-15 Thread Magnus Grönlund
Hi,

I’m trying to setup one-way rbd-mirroring for a ceph-cluster used by an
openstack cloud, but the rbd-mirror is unable to “catch up” with the
changes. However it appears to me as if it's not due to the ceph-clusters
or the network but due to the server running the rbd-mirror process running
out of cpu?

Is a high cpu load to be expected or is it a symptom of something else?
Or in other words, what can I check/do to get the mirroring working? 

# rbd mirror pool status nova
health: WARNING
images: 596 total
572 starting_replay
24 replaying

top - 13:31:36 up 79 days,  5:31,  1 user,  load average: 32.27, 26.82,
25.33
Tasks: 360 total,  17 running, 182 sleeping,   0 stopped,   0 zombie
%Cpu(s):  8.9 us, 70.0 sy,  0.0 ni, 18.5 id,  0.0 wa,  0.0 hi,  2.7 si,
0.0 st
KiB Mem : 13205185+total, 12862490+free,   579508 used,  2847444 buff/cache
KiB Swap:0 total,0 free,0 used. 12948856+avail Mem
PID USER  PR  NIVIRTRESSHR S  %CPU %MEM
 TIME+ COMMAND
2336553 ceph  20   0   17.1g 178160  20344 S 417.2  0.1  21:50.61
rbd-mirror
2312698 root  20   0   0  0  0 I  70.2  0.0  70:11.51
kworker/12:2
2312851 root  20   0   0  0  0 R  69.2  0.0  62:29.69
kworker/24:1
2324627 root  20   0   0  0  0 I  68.4  0.0  40:36.77
kworker/14:1
2235817 root  20   0   0  0  0 I  68.0  0.0 469:14.08
kworker/8:0
2241720 root  20   0   0  0  0 R  67.3  0.0 437:46.51
kworker/9:1
2306648 root  20   0   0  0  0 R  66.9  0.0 109:27.44
kworker/25:0
2324625 root  20   0   0  0  0 R  66.9  0.0  40:37.53
kworker/13:1
2336318 root  20   0   0  0  0 R  66.7  0.0  14:51.96
kworker/27:3
2324643 root  20   0   0  0  0 I  66.5  0.0  36:21.46
kworker/15:2
2294989 root  20   0   0  0  0 I  66.3  0.0 134:09.89
kworker/11:1
2324626 root  20   0   0  0  0 I  66.3  0.0  39:44.14
kworker/28:2
2324019 root  20   0   0  0  0 I  65.3  0.0  44:51.80
kworker/26:1
2235814 root  20   0   0  0  0 R  65.1  0.0 459:14.70
kworker/29:2
2294174 root  20   0   0  0  0 I  64.5  0.0 220:58.50
kworker/30:1
2324355 root  20   0   0  0  0 R  63.3  0.0  45:04.29
kworker/10:1
2263800 root  20   0   0  0  0 R  62.9  0.0 353:38.48
kworker/31:1
2270765 root  20   0   0  0  0 R  60.2  0.0 294:46.34
kworker/0:0
2294798 root  20   0   0  0  0 R  59.8  0.0 148:48.23
kworker/1:2
2307128 root  20   0   0  0  0 R  59.8  0.0  86:15.45
kworker/6:2
2307129 root  20   0   0  0  0 I  59.6  0.0  85:29.66
kworker/5:0
2294826 root  20   0   0  0  0 R  58.2  0.0 138:53.56
kworker/7:3
2294575 root  20   0   0  0  0 I  57.8  0.0 155:03.74
kworker/2:3
2294310 root  20   0   0  0  0 I  57.2  0.0 176:10.92
kworker/4:2
2295000 root  20   0   0  0  0 I  57.2  0.0 132:47.28
kworker/3:2
2307060 root  20   0   0  0  0 I  56.6  0.0  87:46.59
kworker/23:2
2294931 root  20   0   0  0  0 I  56.4  0.0 133:31.47
kworker/17:2
2318659 root  20   0   0  0  0 I  56.2  0.0  55:01.78
kworker/16:2
2336304 root  20   0   0  0  0 I  56.0  0.0  11:45.92
kworker/21:2
2306947 root  20   0   0  0  0 R  55.6  0.0  90:45.31
kworker/22:2
2270628 root  20   0   0  0  0 I  53.8  0.0 273:43.31
kworker/19:3
2294797 root  20   0   0  0  0 R  52.3  0.0 141:13.67
kworker/18:0
2330537 root  20   0   0  0  0 R  52.3  0.0  25:33.25
kworker/20:2

The main cluster has 12 nodes with 120 OSDs and the backup cluster has 6
nodes with 60 OSDs (but roughly the same amount of storage), the rbd-mirror
runs on a separate server with 2* E5-2650v2 cpus and 128GB memory.

Best regards
/Magnus
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Ceph-deploy] Cluster Name

2018-08-09 Thread Magnus Grönlund
Hi Jocelyn,

I'm in the process of setting up rdb-mirroring myself and stumbled on the
same problem. But I think that the "trick" here is to _not_ colocate the
RDB-mirror daemon with any other part of the cluster(s), it should be run
on a separate host. That way you can change the CLUSTER_NAME variable
in /etc/sysconfig/ceph
without affecting any of the mons, osd etc.

Best regards
/Magnus

2018-08-09 7:41 GMT+02:00 Thode Jocelyn :

> Hi Erik,
>
> The thing is that the rbd-mirror service uses the /etc/sysconfig/ceph file
> to determine which configuration file to use (from CLUSTER_NAME). So you
> need to set this to the name you chose for rbd-mirror to work. However
> setting this CLUSTER_NAME variable in /etc/sysconfig/ceph makes it so that
> the mon, osd etc services will also use this variable. Because of this they
> cannot start anymore as all their path are set with "ceph" as cluster name.
>
> However there might be something that I missed which would make this point
> moot
>
> Best Regards
> Jocelyn Thode
>
> -Original Message-
> From: Erik McCormick [mailto:emccorm...@cirrusseven.com]
> Sent: mercredi, 8 août 2018 16:39
> To: Thode Jocelyn 
> Cc: Glen Baars ; Vasu Kulkarni <
> vakul...@redhat.com>; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name
>
> I'm not using this feature, so maybe I'm missing something, but from the
> way I understand cluster naming to work...
>
> I still don't understand why this is blocking for you. Unless you are
> attempting to mirror between two clusters running on the same hosts (why
> would you do this?) then systemd doesn't come into play. The --cluster flag
> on the rbd command will simply set the name of a configuration file with
> the FSID and settings of the appropriate cluster. Cluster name is just a
> way of telling ceph commands and systemd units where to find the configs.
>
> So, what you end up with is something like:
>
> /etc/ceph/ceph.conf (your local cluster configuration) on both clusters
> /etc/ceph/local.conf (config of the source cluster. Just a copy of
> ceph.conf of the source clsuter) /etc/ceph/remote.conf (config of
> destination peer cluster. Just a copy of ceph.conf of the remote cluster).
>
> Run all your rbd mirror commands against local and remote names.
> However when starting things like mons, osds, mds, etc. you need no
> cluster name as it can use ceph.conf (cluster name of ceph).
>
> Am I making sense, or have I completely missed something?
>
> -Erik
>
> On Wed, Aug 8, 2018 at 8:34 AM, Thode Jocelyn 
> wrote:
> > Hi,
> >
> >
> >
> > We are still blocked by this problem on our end. Glen did you  or
> > someone else figure out something for this ?
> >
> >
> >
> > Regards
> >
> > Jocelyn Thode
> >
> >
> >
> > From: Glen Baars [mailto:g...@onsitecomputers.com.au]
> > Sent: jeudi, 2 août 2018 05:43
> > To: Erik McCormick 
> > Cc: Thode Jocelyn ; Vasu Kulkarni
> > ; ceph-users@lists.ceph.com
> > Subject: RE: [ceph-users] [Ceph-deploy] Cluster Name
> >
> >
> >
> > Hello Erik,
> >
> >
> >
> > We are going to use RBD-mirror to replicate the clusters. This seems
> > to need separate cluster names.
> >
> > Kind regards,
> >
> > Glen Baars
> >
> >
> >
> > From: Erik McCormick 
> > Sent: Thursday, 2 August 2018 9:39 AM
> > To: Glen Baars 
> > Cc: Thode Jocelyn ; Vasu Kulkarni
> > ; ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name
> >
> >
> >
> > Don't set a cluster name. It's no longer supported. It really only
> > matters if you're running two or more independent clusters on the same
> > boxes. That's generally inadvisable anyway.
> >
> >
> >
> > Cheers,
> >
> > Erik
> >
> >
> >
> > On Wed, Aug 1, 2018, 9:17 PM Glen Baars 
> wrote:
> >
> > Hello Ceph Users,
> >
> > Does anyone know how to set the Cluster Name when deploying with
> > Ceph-deploy? I have 3 clusters to configure and need to correctly set
> > the name.
> >
> > Kind regards,
> > Glen Baars
> >
> > -Original Message-
> > From: ceph-users  On Behalf Of Glen
> > Baars
> > Sent: Monday, 23 July 2018 5:59 PM
> > To: Thode Jocelyn ; Vasu Kulkarni
> > 
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name
> >
> > How very timely, I am facing the exact same issue.
> >
> > Kind regards,
> > Glen Baars
> >
> > -Original Message-
> > From: ceph-users  On Behalf Of
> > Thode Jocelyn
> > Sent: Monday, 23 July 2018 1:42 PM
> > To: Vasu Kulkarni 
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] [Ceph-deploy] Cluster Name
> >
> > Hi,
> >
> > Yes my rbd-mirror is coloctaed with my mon/osd. It only affects nodes
> > where they are collocated as they all use the "/etc/sysconfig/ceph"
> > configuration file.
> >
> > Best
> > Jocelyn Thode
> >
> > -Original Message-
> > From: Vasu Kulkarni [mailto:vakul...@redhat.com]
> > Sent: vendredi, 20 juillet 2018 17:25
> > To: Thode Jocelyn 
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] [Ceph-deploy] 

Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-12 Thread Magnus Grönlund
Hej David and thanks!

That was indeed the magic trick, no more peering, stale or down PGs.

Upgraded the ceph-packages on the hosts, restarted the OSDs and then "ceph
osd require-osd-release luminous"

/Magnus

2018-07-12 12:05 GMT+02:00 David Majchrzak :

> Hi/Hej Magnus,
>
> We had a similar issue going from latest hammer to jewel (so might not be
> applicable for you), with PGs stuck peering / data misplaced, right after
> updating all mons to latest jewel at that time 10.2.10.
>
> Finally setting the require_jewel_osds put everything back in place ( we
> were going to do this after restarting all OSDs, following the
> docs/changelogs ).
>
> What does your ceph health detail look like?
>
> Did you perform any other commands after starting your mon upgrade? Any
> commands that might change the crush-map might cause issues AFAIK (correct
> me if im wrong, but i think we ran into this once) if your mons and osds
> are different versions.
>
> // david
>
> On jul 12 2018, at 11:45 am, Magnus Grönlund  wrote:
>
>
> Hi list,
>
> Things went from bad to worse, tried to upgrade some OSDs to Luminous to
> see if that could help but that didn’t appear to make any difference.
> But for each restarted OSD there was a few PGs that the OSD seemed to
> “forget” and the number of undersized PGs grew until some PGs had been
> “forgotten” by all 3 acting OSDs and became stale, even though all OSDs
> (and their disks) where available.
> Then the OSDs grew so big that the servers ran out of memory (48GB per
> server with 10 2TB-disks per server) and started killing the OSDs…
> All OSDs where then shutdown to try and preserve some data on the disks at
> least, but maybe it is too late?
>
> /Magnus
>
> 2018-07-11 21:10 GMT+02:00 Magnus Grönlund :
>
> Hi Paul,
>
> No all OSDs are still jewel , the issue started before I had even started
> to upgrade the first OSD and they don't appear to be flapping.
> ceph -w shows a lot of slow request etc, but nothing unexpected as far as
> I can tell considering the state the cluster is in.
>
> 2018-07-11 20:40:09.396642 osd.37 [WRN] 100 slow requests, 2 included
> below; oldest blocked for > 25402.278824 secs
> 2018-07-11 20:40:09.396652 osd.37 [WRN] slow request 1920.957326 seconds
> old, received at 2018-07-11 20:08:08.439214: osd_op(client.73540057.0:8289463
> 2.e57b3e32 (undecoded) ack+ondisk+retry+write+known_if_redirected
> e160294) currently waiting for peered
> 2018-07-11 20:40:09.396660 osd.37 [WRN] slow request 1920.048094 seconds
> old, received at 2018-07-11 20:08:09.348446: osd_op(client.671628641.0:998704
> 2.42f88232 (undecoded) ack+ondisk+retry+write+known_if_redirected
> e160475) currently waiting for peered
> 2018-07-11 20:40:10.397008 osd.37 [WRN] 100 slow requests, 2 included
> below; oldest blocked for > 25403.279204 secs
> 2018-07-11 20:40:10.397017 osd.37 [WRN] slow request 1920.043860 seconds
> old, received at 2018-07-11 20:08:10.353060: osd_op(client.231731103.0:1007729
> 3.e0ff5786 (undecoded) ondisk+write+known_if_redirected e137428)
> currently waiting for peered
> 2018-07-11 20:40:10.397023 osd.37 [WRN] slow request 1920.034101 seconds
> old, received at 2018-07-11 20:08:10.362819: osd_op(client.207458703.0:2000292
> 3.a8143b86 (undecoded) ondisk+write+known_if_redirected e137428)
> currently waiting for peered
> 2018-07-11 20:40:10.790573 mon.0 [INF] pgmap 4104 pgs: 5 down+peering,
> 1142 peering, 210 remapped+peering, 5 active+recovery_wait+degraded, 1551
> active+clean, 2 activating+undersized+degraded+remapped, 15
> active+remapped+backfilling, 178 unknown, 1 active+remapped, 3
> activating+remapped, 78 active+undersized+degraded+remapped+backfill_wait,
> 6 active+recovery_wait+degraded+remapped, 3 
> undersized+degraded+remapped+backfill_wait+peered,
> 5 active+undersized+degraded+remapped+backfilling, 295
> active+remapped+backfill_wait, 3 active+recovery_wait+undersized+degraded,
> 21 activating+undersized+degraded, 559 active+undersized+degraded, 4
> remapped, 17 undersized+degraded+peered, 1 
> active+recovery_wait+undersized+degraded+remapped;
> 13439 GB data, 42395 GB used, 160 TB / 201 TB avail; 4069 B/s rd, 746 kB/s
> wr, 5 op/s; 534753/10756032 objects degraded (4.972%); 779027/10756032
> objects misplaced (7.243%); 256 MB/s, 65 objects/s recovering
>
>
>
> There are a lot of things in the OSD-log files that I'm unfamiliar with
> but so far I haven't found anything that has given me a clue on how to fix
> the issue.
> BTW restarting a OSD doesn't seem to help, on the contrary, that sometimes
> results in PGs beeing stuck undersized!
> I have attaced a osd-log from when a OSD i restarted started up.
>
> Best regards
> /Magnus
>
>
> 2018-07-

Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-12 Thread Magnus Grönlund
Hi list,

Things went from bad to worse, tried to upgrade some OSDs to Luminous to
see if that could help but that didn’t appear to make any difference.
But for each restarted OSD there was a few PGs that the OSD seemed to
“forget” and the number of undersized PGs grew until some PGs had been
“forgotten” by all 3 acting OSDs and became stale, even though all OSDs
(and their disks) where available.
Then the OSDs grew so big that the servers ran out of memory (48GB per
server with 10 2TB-disks per server) and started killing the OSDs…
All OSDs where then shutdown to try and preserve some data on the disks at
least, but maybe it is too late?

/Magnus

2018-07-11 21:10 GMT+02:00 Magnus Grönlund :

> Hi Paul,
>
> No all OSDs are still jewel , the issue started before I had even started
> to upgrade the first OSD and they don't appear to be flapping.
> ceph -w shows a lot of slow request etc, but nothing unexpected as far as
> I can tell considering the state the cluster is in.
>
> 2018-07-11 20:40:09.396642 osd.37 [WRN] 100 slow requests, 2 included
> below; oldest blocked for > 25402.278824 secs
> 2018-07-11 20:40:09.396652 osd.37 [WRN] slow request 1920.957326 seconds
> old, received at 2018-07-11 20:08:08.439214: osd_op(client.73540057.0:8289463
> 2.e57b3e32 (undecoded) ack+ondisk+retry+write+known_if_redirected
> e160294) currently waiting for peered
> 2018-07-11 20:40:09.396660 osd.37 [WRN] slow request 1920.048094 seconds
> old, received at 2018-07-11 20:08:09.348446: osd_op(client.671628641.0:998704
> 2.42f88232 (undecoded) ack+ondisk+retry+write+known_if_redirected
> e160475) currently waiting for peered
> 2018-07-11 20:40:10.397008 osd.37 [WRN] 100 slow requests, 2 included
> below; oldest blocked for > 25403.279204 secs
> 2018-07-11 20:40:10.397017 osd.37 [WRN] slow request 1920.043860 seconds
> old, received at 2018-07-11 20:08:10.353060: osd_op(client.231731103.0:1007729
> 3.e0ff5786 (undecoded) ondisk+write+known_if_redirected e137428)
> currently waiting for peered
> 2018-07-11 20:40:10.397023 osd.37 [WRN] slow request 1920.034101 seconds
> old, received at 2018-07-11 20:08:10.362819: osd_op(client.207458703.0:2000292
> 3.a8143b86 (undecoded) ondisk+write+known_if_redirected e137428)
> currently waiting for peered
> 2018-07-11 20:40:10.790573 mon.0 [INF] pgmap 4104 pgs: 5 down+peering,
> 1142 peering, 210 remapped+peering, 5 active+recovery_wait+degraded, 1551
> active+clean, 2 activating+undersized+degraded+remapped, 15
> active+remapped+backfilling, 178 unknown, 1 active+remapped, 3
> activating+remapped, 78 active+undersized+degraded+remapped+backfill_wait,
> 6 active+recovery_wait+degraded+remapped, 3 
> undersized+degraded+remapped+backfill_wait+peered,
> 5 active+undersized+degraded+remapped+backfilling, 295
> active+remapped+backfill_wait, 3 active+recovery_wait+undersized+degraded,
> 21 activating+undersized+degraded, 559 active+undersized+degraded, 4
> remapped, 17 undersized+degraded+peered, 1 
> active+recovery_wait+undersized+degraded+remapped;
> 13439 GB data, 42395 GB used, 160 TB / 201 TB avail; 4069 B/s rd, 746 kB/s
> wr, 5 op/s; 534753/10756032 objects degraded (4.972%); 779027/10756032
> objects misplaced (7.243%); 256 MB/s, 65 objects/s recovering
>
>
>
> There are a lot of things in the OSD-log files that I'm unfamiliar with
> but so far I haven't found anything that has given me a clue on how to fix
> the issue.
> BTW restarting a OSD doesn't seem to help, on the contrary, that sometimes
> results in PGs beeing stuck undersized!
> I have attaced a osd-log from when a OSD i restarted started up.
>
> Best regards
> /Magnus
>
>
> 2018-07-11 20:39 GMT+02:00 Paul Emmerich :
>
>> Did you finish the upgrade of the OSDs? Are OSDs flapping? (ceph -w) Is
>> there anything weird in the OSDs' log files?
>>
>>
>> Paul
>>
>> 2018-07-11 20:30 GMT+02:00 Magnus Grönlund :
>>
>>> Hi,
>>>
>>> Started to upgrade a ceph-cluster from Jewel (10.2.10) to Luminous
>>> (12.2.6)
>>>
>>> After upgrading and restarting the mons everything looked OK, the mons
>>> had quorum, all OSDs where up and in and all the PGs where active+clean.
>>> But before I had time to start upgrading the OSDs it became obvious that
>>> something had gone terribly wrong.
>>> All of a sudden 1600 out of 4100 PGs where inactive and 40% of the data
>>> was misplaced!
>>>
>>> The mons appears OK and all OSDs are still up and in, but a few hours
>>> later there was still 1483 pgs stuck inactive, essentially all of them in
>>> peering!
>>> Investigating one of the stuck PGs it appears to be looping between
>>> “inactive”, “remapped+peering” and “peering” an

Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-11 Thread Magnus Grönlund
Hi Kevin,

Unfortunately restarting OSD don't appear to help, instead it seems to make
it worse with PGs getting stuck degraded.

Best regards
/Magnus

2018-07-11 20:46 GMT+02:00 Kevin Olbrich :

> Sounds a little bit like the problem I had on OSDs:
>
> [ceph-users] Blocked requests activating+remapped after extending pg(p)_num
>
> <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026680.html>
>   *Kevin Olbrich*
>
>- [ceph-users] Blocked requests activating+remapped
>afterextendingpg(p)_num
><http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026681.html>
>  *Burkhard Linke*
>   - [ceph-users] Blocked requests activating+remapped
>   afterextendingpg(p)_num
>   
> <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026682.html>
> *Kevin Olbrich*
>  - [ceph-users] Blocked requests activating+remapped
>  afterextendingpg(p)_num
>  
> <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026683.html>
>*Kevin Olbrich*
>  - [ceph-users] Blocked requests activating+remapped
>  afterextendingpg(p)_num
>  
> <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026685.html>
>*Kevin Olbrich*
>  - [ceph-users] Blocked requests activating+remapped
>  afterextendingpg(p)_num
>  
> <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026689.html>
>*Kevin Olbrich*
>  - [ceph-users] Blocked requests activating+remapped
>  afterextendingpg(p)_num
>  
> <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026692.html>
>*Paul Emmerich*
>  - [ceph-users] Blocked requests activating+remapped
>  afterextendingpg(p)_num
>  
> <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/026695.html>
>*Kevin Olbrich*
>
> I ended up restarting the OSDs which were stuck in that state and they
> immediately fixed themselfs.
> It should also work to just "out" the problem-OSDs and immeditly up them
> again to fix it.
>
> - Kevin
>
> 2018-07-11 20:30 GMT+02:00 Magnus Grönlund :
>
>> Hi,
>>
>> Started to upgrade a ceph-cluster from Jewel (10.2.10) to Luminous
>> (12.2.6)
>>
>> After upgrading and restarting the mons everything looked OK, the mons
>> had quorum, all OSDs where up and in and all the PGs where active+clean.
>> But before I had time to start upgrading the OSDs it became obvious that
>> something had gone terribly wrong.
>> All of a sudden 1600 out of 4100 PGs where inactive and 40% of the data
>> was misplaced!
>>
>> The mons appears OK and all OSDs are still up and in, but a few hours
>> later there was still 1483 pgs stuck inactive, essentially all of them in
>> peering!
>> Investigating one of the stuck PGs it appears to be looping between
>> “inactive”, “remapped+peering” and “peering” and the epoch number is rising
>> fast, see the attached pg query outputs.
>>
>> We really can’t afford to loose the cluster or the data so any help or
>> suggestions on how to debug or fix this issue would be very, very
>> appreciated!
>>
>>
>> health: HEALTH_ERR
>> 1483 pgs are stuck inactive for more than 60 seconds
>> 542 pgs backfill_wait
>> 14 pgs backfilling
>> 11 pgs degraded
>> 1402 pgs peering
>> 3 pgs recovery_wait
>> 11 pgs stuck degraded
>> 1483 pgs stuck inactive
>> 2042 pgs stuck unclean
>> 7 pgs stuck undersized
>> 7 pgs undersized
>> 111 requests are blocked > 32 sec
>> 10586 requests are blocked > 4096 sec
>> recovery 9472/11120724 objects degraded (0.085%)
>> recovery 1181567/11120724 objects misplaced (10.625%)
>> noout flag(s) set
>> mon.eselde02u32 low disk space
>>
>>   services:
>> mon: 3 daemons, quorum eselde02u32,eselde02u33,eselde02u34
>> mgr: eselde02u32(active), standbys: eselde02u33, eselde02u34
>> osd: 111 osds: 111 up, 111 in; 800 remapped pgs
>>  flags noout
>>
>>   data:
>> pools:   18 pools, 4104 pgs
>> objects: 3620k objects, 13875 GB
>> usage:   42254 GB used, 160 TB / 201 TB avail
>> pgs: 1.876% pgs unknown
>>  34.259% pgs not active
>>  9472/11120724 objects degraded (0.085%)
>>

Re: [ceph-users] PGs stuck peering (looping?) after upgrade to Luminous.

2018-07-11 Thread Magnus Grönlund
Hi Paul,

No all OSDs are still jewel , the issue started before I had even started
to upgrade the first OSD and they don't appear to be flapping.
ceph -w shows a lot of slow request etc, but nothing unexpected as far as I
can tell considering the state the cluster is in.

2018-07-11 20:40:09.396642 osd.37 [WRN] 100 slow requests, 2 included
below; oldest blocked for > 25402.278824 secs
2018-07-11 20:40:09.396652 osd.37 [WRN] slow request 1920.957326 seconds
old, received at 2018-07-11 20:08:08.439214:
osd_op(client.73540057.0:8289463 2.e57b3e32 (undecoded)
ack+ondisk+retry+write+known_if_redirected e160294) currently waiting for
peered
2018-07-11 20:40:09.396660 osd.37 [WRN] slow request 1920.048094 seconds
old, received at 2018-07-11 20:08:09.348446:
osd_op(client.671628641.0:998704 2.42f88232 (undecoded)
ack+ondisk+retry+write+known_if_redirected e160475) currently waiting for
peered
2018-07-11 20:40:10.397008 osd.37 [WRN] 100 slow requests, 2 included
below; oldest blocked for > 25403.279204 secs
2018-07-11 20:40:10.397017 osd.37 [WRN] slow request 1920.043860 seconds
old, received at 2018-07-11 20:08:10.353060:
osd_op(client.231731103.0:1007729 3.e0ff5786 (undecoded)
ondisk+write+known_if_redirected e137428) currently waiting for peered
2018-07-11 20:40:10.397023 osd.37 [WRN] slow request 1920.034101 seconds
old, received at 2018-07-11 20:08:10.362819:
osd_op(client.207458703.0:2000292 3.a8143b86 (undecoded)
ondisk+write+known_if_redirected e137428) currently waiting for peered
2018-07-11 20:40:10.790573 mon.0 [INF] pgmap 4104 pgs: 5 down+peering, 1142
peering, 210 remapped+peering, 5 active+recovery_wait+degraded, 1551
active+clean, 2 activating+undersized+degraded+remapped, 15
active+remapped+backfilling, 178 unknown, 1 active+remapped, 3
activating+remapped, 78 active+undersized+degraded+remapped+backfill_wait,
6 active+recovery_wait+degraded+remapped, 3
undersized+degraded+remapped+backfill_wait+peered, 5
active+undersized+degraded+remapped+backfilling, 295
active+remapped+backfill_wait, 3 active+recovery_wait+undersized+degraded,
21 activating+undersized+degraded, 559 active+undersized+degraded, 4
remapped, 17 undersized+degraded+peered, 1
active+recovery_wait+undersized+degraded+remapped; 13439 GB data, 42395 GB
used, 160 TB / 201 TB avail; 4069 B/s rd, 746 kB/s wr, 5 op/s;
534753/10756032 objects degraded (4.972%); 779027/10756032 objects
misplaced (7.243%); 256 MB/s, 65 objects/s recovering



There are a lot of things in the OSD-log files that I'm unfamiliar with but
so far I haven't found anything that has given me a clue on how to fix the
issue.
BTW restarting a OSD doesn't seem to help, on the contrary, that sometimes
results in PGs beeing stuck undersized!
I have attaced a osd-log from when a OSD i restarted started up.

Best regards
/Magnus


2018-07-11 20:39 GMT+02:00 Paul Emmerich :

> Did you finish the upgrade of the OSDs? Are OSDs flapping? (ceph -w) Is
> there anything weird in the OSDs' log files?
>
>
> Paul
>
> 2018-07-11 20:30 GMT+02:00 Magnus Grönlund :
>
>> Hi,
>>
>> Started to upgrade a ceph-cluster from Jewel (10.2.10) to Luminous
>> (12.2.6)
>>
>> After upgrading and restarting the mons everything looked OK, the mons
>> had quorum, all OSDs where up and in and all the PGs where active+clean.
>> But before I had time to start upgrading the OSDs it became obvious that
>> something had gone terribly wrong.
>> All of a sudden 1600 out of 4100 PGs where inactive and 40% of the data
>> was misplaced!
>>
>> The mons appears OK and all OSDs are still up and in, but a few hours
>> later there was still 1483 pgs stuck inactive, essentially all of them in
>> peering!
>> Investigating one of the stuck PGs it appears to be looping between
>> “inactive”, “remapped+peering” and “peering” and the epoch number is rising
>> fast, see the attached pg query outputs.
>>
>> We really can’t afford to loose the cluster or the data so any help or
>> suggestions on how to debug or fix this issue would be very, very
>> appreciated!
>>
>>
>> health: HEALTH_ERR
>> 1483 pgs are stuck inactive for more than 60 seconds
>> 542 pgs backfill_wait
>> 14 pgs backfilling
>> 11 pgs degraded
>> 1402 pgs peering
>> 3 pgs recovery_wait
>> 11 pgs stuck degraded
>> 1483 pgs stuck inactive
>> 2042 pgs stuck unclean
>> 7 pgs stuck undersized
>> 7 pgs undersized
>> 111 requests are blocked > 32 sec
>> 10586 requests are blocked > 4096 sec
>> recovery 9472/11120724 objects degraded (0.085%)
>> recovery 1181567/11120724 objects misplaced (10.625%)
>>