Hi all,
after increasing mon_max_pg_per_osd number ceph starts rebalancing as usual.
However, the slow requests warnings are still there, even after setting
primary-affinity to 0 beforehand.
By the other hand, if I destroy the osd, ceph will start rebalancing unless
noout flag is set, am I
You can prevent creation of the PGs on the old filestore OSDs (which
seems to be the culprit here) during replacement by replacing the
disks the hard way:
* ceph osd destroy osd.X
* re-create with bluestore under the same id (ceph volume ... --osd-id X)
it will then just backfill onto the same
Hi,
to reduce impact on clients during migration I would set the OSD's
primary-affinity to 0 beforehand. This should prevent the slow
requests, at least this setting has helped us a lot with problematic
OSDs.
Regards
Eugen
Zitat von Jaime Ibar :
Hi all,
we recently upgrade from
Hello,
2018-09-20 09:32:58.851160 mon.dri-ceph01 [WRN] Health check update:
249 PGs pending on creation (PENDING_CREATING_PGS)
This error might indicate that you are hitting a PG limit per osd.
Here some information on it
https://ceph.com/community/new-luminous-pg-overdose-protection/ . You
Hi all,
we recently upgrade from Jewel 10.2.10 to Luminous 12.2.7, now we're
trying to migrate the
osd's to Bluestore following this document[0], however when I mark the
osd as out,
I'm getting warnings similar to these ones
2018-09-20 09:32:46.079630 mon.dri-ceph01 [WRN] Health check
Hi,
We have a five node cluster that has been running for a long
time (over a year). A few weeks ago we upgraded to 0.87 (giant) and
things continued to work well.
Last week a drive failed on one of the nodes. We replaced the
drive and things were working well again.
Hi Jeff,
it would probably wise to first check what these slow requests are:
1) ceph health detail - This will tell you which OSDs are experiencing the
slow requests
2) ceph daemon osd.{id} dump_ops_in_flight - To be issued on one of the above
OSDs will tell you what theses ops are waiting for.
Thanks. I should have mentioned that the errors are pretty well
distributed across the cluster:
ceph1: /var/log/ceph/ceph-osd.0.log 71
ceph1: /var/log/ceph/ceph-osd.1.log 112
ceph1: /var/log/ceph/ceph-osd.2.log 38
ceph2: /var/log/ceph/ceph-osd.3.log 88
ceph2: