Re: [ceph-users] clust recovery stuck

2019-10-23 Thread Eugen Block
Hi, if the OSDs are not too full it's probably the crush weight of those hosts and OSDs. CRUSH tries to distribute the data evenly to all three hosts because they have the same weight (9.31400). But since two OSDs are missing the distribution doesn't finish. If you can't replace the

Re: [ceph-users] clust recovery stuck

2019-10-22 Thread Andras Pataki
Hi Philipp, Given 256 PG's triple replicated onto 4 OSD's you might be encountering the "PG overdose protection" of OSDs.  Take a look at 'ceph osd df' and see the number of PG's that are mapped to each OSD (last column or near the last).  The default limit is 200, so if any OSD exceeds that,

Re: [ceph-users] clust recovery stuck

2019-10-22 Thread Philipp Schwaha
hi, On 2019-10-22 08:05, Eugen Block wrote: > Hi, > > can you share `ceph osd tree`? What crush rules are in use in your > cluster? I assume that the two failed OSDs prevent the remapping because > the rules can't be applied. > ceph osd tree gives: ID WEIGHT TYPE NAMEUP/DOWN

Re: [ceph-users] clust recovery stuck

2019-10-22 Thread Eugen Block
Hi, can you share `ceph osd tree`? What crush rules are in use in your cluster? I assume that the two failed OSDs prevent the remapping because the rules can't be applied. Regards, Eugen Zitat von Philipp Schwaha : hi, I have a problem with a cluster being stuck in recovery after osd

[ceph-users] clust recovery stuck

2019-10-21 Thread Philipp Schwaha
hi, I have a problem with a cluster being stuck in recovery after osd failure. at first recovery was doing quite well, but now it just sits there without any progress. I currently looks like this: health HEALTH_ERR 36 pgs are stuck inactive for more than 300 seconds