Re: [ceph-users] Cluster Down from reweight-by-utilization

2017-11-06 Thread Kevin Hrpcek
An update for the list archive and if people have similar issues in the future. My cluster took about 18 hours after resetting noup for all of the OSDs to get to the current epoch. In the end there were 5 that took a few hours longer than the others. Other small issues came up during the proc

Re: [ceph-users] Cluster Down from reweight-by-utilization

2017-11-04 Thread Sage Weil
On Sat, 4 Nov 2017, Kevin Hrpcek wrote: > Hey Sage, > > Thanks for getting back to me this late on a weekend. > > Do you now why the OSDs were going down? Are there any crash dumps in the > osd logs, or is the OOM killer getting them? > > That's a part I can't nail down yet. OSDs didn't crash,

Re: [ceph-users] Cluster Down from reweight-by-utilization

2017-11-04 Thread Kevin Hrpcek
Hey Sage, Thanks for getting back to me this late on a weekend. Do you now why the OSDs were going down? Are there any crash dumps in the osd logs, or is the OOM killer getting them? That's a part I can't nail down yet. OSDs didn't crash, after the reweight-by-utilization OSDs on some of our e

Re: [ceph-users] Cluster Down from reweight-by-utilization

2017-11-04 Thread Sage Weil
Hi Kevin, On Sat, 4 Nov 2017, Kevin Hrpcek wrote: > Hello, > > I've run into an issue and would appreciate any assistance anyone can provide > as I haven't been able to solve this problem yet > and am running out of ideas. I ran a reweight-by-utilization on my cluster > using conservative value

[ceph-users] Cluster Down from reweight-by-utilization

2017-11-04 Thread Kevin Hrpcek
Hello, I've run into an issue and would appreciate any assistance anyone can provide as I haven't been able to solve this problem yet and am running out of ideas. I ran a reweight-by-utilization on my cluster using conservative values so that it wouldn't cause a large rebalancing. The reweigh