Re: [ceph-users] HEALTH_WARN, 3 daemons have recently crashed

2020-01-10 Thread Simon Oosthoek
On 10/01/2020 10:41, Ashley Merrick wrote:
> Once you have fixed the issue your need to mark / archive the crash
> entry's as seen here: https://docs.ceph.com/docs/master/mgr/crash/

Hi Ashley,

thanks, I didn't know this before...

It turned out there were quite a few old crashes (since I never archived
them) and of the three most recent ones, two were like this:

"assert_msg": "/build/ceph-14.2.5/src/common/ceph_time.h: In function
'ceph::time_detail::timespan
ceph::to_timespan(ceph::time_detail::signedspan)' thread 7fbda425a700
time 2020-01-02
17:37:56.885082\n/build/ceph-14.2.5/src/common/ceph_time.h: 485: FAILED
ceph_assert(z >= signedspan::zero())\n",

And another one was too big to paste here ;-)

I did a `ceph crash archive-all` and now ceph is OK again :-)

Cheers

/Simon

> 
> 
>  On Fri, 10 Jan 2020 17:37:47 +0800 *Simon Oosthoek
> * wrote 
> 
> Hi,
> 
> last week I upgraded our ceph to 14.2.5 (from 14.2.4) and either during
> the procedure or shortly after that, some osds crashed. I
> re-initialised
> them and that should be enough to fix everything, I thought.
> 
> I looked a bit further and I do see a lot of lines like this (which are
> worrying I suppose):
> 
> ceph.log:2020-01-10 10:06:41.049879 mon.cephmon3 (mon.0) 234423 :
> cluster [DBG] osd.97 reported immediately failed by osd.67
> 
> osd.109
> osd.133
> osd.139
> osd.111
> osd.38
> osd.65
> osd.38
> osd.65
> osd.97
> 
> Now everything seems to be OK, but the WARN status remains. Is this a
> "feature" of 14.2.5 or am I missing something?
> 
> Below the output of `ceph -s`
> 
> Cheers
> 
> /Simon
> 
> 10:13 [root@cephmon1 ~]# ceph -s
> cluster:
> id: b489547c-ba50-4745-a914-23eb78e0e5dc
> health: HEALTH_WARN
> 3 daemons have recently crashed
> 
> services:
> mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 27h)
> mgr: cephmon3(active, since 27h), standbys: cephmon1, cephmon2
> mds: cephfs:1 {0=cephmds1=up:active} 1 up:standby
> osd: 168 osds: 168 up (since 6m), 168 in (since 3d); 11 remapped pgs
> 
> data:
> pools: 10 pools, 5216 pgs
> objects: 167.61M objects, 134 TiB
> usage: 245 TiB used, 1.5 PiB / 1.8 PiB avail
> pgs: 1018213/1354096231 objects misplaced (0.075%)
> 5203 active+clean
> 10 active+remapped+backfill_wait
> 2 active+clean+scrubbing+deep
> 1 active+remapped+backfilling
> 
> io:
> client: 149 MiB/s wr, 0 op/s rd, 55 op/s wr
> recovery: 0 B/s, 30 objects/s
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_WARN, 3 daemons have recently crashed

2020-01-10 Thread Ashley Merrick
Once you have fixed the issue your need to mark / archive the crash entry's as 
seen here: https://docs.ceph.com/docs/master/mgr/crash/



 On Fri, 10 Jan 2020 17:37:47 +0800 Simon Oosthoek 
 wrote 


Hi, 
 
last week I upgraded our ceph to 14.2.5 (from 14.2.4) and either during 
the procedure or shortly after that, some osds crashed. I re-initialised 
them and that should be enough to fix everything, I thought. 
 
I looked a bit further and I do see a lot of lines like this (which are 
worrying I suppose): 
 
ceph.log:2020-01-10 10:06:41.049879 mon.cephmon3 (mon.0) 234423 : 
cluster [DBG] osd.97 reported immediately failed by osd.67 
 
osd.109 
osd.133 
osd.139 
osd.111 
osd.38 
osd.65 
osd.38 
osd.65 
osd.97 
 
Now everything seems to be OK, but the WARN status remains. Is this a 
"feature" of 14.2.5 or am I missing something? 
 
Below the output of `ceph -s` 
 
Cheers 
 
/Simon 
 
10:13 [root@cephmon1 ~]# ceph -s 
 cluster: 
 id: b489547c-ba50-4745-a914-23eb78e0e5dc 
 health: HEALTH_WARN 
 3 daemons have recently crashed 
 
 services: 
 mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 27h) 
 mgr: cephmon3(active, since 27h), standbys: cephmon1, cephmon2 
 mds: cephfs:1 {0=cephmds1=up:active} 1 up:standby 
 osd: 168 osds: 168 up (since 6m), 168 in (since 3d); 11 remapped pgs 
 
 data: 
 pools:   10 pools, 5216 pgs 
 objects: 167.61M objects, 134 TiB 
 usage:   245 TiB used, 1.5 PiB / 1.8 PiB avail 
 pgs: 1018213/1354096231 objects misplaced (0.075%) 
 5203 active+clean 
 10   active+remapped+backfill_wait 
 2active+clean+scrubbing+deep 
 1active+remapped+backfilling 
 
 io: 
 client:   149 MiB/s wr, 0 op/s rd, 55 op/s wr 
 recovery: 0 B/s, 30 objects/s 
 
___ 
ceph-users mailing list 
mailto:ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com