Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-20 Thread Gregory Farnum
; However right now the cluster it self is pretty much toast due to the amount > of OSD's now with this assert. > > > ,Ashley > > > From: Gregory Farnum <gfar...@redhat.com> > Sent: 19 November 2017 09:25:39 > To: Ashley Merrick > Cc: David

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-20 Thread Ashley Merrick
hley From: Ashley Merrick Sent: 20 November 2017 08:56:15 To: Gregory Farnum Cc: David Turner; ceph-us...@ceph.com Subject: Re: [ceph-users] OSD Random Failures - Latest Luminous Hello, So I tried as suggested marking one OSD that continuously failed as lost and add a new OSD to take it's p

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-20 Thread Ashley Merrick
017 09:25:39 To: Ashley Merrick Cc: David Turner; ceph-us...@ceph.com Subject: Re: [ceph-users] OSD Random Failures - Latest Luminous I only see two asserts (in my local checkout) in that function; one is metadata assert(info.history.same_interval_since != 0); and the other is a sanity

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-19 Thread Gregory Farnum
M Ashley Merrick <ash...@amerrick.co.uk> > wrote: > > Hello, > > > > Any further suggestions or work around’s from anyone? > > > > Cluster is hard down now with around 2% PG’s offline, on the occasion able > to get an OSD to start for a bit but then will

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Ashley Merrick
l.com>; ceph-us...@ceph.com Subject: Re: [ceph-users] OSD Random Failures - Latest Luminous The osd shouldn't be about to peer while it's down. I think this is good information to update your ticket with as it is possible a different code path than anticipated. Did your cluster see the osd a

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Ashley Merrick
gt;; ceph-us...@ceph.com Subject: Re: [ceph-users] OSD Random Failures - Latest Luminous The osd shouldn't be about to peer while it's down. I think this is good information to update your ticket with as it is possible a different code path than anticipated. Did your cluster see the osd as up? On Sat,

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread David Turner
at, Nov 18, 2017, 7:08 AM Ashley Merrick <ash...@amerrick.co.uk> > wrote: > > Hello, > > > > Any further suggestions or work around’s from anyone? > > > > Cluster is hard down now with around 2% PG’s offline, on the occasion able > to get an OSD to star

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Ashley Merrick
[mailto:sean.redmo...@gmail.com] Sent: 18 November 2017 22:40 To: Ashley Merrick <ash...@amerrick.co.uk> Cc: David Turner <drakonst...@gmail.com>; ceph-users <ceph-us...@ceph.com> Subject: Re: [ceph-users] OSD Random Failures - Latest Luminous Hi, Is it possible to add new

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Sean Redmond
the executable, or `objdump -rdS ` is needed > to interpret this. > > > > I guess even with noup the OSD/PG still has the peer with the other PG’s > which is the stage that causes the failure, most OSD’s seem to stay up for > about 30 seconds, and every time it’s a different PG listed

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Ashley Merrick
a different PG listed on the failure. ,Ashley From: David Turner [mailto:drakonst...@gmail.com] Sent: 18 November 2017 22:19 To: Ashley Merrick <ash...@amerrick.co.uk> Cc: Eric Nelson <ericnel...@gmail.com>; ceph-us...@ceph.com Subject: Re: [ceph-users] OSD Random Failures - Latest

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Ashley Merrick
<ash...@amerrick.co.uk> Cc: Eric Nelson <ericnel...@gmail.com>; ceph-us...@ceph.com Subject: Re: [ceph-users] OSD Random Failures - Latest Luminous Does letting the cluster run with noup for a while until all down disks are idle, and then letting them come in help at all? I don't kn

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread David Turner
2017 17:27 > *To:* Eric Nelson <ericnel...@gmail.com> > > *Cc:* ceph-us...@ceph.com > *Subject:* Re: [ceph-users] OSD Random Failures - Latest Luminous > > > > Hello, > > > > Good to hear it's not just me, however have a cluster basica

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-18 Thread Ashley Merrick
55700 thread_name:tp_peering" ,Ashley From: Ashley Merrick Sent: 16 November 2017 17:27 To: Eric Nelson <ericnel...@gmail.com> Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] OSD Random Failures - Latest Luminous Hello, Good to hear it's not just me, however have a cluster basically offline

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-16 Thread Ashley Merrick
tps://aka.ms/ghei36> From: Ashley Merrick Sent: Thursday, November 16, 2017 9:31:22 PM To: Ashley Merrick; Eric Nelson Cc: ceph-us...@ceph.com Subject: RE: [ceph-users] OSD Random Failures - Latest Luminous Have created a ticket http://tracker.ceph.com/issues

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-16 Thread Ashley Merrick
ubject: Re: [ceph-users] OSD Random Failures - Latest Luminous Hello, Good to hear it's not just me, however have a cluster basically offline due to too many OSD's dropping for this issue. Anybody have any suggestions? ,Ashley From: Eric Nelson &l

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-16 Thread Ashley Merrick
errick Cc: ceph-us...@ceph.com Subject: Re: [ceph-users] OSD Random Failures - Latest Luminous I've been seeing these as well on our SSD cachetier that's been ravaged by disk failures as of late Same tp_peering assert as above even running luminous branch from git. Let me know if you have

Re: [ceph-users] OSD Random Failures - Latest Luminous

2017-11-15 Thread Eric Nelson
I've been seeing these as well on our SSD cachetier that's been ravaged by disk failures as of late Same tp_peering assert as above even running luminous branch from git. Let me know if you have a bug filed I can +1 or have found a workaround. E On Wed, Nov 15, 2017 at 10:25 AM, Ashley