Re: [ceph-users] Multiple OSD crashing a lot

2016-08-13 Thread Hein-Pieter van Braam
e); > assert(obc); --ctx->delta_stats.num_objects; --ctx- > >delta_stats.num_objects_hit_set_archive; > if( obc) > { >  ctx->delta_stats.num_bytes -= obc->obs.oi.size; >  ctx->delta_stats.num_bytes_hit_set_archive -= obc->obs.oi.size; > } > > >

Re: [ceph-users] Multiple OSD crashing a lot

2016-08-13 Thread Hein-Pieter van Braam
Hi Blade, I appear to be stuck in the same situation you were in. Do you still happen to have a patch to implement this workaround you described? Thanks, - HP ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph

Re: [ceph-users] Cascading failure on a placement group

2016-08-13 Thread Hein-Pieter van Braam
es/9732 > > Cheers > Goncalo > > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > Goncalo Borges [goncalo.bor...@sydney.edu.au] > Sent: 13 August 2016 22:23 > To: Hein-Pieter van Braam; ceph-users > Subject: Re: [ceph-users] Cascading failure on a placement

Re: [ceph-users] Cascading failure on a placement group

2016-08-13 Thread Hein-Pieter van Braam
_ > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > Hein-Pieter van Braam [h...@tmm.cx] > Sent: 13 August 2016 21:48 > To: ceph-users > Subject: [ceph-users] Cascading failure on a placement group > > Hello all, > > M

[ceph-users] Cascading failure on a placement group

2016-08-13 Thread Hein-Pieter van Braam
Hello all, My cluster started to lose OSDs without any warning, whenever an OSD becomes the primary for a particular PG it crashes with the following stacktrace:  ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)  1: /usr/bin/ceph-osd() [0xada722]  2: (()+0xf100) [0x7fc28bca5100]  3:

Re: [ceph-users] PG stuck remapped+incomplete

2016-07-18 Thread Hein-Pieter van Braam
7;ve attached the latest version of the pg query for this pg. Thanks a lot. - HP On Sat, 2016-07-16 at 19:56 +0200, Hein-Pieter van Braam wrote: > Hi all, > > I had a crash of some OSDs today, every primary OSD of a particular > PG > just started to crash. I have recorded the informatio

[ceph-users] PG stuck remapped+incomplete

2016-07-16 Thread Hein-Pieter van Braam
Hi all, I had a crash of some OSDs today, every primary OSD of a particular PG just started to crash. I have recorded the information for a bugreport.  I had reweighted the affected osds to 0 and put the processes in a restart loop and eventually all but one placement group ended up recovering. I

[ceph-users] PG stuck remapped+incomplete

2016-07-16 Thread Hein-Pieter van Braam
Hi all, I had a crash of some OSDs today, every primary OSD of a particular PG just started to crash. I have recorded the information for a bugreport.  I had reweighted the affected osds to 0 and put the processes in a restart loop and eventually all but one placement group ended up recovering. I

Re: [ceph-users] Cache pool with replicated pool don't work properly.

2016-06-13 Thread Hein-Pieter van Braam
Hi, I don't really have a solution but I can confirm I had the same problem trying to deploy my new Jewel cluster. I reinstalled the cluster with Hammer and everything is working as I expect it to (that is; writes hit the backing pool asynchronously) Although other than you I noticed the same pro

Re: [ceph-users] PG stuck incomplete after power failure.

2016-05-17 Thread Hein-Pieter van Braam
imary osd for that pg with > osd_find_best_info_ignore_history_les set to true (don't leave it set > long term). > -Sam > > On Tue, May 17, 2016 at 7:50 AM, Hein-Pieter van Braam > wrote: > > > > Hello, > > > > Today we had a power failure in a ra

[ceph-users] PG stuck incomplete after power failure.

2016-05-17 Thread Hein-Pieter van Braam
ave it like this for the time being. Help would be very much appreciated! Thank you, - Hein-Pieter van Braam# ceph pg 54.3e9 query { "state": "incomplete", "snap_trimq": "[]", "epoch": 90440, "up": [ 32,