Re: [ceph-users] PG stuck peering after host reboot

2017-02-24 Thread Wido den Hollander
stale state. You should then be able to force re-create it. This worked for me with a replicated pool, never tried this with EC. Afterwards you can re-create these OSDs again. Wido > > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf o

Re: [ceph-users] PG stuck peering after host reboot

2017-02-23 Thread george.vasilakakos
m: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of george.vasilaka...@stfc.ac.uk [george.vasilaka...@stfc.ac.uk] Sent: 22 February 2017 14:35 To: w...@42on.com; ceph-users@lists.ceph.com Subject: Re: [ceph-users] PG stuck peering after host reboot So what I see there is this for osd.307:

Re: [ceph-users] PG stuck peering after host reboot

2017-02-22 Thread george.vasilakakos
PGs are reporting being undersized and having ITEM_NONE in their acting sets as well. > > From: Wido den Hollander [w...@42on.com] > Sent: 22 February 2017 12:18 > To: Vasilakakos, George (STFC,RAL,SC); ceph-users@lists.ceph.com > Subject: RE: [ceph

Re: [ceph-users] PG stuck peering after host reboot

2017-02-22 Thread Wido den Hollander
_ > From: Wido den Hollander [w...@42on.com] > Sent: 22 February 2017 12:18 > To: Vasilakakos, George (STFC,RAL,SC); ceph-users@lists.ceph.com > Subject: RE: [ceph-users] PG stuck peering after host reboot > > > Op 21 februari 2017 om 15:35 schreef george.vasilaka..

Re: [ceph-users] PG stuck peering after host reboot

2017-02-22 Thread george.vasilakakos
Subject: RE: [ceph-users] PG stuck peering after host reboot > Op 21 februari 2017 om 15:35 schreef george.vasilaka...@stfc.ac.uk: > > > I have noticed something odd with the ceph-objectstore-tool command: > > It always reports PG X not found even on healthly OSDs/PGs. The 'list' o

Re: [ceph-users] PG stuck peering after host reboot

2017-02-22 Thread Wido den Hollander
ers@lists.ceph.com; bhubb...@redhat.com > Subject: Re: [ceph-users] PG stuck peering after host reboot > > > Can you for the sake of redundancy post your sequence of commands you > > executed and their output? > > [root@ceph-sn852 ~]# systemctl stop ceph-osd@307 > [root@cep

Re: [ceph-users] PG stuck peering after host reboot

2017-02-21 Thread george.vasilakakos
of george.vasilaka...@stfc.ac.uk [george.vasilaka...@stfc.ac.uk] Sent: 21 February 2017 10:17 To: w...@42on.com; ceph-users@lists.ceph.com; bhubb...@redhat.com Subject: Re: [ceph-users] PG stuck peering after host reboot > Can you for the sake of redundancy post your sequence of commands you > ex

Re: [ceph-users] PG stuck peering after host reboot

2017-02-21 Thread george.vasilakakos
> Can you for the sake of redundancy post your sequence of commands you > executed and their output? [root@ceph-sn852 ~]# systemctl stop ceph-osd@307 [root@ceph-sn852 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-307 --op info --pgid 1.323 PG '1.323' not found [root@ceph-sn852

Re: [ceph-users] PG stuck peering after host reboot

2017-02-21 Thread Wido den Hollander
> Op 20 februari 2017 om 17:52 schreef george.vasilaka...@stfc.ac.uk: > > > Hi Wido, > > Just to make sure I have everything straight, > > > If the PG still doesn't recover do the same on osd.307 as I think that > > 'ceph pg X query' still hangs? > > > The info from ceph-objectstore-tool

Re: [ceph-users] PG stuck peering after host reboot

2017-02-20 Thread george.vasilakakos
Hi Wido, Just to make sure I have everything straight, > If the PG still doesn't recover do the same on osd.307 as I think that 'ceph > pg X query' still hangs? > The info from ceph-objectstore-tool might shed some more light on this PG. You mean run the objectstore command on 307, not remove

Re: [ceph-users] PG stuck peering after host reboot

2017-02-17 Thread george.vasilakakos
t;3. ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-595 --op info >> >--pgid 1.323 >> > >> >What does osd.595 think about that PG? >> > >> >You could even try 'rm-past-intervals' with the object-store tool, but that >> >might be a bit

Re: [ceph-users] PG stuck peering after host reboot

2017-02-17 Thread Wido den Hollander
t;Wido > > > >> > >> Best regards, > >> > >> George > >> > >> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > >> george.vasilaka...@stfc.ac.uk [george.vasilaka...@stf

Re: [ceph-users] PG stuck peering after host reboot

2017-02-17 Thread george.vasilakakos
tervals' with the object-store tool, but that >might be a bit dangerous. Wouldn't do that immediately. > >Wido > >> >> Best regards, >> >> George >> >> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of >> george.va

Re: [ceph-users] PG stuck peering after host reboot

2017-02-16 Thread Wido den Hollander
___ > From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of > george.vasilaka...@stfc.ac.uk [george.vasilaka...@stfc.ac.uk] > Sent: 14 February 2017 10:27 > To: bhubb...@redhat.com; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] PG stuck peering after

Re: [ceph-users] PG stuck peering after host reboot

2017-02-16 Thread george.vasilakakos
; ceph-users@lists.ceph.com Subject: Re: [ceph-users] PG stuck peering after host reboot Hi Brad, I'll be doing so later in the day. Thanks, George From: Brad Hubbard [bhubb...@redhat.com] Sent: 13 February 2017 22:03 To: Vasilakakos, George (STFC,RAL,SC

Re: [ceph-users] PG stuck peering after host reboot

2017-02-14 Thread george.vasilakakos
Hi Brad, I'll be doing so later in the day. Thanks, George From: Brad Hubbard [bhubb...@redhat.com] Sent: 13 February 2017 22:03 To: Vasilakakos, George (STFC,RAL,SC); Ceph Users Subject: Re: [ceph-users] PG stuck peering after host reboot I'd suggest

Re: [ceph-users] PG stuck peering after host reboot

2017-02-09 Thread george.vasilakakos
[george.vasilaka...@stfc.ac.uk] Sent: 08 February 2017 18:32 To: gfar...@redhat.com Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] PG stuck peering after host reboot Hey Greg, Thanks for your quick responses. I have to leave the office now but I'll look deeper into it tomorrow to try and understand

Re: [ceph-users] PG stuck peering after host reboot

2017-02-08 Thread george.vasilakakos
(STFC,RAL,SC) Cc: Ceph Users Subject: Re: [ceph-users] PG stuck peering after host reboot On Wed, Feb 8, 2017 at 10:25 AM, <george.vasilaka...@stfc.ac.uk> wrote: > Hi Greg, > >> Yes, "bad crc" indicates that the checksums on an incoming message did >> not match w

Re: [ceph-users] PG stuck peering after host reboot

2017-02-08 Thread Gregory Farnum
On Wed, Feb 8, 2017 at 10:25 AM, wrote: > Hi Greg, > >> Yes, "bad crc" indicates that the checksums on an incoming message did >> not match what was provided — ie, the message got corrupted. You >> shouldn't try and fix that by playing around with the peering

Re: [ceph-users] PG stuck peering after host reboot

2017-02-08 Thread george.vasilakakos
Hi Greg, > Yes, "bad crc" indicates that the checksums on an incoming message did > not match what was provided — ie, the message got corrupted. You > shouldn't try and fix that by playing around with the peering settings > as it's not a peering bug. > Unless there's a bug in the messaging layer

Re: [ceph-users] PG stuck peering after host reboot

2017-02-08 Thread Gregory Farnum
On Wed, Feb 8, 2017 at 8:17 AM, wrote: > Hi Ceph folks, > > I have a cluster running Jewel 10.2.5 using a mix EC and replicated pools. > > After rebooting a host last night, one PG refuses to complete peering > > pg 1.323 is stuck inactive for 73352.498493, current

Re: [ceph-users] PG stuck peering after host reboot

2017-02-08 Thread george.vasilakakos
: [ceph-users] PG stuck peering after host reboot Hello, I already had the case, I applied the parameter (osd_find_best_info_ignore_history_les) to all the osd that have reported the queries blocked. -- Cordialement, CEO FEELB | Corentin BONNETON cont...@feelb.io<mailto:cont...@feelb.io> Le

Re: [ceph-users] PG stuck peering after host reboot

2017-02-08 Thread Corentin Bonneton
Hello, I already had the case, I applied the parameter (osd_find_best_info_ignore_history_les) to all the osd that have reported the queries blocked. -- Cordialement, CEO FEELB | Corentin BONNETON cont...@feelb.io > Le 8 févr. 2017 à 17:17, george.vasilaka...@stfc.ac.uk a écrit : > > Hi Ceph

[ceph-users] PG stuck peering after host reboot

2017-02-08 Thread george.vasilakakos
Hi Ceph folks, I have a cluster running Jewel 10.2.5 using a mix EC and replicated pools. After rebooting a host last night, one PG refuses to complete peering pg 1.323 is stuck inactive for 73352.498493, current state peering, last acting [595,1391,240,127,937,362,267,320,7,634,716]