On Sat, May 25, 2019 at 7:45 PM Paul Emmerich
wrote:
>
>
> On Fri, May 24, 2019 at 5:22 PM Kevin Flöh wrote:
>
>> ok this just gives me:
>>
>> error getting xattr ec31/10004dfce92./parent: (2) No such file or
>> directory
>>
> Try to run it on the replicated main data pool which
On Fri, May 24, 2019 at 5:22 PM Kevin Flöh wrote:
> ok this just gives me:
>
> error getting xattr ec31/10004dfce92./parent: (2) No such file or
> directory
>
Try to run it on the replicated main data pool which contains an empty
object for each file, not sure where the xattr is stored
I'd say that if you can't find that object in Rados, then your assumption
may be good. I haven't run into this problem before. Try doing a Rados get
for that object and see if you get anything. I've done a Rados list
grepping for the hex inode, but it took almost two days on our cluster that
had
ok this just gives me:
error getting xattr ec31/10004dfce92./parent: (2) No such file
or directory
Does this mean that the lost object isn't even a file that appears in
the ceph directory. Maybe a leftover of a file that has not been deleted
properly? It wouldn't be an issue to mark
You need to use the first stripe of the object as that is the only one with
the metadata.
Try "rados -p ec31 getxattr 10004dfce92. parent" instead.
Robert LeBlanc
Sent from a mobile device, please excuse any typos.
On Fri, May 24, 2019, 4:42 AM Kevin Flöh wrote:
> Hi,
>
> we already
Hi,
we already tried "rados -p ec31 getxattr 10004dfce92.003d parent"
but this is just hanging forever if we are looking for unfound objects.
It works fine for all other objects.
We also tried scanning the ceph directory with find -inum 1099593404050
(decimal of 10004dfce92) and found
Hi,
On 5/24/19 9:48 AM, Kevin Flöh wrote:
We got the object ids of the missing objects with|ceph pg 1.24c
list_missing:|
|{
"offset": {
"oid": "",
"key": "",
"snapid": 0,
"hash": 0,
"max": 0,
"pool": -9223372036854775808,
We got the object ids of the missing objects with|ceph pg 1.24c
list_missing:|
|{
"offset": {
"oid": "",
"key": "",
"snapid": 0,
"hash": 0,
"max": 0,
"pool": -9223372036854775808,
"namespace": ""
},
"num_missing": 1,
The PGs will stay active+recovery_wait+degraded until you solve the unfound
objects issue.
You can follow this doc to look at which objects are unfound[1] and if no
other recourse mark them lost
[1]
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#unfound-objects
.
On
thank you for this idea, it has improved the situation. Nevertheless,
there are still 2 PGs in recovery_wait. ceph -s gives me:
cluster:
id: 23e72372-0d44-4cad-b24f-3641b14b86f4
health: HEALTH_WARN
3/125481112 objects unfound (0.000%)
Degraded data
I think those osds (1, 11, 21, 32, ...) need a little kick to re-peer
their degraded PGs.
Open a window with `watch ceph -s`, then in another window slowly do
ceph osd down 1
# then wait a minute or so for that osd.1 to re-peer fully.
ceph osd down 11
...
Continue that for each
This is the current status of ceph:
cluster:
id: 23e72372-0d44-4cad-b24f-3641b14b86f4
health: HEALTH_ERR
9/125481144 objects unfound (0.000%)
Degraded data redundancy: 9/497011417 objects degraded
(0.000%), 7 pgs degraded
9 stuck requests are
2019 10:51
To: Robert LeBlanc
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Major ceph disaster
Hi,
we have set the PGs to recover and now they are stuck in
active+recovery_wait+degraded and instructing them to deep-scrub does
not change anything. Hence, the rados report is empty. Is t
What's the full ceph status?
Normally recovery_wait just means that the relevant osd's are busy
recovering/backfilling another PG.
On Thu, May 23, 2019 at 10:53 AM Kevin Flöh wrote:
>
> Hi,
>
> we have set the PGs to recover and now they are stuck in
> active+recovery_wait+degraded and
Hi,
we have set the PGs to recover and now they are stuck in
active+recovery_wait+degraded and instructing them to deep-scrub does
not change anything. Hence, the rados report is empty. Is there a way to
stop the recovery wait to start the deep-scrub and get the output? I
guess the
On Wed, May 22, 2019 at 4:31 AM Kevin Flöh wrote:
> Hi,
>
> thank you, it worked. The PGs are not incomplete anymore. Still we have
> another problem, there are 7 PGs inconsistent and a cpeh pg repair is
> not doing anything. I just get "instructing pg 1.5dd on osd.24 to
> repair" and nothing
It's been suggested here in the past to disable deep scrubbing temporarily
before running the repair because it does not execute immediately but gets
queued up behind deep scrubs.
___
ceph-users mailing list
ceph-users@lists.ceph.com
Hi,
thank you, it worked. The PGs are not incomplete anymore. Still we have
another problem, there are 7 PGs inconsistent and a cpeh pg repair is
not doing anything. I just get "instructing pg 1.5dd on osd.24 to
repair" and nothing happens. Does somebody know how we can get the PGs
to
On 5/21/19 4:48 PM, Kevin Flöh wrote:
> Hi,
>
> we gave up on the incomplete pgs since we do not have enough complete
> shards to restore them. What is the procedure to get rid of these pgs?
>
You need to start with marking the OSDs as 'lost' and then you can
force_create_pg to get the PGs
Hi,
we gave up on the incomplete pgs since we do not have enough complete
shards to restore them. What is the procedure to get rid of these pgs?
regards,
Kevin
On 20.05.19 9:22 vorm., Kevin Flöh wrote:
Hi Frederic,
we do not have access to the original OSDs. We exported the remaining
Hi Frederic,
we do not have access to the original OSDs. We exported the remaining
shards of the two pgs but we are only left with two shards (of
reasonable size) per pg. The rest of the shards displayed by ceph pg
query are empty. I guess marking the OSD as complete doesn't make sense
then.
Le 14/05/2019 à 10:04, Kevin Flöh a écrit :
On 13.05.19 11:21 nachm., Dan van der Ster wrote:
Presumably the 2 OSDs you marked as lost were hosting those
incomplete PGs?
It would be useful to double confirm that: check with `ceph pg
query` and `ceph pg dump`.
(If so, this is why the ignore
We tried to export the shards from the OSDs but there are only two
shards left for each of the pgs, so we decided to give up these pgs.
Will the files of these pgs be deleted from the mds or do we have to
delete them manually. Is this the correct command to mark the pgs as lost:
ceph pg
ceph osd pool get ec31 min_size
min_size: 3
On 15.05.19 9:09 vorm., Konstantin Shalygin wrote:
ceph osd pool get ec31 min_size
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On 5/15/19 1:49 PM, Kevin Flöh wrote:
since we have 3+1 ec I didn't try before. But when I run the command
you suggested I get the following error:
ceph osd pool set ec31 min_size 2
Error EINVAL: pool min_size must be between 3 and 4
What is your current min size? `ceph osd pool get ec31
The hdds of OSDs 4 and 23 are completely lost, we cannot access them in
any way. Is it possible to use the shards which are maybe stored on
working OSDs as shown in the all_participants list?
On 14.05.19 5:24 nachm., Dan van der Ster wrote:
On Tue, May 14, 2019 at 5:13 PM Kevin Flöh wrote:
Hi,
since we have 3+1 ec I didn't try before. But when I run the command you
suggested I get the following error:
ceph osd pool set ec31 min_size 2
Error EINVAL: pool min_size must be between 3 and 4
On 14.05.19 6:18 nachm., Konstantin Shalygin wrote:
peering does not seem to be blocked
peering does not seem to be blocked anymore. But still there is no
recovery going on. Is there anything else we can try?
Try to reduce min_size for problem pool as 'health detail' suggested:
`ceph osd pool set ec31 min_size 2`.
k
___
On Tue, May 14, 2019 at 5:13 PM Kevin Flöh wrote:
>
> ok, so now we see at least a diffrence in the recovery state:
>
> "recovery_state": [
> {
> "name": "Started/Primary/Peering/Incomplete",
> "enter_time": "2019-05-14 14:15:15.650517",
>
ok, so now we see at least a diffrence in the recovery state:
"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2019-05-14 14:15:15.650517",
"comment": "not enough complete instances of this PG"
},
{
On Tue, May 14, 2019 at 10:59 AM Kevin Flöh wrote:
>
>
> On 14.05.19 10:08 vorm., Dan van der Ster wrote:
>
> On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote:
>
> On 13.05.19 10:51 nachm., Lionel Bouton wrote:
>
> Le 13/05/2019 à 16:20, Kevin Flöh a écrit :
>
> Dear ceph experts,
>
> [...] We
On 14.05.19 10:08 vorm., Dan van der Ster wrote:
On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote:
On 13.05.19 10:51 nachm., Lionel Bouton wrote:
Le 13/05/2019 à 16:20, Kevin Flöh a écrit :
Dear ceph experts,
[...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
Here
On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote:
>
> On 13.05.19 10:51 nachm., Lionel Bouton wrote:
> > Le 13/05/2019 à 16:20, Kevin Flöh a écrit :
> >> Dear ceph experts,
> >>
> >> [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
> >> Here is what happened: One osd
On 13.05.19 11:21 nachm., Dan van der Ster wrote:
Presumably the 2 OSDs you marked as lost were hosting those incomplete PGs?
It would be useful to double confirm that: check with `ceph pg
query` and `ceph pg dump`.
(If so, this is why the ignore history les thing isn't helping; you
don't have
On 13.05.19 10:51 nachm., Lionel Bouton wrote:
Le 13/05/2019 à 16:20, Kevin Flöh a écrit :
Dear ceph experts,
[...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
Here is what happened: One osd daemon could not be started and
therefore we decided to mark the osd as lost
Presumably the 2 OSDs you marked as lost were hosting those incomplete PGs?
It would be useful to double confirm that: check with `ceph pg
query` and `ceph pg dump`.
(If so, this is why the ignore history les thing isn't helping; you
don't have the minimum 3 stripes up for those 3+1 PGs.)
If
Le 13/05/2019 à 16:20, Kevin Flöh a écrit :
> Dear ceph experts,
>
> [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
> Here is what happened: One osd daemon could not be started and
> therefore we decided to mark the osd as lost and set it up from
> scratch. Ceph started
Dear ceph experts,
we have several (maybe related) problems with our ceph cluster, let me
first show you the current ceph status:
cluster:
id: 23e72372-0d44-4cad-b24f-3641b14b86f4
health: HEALTH_ERR
1 MDSs report slow metadata IOs
1 MDSs report slow
38 matches
Mail list logo