Re: [ceph-users] MDS damaged

2018-07-13 Thread Alessandro De Salvo

Hi Dan,

you're right, I was following the mimic instructions (which indeed 
worked on my mimic testbed), but luminous is different and I missed the 
additional step.


Works now, thanks!


    Alessandro


Il 13/07/18 17:51, Dan van der Ster ha scritto:

On Fri, Jul 13, 2018 at 4:07 PM Alessandro De Salvo
 wrote:

However, I cannot reduce the number of mdses anymore, I was used to do
that with e.g.:

ceph fs set cephfs max_mds 1

Trying this with 12.2.6 has apparently no effect, I am left with 2
active mdses. Is this another bug?

Are you following this procedure?
http://docs.ceph.com/docs/luminous/cephfs/multimds/#decreasing-the-number-of-ranks
i.e. you need to deactivate after decreasing max_mds.

(Mimic does this automatically, OTOH).

-- dan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS damaged

2018-07-13 Thread Alessandro De Salvo

Thanks all,

100..inode, mds_snaptable and 1..inode were not 
corrupted, so I left them as they were. I have re-injected all the bad 
objects, for all mdses (2 per filesysytem) and all filesystems I had 
(2), and after setiing the mdses as repaired my filesystems are back!


However, I cannot reduce the number of mdses anymore, I was used to do 
that with e.g.:



ceph fs set cephfs max_mds 1


Trying this with 12.2.6 has apparently no effect, I am left with 2 
active mdses. Is this another bug?


Thanks,


    Alessandro



Il 13/07/18 15:54, Yan, Zheng ha scritto:

On Thu, Jul 12, 2018 at 11:39 PM Alessandro De Salvo
 wrote:

Some progress, and more pain...

I was able to recover the 200. using the ceph-objectstore-tool for one 
of the OSDs (all identical copies) but trying to re-inject it just with rados 
put was giving no error while the get was still giving the same I/O error. So 
the solution was to rm the object and the put it again, that worked.

However, after restarting one of the MDSes and seeting it to repaired, I've hit 
another, similar problem:


2018-07-12 17:04:41.999136 7f54c3f4e700 -1 log_channel(cluster) log [ERR] : 
error reading table object 'mds0_inotable' -5 ((5) Input/output error)


Can I safely try to do the same as for object 200.? Should I check 
something before trying it? Again, checking the copies of the object, they have 
identical md5sums on all the replicas.


Yes, It should be safe. you also need to the same for several other
objects. full object list are:

200.
mds0_inotable
100..inode
mds_snaptable
1..inode

The first three objects are per-mds-rank.  Ff you have enabled
multi-active mds, you also need to update objects of other ranks. For
mds.1, object names are 201., mds1_inotable and
101..inode.




Thanks,


 Alessandro


Il 12/07/18 16:46, Alessandro De Salvo ha scritto:

Unfortunately yes, all the OSDs were restarted a few times, but no change.

Thanks,


 Alessandro


Il 12/07/18 15:55, Paul Emmerich ha scritto:

This might seem like a stupid suggestion, but: have you tried to restart the 
OSDs?

I've also encountered some random CRC errors that only showed up when trying to 
read an object,
but not on scrubbing, that magically disappeared after restarting the OSD.

However, in my case it was clearly related to 
https://tracker.ceph.com/issues/22464 which doesn't
seem to be the issue here.

Paul

2018-07-12 13:53 GMT+02:00 Alessandro De Salvo 
:


Il 12/07/18 11:20, Alessandro De Salvo ha scritto:



Il 12/07/18 10:58, Dan van der Ster ha scritto:

On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum  wrote:

On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo 
 wrote:

OK, I found where the object is:


ceph osd map cephfs_metadata 200.
osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg
10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23)


So, looking at the osds 23, 35 and 18 logs in fact I see:


osd.23:

2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
10:292cf221:::200.:head


osd.35:

2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
10:292cf221:::200.:head


osd.18:

2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
10:292cf221:::200.:head


So, basically the same error everywhere.

I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may
help.

No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and
no disk problems anywhere. No relevant errors in syslogs, the hosts are
just fine. I cannot exclude an error on the RAID controllers, but 2 of
the OSDs with 10.14 are on a SAN system and one on a different one, so I
would tend to exclude they both had (silent) errors at the same time.


That's fairly distressing. At this point I'd probably try extracting the object 
using ceph-objectstore-tool and seeing if it decodes properly as an mds 
journal. If it does, you might risk just putting it back in place to overwrite 
the crc.


Wouldn't it be easier to scrub repair the PG to fix the crc?


this is what I already instructed the cluster to do, a deep scrub, but I'm not 
sure it could repair in case all replicas are bad, as it seems to be the case.


I finally managed (with the help of Dan), to perform the deep-scrub on pg 
10.14, but the deep scrub did not detect anything wrong. Also trying to repair 
10.14 has no effect.
Still, trying to access the object I get in the OSDs:

2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster) log [ERR] : 
10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 
10:292cf221:::200.:head

Was deep-scrub supposed to detect the wrong crc? If yes, them it sound

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo

Some progress, and more pain...

I was able to recover the 200. using the ceph-objectstore-tool 
for one of the OSDs (all identical copies) but trying to re-inject it 
just with rados put was giving no error while the get was still giving 
the same I/O error. So the solution was to rm the object and the put it 
again, that worked.


However, after restarting one of the MDSes and seeting it to repaired, 
I've hit another, similar problem:



2018-07-12 17:04:41.999136 7f54c3f4e700 -1 log_channel(cluster) log 
[ERR] : error reading table object 'mds0_inotable' -5 ((5) Input/output 
error)



Can I safely try to do the same as for object 200.? Should I 
check something before trying it? Again, checking the copies of the 
object, they have identical md5sums on all the replicas.


Thanks,


    Alessandro


Il 12/07/18 16:46, Alessandro De Salvo ha scritto:


Unfortunately yes, all the OSDs were restarted a few times, but no change.

Thanks,


    Alessandro


Il 12/07/18 15:55, Paul Emmerich ha scritto:
This might seem like a stupid suggestion, but: have you tried to 
restart the OSDs?


I've also encountered some random CRC errors that only showed up when 
trying to read an object,
but not on scrubbing, that magically disappeared after restarting the 
OSD.


However, in my case it was clearly related to 
https://tracker.ceph.com/issues/22464 which doesn't

seem to be the issue here.

Paul

2018-07-12 13:53 GMT+02:00 Alessandro De Salvo 
<mailto:alessandro.desa...@roma1.infn.it>>:



Il 12/07/18 11:20, Alessandro De Salvo ha scritto:



Il 12/07/18 10:58, Dan van der Ster ha scritto:

On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum
mailto:gfar...@redhat.com>> wrote:

On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo
mailto:alessandro.desa...@roma1.infn.it>> wrote:

OK, I found where the object is:


ceph osd map cephfs_metadata 200.
osdmap e632418 pool 'cephfs_metadata' (10) object
'200.' -> pg
10.844f3494 (10.14) -> up ([23,35,18], p23)
acting ([23,35,18], p23)


So, looking at the osds 23, 35 and 18 logs in
fact I see:


osd.23:

2018-07-11 15:49:14.913771 7efbee672700 -1
log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 !=
expected 0x9ef2b41b on
10:292cf221:::200.:head


osd.35:

2018-07-11 18:01:19.989345 7f760291a700 -1
log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 !=
expected 0x9ef2b41b on
10:292cf221:::200.:head


osd.18:

2018-07-11 18:18:06.214933 7fabaf5c1700 -1
log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 !=
expected 0x9ef2b41b on
10:292cf221:::200.:head


So, basically the same error everywhere.

I'm trying to issue a repair of the pg 10.14, but
I'm not sure if it may
help.

No SMART errors (the fileservers are SANs, in
RAID6 + LVM volumes), and
no disk problems anywhere. No relevant errors in
syslogs, the hosts are
just fine. I cannot exclude an error on the RAID
controllers, but 2 of
the OSDs with 10.14 are on a SAN system and one
on a different one, so I
would tend to exclude they both had (silent)
errors at the same time.


That's fairly distressing. At this point I'd probably
try extracting the object using ceph-objectstore-tool
and seeing if it decodes properly as an mds journal.
If it does, you might risk just putting it back in
place to overwrite the crc.

Wouldn't it be easier to scrub repair the PG to fix the crc?


this is what I already instructed the cluster to do, a deep
scrub, but I'm not sure it could repair in case all replicas
are bad, as it seems to be the case.


I finally managed (with the help of Dan), to perform the
deep-scrub on pg 10.14, but the deep scrub did not detect
anything wrong. Also trying to repair 10.14 has no effect.
Still, trying to access the object I get in the OSDs:

2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster)
log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected
0x9ef2b41b on 10:2

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo

Unfortunately yes, all the OSDs were restarted a few times, but no change.

Thanks,


    Alessandro


Il 12/07/18 15:55, Paul Emmerich ha scritto:
This might seem like a stupid suggestion, but: have you tried to 
restart the OSDs?


I've also encountered some random CRC errors that only showed up when 
trying to read an object,
but not on scrubbing, that magically disappeared after restarting the 
OSD.


However, in my case it was clearly related to 
https://tracker.ceph.com/issues/22464 which doesn't

seem to be the issue here.

Paul

2018-07-12 13:53 GMT+02:00 Alessandro De Salvo 
<mailto:alessandro.desa...@roma1.infn.it>>:



Il 12/07/18 11:20, Alessandro De Salvo ha scritto:



Il 12/07/18 10:58, Dan van der Ster ha scritto:

On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum
mailto:gfar...@redhat.com>> wrote:

On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo
mailto:alessandro.desa...@roma1.infn.it>> wrote:

OK, I found where the object is:


ceph osd map cephfs_metadata 200.
osdmap e632418 pool 'cephfs_metadata' (10) object
'200.' -> pg
10.844f3494 (10.14) -> up ([23,35,18], p23) acting
([23,35,18], p23)


So, looking at the osds 23, 35 and 18 logs in fact
I see:


osd.23:

2018-07-11 15:49:14.913771 7efbee672700 -1
log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 !=
expected 0x9ef2b41b on
10:292cf221:::200.:head


osd.35:

2018-07-11 18:01:19.989345 7f760291a700 -1
log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 !=
expected 0x9ef2b41b on
10:292cf221:::200.:head


osd.18:

2018-07-11 18:18:06.214933 7fabaf5c1700 -1
log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 !=
expected 0x9ef2b41b on
10:292cf221:::200.:head


So, basically the same error everywhere.

I'm trying to issue a repair of the pg 10.14, but
I'm not sure if it may
help.

No SMART errors (the fileservers are SANs, in
RAID6 + LVM volumes), and
no disk problems anywhere. No relevant errors in
syslogs, the hosts are
just fine. I cannot exclude an error on the RAID
controllers, but 2 of
the OSDs with 10.14 are on a SAN system and one on
a different one, so I
would tend to exclude they both had (silent)
errors at the same time.


That's fairly distressing. At this point I'd probably
try extracting the object using ceph-objectstore-tool
and seeing if it decodes properly as an mds journal.
If it does, you might risk just putting it back in
place to overwrite the crc.

Wouldn't it be easier to scrub repair the PG to fix the crc?


this is what I already instructed the cluster to do, a deep
scrub, but I'm not sure it could repair in case all replicas
are bad, as it seems to be the case.


I finally managed (with the help of Dan), to perform the
deep-scrub on pg 10.14, but the deep scrub did not detect anything
wrong. Also trying to repair 10.14 has no effect.
Still, trying to access the object I get in the OSDs:

2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster)
log [ERR] : 10.14 full-object read crc 0x976aefc5 != expected
0x9ef2b41b on 10:292cf221:::200.:head

Was deep-scrub supposed to detect the wrong crc? If yes, them it
sounds like a bug.
Can I force the repair someway?
Thanks,

   Alessandro



Alessandro, did you already try a deep-scrub on pg 10.14?


I'm waiting for the cluster to do that, I've sent it earlier
this morning.

  I expect
it'll show an inconsistent object. Though, I'm unsure if
repair will
correct the crc given that in this case *all* replicas
have a bad crc.


Exactly, this is what I wonder too.
Cheers,

    Alessandro


--Dan

However, I'm also quite curious how it ended up that
way, with a checksum mismatch but identical data (and
identical checksums!) across the three replica

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo


Il 12/07/18 11:20, Alessandro De Salvo ha scritto:



Il 12/07/18 10:58, Dan van der Ster ha scritto:
On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum  
wrote:
On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo 
 wrote:

OK, I found where the object is:


ceph osd map cephfs_metadata 200.
osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg
10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23)


So, looking at the osds 23, 35 and 18 logs in fact I see:


osd.23:

2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 
0x9ef2b41b on

10:292cf221:::200.:head


osd.35:

2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 
0x9ef2b41b on

10:292cf221:::200.:head


osd.18:

2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 
0x9ef2b41b on

10:292cf221:::200.:head


So, basically the same error everywhere.

I'm trying to issue a repair of the pg 10.14, but I'm not sure if 
it may

help.

No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), 
and
no disk problems anywhere. No relevant errors in syslogs, the hosts 
are

just fine. I cannot exclude an error on the RAID controllers, but 2 of
the OSDs with 10.14 are on a SAN system and one on a different one, 
so I

would tend to exclude they both had (silent) errors at the same time.


That's fairly distressing. At this point I'd probably try extracting 
the object using ceph-objectstore-tool and seeing if it decodes 
properly as an mds journal. If it does, you might risk just putting 
it back in place to overwrite the crc.



Wouldn't it be easier to scrub repair the PG to fix the crc?


this is what I already instructed the cluster to do, a deep scrub, but 
I'm not sure it could repair in case all replicas are bad, as it seems 
to be the case.


I finally managed (with the help of Dan), to perform the deep-scrub on 
pg 10.14, but the deep scrub did not detect anything wrong. Also trying 
to repair 10.14 has no effect.

Still, trying to access the object I get in the OSDs:

2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster) log 
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 
10:292cf221:::200.:head


Was deep-scrub supposed to detect the wrong crc? If yes, them it sounds 
like a bug.

Can I force the repair someway?
Thanks,

   Alessandro




Alessandro, did you already try a deep-scrub on pg 10.14?


I'm waiting for the cluster to do that, I've sent it earlier this 
morning.



  I expect
it'll show an inconsistent object. Though, I'm unsure if repair will
correct the crc given that in this case *all* replicas have a bad crc.


Exactly, this is what I wonder too.
Cheers,

    Alessandro



--Dan

However, I'm also quite curious how it ended up that way, with a 
checksum mismatch but identical data (and identical checksums!) 
across the three replicas. Have you previously done some kind of 
scrub repair on the metadata pool? Did the PG perhaps get backfilled 
due to cluster changes?

-Greg



Thanks,


  Alessandro



Il 11/07/18 18:56, John Spray ha scritto:

On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo
 wrote:

Hi John,

in fact I get an I/O error by hand too:


rados get -p cephfs_metadata 200. 200.
error getting cephfs_metadata/200.: (5) Input/output error

Next step would be to go look for corresponding errors on your OSD
logs, system logs, and possibly also check things like the SMART
counters on your hard drives for possible root causes.

John




Can this be recovered someway?

Thanks,


   Alessandro


Il 11/07/18 18:33, John Spray ha scritto:

On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo
 wrote:

Hi,

after the upgrade to luminous 12.2.6 today, all our MDSes have 
been

marked as damaged. Trying to restart the instances only result in
standby MDSes. We currently have 2 filesystems active and 2 
MDSes each.


I found the following error messages in the mon:


mds.0 :6800/2412911269 down:damaged
mds.1 :6800/830539001 down:damaged
mds.0 :6800/4080298733 down:damaged


Whenever I try to force the repaired state with ceph mds repaired
: I get something like this in the MDS logs:


2018-07-11 13:20:41.597970 7ff7e010e700  0 
mds.1.journaler.mdlog(ro)

error getting journal off disk
2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) 
log

[ERR] : Error recovering journal 0x201: (5) Input/output error

An EIO reading the journal header is pretty scary. The MDS itself
probably can't tell you much more about this: you need to dig down
into the RADOS layer.  Try reading the 200. object (that
happens to be the rank 0 journal header, every CephFS filesystem
should have one) using the `rados` command line tool.

John



Any attempt of running the journal 

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo



Il 12/07/18 10:58, Dan van der Ster ha scritto:

On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum  wrote:

On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo 
 wrote:

OK, I found where the object is:


ceph osd map cephfs_metadata 200.
osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg
10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23)


So, looking at the osds 23, 35 and 18 logs in fact I see:


osd.23:

2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
10:292cf221:::200.:head


osd.35:

2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
10:292cf221:::200.:head


osd.18:

2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
10:292cf221:::200.:head


So, basically the same error everywhere.

I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may
help.

No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and
no disk problems anywhere. No relevant errors in syslogs, the hosts are
just fine. I cannot exclude an error on the RAID controllers, but 2 of
the OSDs with 10.14 are on a SAN system and one on a different one, so I
would tend to exclude they both had (silent) errors at the same time.


That's fairly distressing. At this point I'd probably try extracting the object 
using ceph-objectstore-tool and seeing if it decodes properly as an mds 
journal. If it does, you might risk just putting it back in place to overwrite 
the crc.


Wouldn't it be easier to scrub repair the PG to fix the crc?


this is what I already instructed the cluster to do, a deep scrub, but 
I'm not sure it could repair in case all replicas are bad, as it seems 
to be the case.




Alessandro, did you already try a deep-scrub on pg 10.14?


I'm waiting for the cluster to do that, I've sent it earlier this morning.


  I expect
it'll show an inconsistent object. Though, I'm unsure if repair will
correct the crc given that in this case *all* replicas have a bad crc.


Exactly, this is what I wonder too.
Cheers,

    Alessandro



--Dan


However, I'm also quite curious how it ended up that way, with a checksum 
mismatch but identical data (and identical checksums!) across the three 
replicas. Have you previously done some kind of scrub repair on the metadata 
pool? Did the PG perhaps get backfilled due to cluster changes?
-Greg



Thanks,


  Alessandro



Il 11/07/18 18:56, John Spray ha scritto:

On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo
 wrote:

Hi John,

in fact I get an I/O error by hand too:


rados get -p cephfs_metadata 200. 200.
error getting cephfs_metadata/200.: (5) Input/output error

Next step would be to go look for corresponding errors on your OSD
logs, system logs, and possibly also check things like the SMART
counters on your hard drives for possible root causes.

John




Can this be recovered someway?

Thanks,


   Alessandro


Il 11/07/18 18:33, John Spray ha scritto:

On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo
 wrote:

Hi,

after the upgrade to luminous 12.2.6 today, all our MDSes have been
marked as damaged. Trying to restart the instances only result in
standby MDSes. We currently have 2 filesystems active and 2 MDSes each.

I found the following error messages in the mon:


mds.0 :6800/2412911269 down:damaged
mds.1 :6800/830539001 down:damaged
mds.0 :6800/4080298733 down:damaged


Whenever I try to force the repaired state with ceph mds repaired
: I get something like this in the MDS logs:


2018-07-11 13:20:41.597970 7ff7e010e700  0 mds.1.journaler.mdlog(ro)
error getting journal off disk
2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log
[ERR] : Error recovering journal 0x201: (5) Input/output error

An EIO reading the journal header is pretty scary.  The MDS itself
probably can't tell you much more about this: you need to dig down
into the RADOS layer.  Try reading the 200. object (that
happens to be the rank 0 journal header, every CephFS filesystem
should have one) using the `rados` command line tool.

John




Any attempt of running the journal export results in errors, like this one:


cephfs-journal-tool --rank=cephfs:0 journal export backup.bin
Error ((5) Input/output error)2018-07-11 17:01:30.631571 7f94354fff00 -1
Header 200. is unreadable

2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal not
readable, attempt object-by-object dump with `rados`


Same happens for recover_dentries

cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header
200. is unreadable
Errors:
0

Is there something I could try to do to have the cluster back?

I was able t

Re: [ceph-users] MDS damaged

2018-07-12 Thread Alessandro De Salvo


> Il giorno 11 lug 2018, alle ore 23:25, Gregory Farnum  ha 
> scritto:
> 
>> On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo 
>>  wrote:
>> OK, I found where the object is:
>> 
>> 
>> ceph osd map cephfs_metadata 200.
>> osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg 
>> 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23)
>> 
>> 
>> So, looking at the osds 23, 35 and 18 logs in fact I see:
>> 
>> 
>> osd.23:
>> 
>> 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log 
>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 
>> 10:292cf221:::200.:head
>> 
>> 
>> osd.35:
>> 
>> 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log 
>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 
>> 10:292cf221:::200.:head
>> 
>> 
>> osd.18:
>> 
>> 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log 
>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 
>> 10:292cf221:::200.:head
>> 
>> 
>> So, basically the same error everywhere.
>> 
>> I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may 
>> help.
>> 
>> No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and 
>> no disk problems anywhere. No relevant errors in syslogs, the hosts are 
>> just fine. I cannot exclude an error on the RAID controllers, but 2 of 
>> the OSDs with 10.14 are on a SAN system and one on a different one, so I 
>> would tend to exclude they both had (silent) errors at the same time.
> 
> That's fairly distressing. At this point I'd probably try extracting the 
> object using ceph-objectstore-tool and seeing if it decodes properly as an 
> mds journal. If it does, you might risk just putting it back in place to 
> overwrite the crc.
> 

Ok, I guess I know how to extract the object from a given OSD, but I’m not sure 
how to check if it decodes as mds journal, is there a procedure for this? 
However if trying to export all the sophie’s from all the osd brings the same 
object md5sum I believe I can try directly to overwrite the object, as it 
cannot go worse than this, correct?
Also I’d need a confirmation of the procedure to follow in this case, when 
possibly all copies of an object are wrong, I would try the following:

- set the noout
- bring down all the osd where the object is present
- replace the object in all stores
- bring the osds up again
- unset the noout

Correct?


> However, I'm also quite curious how it ended up that way, with a checksum 
> mismatch but identical data (and identical checksums!) across the three 
> replicas. Have you previously done some kind of scrub repair on the metadata 
> pool?

No, at least not on this pg, I only remember of a repair but it was on a 
different pool.

> Did the PG perhaps get backfilled due to cluster changes?

That might be the case, as we have to reboot the osds sometimes when they 
crash. Also, yesterday we rebooted all of them, but this happens always in 
sequence, one by one, not all at the same time.
Thanks for the help,

   Alessandro

> -Greg
>  
>> 
>> Thanks,
>> 
>> 
>>  Alessandro
>> 
>> 
>> 
>> Il 11/07/18 18:56, John Spray ha scritto:
>> > On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo
>> >  wrote:
>> >> Hi John,
>> >>
>> >> in fact I get an I/O error by hand too:
>> >>
>> >>
>> >> rados get -p cephfs_metadata 200. 200.
>> >> error getting cephfs_metadata/200.0000: (5) Input/output error
>> > Next step would be to go look for corresponding errors on your OSD
>> > logs, system logs, and possibly also check things like the SMART
>> > counters on your hard drives for possible root causes.
>> >
>> > John
>> >
>> >
>> >
>> >>
>> >> Can this be recovered someway?
>> >>
>> >> Thanks,
>> >>
>> >>
>> >>   Alessandro
>> >>
>> >>
>> >> Il 11/07/18 18:33, John Spray ha scritto:
>> >>> On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo
>> >>>  wrote:
>> >>>> Hi,
>> >>>>
>> >>>> after the upgrade to luminous 12.2.6 today, all our MDSes have been
>> >>>> marked as damaged. Trying to restart the instances only result in
>> >>>> standby MDSes. We currently have 2 files

Re: [ceph-users] MDS damaged

2018-07-11 Thread Alessandro De Salvo

OK, I found where the object is:


ceph osd map cephfs_metadata 200.
osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg 
10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23)



So, looking at the osds 23, 35 and 18 logs in fact I see:


osd.23:

2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log 
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 
10:292cf221:::200.:head



osd.35:

2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log 
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 
10:292cf221:::200.:head



osd.18:

2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log 
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 
10:292cf221:::200.:head



So, basically the same error everywhere.

I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may 
help.


No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and 
no disk problems anywhere. No relevant errors in syslogs, the hosts are 
just fine. I cannot exclude an error on the RAID controllers, but 2 of 
the OSDs with 10.14 are on a SAN system and one on a different one, so I 
would tend to exclude they both had (silent) errors at the same time.


Thanks,


    Alessandro



Il 11/07/18 18:56, John Spray ha scritto:

On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo
 wrote:

Hi John,

in fact I get an I/O error by hand too:


rados get -p cephfs_metadata 200. 200.
error getting cephfs_metadata/200.: (5) Input/output error

Next step would be to go look for corresponding errors on your OSD
logs, system logs, and possibly also check things like the SMART
counters on your hard drives for possible root causes.

John





Can this be recovered someway?

Thanks,


  Alessandro


Il 11/07/18 18:33, John Spray ha scritto:

On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo
 wrote:

Hi,

after the upgrade to luminous 12.2.6 today, all our MDSes have been
marked as damaged. Trying to restart the instances only result in
standby MDSes. We currently have 2 filesystems active and 2 MDSes each.

I found the following error messages in the mon:


mds.0 :6800/2412911269 down:damaged
mds.1 :6800/830539001 down:damaged
mds.0 :6800/4080298733 down:damaged


Whenever I try to force the repaired state with ceph mds repaired
: I get something like this in the MDS logs:


2018-07-11 13:20:41.597970 7ff7e010e700  0 mds.1.journaler.mdlog(ro)
error getting journal off disk
2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log
[ERR] : Error recovering journal 0x201: (5) Input/output error

An EIO reading the journal header is pretty scary.  The MDS itself
probably can't tell you much more about this: you need to dig down
into the RADOS layer.  Try reading the 200. object (that
happens to be the rank 0 journal header, every CephFS filesystem
should have one) using the `rados` command line tool.

John




Any attempt of running the journal export results in errors, like this one:


cephfs-journal-tool --rank=cephfs:0 journal export backup.bin
Error ((5) Input/output error)2018-07-11 17:01:30.631571 7f94354fff00 -1
Header 200. is unreadable

2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal not
readable, attempt object-by-object dump with `rados`


Same happens for recover_dentries

cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header
200. is unreadable
Errors:
0

Is there something I could try to do to have the cluster back?

I was able to dump the contents of the metadata pool with rados export
-p cephfs_metadata  and I'm currently trying the procedure
described in
http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery
but I'm not sure if it will work as it's apparently doing nothing at the
moment (maybe it's just very slow).

Any help is appreciated, thanks!


   Alessandro

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS damaged

2018-07-11 Thread Alessandro De Salvo

Hi John,

in fact I get an I/O error by hand too:


rados get -p cephfs_metadata 200. 200.
error getting cephfs_metadata/200.: (5) Input/output error


Can this be recovered someway?

Thanks,


    Alessandro


Il 11/07/18 18:33, John Spray ha scritto:

On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo
 wrote:

Hi,

after the upgrade to luminous 12.2.6 today, all our MDSes have been
marked as damaged. Trying to restart the instances only result in
standby MDSes. We currently have 2 filesystems active and 2 MDSes each.

I found the following error messages in the mon:


mds.0 :6800/2412911269 down:damaged
mds.1 :6800/830539001 down:damaged
mds.0 :6800/4080298733 down:damaged


Whenever I try to force the repaired state with ceph mds repaired
: I get something like this in the MDS logs:


2018-07-11 13:20:41.597970 7ff7e010e700  0 mds.1.journaler.mdlog(ro)
error getting journal off disk
2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log
[ERR] : Error recovering journal 0x201: (5) Input/output error

An EIO reading the journal header is pretty scary.  The MDS itself
probably can't tell you much more about this: you need to dig down
into the RADOS layer.  Try reading the 200. object (that
happens to be the rank 0 journal header, every CephFS filesystem
should have one) using the `rados` command line tool.

John





Any attempt of running the journal export results in errors, like this one:


cephfs-journal-tool --rank=cephfs:0 journal export backup.bin
Error ((5) Input/output error)2018-07-11 17:01:30.631571 7f94354fff00 -1
Header 200. is unreadable

2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal not
readable, attempt object-by-object dump with `rados`


Same happens for recover_dentries

cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header
200. is unreadable
Errors:
0

Is there something I could try to do to have the cluster back?

I was able to dump the contents of the metadata pool with rados export
-p cephfs_metadata  and I'm currently trying the procedure
described in
http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery
but I'm not sure if it will work as it's apparently doing nothing at the
moment (maybe it's just very slow).

Any help is appreciated, thanks!


  Alessandro

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS damaged

2018-07-11 Thread Alessandro De Salvo

Hi Gregory,

thanks for the reply. I have the dump of the metadata pool, but I'm not 
sure what to check there. Is it what you mean?


The cluster was operational until today at noon, when a full restart of 
the daemons was issued, like many other times in the past. I was trying 
to issue the repaired command to get a real error in the logs, but it 
was apparently not the case.


Thanks,


    Alessandro


Il 11/07/18 18:22, Gregory Farnum ha scritto:
Have you checked the actual journal objects as the "journal export" 
suggested? Did you identify any actual source of the damage before 
issuing the "repaired" command?

What is the history of the filesystems on this cluster?

On Wed, Jul 11, 2018 at 8:10 AM Alessandro De Salvo 
<mailto:alessandro.desa...@roma1.infn.it>> wrote:


Hi,

after the upgrade to luminous 12.2.6 today, all our MDSes have been
marked as damaged. Trying to restart the instances only result in
standby MDSes. We currently have 2 filesystems active and 2 MDSes
each.

I found the following error messages in the mon:


mds.0 :6800/2412911269 down:damaged
mds.1 :6800/830539001 down:damaged
mds.0 :6800/4080298733 down:damaged


Whenever I try to force the repaired state with ceph mds repaired
: I get something like this in the MDS logs:


2018-07-11 13:20:41.597970 7ff7e010e700  0 mds.1.journaler.mdlog(ro)
error getting journal off disk
2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log
[ERR] : Error recovering journal 0x201: (5) Input/output error


Any attempt of running the journal export results in errors, like
this one:


cephfs-journal-tool --rank=cephfs:0 journal export backup.bin
Error ((5) Input/output error)2018-07-11 17:01:30.631571
7f94354fff00 -1
Header 200. is unreadable

2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal
not
readable, attempt object-by-object dump with `rados`


Same happens for recover_dentries

cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header
200. is unreadable
Errors:
0

Is there something I could try to do to have the cluster back?

I was able to dump the contents of the metadata pool with rados
export
-p cephfs_metadata  and I'm currently trying the procedure
described in

http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery

but I'm not sure if it will work as it's apparently doing nothing
at the
moment (maybe it's just very slow).

Any help is appreciated, thanks!


 Alessandro

___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MDS damaged

2018-07-11 Thread Alessandro De Salvo

Hi,

after the upgrade to luminous 12.2.6 today, all our MDSes have been 
marked as damaged. Trying to restart the instances only result in 
standby MDSes. We currently have 2 filesystems active and 2 MDSes each.


I found the following error messages in the mon:


mds.0 :6800/2412911269 down:damaged
mds.1 :6800/830539001 down:damaged
mds.0 :6800/4080298733 down:damaged


Whenever I try to force the repaired state with ceph mds repaired 
: I get something like this in the MDS logs:



2018-07-11 13:20:41.597970 7ff7e010e700  0 mds.1.journaler.mdlog(ro) 
error getting journal off disk
2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log 
[ERR] : Error recovering journal 0x201: (5) Input/output error



Any attempt of running the journal export results in errors, like this one:


cephfs-journal-tool --rank=cephfs:0 journal export backup.bin
Error ((5) Input/output error)2018-07-11 17:01:30.631571 7f94354fff00 -1 
Header 200. is unreadable


2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal not 
readable, attempt object-by-object dump with `rados`



Same happens for recover_dentries

cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header 
200. is unreadable

Errors:
0

Is there something I could try to do to have the cluster back?

I was able to dump the contents of the metadata pool with rados export 
-p cephfs_metadata  and I'm currently trying the procedure 
described in 
http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery 
but I'm not sure if it will work as it's apparently doing nothing at the 
moment (maybe it's just very slow).


Any help is appreciated, thanks!


    Alessandro

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating cephfs data pools and/or mounting multiple filesystems belonging to the same cluster

2018-06-14 Thread Alessandro De Salvo

Hi,


Il 14/06/18 06:13, Yan, Zheng ha scritto:

On Wed, Jun 13, 2018 at 9:35 PM Alessandro De Salvo
 wrote:

Hi,


Il 13/06/18 14:40, Yan, Zheng ha scritto:

On Wed, Jun 13, 2018 at 7:06 PM Alessandro De Salvo
 wrote:

Hi,

I'm trying to migrate a cephfs data pool to a different one in order to
reconfigure with new pool parameters. I've found some hints but no
specific documentation to migrate pools.

I'm currently trying with rados export + import, but I get errors like
these:

Write #-9223372036854775808::::11e1007.:head#
omap_set_header failed: (95) Operation not supported

The command I'm using is the following:

rados export -p cephfs_data | rados import -p cephfs_data_new -

So, I have a few questions:


1) would it work to swap the cephfs data pools by renaming them while
the fs cluster is down?

2) how can I copy the old data pool into a new one without errors like
the ones above?


This won't work as you expected.  some cephfs metadata records ID of data pool.

This is was suspecting too, hence the question, so thanks for confirming it.
Basically, once a cephfs filesystem is created the pool and structure
are immutable. This is not good, though.


3) plain copy from a fs to another one would also work, but I didn't
find a way to tell the ceph fuse clients how to mount different
filesystems in the same cluster, any documentation on it?


ceph-fuse /mnt/ceph --client_mds_namespace=cephfs_name

In the meantime I also found the same option for fuse and tried it. It
works with fuse, but it seems it's not possible to export via
nfs-ganesha multiple filesystems.


put client_mds_namespace option to client section of ceph.conf  (the
machine the run ganesha)


Yes, that would work but then I need a (set of) exporter(s) for every 
cephfs filesystem. That sounds reasonable though, as it's the same 
situation as for the mds services.

Thanks for the hint,

    Alessandro





Anyone tried it?



4) even if I found a way to mount via fuse different filesystems
belonging to the same cluster, is this feature stable enough or is it
still super-experimental?


very stable

Very good!

Thanks,


  Alessandro


Thanks,


   Alessandro


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating cephfs data pools and/or mounting multiple filesystems belonging to the same cluster

2018-06-13 Thread Alessandro De Salvo

Hi,


Il 13/06/18 14:40, Yan, Zheng ha scritto:

On Wed, Jun 13, 2018 at 7:06 PM Alessandro De Salvo
 wrote:

Hi,

I'm trying to migrate a cephfs data pool to a different one in order to
reconfigure with new pool parameters. I've found some hints but no
specific documentation to migrate pools.

I'm currently trying with rados export + import, but I get errors like
these:

Write #-9223372036854775808::::11e1007.:head#
omap_set_header failed: (95) Operation not supported

The command I'm using is the following:

   rados export -p cephfs_data | rados import -p cephfs_data_new -

So, I have a few questions:


1) would it work to swap the cephfs data pools by renaming them while
the fs cluster is down?

2) how can I copy the old data pool into a new one without errors like
the ones above?


This won't work as you expected.  some cephfs metadata records ID of data pool.


This is was suspecting too, hence the question, so thanks for confirming it.
Basically, once a cephfs filesystem is created the pool and structure 
are immutable. This is not good, though.





3) plain copy from a fs to another one would also work, but I didn't
find a way to tell the ceph fuse clients how to mount different
filesystems in the same cluster, any documentation on it?


ceph-fuse /mnt/ceph --client_mds_namespace=cephfs_name


In the meantime I also found the same option for fuse and tried it. It 
works with fuse, but it seems it's not possible to export via 
nfs-ganesha multiple filesystems.


Anyone tried it?





4) even if I found a way to mount via fuse different filesystems
belonging to the same cluster, is this feature stable enough or is it
still super-experimental?


very stable


Very good!

Thanks,


    Alessandro




Thanks,


  Alessandro


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Migrating cephfs data pools and/or mounting multiple filesystems belonging to the same cluster

2018-06-13 Thread Alessandro De Salvo

Hi,

I'm trying to migrate a cephfs data pool to a different one in order to 
reconfigure with new pool parameters. I've found some hints but no 
specific documentation to migrate pools.


I'm currently trying with rados export + import, but I get errors like 
these:


Write #-9223372036854775808::::11e1007.:head#
omap_set_header failed: (95) Operation not supported

The command I'm using is the following:

 rados export -p cephfs_data | rados import -p cephfs_data_new -

So, I have a few questions:


1) would it work to swap the cephfs data pools by renaming them while 
the fs cluster is down?


2) how can I copy the old data pool into a new one without errors like 
the ones above?


3) plain copy from a fs to another one would also work, but I didn't 
find a way to tell the ceph fuse clients how to mount different 
filesystems in the same cluster, any documentation on it?


4) even if I found a way to mount via fuse different filesystems 
belonging to the same cluster, is this feature stable enough or is it 
still super-experimental?



Thanks,


    Alessandro


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous 12.2.2 OSDs with Bluestore crashing randomly

2018-01-31 Thread Alessandro De Salvo

Hi Greg,

many thanks. This is a new cluster created initially with luminous 
12.2.0. I'm not sure the instructions on jewel really apply on my case 
too, and all the machines have ntp enabled, but I'll have a look, many 
thanks for the link. All machines are set to CET, although I'm running 
over docker containers which are using UTC internally, but they are all 
consistent.


At the moment, after setting 5 of the osds out the cluster resumed, and 
now I'm recreating those osds to be on the safe side.


Thanks,


    Alessandro


Il 31/01/18 19:26, Gregory Farnum ha scritto:
On Tue, Jan 30, 2018 at 5:49 AM Alessandro De Salvo 
<alessandro.desa...@roma1.infn.it 
<mailto:alessandro.desa...@roma1.infn.it>> wrote:


Hi,

we have several times a day different OSDs running Luminous 12.2.2 and
Bluestore crashing with errors like this:


starting osd.2 at - osd_data /var/lib/ceph/osd/ceph-2
/var/lib/ceph/osd/ceph-2/journal
2018-01-30 13:45:28.440883 7f1e193cbd00 -1 osd.2 107082
log_to_monitors
{default=true}

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc:
In function 'void
PrimaryLogPG::hit_set_trim(PrimaryLogPG::OpContextUPtr&, unsigned
int)'
thread 7f1dfd734700 time 2018-01-30 13:45:29.498133

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc:
12819: FAILED assert(obc)
  ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba)
luminous (stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x110) [0x556c6df51550]
  2:
(PrimaryLogPG::hit_set_trim(std::unique_ptr<PrimaryLogPG::OpContext,
std::default_delete >&, unsigned int)+0x3b6)
[0x556c6db5e106]
  3: (PrimaryLogPG::hit_set_persist()+0xb67) [0x556c6db61fb7]
  4: (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x2389)
[0x556c6db78d39]
  5: (PrimaryLogPG::do_request(boost::intrusive_ptr&,
ThreadPool::TPHandle&)+0xeba) [0x556c6db368aa]
  6: (OSD::dequeue_op(boost::intrusive_ptr,
boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f9)
[0x556c6d9c0899]
  7: (PGQueueable::RunVis::operator()(boost::intrusive_ptr
const&)+0x57) [0x556c6dc38897]
  8: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0xfce) [0x556c6d9ee43e]
  9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
[0x556c6df57069]
  10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
[0x556c6df59000]
  11: (()+0x7e25) [0x7f1e16c17e25]
  12: (clone()+0x6d) [0x7f1e15d0b34d]
  NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.
2018-01-30 13:45:29.505317 7f1dfd734700 -1

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc:
In function 'void
PrimaryLogPG::hit_set_trim(PrimaryLogPG::OpContextUPtr&, unsigned
int)'
thread 7f1dfd734700 time 2018-01-30 13:45:29.498133

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc:
12819: FAILED assert(obc)

  ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba)
luminous (stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x110) [0x556c6df51550]
  2:
(PrimaryLogPG::hit_set_trim(std::unique_ptr<PrimaryLogPG::OpContext,
std::default_delete >&, unsigned int)+0x3b6)
[0x556c6db5e106]
  3: (PrimaryLogPG::hit_set_persist()+0xb67) [0x556c6db61fb7]
  4: (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x2389)
[0x556c6db78d39]
  5: (PrimaryLogPG::do_request(boost::intrusive_ptr&,
ThreadPool::TPHandle&)+0xeba) [0x556c6db368aa]
  6: (OSD::dequeue_op(boost::intrusive_ptr,
boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f9)
[0x556c6d9c0899]
  7: (PGQueueable::RunVis::operator()(boost::intrusive_ptr
const&)+0x57) [0x556c6dc38897]
  8: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0xfce) [0x556c6d9ee43e]
  9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839)
[0x556c6df57069]
  10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
[0x556c6df59000]
  11: (()+0x7e25) [0x7f1e16c17e25]
  12: (clone()+0x6d) [0x7f1e15d0b34d]
  NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.


Is it a known issue? How c

[ceph-users] Luminous 12.2.2 OSDs with Bluestore crashing randomly

2018-01-30 Thread Alessandro De Salvo

Hi,

we have several times a day different OSDs running Luminous 12.2.2 and 
Bluestore crashing with errors like this:



starting osd.2 at - osd_data /var/lib/ceph/osd/ceph-2 
/var/lib/ceph/osd/ceph-2/journal
2018-01-30 13:45:28.440883 7f1e193cbd00 -1 osd.2 107082 log_to_monitors 
{default=true}
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc: 
In function 'void 
PrimaryLogPG::hit_set_trim(PrimaryLogPG::OpContextUPtr&, unsigned int)' 
thread 7f1dfd734700 time 2018-01-30 13:45:29.498133
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc: 
12819: FAILED assert(obc)
 ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) 
luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x110) [0x556c6df51550]
 2: 
(PrimaryLogPG::hit_set_trim(std::unique_ptr&, unsigned int)+0x3b6) 
[0x556c6db5e106]

 3: (PrimaryLogPG::hit_set_persist()+0xb67) [0x556c6db61fb7]
 4: (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x2389) 
[0x556c6db78d39]
 5: (PrimaryLogPG::do_request(boost::intrusive_ptr&, 
ThreadPool::TPHandle&)+0xeba) [0x556c6db368aa]
 6: (OSD::dequeue_op(boost::intrusive_ptr, 
boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f9) 
[0x556c6d9c0899]
 7: (PGQueueable::RunVis::operator()(boost::intrusive_ptr 
const&)+0x57) [0x556c6dc38897]
 8: (OSD::ShardedOpWQ::_process(unsigned int, 
ceph::heartbeat_handle_d*)+0xfce) [0x556c6d9ee43e]
 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) 
[0x556c6df57069]

 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x556c6df59000]
 11: (()+0x7e25) [0x7f1e16c17e25]
 12: (clone()+0x6d) [0x7f1e15d0b34d]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.
2018-01-30 13:45:29.505317 7f1dfd734700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc: 
In function 'void 
PrimaryLogPG::hit_set_trim(PrimaryLogPG::OpContextUPtr&, unsigned int)' 
thread 7f1dfd734700 time 2018-01-30 13:45:29.498133
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/osd/PrimaryLogPG.cc: 
12819: FAILED assert(obc)


 ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) 
luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x110) [0x556c6df51550]
 2: 
(PrimaryLogPG::hit_set_trim(std::unique_ptr&, unsigned int)+0x3b6) 
[0x556c6db5e106]

 3: (PrimaryLogPG::hit_set_persist()+0xb67) [0x556c6db61fb7]
 4: (PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x2389) 
[0x556c6db78d39]
 5: (PrimaryLogPG::do_request(boost::intrusive_ptr&, 
ThreadPool::TPHandle&)+0xeba) [0x556c6db368aa]
 6: (OSD::dequeue_op(boost::intrusive_ptr, 
boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f9) 
[0x556c6d9c0899]
 7: (PGQueueable::RunVis::operator()(boost::intrusive_ptr 
const&)+0x57) [0x556c6dc38897]
 8: (OSD::ShardedOpWQ::_process(unsigned int, 
ceph::heartbeat_handle_d*)+0xfce) [0x556c6d9ee43e]
 9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) 
[0x556c6df57069]

 10: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x556c6df59000]
 11: (()+0x7e25) [0x7f1e16c17e25]
 12: (clone()+0x6d) [0x7f1e15d0b34d]
 NOTE: a copy of the executable, or `objdump -rdS ` is 
needed to interpret this.



Is it a known issue? How can we fix that?

Thanks,


    Alessandro

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs degraded on ceph luminous 12.2.2

2018-01-11 Thread Alessandro De Salvo
Hi,
took quite some time to recover the pgs, and indeed the problem with the
mds instances was due to the activating pgs. Once they were cleared the
fs went back to the original state.
I had to restart a few times some OSds though, in order to get all the
pgs activated, and I didn't hit the limits on the max pgs, but I'm close
to, so I have set them to 300 just to be safe (AFAIK it was the limit
set to prior releases of ceph, not sure why it was lowered to 200 now).
Thanks,

Alessandro

On Tue, 2018-01-09 at 09:01 +0100, Burkhard Linke wrote:
> Hi,
> 
> 
> On 01/08/2018 05:40 PM, Alessandro De Salvo wrote:
> > Thanks Lincoln,
> >
> > indeed, as I said the cluster is recovering, so there are pending ops:
> >
> >
> > pgs: 21.034% pgs not active
> >  1692310/24980804 objects degraded (6.774%)
> >  5612149/24980804 objects misplaced (22.466%)
> >  458 active+clean
> >  329 active+remapped+backfill_wait
> >  159 activating+remapped
> >  100 active+undersized+degraded+remapped+backfill_wait
> >  58  activating+undersized+degraded+remapped
> >  27  activating
> >  22  active+undersized+degraded+remapped+backfilling
> >  6   active+remapped+backfilling
> >  1   active+recovery_wait+degraded
> >
> >
> > If it's just a matter to wait for the system to complete the recovery 
> > it's fine, I'll deal with that, but I was wondendering if there is a 
> > more suble problem here.
> >
> > OK, I'll wait for the recovery to complete and see what happens, thanks.
> 
> The blocked MDS might be caused by the 'activating' PGs. Do you have a 
> warning about too much PGs per OSD? If that is the case, 
> activating/creating/peering/whatever on the affected OSDs is blocked, 
> which leads to blocked requests etc.
> 
> You can resolve this be increasing the number of allowed PGs per OSD 
> ('mon_max_pg_per_osd'). AFAIK it needs to be set for mon, mgr and osd 
> instances. There was also been some discussion about this setting on the 
> mailing list in the last weeks.
> 
> Regards,
> Burkhard
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs degraded on ceph luminous 12.2.2

2018-01-08 Thread Alessandro De Salvo

Thanks Lincoln,

indeed, as I said the cluster is recovering, so there are pending ops:


    pgs: 21.034% pgs not active
 1692310/24980804 objects degraded (6.774%)
 5612149/24980804 objects misplaced (22.466%)
 458 active+clean
 329 active+remapped+backfill_wait
 159 activating+remapped
 100 active+undersized+degraded+remapped+backfill_wait
 58  activating+undersized+degraded+remapped
 27  activating
 22  active+undersized+degraded+remapped+backfilling
 6   active+remapped+backfilling
 1   active+recovery_wait+degraded


If it's just a matter to wait for the system to complete the recovery 
it's fine, I'll deal with that, but I was wondendering if there is a 
more suble problem here.


OK, I'll wait for the recovery to complete and see what happens, thanks.

Cheers,


    Alessandro


Il 08/01/18 17:36, Lincoln Bryant ha scritto:

Hi Alessandro,

What is the state of your PGs? Inactive PGs have blocked CephFS
recovery on our cluster before. I'd try to clear any blocked ops and
see if the MDSes recover.

--Lincoln

On Mon, 2018-01-08 at 17:21 +0100, Alessandro De Salvo wrote:

Hi,

I'm running on ceph luminous 12.2.2 and my cephfs suddenly degraded.

I have 2 active mds instances and 1 standby. All the active
instances
are now in replay state and show the same error in the logs:


 mds1 

2018-01-08 16:04:15.765637 7fc2e92451c0  0 ceph version 12.2.2
(cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable),
process
(unknown), pid 164
starting mds.mds1 at -
2018-01-08 16:04:15.785849 7fc2e92451c0  0 pidfile_write: ignore
empty
--pid-file
2018-01-08 16:04:20.168178 7fc2e1ee1700  1 mds.mds1 handle_mds_map
standby
2018-01-08 16:04:20.278424 7fc2e1ee1700  1 mds.1.20635 handle_mds_map
i
am now mds.1.20635
2018-01-08 16:04:20.278432 7fc2e1ee1700  1 mds.1.20635
handle_mds_map
state change up:boot --> up:replay
2018-01-08 16:04:20.278443 7fc2e1ee1700  1 mds.1.20635 replay_start
2018-01-08 16:04:20.278449 7fc2e1ee1700  1 mds.1.20635  recovery set
is 0
2018-01-08 16:04:20.278458 7fc2e1ee1700  1 mds.1.20635  waiting for
osdmap 21467 (which blacklists prior instance)


 mds2 

2018-01-08 16:04:16.870459 7fd8456201c0  0 ceph version 12.2.2
(cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable),
process
(unknown), pid 295
starting mds.mds2 at -
2018-01-08 16:04:16.881616 7fd8456201c0  0 pidfile_write: ignore
empty
--pid-file
2018-01-08 16:04:21.274543 7fd83e2bc700  1 mds.mds2 handle_mds_map
standby
2018-01-08 16:04:21.314438 7fd83e2bc700  1 mds.0.20637 handle_mds_map
i
am now mds.0.20637
2018-01-08 16:04:21.314459 7fd83e2bc700  1 mds.0.20637
handle_mds_map
state change up:boot --> up:replay
2018-01-08 16:04:21.314479 7fd83e2bc700  1 mds.0.20637 replay_start
2018-01-08 16:04:21.314492 7fd83e2bc700  1 mds.0.20637  recovery set
is 1
2018-01-08 16:04:21.314517 7fd83e2bc700  1 mds.0.20637  waiting for
osdmap 21467 (which blacklists prior instance)
2018-01-08 16:04:21.393307 7fd837aaf700  0 mds.0.cache creating
system
inode with ino:0x100
2018-01-08 16:04:21.397246 7fd837aaf700  0 mds.0.cache creating
system
inode with ino:0x1

The cluster is recovering as we are changing some of the osds, and
there
are a few slow/stuck requests, but I'm not sure if this is the cause,
as
there is apparently no data loss (until now).

How can I force the MDSes to quit the replay state?

Thanks for any help,


  Alessandro


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs degraded on ceph luminous 12.2.2

2018-01-08 Thread Alessandro De Salvo

Hi,

I'm running on ceph luminous 12.2.2 and my cephfs suddenly degraded.

I have 2 active mds instances and 1 standby. All the active instances 
are now in replay state and show the same error in the logs:



 mds1 

2018-01-08 16:04:15.765637 7fc2e92451c0  0 ceph version 12.2.2 
(cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process 
(unknown), pid 164

starting mds.mds1 at -
2018-01-08 16:04:15.785849 7fc2e92451c0  0 pidfile_write: ignore empty 
--pid-file

2018-01-08 16:04:20.168178 7fc2e1ee1700  1 mds.mds1 handle_mds_map standby
2018-01-08 16:04:20.278424 7fc2e1ee1700  1 mds.1.20635 handle_mds_map i 
am now mds.1.20635
2018-01-08 16:04:20.278432 7fc2e1ee1700  1 mds.1.20635 handle_mds_map 
state change up:boot --> up:replay

2018-01-08 16:04:20.278443 7fc2e1ee1700  1 mds.1.20635 replay_start
2018-01-08 16:04:20.278449 7fc2e1ee1700  1 mds.1.20635  recovery set is 0
2018-01-08 16:04:20.278458 7fc2e1ee1700  1 mds.1.20635  waiting for 
osdmap 21467 (which blacklists prior instance)



 mds2 

2018-01-08 16:04:16.870459 7fd8456201c0  0 ceph version 12.2.2 
(cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process 
(unknown), pid 295

starting mds.mds2 at -
2018-01-08 16:04:16.881616 7fd8456201c0  0 pidfile_write: ignore empty 
--pid-file

2018-01-08 16:04:21.274543 7fd83e2bc700  1 mds.mds2 handle_mds_map standby
2018-01-08 16:04:21.314438 7fd83e2bc700  1 mds.0.20637 handle_mds_map i 
am now mds.0.20637
2018-01-08 16:04:21.314459 7fd83e2bc700  1 mds.0.20637 handle_mds_map 
state change up:boot --> up:replay

2018-01-08 16:04:21.314479 7fd83e2bc700  1 mds.0.20637 replay_start
2018-01-08 16:04:21.314492 7fd83e2bc700  1 mds.0.20637  recovery set is 1
2018-01-08 16:04:21.314517 7fd83e2bc700  1 mds.0.20637  waiting for 
osdmap 21467 (which blacklists prior instance)
2018-01-08 16:04:21.393307 7fd837aaf700  0 mds.0.cache creating system 
inode with ino:0x100
2018-01-08 16:04:21.397246 7fd837aaf700  0 mds.0.cache creating system 
inode with ino:0x1


The cluster is recovering as we are changing some of the osds, and there 
are a few slow/stuck requests, but I'm not sure if this is the cause, as 
there is apparently no data loss (until now).


How can I force the MDSes to quit the replay state?

Thanks for any help,


    Alessandro


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-fuse hanging on df with ceph luminous >= 12.1.3

2017-08-21 Thread Alessandro De Salvo

Hi,

when trying to use df on a ceph-fuse mounted cephfs filesystem with ceph 
luminous >= 12.1.3 I'm having hangs with the following kind of messages 
in the logs:



2017-08-22 02:20:51.094704 7f80addb7700  0 client.174216 ms_handle_reset 
on 192.168.0.10:6789/0



The logs are only showing this type of messages and nothing more useful. 
The only possible way to resume the operations is to kill ceph-fuse and 
remount. Only df is hanging though, while file operations, like 
copy/rm/ls are working as expected.


This behavior is only shown for ceph >= 12.1.3, while for example 
ceph-fuse on 12.1.2 works.


Anyone has seen the same problems? Any help is highly appreciated.

Thanks,


 Alessandro

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com