[ceph-users] Mimic 13.2.1 released date?

2018-07-13 Thread Frank Yu
Hi there,

Any plan for the release of 13.2.1?


-- 
Regards
Frank Yu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] IMPORTANT: broken luminous 12.2.6 release in repo, do not upgrade

2018-07-13 Thread Sage Weil
Hi everyone,

tl;dr:  Please avoid the 12.2.6 packages that are currently present on 
download.ceph.com.  We will have a 12.2.7 published ASAP (probably 
Monday).

If you do not use bluestore or erasure-coded pools, none of the issues 
affect you.


Details:

We built 12.2.6 and pushed it to the repos Wednesday, but as that was 
happening realized there was a potentially dangerous regression in 
12.2.5[1] that an upgrade might exacerbate.  While we sorted that issue 
out, several people noticed the updated version in the repo and 
upgraded.  That turned up two other regressions[2][3].  We have fixes for 
those, but are working on an additional fix to make the damage from [3] 
be transparently repaired.


More details:

-- [1] http://tracker.ceph.com/issues/24597 --

This is actually a regression in 12.2.5 that affects erasure-coded pools.  
If there are (1) normal erasure code writes, and simultanously (2) erasure 
code writes that result in rados returning an error (for example, a delete 
of a non-existent object, which commonly happens when rgw is doing garbage 
collection), and (3) OSDs that are somewhat heavily loaded and then 
restart, then the bug might incorrectly roll-forward the in-progress EC 
operations.  When the PG repeers this results in an OSD crash like

src/os/filestore/FileStore.cc: 5524: FAILED assert(0 == "ERROR: source must 
exist")

It seems to affect filestore and busy clusters with this specific 
workload.  The OSDs recover once restarted.  However, it is also unclear 
whether it damages the objects in question.  For this reason, please avoid 
unnecessary OSD restarts if you are running 12.2.5 or 12.2.6.  When we 
release 12.2.7, we will have an upgrade procedure in the release notes 
that quiesces RADOS IO to minimize the probability that this bug will 
affect you.

If you do not have erasure-coded pools, this bug does not affect you.


-- [2] https://tracker.ceph.com/issues/24903 --

ceph-volume has had a bug for a while that leaves the 
/var/lib/ceph/osd/*/block.db or block.wal symlinks for bluestore OSDs 
owned by root:root.  This didn't matter because bluestore was ignoring 
these symlinks and using an internally stored value instead.

Both of these were fixed/changed in 12.2.6.  However, after upgrading and 
restarting, the symlink is still present in the /var/lib/ceph/osd/*/ 
tmpfs and the OSD won't restart.  Rerunning ceph-volume will fix it, as 
will manually chown -h ceph:ceph /var/lib/ceph/osd/*/block*, or a reboot.  

12.2.7 has a packaging fix to fixed this up on upgrade so there is no 
disruption.

If you do not run bluestore, this bug does not affect you.


-- [3] https://tracker.ceph.com/issues/23871 --

We modified the OSD recently to avoid storing full-object CRCs when 
bluestore is in use because those CRCs are redundant.  There was a bug in 
this code that was later fixed in master.  This code was backported to 
luminous, but the follow-on fix was missed.  The result is that a sequence 
of

- running 12.2.5
- deep-scrub (updates stored whole-object crc)
- upgrade to 12.2.6
- writefull to existing (on 12.2.6) fails to clear the whole-object crc
- read of full object -> crc mismatch

which leads to an (incorrect) EIO error.  We have fixed the original 
problem by backporting the missing fix.  However, users who mistakenly 
installed 12.2.6 may have many objects with a mismatched whole-object crc.

We are currently working on a fix to ignore the whole-object CRC if the 
same conditions are met that make us skip them entirely (i.e., running 
bluestore), and to clear/repair them on scrub.  Once this is done, we'll 
push out 12.2.7.

If you do not run bluestore, this bug does no affect you.

We don't have an easy workaround for this one at the moment, 
unfortunately.


Exciting week!  Thanks everyone,
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd prepare issue device-mapper mapping

2018-07-13 Thread Jacob DeGlopper
Also, looking at your ceph-disk list output, the LVM is probably your 
root filesystem and cannot be wiped.  If you'd like the send the output 
of a 'mount' and 'lvs' command, you should be able to to tell.


    -- jacob


On 07/13/2018 03:42 PM, Jacob DeGlopper wrote:
You have LVM data on /dev/sdb already; you will need to remove that 
before you can use ceph-disk on that device.


Use the LVM commands 'lvs','vgs', and 'pvs' to list the logical 
volumes, volume groups, and physical volumes defined.  Once you're 
sure you don't need the data, lvremove, vgremove, and pvremove them, 
then zero the disk using 'dd if=/dev/zero of=/dev/sdb bs=1M 
count=10'.  Note that this command wipes the disk - you must be sure 
that you're wiping the right disk.


    -- jacob


On 07/13/2018 03:26 PM, Satish Patel wrote:

I am installing ceph in my lab box using ceph-ansible, i have two HDD
for OSD and i am getting following error on one of OSD not sure what
is the issue.



[root@ceph-osd-01 ~]# ceph-disk prepare --cluster ceph --bluestore 
/dev/sdb

ceph-disk: Error: Device /dev/sdb1 is in use by a device-mapper
mapping (dm-crypt?): dm-0


[root@ceph-osd-01 ~]# ceph-disk list
/dev/dm-0 other, xfs, mounted on /
/dev/sda :
  /dev/sda1 other, xfs, mounted on /boot
  /dev/sda2 swap, swap
/dev/sdb :
  /dev/sdb1 other, LVM2_member
/dev/sdc :
  /dev/sdc1 ceph data, active, cluster ceph, osd.3, block /dev/sdc2
  /dev/sdc2 ceph block, for /dev/sdc1
/dev/sr0 other, unknown
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd prepare issue device-mapper mapping

2018-07-13 Thread Jacob DeGlopper
You have LVM data on /dev/sdb already; you will need to remove that 
before you can use ceph-disk on that device.


Use the LVM commands 'lvs','vgs', and 'pvs' to list the logical volumes, 
volume groups, and physical volumes defined.  Once you're sure you don't 
need the data, lvremove, vgremove, and pvremove them, then zero the disk 
using 'dd if=/dev/zero of=/dev/sdb bs=1M count=10'.  Note that this 
command wipes the disk - you must be sure that you're wiping the right disk.


    -- jacob


On 07/13/2018 03:26 PM, Satish Patel wrote:

I am installing ceph in my lab box using ceph-ansible, i have two HDD
for OSD and i am getting following error on one of OSD not sure what
is the issue.



[root@ceph-osd-01 ~]# ceph-disk prepare --cluster ceph --bluestore /dev/sdb
ceph-disk: Error: Device /dev/sdb1 is in use by a device-mapper
mapping (dm-crypt?): dm-0


[root@ceph-osd-01 ~]# ceph-disk list
/dev/dm-0 other, xfs, mounted on /
/dev/sda :
  /dev/sda1 other, xfs, mounted on /boot
  /dev/sda2 swap, swap
/dev/sdb :
  /dev/sdb1 other, LVM2_member
/dev/sdc :
  /dev/sdc1 ceph data, active, cluster ceph, osd.3, block /dev/sdc2
  /dev/sdc2 ceph block, for /dev/sdc1
/dev/sr0 other, unknown
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osd prepare issue device-mapper mapping

2018-07-13 Thread Satish Patel
I am installing ceph in my lab box using ceph-ansible, i have two HDD
for OSD and i am getting following error on one of OSD not sure what
is the issue.



[root@ceph-osd-01 ~]# ceph-disk prepare --cluster ceph --bluestore /dev/sdb
ceph-disk: Error: Device /dev/sdb1 is in use by a device-mapper
mapping (dm-crypt?): dm-0


[root@ceph-osd-01 ~]# ceph-disk list
/dev/dm-0 other, xfs, mounted on /
/dev/sda :
 /dev/sda1 other, xfs, mounted on /boot
 /dev/sda2 swap, swap
/dev/sdb :
 /dev/sdb1 other, LVM2_member
/dev/sdc :
 /dev/sdc1 ceph data, active, cluster ceph, osd.3, block /dev/sdc2
 /dev/sdc2 ceph block, for /dev/sdc1
/dev/sr0 other, unknown
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Approaches for migrating to a much newer cluster

2018-07-13 Thread Brady Deetz
Just a thought: have you considered rbd replication?

On Fri, Jul 13, 2018 at 9:30 AM r...@cleansafecloud.com <
r...@cleansafecloud.com> wrote:

>
> Hello folks,
>
> We have an old active Ceph cluster on Firefly (v0.80.9) which we use for
> OpenStack and have multiple live clients. We have been put in a position
> whereby we need to move to a brand new cluster under a new OpenStack
> deployment. The new cluster is on Luminous (v.12.2.5). Now we obviously do
> not want to migrate huge images across in one go if we can avoid it, so our
> current plan is to transfer base images well in advance of the migration,
> and use the rbd export-diff feature to apply incremental updates from that
> point forwards. I wanted to reach out to you experts to see if we are going
> down the right path here, what issues we might encounter, or if there might
> be any better options. Or does this sound like the right approach?
>
> Many thanks,
> Rob
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS damaged

2018-07-13 Thread Alessandro De Salvo

Hi Dan,

you're right, I was following the mimic instructions (which indeed 
worked on my mimic testbed), but luminous is different and I missed the 
additional step.


Works now, thanks!


    Alessandro


Il 13/07/18 17:51, Dan van der Ster ha scritto:

On Fri, Jul 13, 2018 at 4:07 PM Alessandro De Salvo
 wrote:

However, I cannot reduce the number of mdses anymore, I was used to do
that with e.g.:

ceph fs set cephfs max_mds 1

Trying this with 12.2.6 has apparently no effect, I am left with 2
active mdses. Is this another bug?

Are you following this procedure?
http://docs.ceph.com/docs/luminous/cephfs/multimds/#decreasing-the-number-of-ranks
i.e. you need to deactivate after decreasing max_mds.

(Mimic does this automatically, OTOH).

-- dan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS damaged

2018-07-13 Thread Dan van der Ster
On Fri, Jul 13, 2018 at 4:07 PM Alessandro De Salvo
 wrote:
> However, I cannot reduce the number of mdses anymore, I was used to do
> that with e.g.:
>
> ceph fs set cephfs max_mds 1
>
> Trying this with 12.2.6 has apparently no effect, I am left with 2
> active mdses. Is this another bug?

Are you following this procedure?
http://docs.ceph.com/docs/luminous/cephfs/multimds/#decreasing-the-number-of-ranks
i.e. you need to deactivate after decreasing max_mds.

(Mimic does this automatically, OTOH).

-- dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Approaches for migrating to a much newer cluster

2018-07-13 Thread r...@cleansafecloud.com

Hello folks,

We have an old active Ceph cluster on Firefly (v0.80.9) which we use for 
OpenStack and have multiple live clients. We have been put in a position 
whereby we need to move to a brand new cluster under a new OpenStack 
deployment. The new cluster is on Luminous (v.12.2.5). Now we obviously do not 
want to migrate huge images across in one go if we can avoid it, so our current 
plan is to transfer base images well in advance of the migration, and use the 
rbd export-diff feature to apply incremental updates from that point forwards. 
I wanted to reach out to you experts to see if we are going down the right path 
here, what issues we might encounter, or if there might be any better options. 
Or does this sound like the right approach?

Many thanks,
Rob___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS damaged

2018-07-13 Thread Alessandro De Salvo

Thanks all,

100..inode, mds_snaptable and 1..inode were not 
corrupted, so I left them as they were. I have re-injected all the bad 
objects, for all mdses (2 per filesysytem) and all filesystems I had 
(2), and after setiing the mdses as repaired my filesystems are back!


However, I cannot reduce the number of mdses anymore, I was used to do 
that with e.g.:



ceph fs set cephfs max_mds 1


Trying this with 12.2.6 has apparently no effect, I am left with 2 
active mdses. Is this another bug?


Thanks,


    Alessandro



Il 13/07/18 15:54, Yan, Zheng ha scritto:

On Thu, Jul 12, 2018 at 11:39 PM Alessandro De Salvo
 wrote:

Some progress, and more pain...

I was able to recover the 200. using the ceph-objectstore-tool for one 
of the OSDs (all identical copies) but trying to re-inject it just with rados 
put was giving no error while the get was still giving the same I/O error. So 
the solution was to rm the object and the put it again, that worked.

However, after restarting one of the MDSes and seeting it to repaired, I've hit 
another, similar problem:


2018-07-12 17:04:41.999136 7f54c3f4e700 -1 log_channel(cluster) log [ERR] : 
error reading table object 'mds0_inotable' -5 ((5) Input/output error)


Can I safely try to do the same as for object 200.? Should I check 
something before trying it? Again, checking the copies of the object, they have 
identical md5sums on all the replicas.


Yes, It should be safe. you also need to the same for several other
objects. full object list are:

200.
mds0_inotable
100..inode
mds_snaptable
1..inode

The first three objects are per-mds-rank.  Ff you have enabled
multi-active mds, you also need to update objects of other ranks. For
mds.1, object names are 201., mds1_inotable and
101..inode.




Thanks,


 Alessandro


Il 12/07/18 16:46, Alessandro De Salvo ha scritto:

Unfortunately yes, all the OSDs were restarted a few times, but no change.

Thanks,


 Alessandro


Il 12/07/18 15:55, Paul Emmerich ha scritto:

This might seem like a stupid suggestion, but: have you tried to restart the 
OSDs?

I've also encountered some random CRC errors that only showed up when trying to 
read an object,
but not on scrubbing, that magically disappeared after restarting the OSD.

However, in my case it was clearly related to 
https://tracker.ceph.com/issues/22464 which doesn't
seem to be the issue here.

Paul

2018-07-12 13:53 GMT+02:00 Alessandro De Salvo 
:


Il 12/07/18 11:20, Alessandro De Salvo ha scritto:



Il 12/07/18 10:58, Dan van der Ster ha scritto:

On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum  wrote:

On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo 
 wrote:

OK, I found where the object is:


ceph osd map cephfs_metadata 200.
osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg
10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23)


So, looking at the osds 23, 35 and 18 logs in fact I see:


osd.23:

2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
10:292cf221:::200.:head


osd.35:

2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
10:292cf221:::200.:head


osd.18:

2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log
[ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
10:292cf221:::200.:head


So, basically the same error everywhere.

I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may
help.

No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and
no disk problems anywhere. No relevant errors in syslogs, the hosts are
just fine. I cannot exclude an error on the RAID controllers, but 2 of
the OSDs with 10.14 are on a SAN system and one on a different one, so I
would tend to exclude they both had (silent) errors at the same time.


That's fairly distressing. At this point I'd probably try extracting the object 
using ceph-objectstore-tool and seeing if it decodes properly as an mds 
journal. If it does, you might risk just putting it back in place to overwrite 
the crc.


Wouldn't it be easier to scrub repair the PG to fix the crc?


this is what I already instructed the cluster to do, a deep scrub, but I'm not 
sure it could repair in case all replicas are bad, as it seems to be the case.


I finally managed (with the help of Dan), to perform the deep-scrub on pg 
10.14, but the deep scrub did not detect anything wrong. Also trying to repair 
10.14 has no effect.
Still, trying to access the object I get in the OSDs:

2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster) log [ERR] : 
10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 
10:292cf221:::200.:head

Was deep-scrub supposed to detect the wrong crc? If yes, them it sounds like 

Re: [ceph-users] MDS damaged

2018-07-13 Thread Yan, Zheng
On Thu, Jul 12, 2018 at 11:39 PM Alessandro De Salvo
 wrote:
>
> Some progress, and more pain...
>
> I was able to recover the 200. using the ceph-objectstore-tool for 
> one of the OSDs (all identical copies) but trying to re-inject it just with 
> rados put was giving no error while the get was still giving the same I/O 
> error. So the solution was to rm the object and the put it again, that worked.
>
> However, after restarting one of the MDSes and seeting it to repaired, I've 
> hit another, similar problem:
>
>
> 2018-07-12 17:04:41.999136 7f54c3f4e700 -1 log_channel(cluster) log [ERR] : 
> error reading table object 'mds0_inotable' -5 ((5) Input/output error)
>
>
> Can I safely try to do the same as for object 200.? Should I check 
> something before trying it? Again, checking the copies of the object, they 
> have identical md5sums on all the replicas.
>

Yes, It should be safe. you also need to the same for several other
objects. full object list are:

200.
mds0_inotable
100..inode
mds_snaptable
1..inode

The first three objects are per-mds-rank.  Ff you have enabled
multi-active mds, you also need to update objects of other ranks. For
mds.1, object names are 201., mds1_inotable and
101..inode.



> Thanks,
>
>
> Alessandro
>
>
> Il 12/07/18 16:46, Alessandro De Salvo ha scritto:
>
> Unfortunately yes, all the OSDs were restarted a few times, but no change.
>
> Thanks,
>
>
> Alessandro
>
>
> Il 12/07/18 15:55, Paul Emmerich ha scritto:
>
> This might seem like a stupid suggestion, but: have you tried to restart the 
> OSDs?
>
> I've also encountered some random CRC errors that only showed up when trying 
> to read an object,
> but not on scrubbing, that magically disappeared after restarting the OSD.
>
> However, in my case it was clearly related to 
> https://tracker.ceph.com/issues/22464 which doesn't
> seem to be the issue here.
>
> Paul
>
> 2018-07-12 13:53 GMT+02:00 Alessandro De Salvo 
> :
>>
>>
>> Il 12/07/18 11:20, Alessandro De Salvo ha scritto:
>>
>>>
>>>
>>> Il 12/07/18 10:58, Dan van der Ster ha scritto:

 On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum  wrote:
>
> On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo 
>  wrote:
>>
>> OK, I found where the object is:
>>
>>
>> ceph osd map cephfs_metadata 200.
>> osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg
>> 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23)
>>
>>
>> So, looking at the osds 23, 35 and 18 logs in fact I see:
>>
>>
>> osd.23:
>>
>> 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log
>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
>> 10:292cf221:::200.:head
>>
>>
>> osd.35:
>>
>> 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log
>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
>> 10:292cf221:::200.:head
>>
>>
>> osd.18:
>>
>> 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log
>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
>> 10:292cf221:::200.:head
>>
>>
>> So, basically the same error everywhere.
>>
>> I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may
>> help.
>>
>> No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and
>> no disk problems anywhere. No relevant errors in syslogs, the hosts are
>> just fine. I cannot exclude an error on the RAID controllers, but 2 of
>> the OSDs with 10.14 are on a SAN system and one on a different one, so I
>> would tend to exclude they both had (silent) errors at the same time.
>
>
> That's fairly distressing. At this point I'd probably try extracting the 
> object using ceph-objectstore-tool and seeing if it decodes properly as 
> an mds journal. If it does, you might risk just putting it back in place 
> to overwrite the crc.
>
 Wouldn't it be easier to scrub repair the PG to fix the crc?
>>>
>>>
>>> this is what I already instructed the cluster to do, a deep scrub, but I'm 
>>> not sure it could repair in case all replicas are bad, as it seems to be 
>>> the case.
>>
>>
>> I finally managed (with the help of Dan), to perform the deep-scrub on pg 
>> 10.14, but the deep scrub did not detect anything wrong. Also trying to 
>> repair 10.14 has no effect.
>> Still, trying to access the object I get in the OSDs:
>>
>> 2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster) log [ERR] : 
>> 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 
>> 10:292cf221:::200.:head
>>
>> Was deep-scrub supposed to detect the wrong crc? If yes, them it sounds like 
>> a bug.
>> Can I force the repair someway?
>> Thanks,
>>
>>Alessandro
>>
>>>


Re: [ceph-users] MDS damaged

2018-07-13 Thread Adam Tygart
Bluestore.

On Fri, Jul 13, 2018, 05:56 Dan van der Ster  wrote:

> Hi Adam,
>
> Are your osds bluestore or filestore?
>
> -- dan
>
>
> On Fri, Jul 13, 2018 at 7:38 AM Adam Tygart  wrote:
> >
> > I've hit this today with an upgrade to 12.2.6 on my backup cluster.
> > Unfortunately there were issues with the logs (in that the files
> > weren't writable) until after the issue struck.
> >
> > 2018-07-13 00:16:54.437051 7f5a0a672700 -1 log_channel(cluster) log
> > [ERR] : 5.255 full-object read crc 0x4e97b4e != expected 0x6cfe829d on
> > 5:aa448500:::500.:head
> >
> > It is a backup cluster and I can keep it around or blow away the data
> > (in this instance) as needed for testing purposes.
> >
> > --
> > Adam
> >
> > On Thu, Jul 12, 2018 at 10:39 AM, Alessandro De Salvo
> >  wrote:
> > > Some progress, and more pain...
> > >
> > > I was able to recover the 200. using the ceph-objectstore-tool
> for
> > > one of the OSDs (all identical copies) but trying to re-inject it just
> with
> > > rados put was giving no error while the get was still giving the same
> I/O
> > > error. So the solution was to rm the object and the put it again, that
> > > worked.
> > >
> > > However, after restarting one of the MDSes and seeting it to repaired,
> I've
> > > hit another, similar problem:
> > >
> > >
> > > 2018-07-12 17:04:41.999136 7f54c3f4e700 -1 log_channel(cluster) log
> [ERR] :
> > > error reading table object 'mds0_inotable' -5 ((5) Input/output error)
> > >
> > >
> > > Can I safely try to do the same as for object 200.? Should I
> check
> > > something before trying it? Again, checking the copies of the object,
> they
> > > have identical md5sums on all the replicas.
> > >
> > > Thanks,
> > >
> > >
> > > Alessandro
> > >
> > >
> > > Il 12/07/18 16:46, Alessandro De Salvo ha scritto:
> > >
> > > Unfortunately yes, all the OSDs were restarted a few times, but no
> change.
> > >
> > > Thanks,
> > >
> > >
> > > Alessandro
> > >
> > >
> > > Il 12/07/18 15:55, Paul Emmerich ha scritto:
> > >
> > > This might seem like a stupid suggestion, but: have you tried to
> restart the
> > > OSDs?
> > >
> > > I've also encountered some random CRC errors that only showed up when
> trying
> > > to read an object,
> > > but not on scrubbing, that magically disappeared after restarting the
> OSD.
> > >
> > > However, in my case it was clearly related to
> > > https://tracker.ceph.com/issues/22464 which doesn't
> > > seem to be the issue here.
> > >
> > > Paul
> > >
> > > 2018-07-12 13:53 GMT+02:00 Alessandro De Salvo
> > > :
> > >>
> > >>
> > >> Il 12/07/18 11:20, Alessandro De Salvo ha scritto:
> > >>
> > >>>
> > >>>
> > >>> Il 12/07/18 10:58, Dan van der Ster ha scritto:
> > 
> >  On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum  >
> >  wrote:
> > >
> > > On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo
> > >  wrote:
> > >>
> > >> OK, I found where the object is:
> > >>
> > >>
> > >> ceph osd map cephfs_metadata 200.
> > >> osdmap e632418 pool 'cephfs_metadata' (10) object '200.'
> -> pg
> > >> 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18],
> p23)
> > >>
> > >>
> > >> So, looking at the osds 23, 35 and 18 logs in fact I see:
> > >>
> > >>
> > >> osd.23:
> > >>
> > >> 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster)
> log
> > >> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected
> 0x9ef2b41b
> > >> on
> > >> 10:292cf221:::200.:head
> > >>
> > >>
> > >> osd.35:
> > >>
> > >> 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster)
> log
> > >> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected
> 0x9ef2b41b
> > >> on
> > >> 10:292cf221:::200.:head
> > >>
> > >>
> > >> osd.18:
> > >>
> > >> 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster)
> log
> > >> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected
> 0x9ef2b41b
> > >> on
> > >> 10:292cf221:::200.:head
> > >>
> > >>
> > >> So, basically the same error everywhere.
> > >>
> > >> I'm trying to issue a repair of the pg 10.14, but I'm not sure if
> it
> > >> may
> > >> help.
> > >>
> > >> No SMART errors (the fileservers are SANs, in RAID6 + LVM
> volumes),
> > >> and
> > >> no disk problems anywhere. No relevant errors in syslogs, the
> hosts
> > >> are
> > >> just fine. I cannot exclude an error on the RAID controllers, but
> 2 of
> > >> the OSDs with 10.14 are on a SAN system and one on a different
> one, so
> > >> I
> > >> would tend to exclude they both had (silent) errors at the same
> time.
> > >
> > >
> > > That's fairly distressing. At this point I'd probably try
> extracting
> > > the object using ceph-objectstore-tool and seeing if it decodes
> properly as
> > > an mds journal. If it does, you 

[ceph-users] Ceph balancer module algorithm learning

2018-07-13 Thread Hunter zhao
Hi, all:

I am now looking at the mgr balancer module. How do the two algorithms in 
it calculate? I just used ceph, the code reading ability is very poor. Can 
anyone help me explain how to calculate the score of the data 
balance?Especially `def calc_eval()` and `def calc_stats()`.


Best
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrading to 12.2.6 damages cephfs (crc errors)

2018-07-13 Thread Dan van der Ster
The problem seems similar to https://tracker.ceph.com/issues/23871
which was fixed in mimic but not luminous:

fe5038c7f9 osd/PrimaryLogPG: clear data digest on WRITEFULL if skip_data_digest

.. dan
On Fri, Jul 13, 2018 at 12:45 PM Dan van der Ster  wrote:
>
> Hi,
>
> Following the reports on ceph-users about damaged cephfs after
> updating to 12.2.6 I spun up a 1 node cluster to try the upgrade.
> I started with two OSDs on 12.2.5, wrote some data.
> Then I restarted the OSDs one by one while continuing to write to the
> cephfs mountpoint.
> Then I restarted the (single) MDS, and it is indeed damaged with a crc error:
>
> 2018-07-13 12:38:55.261379 osd.1 osd.1 137.138.62.86:6805/35320 2 :
> cluster [ERR] 2.15 full-object read crc 0xed77af7c != expected
> 0x1a1d319d on 2:aa448500:::500.:head
> 2018-07-13 12:38:55.285994 osd.0 osd.0 137.138.62.86:6801/34755 2 :
> cluster [ERR] 2.13 full-object read crc 0xa73a97ef != expected
> 0x3e6fdb4a on 2:c91d4a1d:::mds0_inotable:head
>
> I think it goes without saying that nobody should upgrade a cephfs to
> 12.2.6 until this is understood.
>
> -- Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds daemon damaged

2018-07-13 Thread Dan van der Ster
Hi Kevin,

Are your OSDs bluestore or filestore?

-- dan

On Thu, Jul 12, 2018 at 11:30 PM Kevin  wrote:
>
> Sorry for the long posting but trying to cover everything
>
> I woke up to find my cephfs filesystem down. This was in the logs
>
> 2018-07-11 05:54:10.398171 osd.1 [ERR] 2.4 full-object read crc
> 0x6fc2f65a != expected 0x1c08241c on 2:292cf221:::200.:head
>
> I had one standby MDS, but as far as I can tell it did not fail over.
> This was in the logs
>
> (insufficient standby MDS daemons available)
>
> Currently my ceph looks like this
>cluster:
>  id: ..
>  health: HEALTH_ERR
>  1 filesystem is degraded
>  1 mds daemon damaged
>
>services:
>  mon: 6 daemons, quorum ds26,ds27,ds2b,ds2a,ds28,ds29
>  mgr: ids27(active)
>  mds: test-cephfs-1-0/1/1 up , 3 up:standby, 1 damaged
>  osd: 5 osds: 5 up, 5 in
>
>data:
>  pools:   3 pools, 202 pgs
>  objects: 1013k objects, 4018 GB
>  usage:   12085 GB used, 6544 GB / 18630 GB avail
>  pgs: 201 active+clean
>   1   active+clean+scrubbing+deep
>
>io:
>  client:   0 B/s rd, 0 op/s rd, 0 op/s wr
>
> I started trying to get the damaged MDS back online
>
> Based on this page
> http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#disaster-recovery-experts
>
> # cephfs-journal-tool journal export backup.bin
> 2018-07-12 13:35:15.675964 7f3e1389bf00 -1 Header 200. is
> unreadable
> 2018-07-12 13:35:15.675977 7f3e1389bf00 -1 journal_export: Journal not
> readable, attempt object-by-object dump with `rados`
> Error ((5) Input/output error)
>
> # cephfs-journal-tool event recover_dentries summary
> Events by type:
> 2018-07-12 13:36:03.000590 7fc398a18f00 -1 Header 200. is
> unreadableErrors: 0
>
> cephfs-journal-tool journal reset - (I think this command might have
> worked)
>
> Next up, tried to reset the filesystem
>
> ceph fs reset test-cephfs-1 --yes-i-really-mean-it
>
> Each time same errors
>
> 2018-07-12 11:56:35.760449 mon.ds26 [INF] Health check cleared:
> MDS_DAMAGE (was: 1 mds daemon damaged)
> 2018-07-12 11:56:35.856737 mon.ds26 [INF] Standby daemon mds.ds27
> assigned to filesystem test-cephfs-1 as rank 0
> 2018-07-12 11:56:35.947801 mds.ds27 [ERR] Error recovering journal
> 0x200: (5) Input/output error
> 2018-07-12 11:56:36.900807 mon.ds26 [ERR] Health check failed: 1 mds
> daemon damaged (MDS_DAMAGE)
> 2018-07-12 11:56:35.945544 osd.0 [ERR] 2.4 full-object read crc
> 0x6fc2f65a != expected 0x1c08241c on 2:292cf221:::200.:head
> 2018-07-12 12:00:00.000142 mon.ds26 [ERR] overall HEALTH_ERR 1
> filesystem is degraded; 1 mds daemon damaged
>
> Tried to 'fail' mds.ds27
> # ceph mds fail ds27
> # failed mds gid 1929168
>
> Command worked, but each time I run the reset command the same errors
> above appear
>
> Online searches say the object read error has to be removed. But there's
> no object listed. This web page is the closest to the issue
> http://tracker.ceph.com/issues/20863
>
> Recommends fixing error by hand. Tried running deep scrub on pg 2.4, it
> completes but still have the same issue above
>
> Final option is to attempt removing mds.ds27. If mds.ds29 was a standby
> and has data it should become live. If it was not
> I assume we will lose the filesystem at this point
>
> Why didn't the standby MDS failover?
>
> Just looking for any way to recover the cephfs, thanks!
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS damaged

2018-07-13 Thread Dan van der Ster
Hi Adam,

Are your osds bluestore or filestore?

-- dan


On Fri, Jul 13, 2018 at 7:38 AM Adam Tygart  wrote:
>
> I've hit this today with an upgrade to 12.2.6 on my backup cluster.
> Unfortunately there were issues with the logs (in that the files
> weren't writable) until after the issue struck.
>
> 2018-07-13 00:16:54.437051 7f5a0a672700 -1 log_channel(cluster) log
> [ERR] : 5.255 full-object read crc 0x4e97b4e != expected 0x6cfe829d on
> 5:aa448500:::500.:head
>
> It is a backup cluster and I can keep it around or blow away the data
> (in this instance) as needed for testing purposes.
>
> --
> Adam
>
> On Thu, Jul 12, 2018 at 10:39 AM, Alessandro De Salvo
>  wrote:
> > Some progress, and more pain...
> >
> > I was able to recover the 200. using the ceph-objectstore-tool for
> > one of the OSDs (all identical copies) but trying to re-inject it just with
> > rados put was giving no error while the get was still giving the same I/O
> > error. So the solution was to rm the object and the put it again, that
> > worked.
> >
> > However, after restarting one of the MDSes and seeting it to repaired, I've
> > hit another, similar problem:
> >
> >
> > 2018-07-12 17:04:41.999136 7f54c3f4e700 -1 log_channel(cluster) log [ERR] :
> > error reading table object 'mds0_inotable' -5 ((5) Input/output error)
> >
> >
> > Can I safely try to do the same as for object 200.? Should I check
> > something before trying it? Again, checking the copies of the object, they
> > have identical md5sums on all the replicas.
> >
> > Thanks,
> >
> >
> > Alessandro
> >
> >
> > Il 12/07/18 16:46, Alessandro De Salvo ha scritto:
> >
> > Unfortunately yes, all the OSDs were restarted a few times, but no change.
> >
> > Thanks,
> >
> >
> > Alessandro
> >
> >
> > Il 12/07/18 15:55, Paul Emmerich ha scritto:
> >
> > This might seem like a stupid suggestion, but: have you tried to restart the
> > OSDs?
> >
> > I've also encountered some random CRC errors that only showed up when trying
> > to read an object,
> > but not on scrubbing, that magically disappeared after restarting the OSD.
> >
> > However, in my case it was clearly related to
> > https://tracker.ceph.com/issues/22464 which doesn't
> > seem to be the issue here.
> >
> > Paul
> >
> > 2018-07-12 13:53 GMT+02:00 Alessandro De Salvo
> > :
> >>
> >>
> >> Il 12/07/18 11:20, Alessandro De Salvo ha scritto:
> >>
> >>>
> >>>
> >>> Il 12/07/18 10:58, Dan van der Ster ha scritto:
> 
>  On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum 
>  wrote:
> >
> > On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo
> >  wrote:
> >>
> >> OK, I found where the object is:
> >>
> >>
> >> ceph osd map cephfs_metadata 200.
> >> osdmap e632418 pool 'cephfs_metadata' (10) object '200.' -> pg
> >> 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23)
> >>
> >>
> >> So, looking at the osds 23, 35 and 18 logs in fact I see:
> >>
> >>
> >> osd.23:
> >>
> >> 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log
> >> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b
> >> on
> >> 10:292cf221:::200.:head
> >>
> >>
> >> osd.35:
> >>
> >> 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log
> >> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b
> >> on
> >> 10:292cf221:::200.:head
> >>
> >>
> >> osd.18:
> >>
> >> 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log
> >> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b
> >> on
> >> 10:292cf221:::200.:head
> >>
> >>
> >> So, basically the same error everywhere.
> >>
> >> I'm trying to issue a repair of the pg 10.14, but I'm not sure if it
> >> may
> >> help.
> >>
> >> No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes),
> >> and
> >> no disk problems anywhere. No relevant errors in syslogs, the hosts
> >> are
> >> just fine. I cannot exclude an error on the RAID controllers, but 2 of
> >> the OSDs with 10.14 are on a SAN system and one on a different one, so
> >> I
> >> would tend to exclude they both had (silent) errors at the same time.
> >
> >
> > That's fairly distressing. At this point I'd probably try extracting
> > the object using ceph-objectstore-tool and seeing if it decodes 
> > properly as
> > an mds journal. If it does, you might risk just putting it back in 
> > place to
> > overwrite the crc.
> >
>  Wouldn't it be easier to scrub repair the PG to fix the crc?
> >>>
> >>>
> >>> this is what I already instructed the cluster to do, a deep scrub, but
> >>> I'm not sure it could repair in case all replicas are bad, as it seems to 
> >>> be
> >>> the case.
> >>
> >>
> >> I finally managed (with the help of 

Re: [ceph-users] Bluestore and number of devices

2018-07-13 Thread Kevin Olbrich
You can keep the same layout as before. Most place DB/WAL combined in one
partition (similar to the journal on filestore).

Kevin

2018-07-13 12:37 GMT+02:00 Robert Stanford :

>
>  I'm using filestore now, with 4 data devices per journal device.
>
>  I'm confused by this: "BlueStore manages either one, two, or (in certain
> cases) three storage devices."
> (http://docs.ceph.com/docs/luminous/rados/configuration/
> bluestore-config-ref/)
>
>  When I convert my journals to bluestore, will they still be four data
> devices (osds) per journal, or will they each require a dedicated journal
> drive now?
>
>  Regards
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] upgrading to 12.2.6 damages cephfs (crc errors)

2018-07-13 Thread Dan van der Ster
Hi,

Following the reports on ceph-users about damaged cephfs after
updating to 12.2.6 I spun up a 1 node cluster to try the upgrade.
I started with two OSDs on 12.2.5, wrote some data.
Then I restarted the OSDs one by one while continuing to write to the
cephfs mountpoint.
Then I restarted the (single) MDS, and it is indeed damaged with a crc error:

2018-07-13 12:38:55.261379 osd.1 osd.1 137.138.62.86:6805/35320 2 :
cluster [ERR] 2.15 full-object read crc 0xed77af7c != expected
0x1a1d319d on 2:aa448500:::500.:head
2018-07-13 12:38:55.285994 osd.0 osd.0 137.138.62.86:6801/34755 2 :
cluster [ERR] 2.13 full-object read crc 0xa73a97ef != expected
0x3e6fdb4a on 2:c91d4a1d:::mds0_inotable:head

I think it goes without saying that nobody should upgrade a cephfs to
12.2.6 until this is understood.

-- Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore and number of devices

2018-07-13 Thread Robert Stanford
 I'm using filestore now, with 4 data devices per journal device.

 I'm confused by this: "BlueStore manages either one, two, or (in certain
cases) three storage devices."
(
http://docs.ceph.com/docs/luminous/rados/configuration/bluestore-config-ref/
)

 When I convert my journals to bluestore, will they still be four data
devices (osds) per journal, or will they each require a dedicated journal
drive now?

 Regards
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD tuning no longer required?

2018-07-13 Thread Robert Stanford
 This is what leads me to believe it's other settings being referred to as
well:
https://ceph.com/community/new-luminous-rados-improvements/

*"There are dozens of documents floating around with long lists of Ceph
configurables that have been tuned for optimal performance on specific
hardware or for specific workloads.  In most cases these ceph.conf
fragments tend to induce funny looks on developers’ faces because the
settings being adjusted seem counter-intuitive, unrelated to the
performance of the system, and/or outright dangerous.  Our goal is to make
Ceph work as well as we can out of the box without requiring any tuning at
all, so we are always striving to choose sane defaults.  And generally, we
discourage tuning by users. "*

To me it's not just bluestore settings / sdd vs. hdd they're talking about
("dozens of documents floating around"... "our goal... without any tuning
at all".  Am I off base?

 Regards

On Thu, Jul 12, 2018 at 9:12 PM, Konstantin Shalygin  wrote:

>   I saw this in the Luminous release notes:
>>
>>   "Each OSD now adjusts its default configuration based on whether the
>> backing device is an HDD or SSD. Manual tuning generally not required"
>>
>>   Which tuning in particular?  The ones in my configuration are
>> osd_op_threads, osd_disk_threads, osd_recovery_max_active,
>> osd_op_thread_suicide_timeout, and osd_crush_chooseleaf_type, among
>> others.  Can I rip these out when I upgrade to
>> Luminous?
>>
>
> This mean that some "bluestore_*" settings tuned for nvme/hdd separately.
>
> Also with Luminous we have:
>
> osd_op_num_shards_(ssd|hdd)
>
> osd_op_num_threads_per_shard_(ssd|hdd)
>
> osd_recovery_sleep_(ssd|hdd)
>
>
>
>
> k
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [Ceph Admin & Monitoring] Inkscope is back

2018-07-13 Thread ghislain.chevalier
Hi,

Inkscope, a ceph admin and monitoring GUI, is still alive.
It can be now installed with an ansible playbook.
https://github.com/inkscope/inkscope-ansible

Best regards
- - - - - - - - - - - - - - - - - 
Ghislain Chevalier 
ORANGE/IMT/OLS/DIESE/LCP/DDSD
Software-Defined Storage Architect
 +33299124432
 +33788624370


_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increase queue_depth in KVM

2018-07-13 Thread Damian Dabrowski
Konstantin, Thanks for explanation. But unfortunately, upgrading qemu is
nearly impossible in my case.

So is there something else I can do, or I have to agree with fact that
write IOPS had to be 8x smaller inside KVM rather than outside KVM? :|

pt., 13 lip 2018 o 04:22 Konstantin Shalygin  napisał(a):

> > I've seen some people using 'num_queues' but I don't have this parameter
> > in my schemas(libvirt version = 1.3.1, qemu version = 2.5.0
>
>
> num-queues is available from qemu 2.7 [1]
>
>
> [1] https://wiki.qemu.org/ChangeLog/2.7
>
>
>
>
> k
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds daemon damaged

2018-07-13 Thread Oliver Freyermuth
Hi Kevin,

Am 13.07.2018 um 04:21 schrieb Kevin:
> That thread looks exactly like what I'm experiencing. Not sure why my 
> repeated googles didn't find it!

maybe the thread was still too "fresh" for Google's indexing. 

> 
> I'm running 12.2.6 and CentOS 7
> 
> And yes, I recently upgraded from jewel to luminous following the 
> instructions of changing the repo and then updating. Everything has been 
> working fine up until this point
> 
> Given that previous thread I feel at a bit of a loss as to what to try now 
> since that thread ended with no resolution I could see.

I hope the thread is still continuing, given that another affected person just 
commented on it. 
We also planned to upgrade our production cluster to 12.2.6 (also on CentOS 7) 
in the weekend since we are affected by two Ceph-fuse bugs 
causing inconsistency of directory contents since months which have been fixed 
in 12.2.6, 
but given this situation, we'll rather live with that a bit longer and hold off 
on the update... 

> 
> Thanks for pointing that out though, it seems like almost the exact same 
> situation
> 
> On 2018-07-12 18:23, Oliver Freyermuth wrote:
>> Hi,
>>
>> all this sounds an awful lot like:
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-July/027992.html
>> In htat case, things started with an update to 12.2.6. Which version
>> are you running?
>>
>> Cheers,
>> Oliver
>>
>> Am 12.07.2018 um 23:30 schrieb Kevin:
>>> Sorry for the long posting but trying to cover everything
>>>
>>> I woke up to find my cephfs filesystem down. This was in the logs
>>>
>>> 2018-07-11 05:54:10.398171 osd.1 [ERR] 2.4 full-object read crc 0x6fc2f65a 
>>> != expected 0x1c08241c on 2:292cf221:::200.:head
>>>
>>> I had one standby MDS, but as far as I can tell it did not fail over. This 
>>> was in the logs
>>>
>>> (insufficient standby MDS daemons available)
>>>
>>> Currently my ceph looks like this
>>>   cluster:
>>>     id: ..
>>>     health: HEALTH_ERR
>>>     1 filesystem is degraded
>>>     1 mds daemon damaged
>>>
>>>   services:
>>>     mon: 6 daemons, quorum ds26,ds27,ds2b,ds2a,ds28,ds29
>>>     mgr: ids27(active)
>>>     mds: test-cephfs-1-0/1/1 up , 3 up:standby, 1 damaged
>>>     osd: 5 osds: 5 up, 5 in
>>>
>>>   data:
>>>     pools:   3 pools, 202 pgs
>>>     objects: 1013k objects, 4018 GB
>>>     usage:   12085 GB used, 6544 GB / 18630 GB avail
>>>     pgs: 201 active+clean
>>>  1   active+clean+scrubbing+deep
>>>
>>>   io:
>>>     client:   0 B/s rd, 0 op/s rd, 0 op/s wr
>>>
>>> I started trying to get the damaged MDS back online
>>>
>>> Based on this page 
>>> http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#disaster-recovery-experts
>>>
>>> # cephfs-journal-tool journal export backup.bin
>>> 2018-07-12 13:35:15.675964 7f3e1389bf00 -1 Header 200. is unreadable
>>> 2018-07-12 13:35:15.675977 7f3e1389bf00 -1 journal_export: Journal not 
>>> readable, attempt object-by-object dump with `rados`
>>> Error ((5) Input/output error)
>>>
>>> # cephfs-journal-tool event recover_dentries summary
>>> Events by type:
>>> 2018-07-12 13:36:03.000590 7fc398a18f00 -1 Header 200. is 
>>> unreadableErrors: 0
>>>
>>> cephfs-journal-tool journal reset - (I think this command might have worked)
>>>
>>> Next up, tried to reset the filesystem
>>>
>>> ceph fs reset test-cephfs-1 --yes-i-really-mean-it
>>>
>>> Each time same errors
>>>
>>> 2018-07-12 11:56:35.760449 mon.ds26 [INF] Health check cleared: MDS_DAMAGE 
>>> (was: 1 mds daemon damaged)
>>> 2018-07-12 11:56:35.856737 mon.ds26 [INF] Standby daemon mds.ds27 assigned 
>>> to filesystem test-cephfs-1 as rank 0
>>> 2018-07-12 11:56:35.947801 mds.ds27 [ERR] Error recovering journal 0x200: 
>>> (5) Input/output error
>>> 2018-07-12 11:56:36.900807 mon.ds26 [ERR] Health check failed: 1 mds daemon 
>>> damaged (MDS_DAMAGE)
>>> 2018-07-12 11:56:35.945544 osd.0 [ERR] 2.4 full-object read crc 0x6fc2f65a 
>>> != expected 0x1c08241c on 2:292cf221:::200.:head
>>> 2018-07-12 12:00:00.000142 mon.ds26 [ERR] overall HEALTH_ERR 1 filesystem 
>>> is degraded; 1 mds daemon damaged
>>>
>>> Tried to 'fail' mds.ds27
>>> # ceph mds fail ds27
>>> # failed mds gid 1929168
>>>
>>> Command worked, but each time I run the reset command the same errors above 
>>> appear
>>>
>>> Online searches say the object read error has to be removed. But there's no 
>>> object listed. This web page is the closest to the issue
>>> http://tracker.ceph.com/issues/20863
>>>
>>> Recommends fixing error by hand. Tried running deep scrub on pg 2.4, it 
>>> completes but still have the same issue above
>>>
>>> Final option is to attempt removing mds.ds27. If mds.ds29 was a standby and 
>>> has data it should become live. If it was not
>>> I assume we will lose the filesystem at this point
>>>
>>> Why didn't the standby MDS failover?
>>>
>>> Just looking for any way to recover the cephfs, thanks!
>>>
>>>