Re: [ceph-users] how to debug (in order to repair) damaged MDS (rank)?

2017-10-10 Thread Daniel Baumann
On 10/10/2017 02:10 PM, John Spray wrote:
> Yes.

worked, rank 6 is back and cephfs up again. thank you very much.

> Do a final ls to make sure you got all of them -- it is
> dangerous to leave any fragments behind.

will do.

> BTW opened http://tracker.ceph.com/issues/21749 for the underlying bug.

thanks; I've saved all the logs, so I'm happy to provide anything you need.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to debug (in order to repair) damaged MDS (rank)?

2017-10-10 Thread John Spray
On Tue, Oct 10, 2017 at 12:30 PM, Daniel Baumann  wrote:
> Hi John,
>
> thank you very much for your help.
>
> On 10/10/2017 12:57 PM, John Spray wrote:
>>  A) Do a "rados -p  ls | grep "^506\." or similar, to
>> get a list of the objects
>
> done, gives me these:
>
>   506.
>   506.0017
>   506.001b
>   506.0019
>   506.001a
>   506.001c
>   506.0018
>   506.0016
>   506.001e
>   506.001f
>   506.001d
>
>>  B) Write a short bash loop to do a "rados -p  get" on
>> each of those objects into a file.
>
> done, saved them as the object name as filename, resulting in these 11
> files:
>
>90 Oct 10 13:17 506.
>  4.0M Oct 10 13:17 506.0016
>  4.0M Oct 10 13:17 506.0017
>  4.0M Oct 10 13:17 506.0018
>  4.0M Oct 10 13:17 506.0019
>  4.0M Oct 10 13:17 506.001a
>  4.0M Oct 10 13:17 506.001b
>  4.0M Oct 10 13:17 506.001c
>  4.0M Oct 10 13:17 506.001d
>  4.0M Oct 10 13:17 506.001e
>  4.0M Oct 10 13:17 506.001f
>
>>  C) Stop the MDS, set "debug mds = 20" and "debug journaler = 20",
>> mark the rank repaired, start the MDS again, and then gather the
>> resulting log (it should end in the same "Error -22 recovering
>> write_pos", but have much much more detail about what came before).
>
> I've attached the entire log from right before issueing "repaired" until
> after the mds drops to standby again.
>
>> Because you've hit a serious bug, it's really important to gather all
>> this and share it, so that we can try to fix it and prevent it
>> happening again to you or others.
>
> absolutely, sure. If you need anything more, I'm happy to share.
>
>> You have two options, depending on how much downtime you can tolerate:
>>  - carefully remove all the metadata objects that start with 506. --
>
> given the outtage (and people need access to their data), I'd go with
> this. Just to be safe: that would go like this?
>
>   rados -p  rm 506.
>   rados -p  rm 506.0016

Yes.  Do a final ls to make sure you got all of them -- it is
dangerous to leave any fragments behind.

BTW opened http://tracker.ceph.com/issues/21749 for the underlying bug.

John


>   [...]
>
> Regards,
> Daniel
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to debug (in order to repair) damaged MDS (rank)?

2017-10-10 Thread Daniel Baumann
Hi John,

thank you very much for your help.

On 10/10/2017 12:57 PM, John Spray wrote:
>  A) Do a "rados -p  ls | grep "^506\." or similar, to
> get a list of the objects

done, gives me these:

  506.
  506.0017
  506.001b
  506.0019
  506.001a
  506.001c
  506.0018
  506.0016
  506.001e
  506.001f
  506.001d

>  B) Write a short bash loop to do a "rados -p  get" on
> each of those objects into a file.

done, saved them as the object name as filename, resulting in these 11
files:

   90 Oct 10 13:17 506.
 4.0M Oct 10 13:17 506.0016
 4.0M Oct 10 13:17 506.0017
 4.0M Oct 10 13:17 506.0018
 4.0M Oct 10 13:17 506.0019
 4.0M Oct 10 13:17 506.001a
 4.0M Oct 10 13:17 506.001b
 4.0M Oct 10 13:17 506.001c
 4.0M Oct 10 13:17 506.001d
 4.0M Oct 10 13:17 506.001e
 4.0M Oct 10 13:17 506.001f

>  C) Stop the MDS, set "debug mds = 20" and "debug journaler = 20",
> mark the rank repaired, start the MDS again, and then gather the
> resulting log (it should end in the same "Error -22 recovering
> write_pos", but have much much more detail about what came before).

I've attached the entire log from right before issueing "repaired" until
after the mds drops to standby again.

> Because you've hit a serious bug, it's really important to gather all
> this and share it, so that we can try to fix it and prevent it
> happening again to you or others.

absolutely, sure. If you need anything more, I'm happy to share.

> You have two options, depending on how much downtime you can tolerate:
>  - carefully remove all the metadata objects that start with 506. --

given the outtage (and people need access to their data), I'd go with
this. Just to be safe: that would go like this?

  rados -p  rm 506.
  rados -p  rm 506.0016
  [...]

Regards,
Daniel
2017-10-10 13:21:55.413752 7f3f3011a700  5 mds.mds9 handle_mds_map epoch 96224 
from mon.0
2017-10-10 13:21:55.413836 7f3f3011a700 10 mds.mds9  my compat 
compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file 
layout v2}
2017-10-10 13:21:55.413847 7f3f3011a700 10 mds.mds9  mdsmap compat 
compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
2017-10-10 13:21:55.413852 7f3f3011a700 10 mds.mds9 map says I am 
147.87.226.189:6800/1634944095 mds.6.96224 state up:replay
2017-10-10 13:21:55.414088 7f3f3011a700  4 mds.6.purge_queue operator():  data 
pool 7 not found in OSDMap
2017-10-10 13:21:55.414141 7f3f3011a700 10 mds.mds9 handle_mds_map: 
initializing MDS rank 6
2017-10-10 13:21:55.414410 7f3f3011a700 10 mds.6.0 update_log_config 
log_to_monitors {default=true}
2017-10-10 13:21:55.414415 7f3f3011a700 10 mds.6.0 create_logger
2017-10-10 13:21:55.414635 7f3f3011a700  7 mds.6.server operator(): full = 0 
epoch = 0
2017-10-10 13:21:55.414644 7f3f3011a700  4 mds.6.purge_queue operator():  data 
pool 7 not found in OSDMap
2017-10-10 13:21:55.414648 7f3f3011a700  4 mds.6.0 handle_osd_map epoch 0, 0 
new blacklist entries
2017-10-10 13:21:55.414660 7f3f3011a700 10 mds.6.server apply_blacklist: killed 0
2017-10-10 13:21:55.414830 7f3f3011a700 10 mds.mds9 handle_mds_map: handling 
map as rank 6
2017-10-10 13:21:55.414839 7f3f3011a700  1 mds.6.96224 handle_mds_map i am now 
mds.6.96224
2017-10-10 13:21:55.414843 7f3f3011a700  1 mds.6.96224 handle_mds_map state 
change up:boot --> up:replay
2017-10-10 13:21:55.414855 7f3f3011a700 10 mds.beacon.mds9 set_want_state: 
up:standby -> up:replay
2017-10-10 13:21:55.414859 7f3f3011a700  1 mds.6.96224 replay_start
2017-10-10 13:21:55.414873 7f3f3011a700  7 mds.6.cache set_recovery_set 
0,1,2,3,4,5,7,8
2017-10-10 13:21:55.414883 7f3f3011a700  1 mds.6.96224  recovery set is 
0,1,2,3,4,5,7,8
2017-10-10 13:21:55.414893 7f3f3011a700  1 mds.6.96224  waiting for osdmap 
18607 (which blacklists prior instance)
2017-10-10 13:21:55.414901 7f3f3011a700  4 mds.6.purge_queue operator():  data 
pool 7 not found in OSDMap
2017-10-10 13:21:55.416011 7f3f3011a700  7 mds.6.server operator(): full = 0 
epoch = 18608
2017-10-10 13:21:55.416024 7f3f3011a700  4 mds.6.96224 handle_osd_map epoch 
18608, 0 new blacklist entries
2017-10-10 13:21:55.416027 7f3f3011a700 10 mds.6.server apply_blacklist: killed 0
2017-10-10 13:21:55.416076 7f3f2a10e700 10 MDSIOContextBase::complete: 
12C_IO_Wrapper
2017-10-10 13:21:55.416095 7f3f2a10e700 10 MDSInternalContextBase::complete: 
15C_MDS_BootStart
2017-10-10 13:21:55.416101 7f3f2a10e700  2 mds.6.96224 boot_start 0: opening 
inotable
2017-10-10 13:21:55.416120 7f3f2a10e700 10 mds.6.inotable: load
2017-10-10 13:21:55.416301 7f3f2a10e700  2 mds.6.96224 boot_start 0: opening 
sessionmap
2017-10-10 13:21:55.416310 

Re: [ceph-users] how to debug (in order to repair) damaged MDS (rank)?

2017-10-10 Thread John Spray
On Tue, Oct 10, 2017 at 10:28 AM, Daniel Baumann  wrote:
> Hi all,
>
> unfortunatly I'm still struggling bringing cephfs back up after one of
> the MDS has been marked "damaged" (see messages from monday).
>
> 1. When I mark the rank as "repaired", this is what I get in the monitor
>log (leaving unrelated leveldb compacting chatter aside):
>
> 2017-10-10 10:51:23.177865 7f3290710700  0 log_channel(audit) log [INF]
> : from='client.? 147.87.226.72:0/1658479115' entity='client.admin' cmd
> ='[{"prefix": "mds repaired", "rank": "6"}]': finished
> 2017-10-10 10:51:23.177993 7f3290710700  0 log_channel(cluster) log
> [DBG] : fsmap cephfs-9/9/9 up  {0=mds1=up:resolve,1=mds2=up:resolve,2=mds3
> =up:resolve,3=mds4=up:resolve,4=mds5=up:resolve,5=mds6=up:resolve,6=mds9=up:replay,7=mds7=up:resolve,8=mds8=up:resolve}
> [...]
>
> 2017-10-10 10:51:23.492040 7f328ab1c700  1 mon.mon1@0(leader).mds e96186
>  mds mds.? 147.87.226.189:6800/524543767 can't write to fsmap compat=
> {},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds uses versi
> oned encoding,6=dirfrag is stored in omap,8=file layout v2}
> [...]
>
> 2017-10-10 10:51:24.291827 7f328d321700 -1 log_channel(cluster) log
> [ERR] : Health check failed: 1 mds daemon damaged (MDS_DAMAGE)
>
> 2. ...and this is what I get on the mds:
>
> 2017-10-10 11:21:26.537204 7fcb01702700 -1 mds.6.journaler.pq(ro)
> _decode error from assimilate_prefetch
> 2017-10-10 11:21:26.537223 7fcb01702700 -1 mds.6.purge_queue _recover:
> Error -22 recovering write_pos

This is probably the root cause: somehow the PurgeQueue (one of the
on-disk metadata structures) has become corrupt.

The purge queue objects for rank 6 will all have names starting "506."
in the metadata pool.

This is probably the result of a bug of some kind, so to give us a
chance of working out what went wrong let's gather some evidence
first:
 A) Do a "rados -p  ls | grep "^506\." or similar, to
get a list of the objects
 B) Write a short bash loop to do a "rados -p  get" on
each of those objects into a file.
 C) Stop the MDS, set "debug mds = 20" and "debug journaler = 20",
mark the rank repaired, start the MDS again, and then gather the
resulting log (it should end in the same "Error -22 recovering
write_pos", but have much much more detail about what came before).

Because you've hit a serious bug, it's really important to gather all
this and share it, so that we can try to fix it and prevent it
happening again to you or others.

Once you've put all that evidence somewhere safe, you can start
intervening to repair it.  The good news is that this is the best part
of your metadata to damage, because all it does is record the list of
deleted files to purge.

You have two options, depending on how much downtime you can tolerate:
 - carefully remove all the metadata objects that start with 506. --
this will cause that MDS rank to completely forget about purging
anything in its queue.  This will leave some orphan data objects in
the data pool that will never get cleaned up (without doing some more
offline repair).
 - inspect the detailed logs from step C of the evidence gathering, to
work out exactly how far the journal loading got before hitting
something corrupt.  Then with some finer-grained editing of the
on-disk objects, we can persuade it to skip over the part that was
damaged.

John

> (see attachment for the full mds log during the "repair" action)
>
>
> I'm really stuck here and would greatly appreciate any help. How can I
> see what is actually going on/the problem? Running ceph-mon/ceph-mds
> with debug levels logs just "damaged" as quoted above, but doesn't tell
> what is wrong or why it's failing.
>
> would going back to single MDS with "ceph fs reset" allow me to access
> the data again?
>
>
> Regards,
> Daniel
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to debug (in order to repair) damaged MDS (rank)?

2017-10-10 Thread Daniel Baumann
Hi all,

unfortunatly I'm still struggling bringing cephfs back up after one of
the MDS has been marked "damaged" (see messages from monday).

1. When I mark the rank as "repaired", this is what I get in the monitor
   log (leaving unrelated leveldb compacting chatter aside):

2017-10-10 10:51:23.177865 7f3290710700  0 log_channel(audit) log [INF]
: from='client.? 147.87.226.72:0/1658479115' entity='client.admin' cmd
='[{"prefix": "mds repaired", "rank": "6"}]': finished
2017-10-10 10:51:23.177993 7f3290710700  0 log_channel(cluster) log
[DBG] : fsmap cephfs-9/9/9 up  {0=mds1=up:resolve,1=mds2=up:resolve,2=mds3
=up:resolve,3=mds4=up:resolve,4=mds5=up:resolve,5=mds6=up:resolve,6=mds9=up:replay,7=mds7=up:resolve,8=mds8=up:resolve}
[...]

2017-10-10 10:51:23.492040 7f328ab1c700  1 mon.mon1@0(leader).mds e96186
 mds mds.? 147.87.226.189:6800/524543767 can't write to fsmap compat=
{},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versi
oned encoding,6=dirfrag is stored in omap,8=file layout v2}
[...]

2017-10-10 10:51:24.291827 7f328d321700 -1 log_channel(cluster) log
[ERR] : Health check failed: 1 mds daemon damaged (MDS_DAMAGE)

2. ...and this is what I get on the mds:

2017-10-10 11:21:26.537204 7fcb01702700 -1 mds.6.journaler.pq(ro)
_decode error from assimilate_prefetch
2017-10-10 11:21:26.537223 7fcb01702700 -1 mds.6.purge_queue _recover:
Error -22 recovering write_pos

(see attachment for the full mds log during the "repair" action)


I'm really stuck here and would greatly appreciate any help. How can I
see what is actually going on/the problem? Running ceph-mon/ceph-mds
with debug levels logs just "damaged" as quoted above, but doesn't tell
what is wrong or why it's failing.

would going back to single MDS with "ceph fs reset" allow me to access
the data again?

Regards,
Daniel
2017-10-10 11:21:26.419394 7fcb0670c700 10 mds.mds9  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=de
fault file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
2017-10-10 11:21:26.419399 7fcb0670c700 10 mds.mds9 map says I am 147.87.226.189:6800/1182896077 mds.6.96195 state up:replay
2017-10-10 11:21:26.419623 7fcb0670c700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-10 11:21:26.419679 7fcb0670c700 10 mds.mds9 handle_mds_map: initializing MDS rank 6
2017-10-10 11:21:26.419916 7fcb0670c700 10 mds.6.0 update_log_config log_to_monitors {default=true}
2017-10-10 11:21:26.419920 7fcb0670c700 10 mds.6.0 create_logger
2017-10-10 11:21:26.420138 7fcb0670c700  7 mds.6.server operator(): full = 0 epoch = 0
2017-10-10 11:21:26.420146 7fcb0670c700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-10 11:21:26.420150 7fcb0670c700  4 mds.6.0 handle_osd_map epoch 0, 0 new blacklist entries
2017-10-10 11:21:26.420159 7fcb0670c700 10 mds.6.server apply_blacklist: killed 0
2017-10-10 11:21:26.420338 7fcb0670c700 10 mds.mds9 handle_mds_map: handling map as rank 6
2017-10-10 11:21:26.420347 7fcb0670c700  1 mds.6.96195 handle_mds_map i am now mds.6.96195
2017-10-10 11:21:26.420351 7fcb0670c700  1 mds.6.96195 handle_mds_map state change up:boot --> up:replay
2017-10-10 11:21:26.420366 7fcb0670c700 10 mds.beacon.mds9 set_want_state: up:standby -> up:replay
2017-10-10 11:21:26.420370 7fcb0670c700  1 mds.6.96195 replay_start
2017-10-10 11:21:26.420375 7fcb0670c700  7 mds.6.cache set_recovery_set 0,1,2,3,4,5,7,8
2017-10-10 11:21:26.420380 7fcb0670c700  1 mds.6.96195  recovery set is 0,1,2,3,4,5,7,8
2017-10-10 11:21:26.420395 7fcb0670c700  1 mds.6.96195  waiting for osdmap 18593 (which blacklists prior instance)
2017-10-10 11:21:26.420401 7fcb0670c700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-10 11:21:26.421206 7fcb0670c700  7 mds.6.server operator(): full = 0 epoch = 18593
2017-10-10 11:21:26.421217 7fcb0670c700  4 mds.6.96195 handle_osd_map epoch 18593, 0 new blacklist entries
2017-10-10 11:21:26.421220 7fcb0670c700 10 mds.6.server apply_blacklist: killed 0
2017-10-10 11:21:26.421253 7fcb00700700 10 MDSIOContextBase::complete: 12C_IO_Wrapper
2017-10-10 11:21:26.421263 7fcb00700700 10 MDSInternalContextBase::complete: 15C_MDS_BootStart
2017-10-10 11:21:26.421267 7fcb00700700  2 mds.6.96195 boot_start 0: opening inotable
2017-10-10 11:21:26.421285 7fcb00700700 10 mds.6.inotable: load
2017-10-10 11:21:26.421441 7fcb00700700  2 mds.6.96195 boot_start 0: opening sessionmap
2017-10-10 11:21:26.421449 7fcb00700700 10 mds.6.sessionmap load
2017-10-10 11:21:26.421551 7fcb00700700  2 mds.6.96195 boot_start 0: opening mds log
2017-10-10 11:21:26.421558 7fcb00700700  5 mds.6.log open discovering log bounds
2017-10-10 11:21:26.421720 7fcaff6fe700 10 mds.6.log _submit_thread start
2017-10-10 11:21:26.423002 7fcb00700700 10 MDSIOContextBase::complete: N12_GLOBAL__N_112C_IO_SM_LoadE