Re: [ceph-users] To backport or not to backport

2019-07-04 Thread Daniel Baumann
Hi,

On 7/4/19 3:00 PM, Stefan Kooman wrote:
> - Only backport fixes that do not introduce new functionality, but addresses
>   (impaired) functionality already present in the release.

ack, and also my full agrement/support for everything else you wrote,
thanks.

reading in the changelogs about backported features (in particular the
one release where bluestor was backported to) left me quite scared for
upgrading our cluster.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Debian Buster builds

2019-06-18 Thread Daniel Baumann
On 6/18/19 3:39 PM, Paul Emmerich wrote:
> we maintain (unofficial) Nautilus builds for Buster here:
> https://mirror.croit.io/debian-nautilus/

the repository doesn't contain the source packages. just out of
curiosity to see what you might have changes, apart from just
(re)building the packages.. are they available somewhere?

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Debian Buster builds

2019-06-18 Thread Daniel Baumann
On 6/18/19 3:11 PM, Tobias Gall wrote:
> I would like to switch to debian buster and test the upgrade from
> luminous but there are currently no ceph releases/builds for buster.

shameless plug:

we're re-building ceph packages in our repository that we do for our
university (and a few other users; hence the neutral project name).

if you feel comfortable adding a third-party repo, you can use:

# backports on top of buster for packages that are not in debian
deb https://cdn.deb.progress-linux.org/packages
engywuck-backports-extras main contrib non-free

(trust path to the archive signing keys can be established via the
progress-linux package in debian)

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-06-17 Thread Daniel Baumann
Hi,

I didn't bother to create a twitter account just to be able to
participate in the poll.. so.. please count me in for October.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-06-06 Thread Daniel Baumann
On 6/6/19 9:26 AM, Xiaoxi Chen wrote:
> I will vote for November for several reasons:

[...]

as an academic institution we're aligned by August to July (school year)
instead of the January to December (calendar year), so all your reasons
(thanks!) are valid for us.. just shifted by 6 months, hence Q1 is ideal
for us.

however, given that academic institutions are the minority, I'm
convinced now that November is the better choice for everyone.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Changing the release cadence

2019-06-05 Thread Daniel Baumann
On 6/5/19 5:57 PM, Sage Weil wrote:
> So far the balance of opinion seems to favor a shift to a 12 month 
> cycle [...] it seems pretty likely we'll make that shift.

thanks, much appreciated (from an cluster operating point of view).

> Thoughts?

GNOME and a few others are doing April and October releases which seems
balanced and to be good timing for most people; personally I prefer
spring rather than autum for upgrades, hence.. would suggest April.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic 13.2.3?

2019-01-04 Thread Daniel Baumann
On 01/04/2019 07:32 PM, Peter Woodman wrote:
> not to mention that the current released version of mimic (.2) has a
> bug that is potentially catastrophic to cephfs, known about for
> months, yet it's not in the release notes. would have upgraded and
> destroyed data had i not caught a thread on this list.

indeed.

we're a big cephfs user here for HPC. everytime I get asked about it by
my peers, sadly I have to tell them that they should not use it for
production, that it's not stable and has serious stability bugs
(eventhough it was declared "stable" upstream some time ago).

(e.g. doing an rsync on, from or to a cephfs, just like someone wrote a
couple of days again on the list, reliably kills it, everytime - we
reproduce it with every kernel release and every ceph release since
february 2015 on several independent clusters. even more catastropic is
that single inconsistent files stopps the whole cephfs which then cannot
be restored unless the affected cephfs is unmounted on all(!) machines
that have it mounted, etc.

we can use cephfs only in our sort-of-stable setup with 12.2.5 because
we have mostly non-malicious users that usualy behave nicely. but it's
to brittle in the end and apparently no silver lining ahead. because of
that, during our scaling up of our cephfs cluster from 300tb to 1.2pb
this spring, we'll be moving away from cephfs entirely and switch to
mounting RBDs and export them with samba instead.

we have good experiences with RBDs on other clusters. but using RBDs
that way is quite painful when knowing that cephfs exists, it's slower,
and not really HA anymore, but it's overall more reliable than cephfs)

as much as I like ceph, I unfortunatly can't say the same for cephfs :(

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic 13.2.3?

2019-01-04 Thread Daniel Baumann
On 01/04/2019 05:07 PM, Matthew Vernon wrote:
> how is it still the case that packages are being pushed onto the official 
> ceph.com repos that people
> shouldn't install?

We're still on 12.2.5 because of this. Basically every 12.2.x after that
had notes on the mailinglist like "don't use, wait for ..."

I don't dare updating to 13.2.

For the 10.2.x and 11.2.x cycles, we upgraded our production cluster
within a matter of days after the release of an update. Since the second
half of the 12.2.x releases, this seems to be not possible anymore.

Ceph is great and all, but this decrease of release quality seriously
harms the image and perception of Ceph as a stable software platform in
the enterprise environment and makes people do the wrong things (rotting
systems update-wise, for the sake of stability).

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem with CephFS

2018-11-21 Thread Daniel Baumann
Hi,

On 11/21/2018 07:04 PM, Rodrigo Embeita wrote:
>             Reduced data availability: 7 pgs inactive, 7 pgs down

this is your first problem: unless you have all data available again,
cephfs will not be back.

after that, I would take care about the redundancy next, and get the one
missing monitor back online.

once that is done, get the mds working again and your cephfs should be
back in service.

if you encounter problems with any of the steps, send all the necessary
commands and outputs to the list and I (or others) can try to help.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mimic and Debian 9

2018-10-17 Thread Daniel Baumann
Hi,

On 10/17/2018 04:04 PM, John Spray wrote:
> If there isn't anything
> too hacky involved in the build perhaps your packages could simply be
> the official ones?

being a Debian Developer, I can upload my backports that I maintain/use
at work to e.g. people.debian.org/~daniel or so. Given time constrains,
I can't do it right now.. but until end of month.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ls operation is too slow in cephfs

2018-07-17 Thread Daniel Baumann
On 07/17/2018 11:43 AM, Marc Roos wrote:
> I had similar thing with doing the ls. Increasing the cache limit helped 
> with our test cluster

same here; additionally we also had to use more than one MDS to get good
performance (currently 3 MDS plus 2 stand-by per FS).

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fuse vs kernel client

2018-07-09 Thread Daniel Baumann
On 07/09/2018 10:18 AM, Manuel Sopena Ballesteros wrote:
> FUSE is supposed to run slower.

in our tests with ceph 11.2.x and 12.2.x clusters, cephfs-fuse is always
around 10 times slower than kernel cephfs.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] samba gateway experiences with cephfs ?

2018-05-24 Thread Daniel Baumann
Hi,

On 05/24/2018 02:53 PM, David Disseldorp wrote:
>> [ceph_test]
>> path = /ceph-kernel
>> guest ok = no
>> delete readonly = yes
>> oplocks = yes
>> posix locking = no

jftr, we use the following to disable all locking (on samba 4.8.2):

  oplocks = False
  level2 oplocks = False
  kernel oplocks = no

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph replication factor of 2

2018-05-24 Thread Daniel Baumann
Hi,

I coudn't agree more, but just to re-emphasize what others already said:

  the point of replica 3 is not to have extra safety for
  (human|software|server) failures, but to have enough data around to
  allow rebalancing the cluster when disks fail.

after a certain amount of disks in a cluster, you're going to get disks
failures all the time. if you don't pay extra attention (and wasting
lots and lots of time/money) to carefully arrange/choose disks of
different vendor productions lines/dates, simultaneous disk failures
happen within minutes.


example from our past:

on our (at that time small) cluster of 72 disks spread over 6 storage
nodes, half of them were seagate enterprice capacity disks, the other
half western digitial red pro. for each disk manufacturer, we bought
only half of the disks from the same production. so.. we had..

  * 18 disks wd, production charge A
  * 18 disks wd, production charge B
  * 18 disks seagate, production charge C
  * 18 disks seagate, production charge D

one day, 6 disks failed simultaneously spread over two storage nodes.
had we had replica 2, we couldn't recover and would have lost data.
instead, because of replica 3, we didn't loose any data and ceph
automatically rebalanced all data before further disks were failing.


so: if re-creating data stored on the cluster is valuable (because it
costs much time and effort to 're-collect' it, or you can't accept the
time it takes to restore from backup, or worse to re-create it from
scratch), you have to assume that whatever manufacturer/production
charge of HDs you're using, they *can* fail all at the same time because
you could have hit a faulty production.

the only way out here is replica >=3.

(of course, the whole MTBF and why raid doesn't scale applies as well)

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] samba gateway experiences with cephfs ?

2018-05-21 Thread Daniel Baumann
Hi

On 05/21/2018 05:38 PM, Jake Grimmett wrote:
> Unfortunately we have a large number (~200) of Windows and Macs clients
> which need CIFS/SMB  access to cephfs.

we too, which is why we're (partially) exporting cephfs over samba too,
1.5y in production now.

for us, cephfs-over-samba is significantly slower than cephfs directly
too, but it's not really an issue here (basically, if people use a
windows client here, they're already on the slow track anyway).

we had to do two things to get it working reliably though:

a) disable all locking on samba (otherwise "opportunistic locking" on
windows clients killed within hours all mds (kraken at that time))

b) only allow writes to a specific space on cephfs, reserved to samba
(with luminous; otherwise, we'd have problems with data consistency on
cephfs with people writing the same files from linux->cephfs and
samba->cephfs concurrently). my hunch is that samba caches writes and
doesn't give them back appropriatly.

> Finally, is the vfs_ceph module for Samba useful? It doesn't seem to be
> widely available pre-complied for for RHEL derivatives. Can anyone
> comment on their experiences using vfs_ceph, or point me to a Centos 7.x
> repo that has it?

we use debian, with backported kernel and backported samba, which has
vfs_ceph pre-compiled. however, we couldn't make vfs_ceph work at all -
the snapshot patters just don't seem to match/align (and nothing we
tried seem to work).

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (yet another) multi active mds advise needed

2018-05-18 Thread Daniel Baumann
On 05/19/2018 01:13 AM, Webert de Souza Lima wrote:
> New question: will it make any difference in the balancing if instead of
> having the MAIL directory in the root of cephfs and the domains's
> subtrees inside it, I discard the parent dir and put all the subtress right 
> in cephfs root?

the balancing between the MDS is influenced by which directories are
accessed, the currently accessed directory-trees are diveded between the
MDS's (also check the dirfrag option in the docs). assuming you have the
same access pattern, the "fragmentation" between the MDS's happens at
these "target-directories", so it doesn't matter if these directories
are further up or down in the same filesystem tree.

in the multi-MDS scenario where the MDS serving rank 0 fails, the
effects in the moment of the failure for any cephfs client accessing a
directory/file are the same (as described in an earlier mail),
regardless on which level the directory/file is within the filesystem.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] (yet another) multi active mds advise needed

2018-05-18 Thread Daniel Baumann
On 05/18/2018 11:19 PM, Patrick Donnelly wrote:
> So, you would want to have a standby-replay
> daemon for each rank or just have normal standbys. It will likely
> depend on the size of your MDS (cache size) and available hardware.

jftr, having 3 active mds and 3 standby-replay resulted May 20217 in a
longer downtime for us due to http://tracker.ceph.com/issues/21749

(http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/thread.html#21390
- thanks again for the help back then, still much appreciated)

we're not using standby-replay MDS's anymore but only "normal" standby,
and didn't have had any problems anymore (running kraken then, upgraded
to luminous last fall).

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-MDS Failover

2018-04-30 Thread Daniel Baumann
On 04/27/2018 07:11 PM, Patrick Donnelly wrote:
> The answer is that there may be partial availability from
> the up:active ranks which may hand out capabilities for the subtrees
> they manage or no availability if that's not possible because it
> cannot obtain the necessary locks.

additionally: if rank 0 is lost, the whole FS stands still (no new
client can mount the fs; no existing client can change a directory, etc.).

my guess is that the root of a cephfs (/; which is always served by rank
0) is needed in order to do traversals/lookups of any directories on the
top-level (which then can be served by ranks 1-n).


last year, we had quite some troubles with unstable cephfs (MDS reliably
and reproducibly crashing when hitting them with rsync over multi-TB
directories with files all being <<1mb) and had lots of situations where
ranks (most of the time including 0) were down.

fortunatly we could always get the fs back my unmounting it on all
clients, restarting all mds. the last of these unstabilities seem to
have gone with 12.2.3/12.2.4 (we're now running 12.2.5).

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-26 Thread Daniel Baumann
ceph is cluster - so reboots aren't an issue (we do set noout during a
planed serial reboot of all machines of the cluster).

personally i don't think the hassle of live patching is worth it. it's a
very gross hack that only works well in very specific niche cases. ceph
(as every proper cluster) is imho not such a use case.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ubuntu 17.10 or Debian 9.3 + Luminous = random OS hang ?

2018-01-19 Thread Daniel Baumann
Hi,

On 01/19/18 14:46, Youzhong Yang wrote:
> Just wondering if anyone has seen the same issue, or it's just me.

we're using debian with our own backported kernels and ceph, works rock
solid.

what you're describing sounds more like hardware issues to me. if you
don't fully "trust"/have confidence in your hardware (and your logs
don't reveal anything), I'd recommend running some burn-in tests
(memtest, cpuburn, etc.) on them for 24 hours/machine to rule out
cpu/ram/etc. issues.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS log jam prevention

2017-12-05 Thread Daniel Baumann
Hi,

On 12/05/17 17:58, Dan Jakubiec wrote:
> Is this is configuration problem or a bug?

we had massive problems with both kraken (feb-sept 2017) and luminous
(12.2.0), seeing the same behaviour as you. ceph.conf was containing
defaults only, except that we had to crank up mds_cache_size and
mds_bal_fragment_size_max.

using dirfrag and multi-mds did not change anything. even with luminous
(12.2.0) basically a single rsync over a large directory tree could kill
cephfs for all clients within seconds, where even a waiting period of >8
hours did not help.

since the cluster was semi-productive, we coudn't take the downtime so
we switched to unmounting all cephfs, flush journal, and re-mount it.

interestingly with 12.2.1 on kernel 4.13 however, this doesn't occur
anymore (the 'mds lagging behind' still happens, but recovers quickly
within minutes, and the rsync doesn not need to be aborted).

i'm not sure if 12.2.1 fixed it itself, or it was your config changes
happening at the same time:

mds_session_autoclose = 10
mds_reconnect_timeout = 10

mds_blacklist_interval = 10
mds_session_blacklist_on_timeout = false
mds_session_blacklist_on_evict = false

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-disk is now deprecated

2017-11-30 Thread Daniel Baumann
On 11/30/17 14:04, Fabian Grünbichler wrote:
> point is - you should not purposefully attempt to annoy users and/or
> downstreams by changing behaviour in the middle of an LTS release cycle,

exactly. upgrading the patch level (x.y.z to x.y.z+1) should imho never
introduce a behaviour-change, regardless if it's "just" adding new
warnings or not.

this is a stable update we're talking about, even more so since it's an
LTS release. you never know how people use stuff (e.g. by parsing stupid
things), so such behaviour-change will break stuff for *some* people
(granted, most likely a really low number).

my expection to an stable release is, that it stays, literally, stable.
that's the whole point of having it in the first place. otherwise we
would all be running git snapshots and update randomly to newer ones.

adding deprecation messages in mimic makes sense, and getting rid of
it/not provide support for it in mimic+1 is reasonable.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS - Mounting a second Ceph file system

2017-11-29 Thread Daniel Baumann
On 11/29/17 00:06, Nigel Williams wrote:
> Are their opinions on how stable multiple filesystems per single Ceph
> cluster is in practice?

we're using a single cephfs in production since february, and switched
to three cephfs in september - without any problem so far (running 12.2.1).

workload is backend for smb, hpc number crunching, and running generic
linux containers on it.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS - Mounting a second Ceph file system

2017-11-28 Thread Daniel Baumann
On 11/28/17 15:09, Geoffrey Rhodes wrote:
> I'd like to run more than one Ceph file system in the same cluster.
> Can anybody point me in the right direction to explain how to mount the
> second file system?

if you use the kernel client, you can use the mds_namespace option, i.e.:

  mount -t ceph $monitor_address:/ -o mds_namespace=$fsname \
  /mnt/$you_mountpoint

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to debug (in order to repair) damaged MDS (rank)?

2017-10-10 Thread Daniel Baumann
On 10/10/2017 02:10 PM, John Spray wrote:
> Yes.

worked, rank 6 is back and cephfs up again. thank you very much.

> Do a final ls to make sure you got all of them -- it is
> dangerous to leave any fragments behind.

will do.

> BTW opened http://tracker.ceph.com/issues/21749 for the underlying bug.

thanks; I've saved all the logs, so I'm happy to provide anything you need.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to debug (in order to repair) damaged MDS (rank)?

2017-10-10 Thread Daniel Baumann
Hi John,

thank you very much for your help.

On 10/10/2017 12:57 PM, John Spray wrote:
>  A) Do a "rados -p  ls | grep "^506\." or similar, to
> get a list of the objects

done, gives me these:

  506.
  506.0017
  506.001b
  506.0019
  506.001a
  506.001c
  506.0018
  506.0016
  506.001e
  506.001f
  506.001d

>  B) Write a short bash loop to do a "rados -p  get" on
> each of those objects into a file.

done, saved them as the object name as filename, resulting in these 11
files:

   90 Oct 10 13:17 506.
 4.0M Oct 10 13:17 506.0016
 4.0M Oct 10 13:17 506.0017
 4.0M Oct 10 13:17 506.0018
 4.0M Oct 10 13:17 506.0019
 4.0M Oct 10 13:17 506.001a
 4.0M Oct 10 13:17 506.001b
 4.0M Oct 10 13:17 506.001c
 4.0M Oct 10 13:17 506.001d
 4.0M Oct 10 13:17 506.001e
 4.0M Oct 10 13:17 506.001f

>  C) Stop the MDS, set "debug mds = 20" and "debug journaler = 20",
> mark the rank repaired, start the MDS again, and then gather the
> resulting log (it should end in the same "Error -22 recovering
> write_pos", but have much much more detail about what came before).

I've attached the entire log from right before issueing "repaired" until
after the mds drops to standby again.

> Because you've hit a serious bug, it's really important to gather all
> this and share it, so that we can try to fix it and prevent it
> happening again to you or others.

absolutely, sure. If you need anything more, I'm happy to share.

> You have two options, depending on how much downtime you can tolerate:
>  - carefully remove all the metadata objects that start with 506. --

given the outtage (and people need access to their data), I'd go with
this. Just to be safe: that would go like this?

  rados -p  rm 506.
  rados -p  rm 506.0016
  [...]

Regards,
Daniel
2017-10-10 13:21:55.413752 7f3f3011a700  5 mds.mds9 handle_mds_map epoch 96224 
from mon.0
2017-10-10 13:21:55.413836 7f3f3011a700 10 mds.mds9  my compat 
compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file 
layout v2}
2017-10-10 13:21:55.413847 7f3f3011a700 10 mds.mds9  mdsmap compat 
compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
2017-10-10 13:21:55.413852 7f3f3011a700 10 mds.mds9 map says I am 
147.87.226.189:6800/1634944095 mds.6.96224 state up:replay
2017-10-10 13:21:55.414088 7f3f3011a700  4 mds.6.purge_queue operator():  data 
pool 7 not found in OSDMap
2017-10-10 13:21:55.414141 7f3f3011a700 10 mds.mds9 handle_mds_map: 
initializing MDS rank 6
2017-10-10 13:21:55.414410 7f3f3011a700 10 mds.6.0 update_log_config 
log_to_monitors {default=true}
2017-10-10 13:21:55.414415 7f3f3011a700 10 mds.6.0 create_logger
2017-10-10 13:21:55.414635 7f3f3011a700  7 mds.6.server operator(): full = 0 
epoch = 0
2017-10-10 13:21:55.414644 7f3f3011a700  4 mds.6.purge_queue operator():  data 
pool 7 not found in OSDMap
2017-10-10 13:21:55.414648 7f3f3011a700  4 mds.6.0 handle_osd_map epoch 0, 0 
new blacklist entries
2017-10-10 13:21:55.414660 7f3f3011a700 10 mds.6.server apply_blacklist: killed 0
2017-10-10 13:21:55.414830 7f3f3011a700 10 mds.mds9 handle_mds_map: handling 
map as rank 6
2017-10-10 13:21:55.414839 7f3f3011a700  1 mds.6.96224 handle_mds_map i am now 
mds.6.96224
2017-10-10 13:21:55.414843 7f3f3011a700  1 mds.6.96224 handle_mds_map state 
change up:boot --> up:replay
2017-10-10 13:21:55.414855 7f3f3011a700 10 mds.beacon.mds9 set_want_state: 
up:standby -> up:replay
2017-10-10 13:21:55.414859 7f3f3011a700  1 mds.6.96224 replay_start
2017-10-10 13:21:55.414873 7f3f3011a700  7 mds.6.cache set_recovery_set 
0,1,2,3,4,5,7,8
2017-10-10 13:21:55.414883 7f3f3011a700  1 mds.6.96224  recovery set is 
0,1,2,3,4,5,7,8
2017-10-10 13:21:55.414893 7f3f3011a700  1 mds.6.96224  waiting for osdmap 
18607 (which blacklists prior instance)
2017-10-10 13:21:55.414901 7f3f3011a700  4 mds.6.purge_queue operator():  data 
pool 7 not found in OSDMap
2017-10-10 13:21:55.416011 7f3f3011a700  7 mds.6.server operator(): full = 0 
epoch = 18608
2017-10-10 13:21:55.416024 7f3f3011a700  4 mds.6.96224 handle_osd_map epoch 
18608, 0 new blacklist entries
2017-10-10 13:21:55.416027 7f3f3011a700 10 mds.6.server apply_blacklist: killed 0
2017-10-10 13:21:55.416076 7f3f2a10e700 10 MDSIOContextBase::complete: 
12C_IO_Wrapper
2017-10-10 13:21:55.416095 7f3f2a10e700 10 MDSInternalContextBase::complete: 
15C_MDS_BootStart
2017-10-10 13:21:55.416101 7f3f2a10e700  2 mds.6.96224 boot_start 0: opening 
inotable
2017-10-10 13:21:55.416120 7f3f2a10e700 10 mds.6.inotable: load
2017-10-10 13:21:55.416301 7f3f2a10e700  2 mds.6.96224 boot_start 0: opening 
sessionmap
2017-10-10 13:21:55.416310 

[ceph-users] how to debug (in order to repair) damaged MDS (rank)?

2017-10-10 Thread Daniel Baumann
Hi all,

unfortunatly I'm still struggling bringing cephfs back up after one of
the MDS has been marked "damaged" (see messages from monday).

1. When I mark the rank as "repaired", this is what I get in the monitor
   log (leaving unrelated leveldb compacting chatter aside):

2017-10-10 10:51:23.177865 7f3290710700  0 log_channel(audit) log [INF]
: from='client.? 147.87.226.72:0/1658479115' entity='client.admin' cmd
='[{"prefix": "mds repaired", "rank": "6"}]': finished
2017-10-10 10:51:23.177993 7f3290710700  0 log_channel(cluster) log
[DBG] : fsmap cephfs-9/9/9 up  {0=mds1=up:resolve,1=mds2=up:resolve,2=mds3
=up:resolve,3=mds4=up:resolve,4=mds5=up:resolve,5=mds6=up:resolve,6=mds9=up:replay,7=mds7=up:resolve,8=mds8=up:resolve}
[...]

2017-10-10 10:51:23.492040 7f328ab1c700  1 mon.mon1@0(leader).mds e96186
 mds mds.? 147.87.226.189:6800/524543767 can't write to fsmap compat=
{},rocompat={},incompat={1=base v0.20,2=client writeable
ranges,3=default file layouts on dirs,4=dir inode in separate
object,5=mds uses versi
oned encoding,6=dirfrag is stored in omap,8=file layout v2}
[...]

2017-10-10 10:51:24.291827 7f328d321700 -1 log_channel(cluster) log
[ERR] : Health check failed: 1 mds daemon damaged (MDS_DAMAGE)

2. ...and this is what I get on the mds:

2017-10-10 11:21:26.537204 7fcb01702700 -1 mds.6.journaler.pq(ro)
_decode error from assimilate_prefetch
2017-10-10 11:21:26.537223 7fcb01702700 -1 mds.6.purge_queue _recover:
Error -22 recovering write_pos

(see attachment for the full mds log during the "repair" action)


I'm really stuck here and would greatly appreciate any help. How can I
see what is actually going on/the problem? Running ceph-mon/ceph-mds
with debug levels logs just "damaged" as quoted above, but doesn't tell
what is wrong or why it's failing.

would going back to single MDS with "ceph fs reset" allow me to access
the data again?

Regards,
Daniel
2017-10-10 11:21:26.419394 7fcb0670c700 10 mds.mds9  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=de
fault file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=file layout v2}
2017-10-10 11:21:26.419399 7fcb0670c700 10 mds.mds9 map says I am 147.87.226.189:6800/1182896077 mds.6.96195 state up:replay
2017-10-10 11:21:26.419623 7fcb0670c700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-10 11:21:26.419679 7fcb0670c700 10 mds.mds9 handle_mds_map: initializing MDS rank 6
2017-10-10 11:21:26.419916 7fcb0670c700 10 mds.6.0 update_log_config log_to_monitors {default=true}
2017-10-10 11:21:26.419920 7fcb0670c700 10 mds.6.0 create_logger
2017-10-10 11:21:26.420138 7fcb0670c700  7 mds.6.server operator(): full = 0 epoch = 0
2017-10-10 11:21:26.420146 7fcb0670c700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-10 11:21:26.420150 7fcb0670c700  4 mds.6.0 handle_osd_map epoch 0, 0 new blacklist entries
2017-10-10 11:21:26.420159 7fcb0670c700 10 mds.6.server apply_blacklist: killed 0
2017-10-10 11:21:26.420338 7fcb0670c700 10 mds.mds9 handle_mds_map: handling map as rank 6
2017-10-10 11:21:26.420347 7fcb0670c700  1 mds.6.96195 handle_mds_map i am now mds.6.96195
2017-10-10 11:21:26.420351 7fcb0670c700  1 mds.6.96195 handle_mds_map state change up:boot --> up:replay
2017-10-10 11:21:26.420366 7fcb0670c700 10 mds.beacon.mds9 set_want_state: up:standby -> up:replay
2017-10-10 11:21:26.420370 7fcb0670c700  1 mds.6.96195 replay_start
2017-10-10 11:21:26.420375 7fcb0670c700  7 mds.6.cache set_recovery_set 0,1,2,3,4,5,7,8
2017-10-10 11:21:26.420380 7fcb0670c700  1 mds.6.96195  recovery set is 0,1,2,3,4,5,7,8
2017-10-10 11:21:26.420395 7fcb0670c700  1 mds.6.96195  waiting for osdmap 18593 (which blacklists prior instance)
2017-10-10 11:21:26.420401 7fcb0670c700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-10 11:21:26.421206 7fcb0670c700  7 mds.6.server operator(): full = 0 epoch = 18593
2017-10-10 11:21:26.421217 7fcb0670c700  4 mds.6.96195 handle_osd_map epoch 18593, 0 new blacklist entries
2017-10-10 11:21:26.421220 7fcb0670c700 10 mds.6.server apply_blacklist: killed 0
2017-10-10 11:21:26.421253 7fcb00700700 10 MDSIOContextBase::complete: 12C_IO_Wrapper
2017-10-10 11:21:26.421263 7fcb00700700 10 MDSInternalContextBase::complete: 15C_MDS_BootStart
2017-10-10 11:21:26.421267 7fcb00700700  2 mds.6.96195 boot_start 0: opening inotable
2017-10-10 11:21:26.421285 7fcb00700700 10 mds.6.inotable: load
2017-10-10 11:21:26.421441 7fcb00700700  2 mds.6.96195 boot_start 0: opening sessionmap
2017-10-10 11:21:26.421449 7fcb00700700 10 mds.6.sessionmap load
2017-10-10 11:21:26.421551 7fcb00700700  2 mds.6.96195 boot_start 0: opening mds log
2017-10-10 11:21:26.421558 7fcb00700700  5 mds.6.log open discovering log bounds
2017-10-10 11:21:26.421720 7fcaff6fe700 10 mds.6.log _submit_thread start
2017-10-10 11:21:26.423002 7fcb00700700 10 MDSIOContextBase::complete: N12_GLOBAL__N_112C_IO_SM_LoadE

Re: [ceph-users] cephfs: how to repair damaged mds rank?

2017-10-09 Thread Daniel Baumann
Hi John,

On 10/09/2017 10:47 AM, John Spray wrote:
> When a rank is "damaged", that means the MDS rank is blocked from
> starting because Ceph thinks the on-disk metadata is damaged -- no
> amount of restarting things will help.

thanks.

> The place to start with the investigation is to find the source of the
> damage.  Look in your monitor log for "marking rank 6 damaged"

I found this in the mon log:

  2017-10-09 03:24:28.207424 7f3290710700  0 log_channel(cluster) log
 [DBG] : mds.6 147.87.226.187:6800/1120166215 down:damaged

so at the time it was marked damaged, rank 6 was running on mds7.

> and then look in your MDS logs at that timestamp (find the MDS that held
> rank 6 at the time).

looking at mds7 log for that timespan, I think I understand that:

  * at "early" 03:24, mds7 was serving rank 5 and crashed, restarted
automatically twice, and then picked up rank 6 at 03:24:21.

  * at 03:24:21, mds7 got rank 6 and got into 'standby'-mode(?):

2017-10-09 03:24:21.598446 7f70ca01c240  0 set uid:gid to 64045:64045
(ceph:ceph)
2017-10-09 03:24:21.598469 7f70ca01c240  0 ceph version 12.2.1
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process
(unknown), pid 1337
2017-10-09 03:24:21.601958 7f70ca01c240  0 pidfile_write: ignore empty
--pid-file
2017-10-09 03:24:26.108545 7f70c2580700  1 mds.mds7 handle_mds_map standby
2017-10-09 03:24:26.115469 7f70c2580700  1 mds.6.95474 handle_mds_map i
am now mds.6.95474
2017-10-09 03:24:26.115479 7f70c2580700  1 mds.6.95474 handle_mds_map
state change up:boot --> up:replay
2017-10-09 03:24:26.115493 7f70c2580700  1 mds.6.95474 replay_start
2017-10-09 03:24:26.115502 7f70c2580700  1 mds.6.95474  recovery set is
0,1,2,3,4,5,7,8
2017-10-09 03:24:26.115511 7f70c2580700  1 mds.6.95474  waiting for
osdmap 18284 (which blacklists prior instance)
2017-10-09 03:24:26.536629 7f70bc574700  0 mds.6.cache creating system
inode with ino:0x106
2017-10-09 03:24:26.537009 7f70bc574700  0 mds.6.cache creating system
inode with ino:0x1
2017-10-09 03:24:27.233759 7f70bd576700 -1 mds.6.journaler.pq(ro)
_decode error from assimilate_prefetch
2017-10-09 03:24:27.233780 7f70bd576700 -1 mds.6.purge_queue _recover:
Error -22 recovering write_pos
2017-10-09 03:24:27.238820 7f70bd576700  1 mds.mds7 respawn
2017-10-09 03:24:27.238828 7f70bd576700  1 mds.mds7  e: '/usr/bin/ceph-mds'
2017-10-09 03:24:27.238831 7f70bd576700  1 mds.mds7  0: '/usr/bin/ceph-mds'
2017-10-09 03:24:27.238833 7f70bd576700  1 mds.mds7  1: '-f'
2017-10-09 03:24:27.238835 7f70bd576700  1 mds.mds7  2: '--cluster'
2017-10-09 03:24:27.238836 7f70bd576700  1 mds.mds7  3: 'ceph'
2017-10-09 03:24:27.238838 7f70bd576700  1 mds.mds7  4: '--id'
2017-10-09 03:24:27.238839 7f70bd576700  1 mds.mds7  5: 'mds7'
2017-10-09 03:24:27.239567 7f70bd576700  1 mds.mds7  6: '--setuser'
2017-10-09 03:24:27.239579 7f70bd576700  1 mds.mds7  7: 'ceph'
2017-10-09 03:24:27.239580 7f70bd576700  1 mds.mds7  8: '--setgroup'
2017-10-09 03:24:27.239581 7f70bd576700  1 mds.mds7  9: 'ceph'
2017-10-09 03:24:27.239612 7f70bd576700  1 mds.mds7 respawning with exe
/usr/bin/ceph-mds
2017-10-09 03:24:27.239614 7f70bd576700  1 mds.mds7  exe_path /proc/self/exe
2017-10-09 03:24:27.268448 7f9c7eafa240  0 ceph version 12.2.1
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process
(unknown), pid 1337
2017-10-09 03:24:27.271987 7f9c7eafa240  0 pidfile_write: ignore empty
--pid-file
2017-10-09 03:24:31.325891 7f9c7789c700  1 mds.mds7 handle_mds_map standby
2017-10-09 03:24:31.332376 7f9c7789c700  1 mds.1.0 handle_mds_map i am
now mds.28178286.0 replaying mds.1.0
2017-10-09 03:24:31.332388 7f9c7789c700  1 mds.1.0 handle_mds_map state
change up:boot --> up:standby-replay
2017-10-09 03:24:31.332401 7f9c7789c700  1 mds.1.0 replay_start
2017-10-09 03:24:31.332410 7f9c7789c700  1 mds.1.0  recovery set is
0,2,3,4,5,6,7,8
2017-10-09 03:24:31.332425 7f9c7789c700  1 mds.1.0  waiting for osdmap
18285 (which blacklists prior instance)
2017-10-09 03:24:31.351850 7f9c7108f700  0 mds.1.cache creating system
inode with ino:0x101
2017-10-09 03:24:31.352204 7f9c7108f700  0 mds.1.cache creating system
inode with ino:0x1
2017-10-09 03:24:32.144505 7f9c7008d700  0 mds.1.cache creating system
inode with ino:0x100
2017-10-09 03:24:32.144671 7f9c7008d700  1 mds.1.0 replay_done (as standby)
2017-10-09 03:24:33.150117 7f9c71890700  1 mds.1.0 replay_done (as standby)

for about two hours, then, the last line repeats unchanged for every
following second.

where can I go with this? anything I can do further?

also, just in case: it seems that at the time of the crash a large (= a
lot, lot of small files) 'rm -rf' was running (all clients use kernel
4.13.4 to mount the cephfs, not fuse).

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs: how to repair damaged mds rank?

2017-10-09 Thread Daniel Baumann
On 10/09/2017 09:17 AM, Daniel Baumann wrote:
> The relevant portion from the ceph-mds log (when starting mds9 which
> should then take up rank 6; I'm happy to provide any logs):

i've turned up the logging (see attachment).. could it be that we hit
this bug here?

http://tracker.ceph.com/issues/17670

Regards,
Daniel
2017-10-09 10:07:14.677308 7f7972bd6700 10 mds.beacon.mds9 handle_mds_beacon up:standby seq 6 rtt 0.000642
2017-10-09 10:07:15.547453 7f7972bd6700  5 mds.mds9 handle_mds_map epoch 96022 from mon.0
2017-10-09 10:07:15.547526 7f7972bd6700 10 mds.mds9  my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in om
ap,7=mds uses inline data,8=file layout v2}
2017-10-09 10:07:15.547546 7f7972bd6700 10 mds.mds9  mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in om
ap,8=file layout v2}
2017-10-09 10:07:15.547555 7f7972bd6700 10 mds.mds9 map says I am 147.87.226.189:6800/6621615 mds.6.96022 state up:replay
2017-10-09 10:07:15.547825 7f7972bd6700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-09 10:07:15.547882 7f7972bd6700 10 mds.mds9 handle_mds_map: initializing MDS rank 6
2017-10-09 10:07:15.548165 7f7972bd6700 10 mds.6.0 update_log_config log_to_monitors {default=true}
2017-10-09 10:07:15.548171 7f7972bd6700 10 mds.6.0 create_logger
2017-10-09 10:07:15.548410 7f7972bd6700  7 mds.6.server operator(): full = 0 epoch = 0
2017-10-09 10:07:15.548423 7f7972bd6700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-09 10:07:15.548427 7f7972bd6700  4 mds.6.0 handle_osd_map epoch 0, 0 new blacklist entries
2017-10-09 10:07:15.548439 7f7972bd6700 10 mds.6.server apply_blacklist: killed 0
2017-10-09 10:07:15.548634 7f7972bd6700 10 mds.mds9 handle_mds_map: handling map as rank 6
2017-10-09 10:07:15.548647 7f7972bd6700  1 mds.6.96022 handle_mds_map i am now mds.6.96022
2017-10-09 10:07:15.548650 7f7972bd6700  1 mds.6.96022 handle_mds_map state change up:boot --> up:replay
2017-10-09 10:07:15.548668 7f7972bd6700 10 mds.beacon.mds9 set_want_state: up:standby -> up:replay
2017-10-09 10:07:15.548687 7f7972bd6700  1 mds.6.96022 replay_start
2017-10-09 10:07:15.548699 7f7972bd6700  7 mds.6.cache set_recovery_set 0,1,2,3,4,5,7,8
2017-10-09 10:07:15.548706 7f7972bd6700  1 mds.6.96022  recovery set is 0,1,2,3,4,5,7,8
2017-10-09 10:07:15.548720 7f7972bd6700  1 mds.6.96022  waiting for osdmap 18484 (which blacklists prior instance)
2017-10-09 10:07:15.548737 7f7972bd6700  4 mds.6.purge_queue operator():  data pool 7 not found in OSDMap
2017-10-09 10:07:15.549521 7f7972bd6700  7 mds.6.server operator(): full = 0 epoch = 18492
2017-10-09 10:07:15.549534 7f7972bd6700  4 mds.6.96022 handle_osd_map epoch 18492, 0 new blacklist entries
2017-10-09 10:07:15.549537 7f7972bd6700 10 mds.6.server apply_blacklist: killed 0
2017-10-09 10:07:15.549582 7f796cbca700 10 MDSIOContextBase::complete: 12C_IO_Wrapper
2017-10-09 10:07:15.549679 7f796cbca700 10 MDSInternalContextBase::complete: 15C_MDS_BootStart
2017-10-09 10:07:15.549685 7f796cbca700  2 mds.6.96022 boot_start 0: opening inotable
2017-10-09 10:07:15.549695 7f796cbca700 10 mds.6.inotable: load
2017-10-09 10:07:15.549880 7f796cbca700  2 mds.6.96022 boot_start 0: opening sessionmap
2017-10-09 10:07:15.549888 7f796cbca700 10 mds.6.sessionmap load
2017-10-09 10:07:15.549977 7f796cbca700  2 mds.6.96022 boot_start 0: opening mds log
2017-10-09 10:07:15.549984 7f796cbca700  5 mds.6.log open discovering log bounds
2017-10-09 10:07:15.550113 7f796c3c9700  4 mds.6.journalpointer Reading journal pointer '406.'
2017-10-09 10:07:15.550132 7f796bbc8700 10 mds.6.log _submit_thread start
2017-10-09 10:07:15.551165 7f796cbca700 10 MDSIOContextBase::complete: 12C_IO_MT_Load
2017-10-09 10:07:15.551178 7f796cbca700 10 mds.6.inotable: load_2 got 34 bytes
2017-10-09 10:07:15.551184 7f796cbca700 10 mds.6.inotable: load_2 loaded v1
2017-10-09 10:07:15.565382 7f796cbca700 10 MDSIOContextBase::complete: N12_GLOBAL__N_112C_IO_SM_LoadE
2017-10-09 10:07:15.565397 7f796cbca700 10 mds.6.sessionmap _load_finish loaded version 0
2017-10-09 10:07:15.565401 7f796cbca700 10 mds.6.sessionmap _load_finish: omap load complete
2017-10-09 10:07:15.565403 7f796cbca700 10 mds.6.sessionmap _load_finish: v 0, 0 sessions
2017-10-09 10:07:15.565408 7f796cbca700 10 mds.6.sessionmap dump
2017-10-09 10:07:15.583721 7f796c3c9700  1 mds.6.journaler.mdlog(ro) recover start
2017-10-09 10:07:15.583732 7f796c3c9700  1 mds.6.journaler.mdlog(ro) read_head
2017-10-09 10:07:15.583854 7f796c3c9700  4 mds.6.log Waiting for journal 0x206 to recover...
2017-10-09 10:07:15.796523 7f796cbca700  1 mds.6.journaler.mdlog(ro) _finish_read_head loghead(trim 25992101888, expire 25992101888, write

[ceph-users] cephfs: how to repair damaged mds rank?

2017-10-09 Thread Daniel Baumann
Hi all,

we have a Ceph Cluster (12.2.1) with 9 MDS ranks in multi-mds mode.

"out of the blue", rank 6 is marked as damaged (and all other MDS are in
state up:resolve) and I can't bring the FS up again.

'ceph -s' says:
[...]
1 filesystem is degraded
1 mds daemon damaged

mds: cephfs-8/9/9 up
{0=mds1=up:resolve,1=mds2=up:resolve,2=mds3=up:resolve,3=mds4=up:resolve,4=mds5=up:resolve,5=mds6=up:resolve,7=mds7=
up:resolve,8=mds8=up:resolve}, 1 up:standby, 1 damaged
[...]

'ceph fs get cephfs' says:
[...]
max_mds 9
in  0,1,2,3,4,5,6,7,8
up
{0=28309098,1=28309128,2=28309149,3=28309188,4=28309209,5=28317918,7=28311732,8=28312272}
failed
damaged 6
stopped
[...]
28309098:   147.87.226.60:6800/2627352929 'mds1' mds.0.95936
up:resolve seq 3
28309128:   147.87.226.61:6800/416822271 'mds2' mds.1.95939
up:resolve seq 3
28309149:   147.87.226.62:6800/1969015920 'mds3' mds.2.95942
up:resolve seq 3
28309188:   147.87.226.184:6800/4074580566 'mds4' mds.3.95945
up:resolve seq 3
28309209:   147.87.226.185:6800/805082194 'mds5' mds.4.95948
up:resolve seq 3
28317918:   147.87.226.186:6800/1913199036 'mds6' mds.5.95984
up:resolve seq 3
28311732:   147.87.226.187:6800/4117561729 'mds7' mds.7.95957
up:resolve seq 3
28312272:   147.87.226.188:6800/2936268159 'mds8' mds.8.95960
up:resolve seq 3


I think I've tried almost anything already without success :(, including:

  * stopping all MDS, and bringing them up one after one
(works nice for the first ones up to rank 5, then the next one
 just grabs rank 7 and no MDS after that wants to take rank 6)

  * stopped all MDS, flushed MDS journal, manually marked rank 6 as
repaired, started all MDS again.

  * tried to switch back to only one MDS (stopping all MDS, setting
max_mds=1, disallowing multi-mds, disallowing dirfrag, removing
"mds_bal_frag=true" from ceph.conf, then starting the first mds),
didn't work.. the one single MDS stayed in up:resolve forever.

  * during all of the above, all CephFS clients have been unmounted,
so there's no access/stale access to the FS

  * did find a few things in the mailinglist archive, but seems there's
nothing conclusive on how to get it back online ("formating" the
FS is not possible). I didn't dare trying 'ceph mds rmfailed 6'
in fear of dataloss.


How can I get it back online?

The relevant portion from the ceph-mds log (when starting mds9 which
should then take up rank 6; I'm happy to provide any logs):

---snip---
2017-10-09 08:55:56.418237 7f1ec6ef3240  0 ceph version 12.2.1
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process
(unknown), pid 421
2017-10-09 08:55:56.421672 7f1ec6ef3240  0 pidfile_write: ignore empty
--pid-file
2017-10-09 08:56:00.990530 7f1ebf457700  1 mds.mds9 handle_mds_map standby
2017-10-09 08:56:00.997044 7f1ebf457700  1 mds.6.95988 handle_mds_map i
am now mds.6.95988
2017-10-09 08:56:00.997053 7f1ebf457700  1 mds.6.95988 handle_mds_map
state change up:boot --> up:replay
2017-10-09 08:56:00.997068 7f1ebf457700  1 mds.6.95988 replay_start
2017-10-09 08:56:00.997076 7f1ebf457700  1 mds.6.95988  recovery set is
0,1,2,3,4,5,7,8
2017-10-09 08:56:01.003203 7f1eb8c4a700  0 mds.6.cache creating system
inode with ino:0x106
2017-10-09 08:56:01.003592 7f1eb8c4a700  0 mds.6.cache creating system
inode with ino:0x1
2017-10-09 08:56:01.016403 7f1eba44d700 -1 mds.6.journaler.pq(ro)
_decode error from assimilate_prefetch
2017-10-09 08:56:01.016425 7f1eba44d700 -1 mds.6.purge_queue _recover:
Error -22 recovering write_pos
2017-10-09 08:56:01.019746 7f1eba44d700  1 mds.mds9 respawn
2017-10-09 08:56:01.019762 7f1eba44d700  1 mds.mds9  e: '/usr/bin/ceph-mds'
2017-10-09 08:56:01.019765 7f1eba44d700  1 mds.mds9  0: '/usr/bin/ceph-mds'
2017-10-09 08:56:01.019767 7f1eba44d700  1 mds.mds9  1: '-f'
2017-10-09 08:56:01.019769 7f1eba44d700  1 mds.mds9  2: '--cluster'
2017-10-09 08:56:01.019771 7f1eba44d700  1 mds.mds9  3: 'ceph'
2017-10-09 08:56:01.019772 7f1eba44d700  1 mds.mds9  4: '--id'
2017-10-09 08:56:01.019773 7f1eba44d700  1 mds.mds9  5: 'mds9'
2017-10-09 08:56:01.019774 7f1eba44d700  1 mds.mds9  6: '--setuser'
2017-10-09 08:56:01.019775 7f1eba44d700  1 mds.mds9  7: 'ceph'
2017-10-09 08:56:01.019776 7f1eba44d700  1 mds.mds9  8: '--setgroup'
2017-10-09 08:56:01.019778 7f1eba44d700  1 mds.mds9  9: 'ceph'
2017-10-09 08:56:01.019811 7f1eba44d700  1 mds.mds9 respawning with exe
/usr/bin/ceph-mds
2017-10-09 08:56:01.019814 7f1eba44d700  1 mds.mds9  exe_path /proc/self/exe
2017-10-09 08:56:01.046396 7f5ed6090240  0 ceph version 12.2.1
(3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable), process
(unknown), pid 421
2017-10-09 08:56:01.049516 7f5ed6090240  0 pidfile_write: ignore empty
--pid-file
2017-10-09 08:56:05.162732 7f5ecee32700  1 mds.mds9 handle_mds_map standby
[...]
---snap---

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com