Re: [ceph-users] Multi-MDS Failover

2018-05-19 Thread Blair Bethwaite
On 19 May 2018 at 09:20, Scottix  wrote:
> It would be nice to have an option to have all IO blocked if it hits a 
> degraded state until it recovers. Since you are unaware of other MDS state, 
> seems like that would be tough to do.

I agree this would be a nice knob to have from the perspective of
having consistent (and easy to diagnose) client behaviour when such a
situation occurs. However I don't think this is possible, if a client
is working in a directory served via rank-0 MDS (whilst rank-1 has
just gone down) it isn't going to know rank-0 is down until the MONs
do. So to get the "all stop" you are talking about the client would
then have to undo already committed IO(!), the only other option would
be "pinging" all ranks on every metadata change, and that sounds
horrible.

Maybe this is a case where you'd be better off putting NFS in front of
your CephFS?

-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-MDS Failover

2018-05-18 Thread Scottix
So we have been testing this quite a bit, having the failure domain as
partially available is ok for us but odd, since we don't know what will be
down. Compared to a single MDS we know everything will be blocked.

It would be nice to have an option to have all IO blocked if it hits a
degraded state until it recovers. Since you are unaware of other MDS state,
seems like that would be tough to do.

I'll leave this as a feature request possibly in the future.

On Fri, May 18, 2018 at 3:15 PM Gregory Farnum  wrote:

> On Fri, May 18, 2018 at 11:56 AM Webert de Souza Lima <
> webert.b...@gmail.com> wrote:
>
>> Hello,
>>
>>
>> On Mon, Apr 30, 2018 at 7:16 AM Daniel Baumann 
>> wrote:
>>
>>> additionally: if rank 0 is lost, the whole FS stands still (no new
>>> client can mount the fs; no existing client can change a directory,
>>> etc.).
>>>
>>> my guess is that the root of a cephfs (/; which is always served by rank
>>> 0) is needed in order to do traversals/lookups of any directories on the
>>> top-level (which then can be served by ranks 1-n).
>>>
>>
>> Could someone confirm if this is actually how it works? Thanks.
>>
>
> Yes, although I'd expect that clients can keep doing work in directories
> they've already got opened (or in descendants of those). Perhaps I'm
> missing something about that, though...
> -Greg
>
>
>>
>> Regards,
>>
>> Webert Lima
>> DevOps Engineer at MAV Tecnologia
>> *Belo Horizonte - Brasil*
>> *IRC NICK - WebertRLZ*
>>
>>
>>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-MDS Failover

2018-05-18 Thread Gregory Farnum
On Fri, May 18, 2018 at 11:56 AM Webert de Souza Lima 
wrote:

> Hello,
>
>
> On Mon, Apr 30, 2018 at 7:16 AM Daniel Baumann 
> wrote:
>
>> additionally: if rank 0 is lost, the whole FS stands still (no new
>> client can mount the fs; no existing client can change a directory, etc.).
>>
>> my guess is that the root of a cephfs (/; which is always served by rank
>> 0) is needed in order to do traversals/lookups of any directories on the
>> top-level (which then can be served by ranks 1-n).
>>
>
> Could someone confirm if this is actually how it works? Thanks.
>

Yes, although I'd expect that clients can keep doing work in directories
they've already got opened (or in descendants of those). Perhaps I'm
missing something about that, though...
-Greg


>
> Regards,
>
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> *Belo Horizonte - Brasil*
> *IRC NICK - WebertRLZ*
>
>
>> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-MDS Failover

2018-05-18 Thread Webert de Souza Lima
Hello,


On Mon, Apr 30, 2018 at 7:16 AM Daniel Baumann 
wrote:

> additionally: if rank 0 is lost, the whole FS stands still (no new
> client can mount the fs; no existing client can change a directory, etc.).
>
> my guess is that the root of a cephfs (/; which is always served by rank
> 0) is needed in order to do traversals/lookups of any directories on the
> top-level (which then can be served by ranks 1-n).
>

Could someone confirm if this is actually how it works? Thanks.

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*


>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-MDS Failover

2018-04-30 Thread Daniel Baumann
On 04/27/2018 07:11 PM, Patrick Donnelly wrote:
> The answer is that there may be partial availability from
> the up:active ranks which may hand out capabilities for the subtrees
> they manage or no availability if that's not possible because it
> cannot obtain the necessary locks.

additionally: if rank 0 is lost, the whole FS stands still (no new
client can mount the fs; no existing client can change a directory, etc.).

my guess is that the root of a cephfs (/; which is always served by rank
0) is needed in order to do traversals/lookups of any directories on the
top-level (which then can be served by ranks 1-n).


last year, we had quite some troubles with unstable cephfs (MDS reliably
and reproducibly crashing when hitting them with rsync over multi-TB
directories with files all being <<1mb) and had lots of situations where
ranks (most of the time including 0) were down.

fortunatly we could always get the fs back my unmounting it on all
clients, restarting all mds. the last of these unstabilities seem to
have gone with 12.2.3/12.2.4 (we're now running 12.2.5).

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-MDS Failover

2018-04-27 Thread Patrick Donnelly
On Thu, Apr 26, 2018 at 7:04 PM, Scottix  wrote:
> Ok let me try to explain this better, we are doing this back and forth and
> its not going anywhere. I'll just be as genuine as I can and explain the
> issue.
>
> What we are testing is a critical failure scenario and actually more of a
> real world scenario. Basically just what happens when it is 1AM and the shit
> hits the fan, half of your servers are down and 1 of the 3 MDS boxes are
> still alive.
> There is one very important fact that happens with CephFS and when the
> single Active MDS server fails. It is guaranteed 100% all IO is blocked. No
> split-brain, no corrupted data, 100% guaranteed ever since we started using
> CephFS
>
>
> Now with multi_mds, I understand this changes the logic and I understand how
> difficult and how hard this problem is, trust me I would not be able to
> tackle this. Basically I need to answer the question; what happens when 1 of
> 2 multi_mds fails with no standbys ready to come save them?
> What I have tested is not the same of a single active MDS; this absolutely
> changes the logic of what happens and how we troubleshoot. The CephFS is
> still alive and it does allow operations and does allow resources to go
> through. How, why and what is affected are very relevant questions if this
> is what the failure looks like since it is not 100% blocking.

Okay so now I understand what your real question is: what is the state
of CephFS when one or more ranks have failed but no standbys exist to
takeover? The answer is that there may be partial availability from
the up:active ranks which may hand out capabilities for the subtrees
they manage or no availability if that's not possible because it
cannot obtain the necessary locks.  No metadata is lost. No
inconsistency is created between clients. Full availability will be
restored when the lost ranks come back online.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-MDS Failover

2018-04-26 Thread Scottix
Ok let me try to explain this better, we are doing this back and forth and
its not going anywhere. I'll just be as genuine as I can and explain the
issue.

What we are testing is a critical failure scenario and actually more of a
real world scenario. Basically just what happens when it is 1AM and the
shit hits the fan, half of your servers are down and 1 of the 3 MDS boxes
are still alive.
There is one very important fact that happens with CephFS and when the
single Active MDS server fails. It is guaranteed 100% all IO is blocked. No
split-brain, no corrupted data, 100% guaranteed ever since we started using
CephFS

Now with multi_mds, I understand this changes the logic and I understand
how difficult and how hard this problem is, trust me I would not be able to
tackle this. Basically I need to answer the question; what happens when 1
of 2 multi_mds fails with no standbys ready to come save them?
What I have tested is not the same of a single active MDS; this absolutely
changes the logic of what happens and how we troubleshoot. The CephFS is
still alive and it does allow operations and does allow resources to go
through. How, why and what is affected are very relevant questions if this
is what the failure looks like since it is not 100% blocking.

This is the problem, I have programs writing a massive amount of data and I
don't want it corrupted or lost. I need to know what happens and I need to
have guarantees.

Best


On Thu, Apr 26, 2018 at 5:03 PM Patrick Donnelly 
wrote:

> On Thu, Apr 26, 2018 at 4:40 PM, Scottix  wrote:
> >> Of course -- the mons can't tell the difference!
> > That is really unfortunate, it would be nice to know if the filesystem
> has
> > been degraded and to what degree.
>
> If a rank is laggy/crashed, the file system as a whole is generally
> unavailable. The span between partial outage and full is small and not
> worth quantifying.
>
> >> You must have standbys for high availability. This is the docs.
> > Ok but what if you have your standby go down and a master go down. This
> > could happen in the real world and is a valid error scenario.
> >Also there is
> > a period between when the standby becomes active what happens in-between
> > that time?
>
> The standby MDS goes through a series of states where it recovers the
> lost state and connections with clients. Finally, it goes active.
>
> >> It depends(tm) on how the metadata is distributed and what locks are
> > held by each MDS.
> > Your saying depending on which mds had a lock on a resource it will block
> > that particular POSIX operation? Can you clarify a little bit?
> >
> >> Standbys are not optional in any production cluster.
> > Of course in production I would hope people have standbys but in theory
> > there is no enforcement in Ceph for this other than a warning. So when
> you
> > say not optional that is not exactly true it will still run.
>
> It's self-defeating to expect CephFS to enforce having standbys --
> presumably by throwing an error or becoming unavailable -- when the
> standbys exist to make the system available.
>
> There's nothing to enforce. A warning is sufficient for the operator
> that (a) they didn't configure any standbys or (b) MDS daemon
> processes/boxes are going away and not coming back as standbys (i.e.
> the pool of MDS daemons is decreasing with each failover)
>
> --
> Patrick Donnelly
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-MDS Failover

2018-04-26 Thread Patrick Donnelly
On Thu, Apr 26, 2018 at 4:40 PM, Scottix  wrote:
>> Of course -- the mons can't tell the difference!
> That is really unfortunate, it would be nice to know if the filesystem has
> been degraded and to what degree.

If a rank is laggy/crashed, the file system as a whole is generally
unavailable. The span between partial outage and full is small and not
worth quantifying.

>> You must have standbys for high availability. This is the docs.
> Ok but what if you have your standby go down and a master go down. This
> could happen in the real world and is a valid error scenario.
>Also there is
> a period between when the standby becomes active what happens in-between
> that time?

The standby MDS goes through a series of states where it recovers the
lost state and connections with clients. Finally, it goes active.

>> It depends(tm) on how the metadata is distributed and what locks are
> held by each MDS.
> Your saying depending on which mds had a lock on a resource it will block
> that particular POSIX operation? Can you clarify a little bit?
>
>> Standbys are not optional in any production cluster.
> Of course in production I would hope people have standbys but in theory
> there is no enforcement in Ceph for this other than a warning. So when you
> say not optional that is not exactly true it will still run.

It's self-defeating to expect CephFS to enforce having standbys --
presumably by throwing an error or becoming unavailable -- when the
standbys exist to make the system available.

There's nothing to enforce. A warning is sufficient for the operator
that (a) they didn't configure any standbys or (b) MDS daemon
processes/boxes are going away and not coming back as standbys (i.e.
the pool of MDS daemons is decreasing with each failover)

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-MDS Failover

2018-04-26 Thread Scottix
> Of course -- the mons can't tell the difference!
That is really unfortunate, it would be nice to know if the filesystem has
been degraded and to what degree.

> You must have standbys for high availability. This is the docs.
Ok but what if you have your standby go down and a master go down. This
could happen in the real world and is a valid error scenario. Also there is
a period between when the standby becomes active what happens in-between
that time?

> It depends(tm) on how the metadata is distributed and what locks are
held by each MDS.
Your saying depending on which mds had a lock on a resource it will block
that particular POSIX operation? Can you clarify a little bit?

> Standbys are not optional in any production cluster.
Of course in production I would hope people have standbys but in theory
there is no enforcement in Ceph for this other than a warning. So when you
say not optional that is not exactly true it will still run.

On Thu, Apr 26, 2018 at 3:37 PM Patrick Donnelly 
wrote:

> On Thu, Apr 26, 2018 at 3:16 PM, Scottix  wrote:
> > Updated to 12.2.5
> >
> > We are starting to test multi_mds cephfs and we are going through some
> > failure scenarios in our test cluster.
> >
> > We are simulating a power failure to one machine and we are getting mixed
> > results of what happens to the file system.
> >
> > This is the status of the mds once we simulate the power loss considering
> > there are no more standbys.
> >
> > mds: cephfs-2/2/2 up
> > {0=CephDeploy100=up:active,1=TigoMDS100=up:active(laggy or crashed)}
> >
> > 1. It is a little unclear if it is laggy or really is down, using this
> line
> > alone.
>
> Of course -- the mons can't tell the difference!
>
> > 2. The first time we lost total access to ceph folder and just blocked
> i/o
>
> You must have standbys for high availability. This is the docs.
>
> > 3. One time we were still able to access ceph folder and everything
> seems to
> > be running.
>
> It depends(tm) on how the metadata is distributed and what locks are
> held by each MDS.
>
> Standbys are not optional in any production cluster.
>
> --
> Patrick Donnelly
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-MDS Failover

2018-04-26 Thread Patrick Donnelly
On Thu, Apr 26, 2018 at 3:16 PM, Scottix  wrote:
> Updated to 12.2.5
>
> We are starting to test multi_mds cephfs and we are going through some
> failure scenarios in our test cluster.
>
> We are simulating a power failure to one machine and we are getting mixed
> results of what happens to the file system.
>
> This is the status of the mds once we simulate the power loss considering
> there are no more standbys.
>
> mds: cephfs-2/2/2 up
> {0=CephDeploy100=up:active,1=TigoMDS100=up:active(laggy or crashed)}
>
> 1. It is a little unclear if it is laggy or really is down, using this line
> alone.

Of course -- the mons can't tell the difference!

> 2. The first time we lost total access to ceph folder and just blocked i/o

You must have standbys for high availability. This is the docs.

> 3. One time we were still able to access ceph folder and everything seems to
> be running.

It depends(tm) on how the metadata is distributed and what locks are
held by each MDS.

Standbys are not optional in any production cluster.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multi-MDS Failover

2018-04-26 Thread Scottix
Updated to 12.2.5

We are starting to test multi_mds cephfs and we are going through some
failure scenarios in our test cluster.

We are simulating a power failure to one machine and we are getting mixed
results of what happens to the file system.

This is the status of the mds once we simulate the power loss considering
there are no more standbys.

mds: cephfs-2/2/2 up
{0=CephDeploy100=up:active,1=TigoMDS100=up:active(laggy or crashed)}

1. It is a little unclear if it is laggy or really is down, using this line
alone.
2. The first time we lost total access to ceph folder and just blocked i/o
3. One time we were still able to access ceph folder and everything seems
to be running.
4. One time we had a script creating a bunch of files, simulated the crash,
then we list the directory and showed 0 files, expected should be lots of
files.

I mean we could go into details of each of those, but really I am trying to
understand ceph logic in dealing with a crashed multi mds or if you mark it
degraded? or what is going on.

It just seems a little unclear what is going to happen.

Good news once it comes back online everything is as it should be.

Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com