[ceph-users] NAS solution for CephFS

2019-02-11 Thread Marvin Zhang
Hi,
As http://docs.ceph.com/docs/master/cephfs/nfs/ says, it's OK to
config active/passive NFS-Ganesha to use CephFs. My question is if we
can use active/active nfs-ganesha for CephFS.
In my thought, only state consistance should we think about.
1. Lock support for Active/Active. Even each nfs-ganesha sever mantain
the lock state, the real lock/unlock will call
ceph_ll_getlk/ceph_ll_setlk. So Ceph cluster will handle the lock
safely.
2. Delegation support Active/Active. It's similar question 1,
ceph_ll_delegation will handle it safely.
3. Nfs-ganesha cache support Active/Active. As
https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf
describes, we can config cache size as size 1.
4. Ceph-FSAL cache support Active/Active. Like other CephFs client,
there is no issues for cache consistance.

Thanks,
Marvin
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-13 Thread Marvin Zhang
On Thu, Feb 14, 2019 at 8:09 AM Jeff Layton  wrote:
>
> > Hi,
> > As http://docs.ceph.com/docs/master/cephfs/nfs/ says, it's OK to
> > config active/passive NFS-Ganesha to use CephFs. My question is if we
> > can use active/active nfs-ganesha for CephFS.
>
> (Apologies if you get two copies of this. I sent an earlier one from the
> wrong account and it got stuck in moderation)
>
> You can, with the new rados-cluster recovery backend that went into
> ganesha v2.7. See here for a bit more detail:
>
> https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/
>
> ...also have a look at the ceph.conf file in the ganesha sources.
>
> > In my thought, only state consistance should we think about.
> > 1. Lock support for Active/Active. Even each nfs-ganesha sever mantain
> > the lock state, the real lock/unlock will call
> > ceph_ll_getlk/ceph_ll_setlk. So Ceph cluster will handle the lock
> > safely.
> > 2. Delegation support Active/Active. It's similar question 1,
> > ceph_ll_delegation will handle it safely.
> > 3. Nfs-ganesha cache support Active/Active. As
> > https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf
> > describes, we can config cache size as size 1.
> > 4. Ceph-FSAL cache support Active/Active. Like other CephFs client,
> > there is no issues for cache consistance.
>
> The basic idea with the new recovery backend is to have the different
> NFS ganesha heads coordinate their recovery grace periods to prevent
> stateful conflicts.
>
> The one thing missing at this point is delegations in an active/active
> configuration, but that's mainly because of the synchronous nature of
> libcephfs. We have a potential fix for that problem but it requires work
> in libcephfs that is not yet done.
[marvin] So we should disable delegation on active/active and set the
conf like this. Is it right?
NFSv4
{
Delegations = false;
}
>
> Cheers,
> --
> Jeff Layton 
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-14 Thread Marvin Zhang
Hi Jeff,
Another question is about Client Caching when disabling delegation.
I set breakpoint on nfs4_op_read, which is OP_READ process function in
nfs-ganesha. Then I read a file, I found that it will hit only once on
the first time, which means latter reading operation on this file will
not trigger OP_READ. It will read the data from client side cache. Is
it right?
I also checked the nfs client code in linux kernel. Only
cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
like this:
if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
ret = nfs_invalidate_mapping(inode, mapping);
}
This about this senario, client1 connect ganesha1 and client2 connect
ganesha2. I read /1.txt on client1 and client1 will cache the data.
Then I modify this file on client2. At that time, how client1 know the
file is modifed and how it will add NFS_INO_INVALID_DATA into
cache_validity?
Thanks,
Marvin

On Thu, Feb 14, 2019 at 7:27 PM Jeff Layton  wrote:
>
> On Thu, 2019-02-14 at 10:35 +0800, Marvin Zhang wrote:
> > On Thu, Feb 14, 2019 at 8:09 AM Jeff Layton  wrote:
> > > > Hi,
> > > > As http://docs.ceph.com/docs/master/cephfs/nfs/ says, it's OK to
> > > > config active/passive NFS-Ganesha to use CephFs. My question is if we
> > > > can use active/active nfs-ganesha for CephFS.
> > >
> > > (Apologies if you get two copies of this. I sent an earlier one from the
> > > wrong account and it got stuck in moderation)
> > >
> > > You can, with the new rados-cluster recovery backend that went into
> > > ganesha v2.7. See here for a bit more detail:
> > >
> > > https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/
> > >
> > > ...also have a look at the ceph.conf file in the ganesha sources.
> > >
> > > > In my thought, only state consistance should we think about.
> > > > 1. Lock support for Active/Active. Even each nfs-ganesha sever mantain
> > > > the lock state, the real lock/unlock will call
> > > > ceph_ll_getlk/ceph_ll_setlk. So Ceph cluster will handle the lock
> > > > safely.
> > > > 2. Delegation support Active/Active. It's similar question 1,
> > > > ceph_ll_delegation will handle it safely.
> > > > 3. Nfs-ganesha cache support Active/Active. As
> > > > https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf
> > > > describes, we can config cache size as size 1.
> > > > 4. Ceph-FSAL cache support Active/Active. Like other CephFs client,
> > > > there is no issues for cache consistance.
> > >
> > > The basic idea with the new recovery backend is to have the different
> > > NFS ganesha heads coordinate their recovery grace periods to prevent
> > > stateful conflicts.
> > >
> > > The one thing missing at this point is delegations in an active/active
> > > configuration, but that's mainly because of the synchronous nature of
> > > libcephfs. We have a potential fix for that problem but it requires work
> > > in libcephfs that is not yet done.
> > [marvin] So we should disable delegation on active/active and set the
> > conf like this. Is it right?
> > NFSv4
> > {
> > Delegations = false;
> > }
>
> Yes.
> --
> Jeff Layton 
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-14 Thread Marvin Zhang
Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
Will Client query 'change' attribute every time before reading to know
if the data has been changed?

  +-+++-+---+
  | Name| ID | Data Type  | Acc | Defined in|
  +-+++-+---+
  | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1   |
  | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2   |
  | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3   |
  | change  | 3  | changeid4  | R   | Section 5.8.1.4   |
  | size| 4  | uint64_t   | R W | Section 5.8.1.5   |
  | link_support| 5  | bool   | R   | Section 5.8.1.6   |
  | symlink_support | 6  | bool   | R   | Section 5.8.1.7   |
  | named_attr  | 7  | bool   | R   | Section 5.8.1.8   |
  | fsid| 8  | fsid4  | R   | Section 5.8.1.9   |
  | unique_handles  | 9  | bool   | R   | Section 5.8.1.10  |
  | lease_time  | 10 | nfs_lease4 | R   | Section 5.8.1.11  |
  | rdattr_error| 11 | nfsstat4   | R   | Section 5.8.1.12  |
  | filehandle  | 19 | nfs_fh4| R   | Section 5.8.1.13  |
  +-+++-+---+

On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton  wrote:
>
> On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > Hi Jeff,
> > Another question is about Client Caching when disabling delegation.
> > I set breakpoint on nfs4_op_read, which is OP_READ process function in
> > nfs-ganesha. Then I read a file, I found that it will hit only once on
> > the first time, which means latter reading operation on this file will
> > not trigger OP_READ. It will read the data from client side cache. Is
> > it right?
>
> Yes. In the absence of a delegation, the client will periodically query
> for the inode attributes, and will serve reads from the cache if it
> looks like the file hasn't changed.
>
> > I also checked the nfs client code in linux kernel. Only
> > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
> > like this:
> > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> > ret = nfs_invalidate_mapping(inode, mapping);
> > }
> > This about this senario, client1 connect ganesha1 and client2 connect
> > ganesha2. I read /1.txt on client1 and client1 will cache the data.
> > Then I modify this file on client2. At that time, how client1 know the
> > file is modifed and how it will add NFS_INO_INVALID_DATA into
> > cache_validity?
>
>
> Once you modify the code on client2, ganesha2 will request the necessary
> caps from the ceph MDS, and client1 will have its caps revoked. It'll
> then make the change.
>
> When client1 reads again it will issue a GETATTR against the file [1].
> ganesha1 will then request caps to do the getattr, which will end up
> revoking ganesha2's caps. client1 will then see the change in attributes
> (the change attribute and mtime, most likely) and will invalidate the
> mapping, causing it do reissue a READ on the wire.
>
> [1]: There may be a window of time after you change the file on client2
> where client1 doesn't see it. That's due to the fact that inode
> attributes on the client are only revalidated after a timeout. You may
> want to read over the DATA AND METADATA COHERENCE section of nfs(5) to
> make sure you understand how the NFS client validates its caches.
>
> Cheers,
> --
> Jeff Layton 
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-14 Thread Marvin Zhang
Thanks Jeff.
If I set Attr_Expiration_Time as zero in conf , deos it mean timeout
is zero? If so, every client will see the change immediately. Will it
decrease the performance hardly?
I seems that GlusterFS FSAL use  UPCALL to invalidate the cache. How
about the CephFS FSAL?

On Thu, Feb 14, 2019 at 9:04 PM Jeff Layton  wrote:
>
> On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote:
> > Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
> > Will Client query 'change' attribute every time before reading to know
> > if the data has been changed?
> >
> >   +-+++-+---+
> >   | Name| ID | Data Type  | Acc | Defined in|
> >   +-+++-+---+
> >   | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1   |
> >   | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2   |
> >   | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3   |
> >   | change  | 3  | changeid4  | R   | Section 5.8.1.4   |
> >   | size| 4  | uint64_t   | R W | Section 5.8.1.5   |
> >   | link_support| 5  | bool   | R   | Section 5.8.1.6   |
> >   | symlink_support | 6  | bool   | R   | Section 5.8.1.7   |
> >   | named_attr  | 7  | bool   | R   | Section 5.8.1.8   |
> >   | fsid| 8  | fsid4  | R   | Section 5.8.1.9   |
> >   | unique_handles  | 9  | bool   | R   | Section 5.8.1.10  |
> >   | lease_time  | 10 | nfs_lease4 | R   | Section 5.8.1.11  |
> >   | rdattr_error| 11 | nfsstat4   | R   | Section 5.8.1.12  |
> >   | filehandle  | 19 | nfs_fh4| R   | Section 5.8.1.13  |
> >   +-+++-+---+
> >
>
> Not every time -- only when the cache needs revalidation.
>
> In the absence of a delegation, that happens on a timeout (see the
> acregmin/acregmax settings in nfs(5)), though things like opens and file
> locking events also affect when the client revalidates.
>
> When the v4 client does revalidate the cache, it relies heavily on NFSv4
> change attribute. Cephfs's change attribute is cluster-coherent too, so
> if the client does revalidate it should see changes made on other
> servers.
>
> > On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton  wrote:
> > > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > > > Hi Jeff,
> > > > Another question is about Client Caching when disabling delegation.
> > > > I set breakpoint on nfs4_op_read, which is OP_READ process function in
> > > > nfs-ganesha. Then I read a file, I found that it will hit only once on
> > > > the first time, which means latter reading operation on this file will
> > > > not trigger OP_READ. It will read the data from client side cache. Is
> > > > it right?
> > >
> > > Yes. In the absence of a delegation, the client will periodically query
> > > for the inode attributes, and will serve reads from the cache if it
> > > looks like the file hasn't changed.
> > >
> > > > I also checked the nfs client code in linux kernel. Only
> > > > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
> > > > like this:
> > > > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> > > > ret = nfs_invalidate_mapping(inode, mapping);
> > > > }
> > > > This about this senario, client1 connect ganesha1 and client2 connect
> > > > ganesha2. I read /1.txt on client1 and client1 will cache the data.
> > > > Then I modify this file on client2. At that time, how client1 know the
> > > > file is modifed and how it will add NFS_INO_INVALID_DATA into
> > > > cache_validity?
> > >
> > > Once you modify the code on client2, ganesha2 will request the necessary
> > > caps from the ceph MDS, and client1 will have its caps revoked. It'll
> > > then make the change.
> > >
> > > When client1 reads again it will issue a GETATTR against the file [1].
> > > ganesha1 will then request caps to do the getattr, which will end up
> > > revoking ganesha2's caps. client1 will then see the change in attributes
> > > (the change attribute and mtime, most likely) and will invalidate the
> > > mapping, causing it do reissue a READ on the wire.
> > >
> > > [1]: There may be a window of time after you change the file on client2
> > > where client1 doesn't see it. That's due to the fact that inode
> > > attributes on the client are only revalidated after a timeout. You may
> > > want to read over the DATA AND METADATA COHERENCE section of nfs(5) to
> > > make sure you understand how the NFS client validates its caches.
> > >
> > > Cheers,
> > > --
> > > Jeff Layton 
> > >
>
> --
> Jeff Layton 
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com