Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-18 Thread Jeff Layton
On Mon, 2019-02-18 at 17:02 +0100, Paul Emmerich wrote:
> > > I've benchmarked a ~15% performance difference in IOPS between cache
> > > expiration time of 0 and 10 when running fio on a single file from a
> > > single client.
> > > 
> > > 
> > 
> > NFS iops? I'd guess more READ ops in particular? Is that with a
> > FSAL_CEPH backend?
> 
> Yes. But that take that with a grain of salt, that was just a quick
> and dirty test of a very specific scenario that may or may not be
> relevant.
> 
> 

Sure.

If the NFS iops go up when you remove a layer of caching, then that
suggests that you had a situation where the cache likely should have
been invalidated, but wasn't. Basically, you may be sacrificing cache
coherency for performance.

The bigger question I have is whether the ganesha mdcache provides any
performance gain when the attributes are already cached in the libcephfs
layer.

If we did want to start using the mdcache, then we'd almost certainly
want to invalidate that cache when libcephfs gives up caps. I just don't
see how the extra layer of caching provides much value in that
situation.


> > 
> > > > > On Thu, Feb 14, 2019 at 9:04 PM Jeff Layton  
> > > > > wrote:
> > > > > > On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote:
> > > > > > > Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
> > > > > > > Will Client query 'change' attribute every time before reading to 
> > > > > > > know
> > > > > > > if the data has been changed?
> > > > > > > 
> > > > > > >   
> > > > > > > +-+++-+---+
> > > > > > >   | Name| ID | Data Type  | Acc | Defined in  
> > > > > > >   |
> > > > > > >   
> > > > > > > +-+++-+---+
> > > > > > >   | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1 
> > > > > > >   |
> > > > > > >   | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2 
> > > > > > >   |
> > > > > > >   | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3 
> > > > > > >   |
> > > > > > >   | change  | 3  | changeid4  | R   | Section 5.8.1.4 
> > > > > > >   |
> > > > > > >   | size| 4  | uint64_t   | R W | Section 5.8.1.5 
> > > > > > >   |
> > > > > > >   | link_support| 5  | bool   | R   | Section 5.8.1.6 
> > > > > > >   |
> > > > > > >   | symlink_support | 6  | bool   | R   | Section 5.8.1.7 
> > > > > > >   |
> > > > > > >   | named_attr  | 7  | bool   | R   | Section 5.8.1.8 
> > > > > > >   |
> > > > > > >   | fsid| 8  | fsid4  | R   | Section 5.8.1.9 
> > > > > > >   |
> > > > > > >   | unique_handles  | 9  | bool   | R   | Section 
> > > > > > > 5.8.1.10  |
> > > > > > >   | lease_time  | 10 | nfs_lease4 | R   | Section 
> > > > > > > 5.8.1.11  |
> > > > > > >   | rdattr_error| 11 | nfsstat4   | R   | Section 
> > > > > > > 5.8.1.12  |
> > > > > > >   | filehandle  | 19 | nfs_fh4| R   | Section 
> > > > > > > 5.8.1.13  |
> > > > > > >   
> > > > > > > +-+++-+---+
> > > > > > > 
> > > > > > 
> > > > > > Not every time -- only when the cache needs revalidation.
> > > > > > 
> > > > > > In the absence of a delegation, that happens on a timeout (see the
> > > > > > acregmin/acregmax settings in nfs(5)), though things like opens and 
> > > > > > file
> > > > > > locking events also affect when the client revalidates.
> > > > > > 
> > > > > > When the v4 client does revalidate the cache, it relies heavily on 
> > > > > > NFSv4
> > > > > > change attribute. Cephfs's change attribute is cluster-coherent 
> > > > > > too, so
> > > > > > if the client does revalidate it should see changes made on other
> > > > > > servers.
> > > > > > 
> > > > > > > On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton 
> > > > > > >  wrote:
> > > > > > > > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > > > > > > > > Hi Jeff,
> > > > > > > > > Another question is about Client Caching when disabling 
> > > > > > > > > delegation.
> > > > > > > > > I set breakpoint on nfs4_op_read, which is OP_READ process 
> > > > > > > > > function in
> > > > > > > > > nfs-ganesha. Then I read a file, I found that it will hit 
> > > > > > > > > only once on
> > > > > > > > > the first time, which means latter reading operation on this 
> > > > > > > > > file will
> > > > > > > > > not trigger OP_READ. It will read the data from client side 
> > > > > > > > > cache. Is
> > > > > > > > > it right?
> > > > > > > > 
> > > > > > > > Yes. In the absence of a delegation, the client will 
> > > > > > > > periodically query
> > > > > > > > for the inode attributes, and will serve reads from the cache 
> > > > > > > > if it
> > > > > > > > looks like the file hasn't changed.
> > > > > > > > 
> > > > > > > > > I also checked the nfs client code in linux kernel. Only
> > > > > > > > > cache_validity is 

Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-18 Thread Paul Emmerich
> >
> > I've benchmarked a ~15% performance difference in IOPS between cache
> > expiration time of 0 and 10 when running fio on a single file from a
> > single client.
> >
> >
>
> NFS iops? I'd guess more READ ops in particular? Is that with a
> FSAL_CEPH backend?

Yes. But that take that with a grain of salt, that was just a quick
and dirty test of a very specific scenario that may or may not be
relevant.


Paul

>
>
> >
> > >
> > > > On Thu, Feb 14, 2019 at 9:04 PM Jeff Layton  
> > > > wrote:
> > > > > On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote:
> > > > > > Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
> > > > > > Will Client query 'change' attribute every time before reading to 
> > > > > > know
> > > > > > if the data has been changed?
> > > > > >
> > > > > >   
> > > > > > +-+++-+---+
> > > > > >   | Name| ID | Data Type  | Acc | Defined in
> > > > > > |
> > > > > >   
> > > > > > +-+++-+---+
> > > > > >   | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1   
> > > > > > |
> > > > > >   | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2   
> > > > > > |
> > > > > >   | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3   
> > > > > > |
> > > > > >   | change  | 3  | changeid4  | R   | Section 5.8.1.4   
> > > > > > |
> > > > > >   | size| 4  | uint64_t   | R W | Section 5.8.1.5   
> > > > > > |
> > > > > >   | link_support| 5  | bool   | R   | Section 5.8.1.6   
> > > > > > |
> > > > > >   | symlink_support | 6  | bool   | R   | Section 5.8.1.7   
> > > > > > |
> > > > > >   | named_attr  | 7  | bool   | R   | Section 5.8.1.8   
> > > > > > |
> > > > > >   | fsid| 8  | fsid4  | R   | Section 5.8.1.9   
> > > > > > |
> > > > > >   | unique_handles  | 9  | bool   | R   | Section 5.8.1.10  
> > > > > > |
> > > > > >   | lease_time  | 10 | nfs_lease4 | R   | Section 5.8.1.11  
> > > > > > |
> > > > > >   | rdattr_error| 11 | nfsstat4   | R   | Section 5.8.1.12  
> > > > > > |
> > > > > >   | filehandle  | 19 | nfs_fh4| R   | Section 5.8.1.13  
> > > > > > |
> > > > > >   
> > > > > > +-+++-+---+
> > > > > >
> > > > >
> > > > > Not every time -- only when the cache needs revalidation.
> > > > >
> > > > > In the absence of a delegation, that happens on a timeout (see the
> > > > > acregmin/acregmax settings in nfs(5)), though things like opens and 
> > > > > file
> > > > > locking events also affect when the client revalidates.
> > > > >
> > > > > When the v4 client does revalidate the cache, it relies heavily on 
> > > > > NFSv4
> > > > > change attribute. Cephfs's change attribute is cluster-coherent too, 
> > > > > so
> > > > > if the client does revalidate it should see changes made on other
> > > > > servers.
> > > > >
> > > > > > On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton 
> > > > > >  wrote:
> > > > > > > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > > > > > > > Hi Jeff,
> > > > > > > > Another question is about Client Caching when disabling 
> > > > > > > > delegation.
> > > > > > > > I set breakpoint on nfs4_op_read, which is OP_READ process 
> > > > > > > > function in
> > > > > > > > nfs-ganesha. Then I read a file, I found that it will hit only 
> > > > > > > > once on
> > > > > > > > the first time, which means latter reading operation on this 
> > > > > > > > file will
> > > > > > > > not trigger OP_READ. It will read the data from client side 
> > > > > > > > cache. Is
> > > > > > > > it right?
> > > > > > >
> > > > > > > Yes. In the absence of a delegation, the client will periodically 
> > > > > > > query
> > > > > > > for the inode attributes, and will serve reads from the cache if 
> > > > > > > it
> > > > > > > looks like the file hasn't changed.
> > > > > > >
> > > > > > > > I also checked the nfs client code in linux kernel. Only
> > > > > > > > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ 
> > > > > > > > again,
> > > > > > > > like this:
> > > > > > > > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> > > > > > > > ret = nfs_invalidate_mapping(inode, mapping);
> > > > > > > > }
> > > > > > > > This about this senario, client1 connect ganesha1 and client2 
> > > > > > > > connect
> > > > > > > > ganesha2. I read /1.txt on client1 and client1 will cache the 
> > > > > > > > data.
> > > > > > > > Then I modify this file on client2. At that time, how client1 
> > > > > > > > know the
> > > > > > > > file is modifed and how it will add NFS_INO_INVALID_DATA into
> > > > > > > > cache_validity?
> > > > > > >
> > > > > > > Once you modify the code on client2, ganesha2 will request the 
> > > > > > > necessary
> > > > > > > caps from the ceph MDS, and client1 will have 

Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-18 Thread Jeff Layton
On Mon, 2019-02-18 at 16:40 +0100, Paul Emmerich wrote:
> > A call into libcephfs from ganesha to retrieve cached attributes is
> > mostly just in-memory copies within the same process, so any performance
> > overhead there is pretty minimal. If we need to go to the network to get
> > the attributes, then that was a case where the cache should have been
> > invalidated anyway, and we avoid having to check the validity of the
> > cache.
> 
> I've benchmarked a ~15% performance difference in IOPS between cache
> expiration time of 0 and 10 when running fio on a single file from a
> single client.
> 
> 

NFS iops? I'd guess more READ ops in particular? Is that with a
FSAL_CEPH backend?


> 
> > 
> > > On Thu, Feb 14, 2019 at 9:04 PM Jeff Layton  
> > > wrote:
> > > > On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote:
> > > > > Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
> > > > > Will Client query 'change' attribute every time before reading to know
> > > > > if the data has been changed?
> > > > > 
> > > > >   +-+++-+---+
> > > > >   | Name| ID | Data Type  | Acc | Defined in|
> > > > >   +-+++-+---+
> > > > >   | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1   |
> > > > >   | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2   |
> > > > >   | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3   |
> > > > >   | change  | 3  | changeid4  | R   | Section 5.8.1.4   |
> > > > >   | size| 4  | uint64_t   | R W | Section 5.8.1.5   |
> > > > >   | link_support| 5  | bool   | R   | Section 5.8.1.6   |
> > > > >   | symlink_support | 6  | bool   | R   | Section 5.8.1.7   |
> > > > >   | named_attr  | 7  | bool   | R   | Section 5.8.1.8   |
> > > > >   | fsid| 8  | fsid4  | R   | Section 5.8.1.9   |
> > > > >   | unique_handles  | 9  | bool   | R   | Section 5.8.1.10  |
> > > > >   | lease_time  | 10 | nfs_lease4 | R   | Section 5.8.1.11  |
> > > > >   | rdattr_error| 11 | nfsstat4   | R   | Section 5.8.1.12  |
> > > > >   | filehandle  | 19 | nfs_fh4| R   | Section 5.8.1.13  |
> > > > >   +-+++-+---+
> > > > > 
> > > > 
> > > > Not every time -- only when the cache needs revalidation.
> > > > 
> > > > In the absence of a delegation, that happens on a timeout (see the
> > > > acregmin/acregmax settings in nfs(5)), though things like opens and file
> > > > locking events also affect when the client revalidates.
> > > > 
> > > > When the v4 client does revalidate the cache, it relies heavily on NFSv4
> > > > change attribute. Cephfs's change attribute is cluster-coherent too, so
> > > > if the client does revalidate it should see changes made on other
> > > > servers.
> > > > 
> > > > > On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton  
> > > > > wrote:
> > > > > > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > > > > > > Hi Jeff,
> > > > > > > Another question is about Client Caching when disabling 
> > > > > > > delegation.
> > > > > > > I set breakpoint on nfs4_op_read, which is OP_READ process 
> > > > > > > function in
> > > > > > > nfs-ganesha. Then I read a file, I found that it will hit only 
> > > > > > > once on
> > > > > > > the first time, which means latter reading operation on this file 
> > > > > > > will
> > > > > > > not trigger OP_READ. It will read the data from client side 
> > > > > > > cache. Is
> > > > > > > it right?
> > > > > > 
> > > > > > Yes. In the absence of a delegation, the client will periodically 
> > > > > > query
> > > > > > for the inode attributes, and will serve reads from the cache if it
> > > > > > looks like the file hasn't changed.
> > > > > > 
> > > > > > > I also checked the nfs client code in linux kernel. Only
> > > > > > > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ 
> > > > > > > again,
> > > > > > > like this:
> > > > > > > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> > > > > > > ret = nfs_invalidate_mapping(inode, mapping);
> > > > > > > }
> > > > > > > This about this senario, client1 connect ganesha1 and client2 
> > > > > > > connect
> > > > > > > ganesha2. I read /1.txt on client1 and client1 will cache the 
> > > > > > > data.
> > > > > > > Then I modify this file on client2. At that time, how client1 
> > > > > > > know the
> > > > > > > file is modifed and how it will add NFS_INO_INVALID_DATA into
> > > > > > > cache_validity?
> > > > > > 
> > > > > > Once you modify the code on client2, ganesha2 will request the 
> > > > > > necessary
> > > > > > caps from the ceph MDS, and client1 will have its caps revoked. 
> > > > > > It'll
> > > > > > then make the change.
> > > > > > 
> > > > > > When client1 reads again it will issue a GETATTR against the 

Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-18 Thread Paul Emmerich
>
> A call into libcephfs from ganesha to retrieve cached attributes is
> mostly just in-memory copies within the same process, so any performance
> overhead there is pretty minimal. If we need to go to the network to get
> the attributes, then that was a case where the cache should have been
> invalidated anyway, and we avoid having to check the validity of the
> cache.

I've benchmarked a ~15% performance difference in IOPS between cache
expiration time of 0 and 10 when running fio on a single file from a
single client.


Paul

>
>
> > On Thu, Feb 14, 2019 at 9:04 PM Jeff Layton  wrote:
> > > On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote:
> > > > Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
> > > > Will Client query 'change' attribute every time before reading to know
> > > > if the data has been changed?
> > > >
> > > >   +-+++-+---+
> > > >   | Name| ID | Data Type  | Acc | Defined in|
> > > >   +-+++-+---+
> > > >   | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1   |
> > > >   | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2   |
> > > >   | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3   |
> > > >   | change  | 3  | changeid4  | R   | Section 5.8.1.4   |
> > > >   | size| 4  | uint64_t   | R W | Section 5.8.1.5   |
> > > >   | link_support| 5  | bool   | R   | Section 5.8.1.6   |
> > > >   | symlink_support | 6  | bool   | R   | Section 5.8.1.7   |
> > > >   | named_attr  | 7  | bool   | R   | Section 5.8.1.8   |
> > > >   | fsid| 8  | fsid4  | R   | Section 5.8.1.9   |
> > > >   | unique_handles  | 9  | bool   | R   | Section 5.8.1.10  |
> > > >   | lease_time  | 10 | nfs_lease4 | R   | Section 5.8.1.11  |
> > > >   | rdattr_error| 11 | nfsstat4   | R   | Section 5.8.1.12  |
> > > >   | filehandle  | 19 | nfs_fh4| R   | Section 5.8.1.13  |
> > > >   +-+++-+---+
> > > >
> > >
> > > Not every time -- only when the cache needs revalidation.
> > >
> > > In the absence of a delegation, that happens on a timeout (see the
> > > acregmin/acregmax settings in nfs(5)), though things like opens and file
> > > locking events also affect when the client revalidates.
> > >
> > > When the v4 client does revalidate the cache, it relies heavily on NFSv4
> > > change attribute. Cephfs's change attribute is cluster-coherent too, so
> > > if the client does revalidate it should see changes made on other
> > > servers.
> > >
> > > > On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton  
> > > > wrote:
> > > > > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > > > > > Hi Jeff,
> > > > > > Another question is about Client Caching when disabling delegation.
> > > > > > I set breakpoint on nfs4_op_read, which is OP_READ process function 
> > > > > > in
> > > > > > nfs-ganesha. Then I read a file, I found that it will hit only once 
> > > > > > on
> > > > > > the first time, which means latter reading operation on this file 
> > > > > > will
> > > > > > not trigger OP_READ. It will read the data from client side cache. 
> > > > > > Is
> > > > > > it right?
> > > > >
> > > > > Yes. In the absence of a delegation, the client will periodically 
> > > > > query
> > > > > for the inode attributes, and will serve reads from the cache if it
> > > > > looks like the file hasn't changed.
> > > > >
> > > > > > I also checked the nfs client code in linux kernel. Only
> > > > > > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
> > > > > > like this:
> > > > > > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> > > > > > ret = nfs_invalidate_mapping(inode, mapping);
> > > > > > }
> > > > > > This about this senario, client1 connect ganesha1 and client2 
> > > > > > connect
> > > > > > ganesha2. I read /1.txt on client1 and client1 will cache the data.
> > > > > > Then I modify this file on client2. At that time, how client1 know 
> > > > > > the
> > > > > > file is modifed and how it will add NFS_INO_INVALID_DATA into
> > > > > > cache_validity?
> > > > >
> > > > > Once you modify the code on client2, ganesha2 will request the 
> > > > > necessary
> > > > > caps from the ceph MDS, and client1 will have its caps revoked. It'll
> > > > > then make the change.
> > > > >
> > > > > When client1 reads again it will issue a GETATTR against the file [1].
> > > > > ganesha1 will then request caps to do the getattr, which will end up
> > > > > revoking ganesha2's caps. client1 will then see the change in 
> > > > > attributes
> > > > > (the change attribute and mtime, most likely) and will invalidate the
> > > > > mapping, causing it do reissue a READ on the wire.
> > > > >
> > > > > [1]: There may be a window of time after you 

Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-15 Thread Jeff Layton
On Fri, 2019-02-15 at 15:34 +0800, Marvin Zhang wrote:
> Thanks Jeff.
> If I set Attr_Expiration_Time as zero in conf , deos it mean timeout
> is zero? If so, every client will see the change immediately. Will it
> decrease the performance hardly?
> I seems that GlusterFS FSAL use  UPCALL to invalidate the cache. How
> about the CephFS FSAL?
> 

We mostly suggest ganesha's attribute cache be disabled when exporting
FSAL_CEPH. libcephfs caches attributes too, and it knows the status of
those attributes better than ganesha can.

A call into libcephfs from ganesha to retrieve cached attributes is
mostly just in-memory copies within the same process, so any performance
overhead there is pretty minimal. If we need to go to the network to get
the attributes, then that was a case where the cache should have been
invalidated anyway, and we avoid having to check the validity of the
cache.


> On Thu, Feb 14, 2019 at 9:04 PM Jeff Layton  wrote:
> > On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote:
> > > Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
> > > Will Client query 'change' attribute every time before reading to know
> > > if the data has been changed?
> > > 
> > >   +-+++-+---+
> > >   | Name| ID | Data Type  | Acc | Defined in|
> > >   +-+++-+---+
> > >   | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1   |
> > >   | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2   |
> > >   | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3   |
> > >   | change  | 3  | changeid4  | R   | Section 5.8.1.4   |
> > >   | size| 4  | uint64_t   | R W | Section 5.8.1.5   |
> > >   | link_support| 5  | bool   | R   | Section 5.8.1.6   |
> > >   | symlink_support | 6  | bool   | R   | Section 5.8.1.7   |
> > >   | named_attr  | 7  | bool   | R   | Section 5.8.1.8   |
> > >   | fsid| 8  | fsid4  | R   | Section 5.8.1.9   |
> > >   | unique_handles  | 9  | bool   | R   | Section 5.8.1.10  |
> > >   | lease_time  | 10 | nfs_lease4 | R   | Section 5.8.1.11  |
> > >   | rdattr_error| 11 | nfsstat4   | R   | Section 5.8.1.12  |
> > >   | filehandle  | 19 | nfs_fh4| R   | Section 5.8.1.13  |
> > >   +-+++-+---+
> > > 
> > 
> > Not every time -- only when the cache needs revalidation.
> > 
> > In the absence of a delegation, that happens on a timeout (see the
> > acregmin/acregmax settings in nfs(5)), though things like opens and file
> > locking events also affect when the client revalidates.
> > 
> > When the v4 client does revalidate the cache, it relies heavily on NFSv4
> > change attribute. Cephfs's change attribute is cluster-coherent too, so
> > if the client does revalidate it should see changes made on other
> > servers.
> > 
> > > On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton  
> > > wrote:
> > > > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > > > > Hi Jeff,
> > > > > Another question is about Client Caching when disabling delegation.
> > > > > I set breakpoint on nfs4_op_read, which is OP_READ process function in
> > > > > nfs-ganesha. Then I read a file, I found that it will hit only once on
> > > > > the first time, which means latter reading operation on this file will
> > > > > not trigger OP_READ. It will read the data from client side cache. Is
> > > > > it right?
> > > > 
> > > > Yes. In the absence of a delegation, the client will periodically query
> > > > for the inode attributes, and will serve reads from the cache if it
> > > > looks like the file hasn't changed.
> > > > 
> > > > > I also checked the nfs client code in linux kernel. Only
> > > > > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
> > > > > like this:
> > > > > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> > > > > ret = nfs_invalidate_mapping(inode, mapping);
> > > > > }
> > > > > This about this senario, client1 connect ganesha1 and client2 connect
> > > > > ganesha2. I read /1.txt on client1 and client1 will cache the data.
> > > > > Then I modify this file on client2. At that time, how client1 know the
> > > > > file is modifed and how it will add NFS_INO_INVALID_DATA into
> > > > > cache_validity?
> > > > 
> > > > Once you modify the code on client2, ganesha2 will request the necessary
> > > > caps from the ceph MDS, and client1 will have its caps revoked. It'll
> > > > then make the change.
> > > > 
> > > > When client1 reads again it will issue a GETATTR against the file [1].
> > > > ganesha1 will then request caps to do the getattr, which will end up
> > > > revoking ganesha2's caps. client1 will then see the change in attributes
> > > > (the change attribute and mtime, most likely) and will invalidate the
> > > > mapping, 

Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-14 Thread Marvin Zhang
Thanks Jeff.
If I set Attr_Expiration_Time as zero in conf , deos it mean timeout
is zero? If so, every client will see the change immediately. Will it
decrease the performance hardly?
I seems that GlusterFS FSAL use  UPCALL to invalidate the cache. How
about the CephFS FSAL?

On Thu, Feb 14, 2019 at 9:04 PM Jeff Layton  wrote:
>
> On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote:
> > Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
> > Will Client query 'change' attribute every time before reading to know
> > if the data has been changed?
> >
> >   +-+++-+---+
> >   | Name| ID | Data Type  | Acc | Defined in|
> >   +-+++-+---+
> >   | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1   |
> >   | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2   |
> >   | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3   |
> >   | change  | 3  | changeid4  | R   | Section 5.8.1.4   |
> >   | size| 4  | uint64_t   | R W | Section 5.8.1.5   |
> >   | link_support| 5  | bool   | R   | Section 5.8.1.6   |
> >   | symlink_support | 6  | bool   | R   | Section 5.8.1.7   |
> >   | named_attr  | 7  | bool   | R   | Section 5.8.1.8   |
> >   | fsid| 8  | fsid4  | R   | Section 5.8.1.9   |
> >   | unique_handles  | 9  | bool   | R   | Section 5.8.1.10  |
> >   | lease_time  | 10 | nfs_lease4 | R   | Section 5.8.1.11  |
> >   | rdattr_error| 11 | nfsstat4   | R   | Section 5.8.1.12  |
> >   | filehandle  | 19 | nfs_fh4| R   | Section 5.8.1.13  |
> >   +-+++-+---+
> >
>
> Not every time -- only when the cache needs revalidation.
>
> In the absence of a delegation, that happens on a timeout (see the
> acregmin/acregmax settings in nfs(5)), though things like opens and file
> locking events also affect when the client revalidates.
>
> When the v4 client does revalidate the cache, it relies heavily on NFSv4
> change attribute. Cephfs's change attribute is cluster-coherent too, so
> if the client does revalidate it should see changes made on other
> servers.
>
> > On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton  wrote:
> > > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > > > Hi Jeff,
> > > > Another question is about Client Caching when disabling delegation.
> > > > I set breakpoint on nfs4_op_read, which is OP_READ process function in
> > > > nfs-ganesha. Then I read a file, I found that it will hit only once on
> > > > the first time, which means latter reading operation on this file will
> > > > not trigger OP_READ. It will read the data from client side cache. Is
> > > > it right?
> > >
> > > Yes. In the absence of a delegation, the client will periodically query
> > > for the inode attributes, and will serve reads from the cache if it
> > > looks like the file hasn't changed.
> > >
> > > > I also checked the nfs client code in linux kernel. Only
> > > > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
> > > > like this:
> > > > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> > > > ret = nfs_invalidate_mapping(inode, mapping);
> > > > }
> > > > This about this senario, client1 connect ganesha1 and client2 connect
> > > > ganesha2. I read /1.txt on client1 and client1 will cache the data.
> > > > Then I modify this file on client2. At that time, how client1 know the
> > > > file is modifed and how it will add NFS_INO_INVALID_DATA into
> > > > cache_validity?
> > >
> > > Once you modify the code on client2, ganesha2 will request the necessary
> > > caps from the ceph MDS, and client1 will have its caps revoked. It'll
> > > then make the change.
> > >
> > > When client1 reads again it will issue a GETATTR against the file [1].
> > > ganesha1 will then request caps to do the getattr, which will end up
> > > revoking ganesha2's caps. client1 will then see the change in attributes
> > > (the change attribute and mtime, most likely) and will invalidate the
> > > mapping, causing it do reissue a READ on the wire.
> > >
> > > [1]: There may be a window of time after you change the file on client2
> > > where client1 doesn't see it. That's due to the fact that inode
> > > attributes on the client are only revalidated after a timeout. You may
> > > want to read over the DATA AND METADATA COHERENCE section of nfs(5) to
> > > make sure you understand how the NFS client validates its caches.
> > >
> > > Cheers,
> > > --
> > > Jeff Layton 
> > >
>
> --
> Jeff Layton 
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-14 Thread Jeff Layton
On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote:
> Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
> Will Client query 'change' attribute every time before reading to know
> if the data has been changed?
> 
>   +-+++-+---+
>   | Name| ID | Data Type  | Acc | Defined in|
>   +-+++-+---+
>   | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1   |
>   | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2   |
>   | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3   |
>   | change  | 3  | changeid4  | R   | Section 5.8.1.4   |
>   | size| 4  | uint64_t   | R W | Section 5.8.1.5   |
>   | link_support| 5  | bool   | R   | Section 5.8.1.6   |
>   | symlink_support | 6  | bool   | R   | Section 5.8.1.7   |
>   | named_attr  | 7  | bool   | R   | Section 5.8.1.8   |
>   | fsid| 8  | fsid4  | R   | Section 5.8.1.9   |
>   | unique_handles  | 9  | bool   | R   | Section 5.8.1.10  |
>   | lease_time  | 10 | nfs_lease4 | R   | Section 5.8.1.11  |
>   | rdattr_error| 11 | nfsstat4   | R   | Section 5.8.1.12  |
>   | filehandle  | 19 | nfs_fh4| R   | Section 5.8.1.13  |
>   +-+++-+---+
> 

Not every time -- only when the cache needs revalidation.

In the absence of a delegation, that happens on a timeout (see the
acregmin/acregmax settings in nfs(5)), though things like opens and file
locking events also affect when the client revalidates.

When the v4 client does revalidate the cache, it relies heavily on NFSv4
change attribute. Cephfs's change attribute is cluster-coherent too, so
if the client does revalidate it should see changes made on other
servers.

> On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton  wrote:
> > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > > Hi Jeff,
> > > Another question is about Client Caching when disabling delegation.
> > > I set breakpoint on nfs4_op_read, which is OP_READ process function in
> > > nfs-ganesha. Then I read a file, I found that it will hit only once on
> > > the first time, which means latter reading operation on this file will
> > > not trigger OP_READ. It will read the data from client side cache. Is
> > > it right?
> > 
> > Yes. In the absence of a delegation, the client will periodically query
> > for the inode attributes, and will serve reads from the cache if it
> > looks like the file hasn't changed.
> > 
> > > I also checked the nfs client code in linux kernel. Only
> > > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
> > > like this:
> > > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> > > ret = nfs_invalidate_mapping(inode, mapping);
> > > }
> > > This about this senario, client1 connect ganesha1 and client2 connect
> > > ganesha2. I read /1.txt on client1 and client1 will cache the data.
> > > Then I modify this file on client2. At that time, how client1 know the
> > > file is modifed and how it will add NFS_INO_INVALID_DATA into
> > > cache_validity?
> > 
> > Once you modify the code on client2, ganesha2 will request the necessary
> > caps from the ceph MDS, and client1 will have its caps revoked. It'll
> > then make the change.
> > 
> > When client1 reads again it will issue a GETATTR against the file [1].
> > ganesha1 will then request caps to do the getattr, which will end up
> > revoking ganesha2's caps. client1 will then see the change in attributes
> > (the change attribute and mtime, most likely) and will invalidate the
> > mapping, causing it do reissue a READ on the wire.
> > 
> > [1]: There may be a window of time after you change the file on client2
> > where client1 doesn't see it. That's due to the fact that inode
> > attributes on the client are only revalidated after a timeout. You may
> > want to read over the DATA AND METADATA COHERENCE section of nfs(5) to
> > make sure you understand how the NFS client validates its caches.
> > 
> > Cheers,
> > --
> > Jeff Layton 
> > 

-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-14 Thread Marvin Zhang
Here is the copy from https://tools.ietf.org/html/rfc7530#page-40
Will Client query 'change' attribute every time before reading to know
if the data has been changed?

  +-+++-+---+
  | Name| ID | Data Type  | Acc | Defined in|
  +-+++-+---+
  | supported_attrs | 0  | bitmap4| R   | Section 5.8.1.1   |
  | type| 1  | nfs_ftype4 | R   | Section 5.8.1.2   |
  | fh_expire_type  | 2  | uint32_t   | R   | Section 5.8.1.3   |
  | change  | 3  | changeid4  | R   | Section 5.8.1.4   |
  | size| 4  | uint64_t   | R W | Section 5.8.1.5   |
  | link_support| 5  | bool   | R   | Section 5.8.1.6   |
  | symlink_support | 6  | bool   | R   | Section 5.8.1.7   |
  | named_attr  | 7  | bool   | R   | Section 5.8.1.8   |
  | fsid| 8  | fsid4  | R   | Section 5.8.1.9   |
  | unique_handles  | 9  | bool   | R   | Section 5.8.1.10  |
  | lease_time  | 10 | nfs_lease4 | R   | Section 5.8.1.11  |
  | rdattr_error| 11 | nfsstat4   | R   | Section 5.8.1.12  |
  | filehandle  | 19 | nfs_fh4| R   | Section 5.8.1.13  |
  +-+++-+---+

On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton  wrote:
>
> On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> > Hi Jeff,
> > Another question is about Client Caching when disabling delegation.
> > I set breakpoint on nfs4_op_read, which is OP_READ process function in
> > nfs-ganesha. Then I read a file, I found that it will hit only once on
> > the first time, which means latter reading operation on this file will
> > not trigger OP_READ. It will read the data from client side cache. Is
> > it right?
>
> Yes. In the absence of a delegation, the client will periodically query
> for the inode attributes, and will serve reads from the cache if it
> looks like the file hasn't changed.
>
> > I also checked the nfs client code in linux kernel. Only
> > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
> > like this:
> > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> > ret = nfs_invalidate_mapping(inode, mapping);
> > }
> > This about this senario, client1 connect ganesha1 and client2 connect
> > ganesha2. I read /1.txt on client1 and client1 will cache the data.
> > Then I modify this file on client2. At that time, how client1 know the
> > file is modifed and how it will add NFS_INO_INVALID_DATA into
> > cache_validity?
>
>
> Once you modify the code on client2, ganesha2 will request the necessary
> caps from the ceph MDS, and client1 will have its caps revoked. It'll
> then make the change.
>
> When client1 reads again it will issue a GETATTR against the file [1].
> ganesha1 will then request caps to do the getattr, which will end up
> revoking ganesha2's caps. client1 will then see the change in attributes
> (the change attribute and mtime, most likely) and will invalidate the
> mapping, causing it do reissue a READ on the wire.
>
> [1]: There may be a window of time after you change the file on client2
> where client1 doesn't see it. That's due to the fact that inode
> attributes on the client are only revalidated after a timeout. You may
> want to read over the DATA AND METADATA COHERENCE section of nfs(5) to
> make sure you understand how the NFS client validates its caches.
>
> Cheers,
> --
> Jeff Layton 
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-14 Thread Jeff Layton
On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote:
> Hi Jeff,
> Another question is about Client Caching when disabling delegation.
> I set breakpoint on nfs4_op_read, which is OP_READ process function in
> nfs-ganesha. Then I read a file, I found that it will hit only once on
> the first time, which means latter reading operation on this file will
> not trigger OP_READ. It will read the data from client side cache. Is
> it right?

Yes. In the absence of a delegation, the client will periodically query
for the inode attributes, and will serve reads from the cache if it
looks like the file hasn't changed.

> I also checked the nfs client code in linux kernel. Only
> cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
> like this:
> if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
> ret = nfs_invalidate_mapping(inode, mapping);
> }
> This about this senario, client1 connect ganesha1 and client2 connect
> ganesha2. I read /1.txt on client1 and client1 will cache the data.
> Then I modify this file on client2. At that time, how client1 know the
> file is modifed and how it will add NFS_INO_INVALID_DATA into
> cache_validity?


Once you modify the code on client2, ganesha2 will request the necessary
caps from the ceph MDS, and client1 will have its caps revoked. It'll
then make the change.

When client1 reads again it will issue a GETATTR against the file [1].
ganesha1 will then request caps to do the getattr, which will end up
revoking ganesha2's caps. client1 will then see the change in attributes
(the change attribute and mtime, most likely) and will invalidate the
mapping, causing it do reissue a READ on the wire.

[1]: There may be a window of time after you change the file on client2
where client1 doesn't see it. That's due to the fact that inode
attributes on the client are only revalidated after a timeout. You may
want to read over the DATA AND METADATA COHERENCE section of nfs(5) to
make sure you understand how the NFS client validates its caches.

Cheers,
-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-14 Thread Marvin Zhang
Hi Jeff,
Another question is about Client Caching when disabling delegation.
I set breakpoint on nfs4_op_read, which is OP_READ process function in
nfs-ganesha. Then I read a file, I found that it will hit only once on
the first time, which means latter reading operation on this file will
not trigger OP_READ. It will read the data from client side cache. Is
it right?
I also checked the nfs client code in linux kernel. Only
cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again,
like this:
if (nfsi->cache_validity & NFS_INO_INVALID_DATA) {
ret = nfs_invalidate_mapping(inode, mapping);
}
This about this senario, client1 connect ganesha1 and client2 connect
ganesha2. I read /1.txt on client1 and client1 will cache the data.
Then I modify this file on client2. At that time, how client1 know the
file is modifed and how it will add NFS_INO_INVALID_DATA into
cache_validity?
Thanks,
Marvin

On Thu, Feb 14, 2019 at 7:27 PM Jeff Layton  wrote:
>
> On Thu, 2019-02-14 at 10:35 +0800, Marvin Zhang wrote:
> > On Thu, Feb 14, 2019 at 8:09 AM Jeff Layton  wrote:
> > > > Hi,
> > > > As http://docs.ceph.com/docs/master/cephfs/nfs/ says, it's OK to
> > > > config active/passive NFS-Ganesha to use CephFs. My question is if we
> > > > can use active/active nfs-ganesha for CephFS.
> > >
> > > (Apologies if you get two copies of this. I sent an earlier one from the
> > > wrong account and it got stuck in moderation)
> > >
> > > You can, with the new rados-cluster recovery backend that went into
> > > ganesha v2.7. See here for a bit more detail:
> > >
> > > https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/
> > >
> > > ...also have a look at the ceph.conf file in the ganesha sources.
> > >
> > > > In my thought, only state consistance should we think about.
> > > > 1. Lock support for Active/Active. Even each nfs-ganesha sever mantain
> > > > the lock state, the real lock/unlock will call
> > > > ceph_ll_getlk/ceph_ll_setlk. So Ceph cluster will handle the lock
> > > > safely.
> > > > 2. Delegation support Active/Active. It's similar question 1,
> > > > ceph_ll_delegation will handle it safely.
> > > > 3. Nfs-ganesha cache support Active/Active. As
> > > > https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf
> > > > describes, we can config cache size as size 1.
> > > > 4. Ceph-FSAL cache support Active/Active. Like other CephFs client,
> > > > there is no issues for cache consistance.
> > >
> > > The basic idea with the new recovery backend is to have the different
> > > NFS ganesha heads coordinate their recovery grace periods to prevent
> > > stateful conflicts.
> > >
> > > The one thing missing at this point is delegations in an active/active
> > > configuration, but that's mainly because of the synchronous nature of
> > > libcephfs. We have a potential fix for that problem but it requires work
> > > in libcephfs that is not yet done.
> > [marvin] So we should disable delegation on active/active and set the
> > conf like this. Is it right?
> > NFSv4
> > {
> > Delegations = false;
> > }
>
> Yes.
> --
> Jeff Layton 
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-14 Thread Jeff Layton
On Thu, 2019-02-14 at 10:35 +0800, Marvin Zhang wrote:
> On Thu, Feb 14, 2019 at 8:09 AM Jeff Layton  wrote:
> > > Hi,
> > > As http://docs.ceph.com/docs/master/cephfs/nfs/ says, it's OK to
> > > config active/passive NFS-Ganesha to use CephFs. My question is if we
> > > can use active/active nfs-ganesha for CephFS.
> > 
> > (Apologies if you get two copies of this. I sent an earlier one from the
> > wrong account and it got stuck in moderation)
> > 
> > You can, with the new rados-cluster recovery backend that went into
> > ganesha v2.7. See here for a bit more detail:
> > 
> > https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/
> > 
> > ...also have a look at the ceph.conf file in the ganesha sources.
> > 
> > > In my thought, only state consistance should we think about.
> > > 1. Lock support for Active/Active. Even each nfs-ganesha sever mantain
> > > the lock state, the real lock/unlock will call
> > > ceph_ll_getlk/ceph_ll_setlk. So Ceph cluster will handle the lock
> > > safely.
> > > 2. Delegation support Active/Active. It's similar question 1,
> > > ceph_ll_delegation will handle it safely.
> > > 3. Nfs-ganesha cache support Active/Active. As
> > > https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf
> > > describes, we can config cache size as size 1.
> > > 4. Ceph-FSAL cache support Active/Active. Like other CephFs client,
> > > there is no issues for cache consistance.
> > 
> > The basic idea with the new recovery backend is to have the different
> > NFS ganesha heads coordinate their recovery grace periods to prevent
> > stateful conflicts.
> > 
> > The one thing missing at this point is delegations in an active/active
> > configuration, but that's mainly because of the synchronous nature of
> > libcephfs. We have a potential fix for that problem but it requires work
> > in libcephfs that is not yet done.
> [marvin] So we should disable delegation on active/active and set the
> conf like this. Is it right?
> NFSv4
> {
> Delegations = false;
> }

Yes.
-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-13 Thread Marvin Zhang
On Thu, Feb 14, 2019 at 8:09 AM Jeff Layton  wrote:
>
> > Hi,
> > As http://docs.ceph.com/docs/master/cephfs/nfs/ says, it's OK to
> > config active/passive NFS-Ganesha to use CephFs. My question is if we
> > can use active/active nfs-ganesha for CephFS.
>
> (Apologies if you get two copies of this. I sent an earlier one from the
> wrong account and it got stuck in moderation)
>
> You can, with the new rados-cluster recovery backend that went into
> ganesha v2.7. See here for a bit more detail:
>
> https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/
>
> ...also have a look at the ceph.conf file in the ganesha sources.
>
> > In my thought, only state consistance should we think about.
> > 1. Lock support for Active/Active. Even each nfs-ganesha sever mantain
> > the lock state, the real lock/unlock will call
> > ceph_ll_getlk/ceph_ll_setlk. So Ceph cluster will handle the lock
> > safely.
> > 2. Delegation support Active/Active. It's similar question 1,
> > ceph_ll_delegation will handle it safely.
> > 3. Nfs-ganesha cache support Active/Active. As
> > https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf
> > describes, we can config cache size as size 1.
> > 4. Ceph-FSAL cache support Active/Active. Like other CephFs client,
> > there is no issues for cache consistance.
>
> The basic idea with the new recovery backend is to have the different
> NFS ganesha heads coordinate their recovery grace periods to prevent
> stateful conflicts.
>
> The one thing missing at this point is delegations in an active/active
> configuration, but that's mainly because of the synchronous nature of
> libcephfs. We have a potential fix for that problem but it requires work
> in libcephfs that is not yet done.
[marvin] So we should disable delegation on active/active and set the
conf like this. Is it right?
NFSv4
{
Delegations = false;
}
>
> Cheers,
> --
> Jeff Layton 
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: NAS solution for CephFS

2019-02-13 Thread Jeff Layton
> Hi,
> As http://docs.ceph.com/docs/master/cephfs/nfs/ says, it's OK to
> config active/passive NFS-Ganesha to use CephFs. My question is if we
> can use active/active nfs-ganesha for CephFS.

(Apologies if you get two copies of this. I sent an earlier one from the
wrong account and it got stuck in moderation)

You can, with the new rados-cluster recovery backend that went into
ganesha v2.7. See here for a bit more detail:

https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/

...also have a look at the ceph.conf file in the ganesha sources.

> In my thought, only state consistance should we think about.
> 1. Lock support for Active/Active. Even each nfs-ganesha sever mantain
> the lock state, the real lock/unlock will call
> ceph_ll_getlk/ceph_ll_setlk. So Ceph cluster will handle the lock
> safely.
> 2. Delegation support Active/Active. It's similar question 1,
> ceph_ll_delegation will handle it safely.
> 3. Nfs-ganesha cache support Active/Active. As
> https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf
> describes, we can config cache size as size 1.
> 4. Ceph-FSAL cache support Active/Active. Like other CephFs client,
> there is no issues for cache consistance.

The basic idea with the new recovery backend is to have the different
NFS ganesha heads coordinate their recovery grace periods to prevent
stateful conflicts.

The one thing missing at this point is delegations in an active/active
configuration, but that's mainly because of the synchronous nature of
libcephfs. We have a potential fix for that problem but it requires work
in libcephfs that is not yet done.

Cheers,
-- 
Jeff Layton 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com