[ceph-users] NAS solution for CephFS
Hi, As http://docs.ceph.com/docs/master/cephfs/nfs/ says, it's OK to config active/passive NFS-Ganesha to use CephFs. My question is if we can use active/active nfs-ganesha for CephFS. In my thought, only state consistance should we think about. 1. Lock support for Active/Active. Even each nfs-ganesha sever mantain the lock state, the real lock/unlock will call ceph_ll_getlk/ceph_ll_setlk. So Ceph cluster will handle the lock safely. 2. Delegation support Active/Active. It's similar question 1, ceph_ll_delegation will handle it safely. 3. Nfs-ganesha cache support Active/Active. As https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf describes, we can config cache size as size 1. 4. Ceph-FSAL cache support Active/Active. Like other CephFs client, there is no issues for cache consistance. Thanks, Marvin ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fwd: NAS solution for CephFS
On Thu, Feb 14, 2019 at 8:09 AM Jeff Layton wrote: > > > Hi, > > As http://docs.ceph.com/docs/master/cephfs/nfs/ says, it's OK to > > config active/passive NFS-Ganesha to use CephFs. My question is if we > > can use active/active nfs-ganesha for CephFS. > > (Apologies if you get two copies of this. I sent an earlier one from the > wrong account and it got stuck in moderation) > > You can, with the new rados-cluster recovery backend that went into > ganesha v2.7. See here for a bit more detail: > > https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/ > > ...also have a look at the ceph.conf file in the ganesha sources. > > > In my thought, only state consistance should we think about. > > 1. Lock support for Active/Active. Even each nfs-ganesha sever mantain > > the lock state, the real lock/unlock will call > > ceph_ll_getlk/ceph_ll_setlk. So Ceph cluster will handle the lock > > safely. > > 2. Delegation support Active/Active. It's similar question 1, > > ceph_ll_delegation will handle it safely. > > 3. Nfs-ganesha cache support Active/Active. As > > https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf > > describes, we can config cache size as size 1. > > 4. Ceph-FSAL cache support Active/Active. Like other CephFs client, > > there is no issues for cache consistance. > > The basic idea with the new recovery backend is to have the different > NFS ganesha heads coordinate their recovery grace periods to prevent > stateful conflicts. > > The one thing missing at this point is delegations in an active/active > configuration, but that's mainly because of the synchronous nature of > libcephfs. We have a potential fix for that problem but it requires work > in libcephfs that is not yet done. [marvin] So we should disable delegation on active/active and set the conf like this. Is it right? NFSv4 { Delegations = false; } > > Cheers, > -- > Jeff Layton > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fwd: NAS solution for CephFS
Hi Jeff, Another question is about Client Caching when disabling delegation. I set breakpoint on nfs4_op_read, which is OP_READ process function in nfs-ganesha. Then I read a file, I found that it will hit only once on the first time, which means latter reading operation on this file will not trigger OP_READ. It will read the data from client side cache. Is it right? I also checked the nfs client code in linux kernel. Only cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again, like this: if (nfsi->cache_validity & NFS_INO_INVALID_DATA) { ret = nfs_invalidate_mapping(inode, mapping); } This about this senario, client1 connect ganesha1 and client2 connect ganesha2. I read /1.txt on client1 and client1 will cache the data. Then I modify this file on client2. At that time, how client1 know the file is modifed and how it will add NFS_INO_INVALID_DATA into cache_validity? Thanks, Marvin On Thu, Feb 14, 2019 at 7:27 PM Jeff Layton wrote: > > On Thu, 2019-02-14 at 10:35 +0800, Marvin Zhang wrote: > > On Thu, Feb 14, 2019 at 8:09 AM Jeff Layton wrote: > > > > Hi, > > > > As http://docs.ceph.com/docs/master/cephfs/nfs/ says, it's OK to > > > > config active/passive NFS-Ganesha to use CephFs. My question is if we > > > > can use active/active nfs-ganesha for CephFS. > > > > > > (Apologies if you get two copies of this. I sent an earlier one from the > > > wrong account and it got stuck in moderation) > > > > > > You can, with the new rados-cluster recovery backend that went into > > > ganesha v2.7. See here for a bit more detail: > > > > > > https://jtlayton.wordpress.com/2018/12/10/deploying-an-active-active-nfs-cluster-over-cephfs/ > > > > > > ...also have a look at the ceph.conf file in the ganesha sources. > > > > > > > In my thought, only state consistance should we think about. > > > > 1. Lock support for Active/Active. Even each nfs-ganesha sever mantain > > > > the lock state, the real lock/unlock will call > > > > ceph_ll_getlk/ceph_ll_setlk. So Ceph cluster will handle the lock > > > > safely. > > > > 2. Delegation support Active/Active. It's similar question 1, > > > > ceph_ll_delegation will handle it safely. > > > > 3. Nfs-ganesha cache support Active/Active. As > > > > https://github.com/nfs-ganesha/nfs-ganesha/blob/next/src/config_samples/ceph.conf > > > > describes, we can config cache size as size 1. > > > > 4. Ceph-FSAL cache support Active/Active. Like other CephFs client, > > > > there is no issues for cache consistance. > > > > > > The basic idea with the new recovery backend is to have the different > > > NFS ganesha heads coordinate their recovery grace periods to prevent > > > stateful conflicts. > > > > > > The one thing missing at this point is delegations in an active/active > > > configuration, but that's mainly because of the synchronous nature of > > > libcephfs. We have a potential fix for that problem but it requires work > > > in libcephfs that is not yet done. > > [marvin] So we should disable delegation on active/active and set the > > conf like this. Is it right? > > NFSv4 > > { > > Delegations = false; > > } > > Yes. > -- > Jeff Layton > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fwd: NAS solution for CephFS
Here is the copy from https://tools.ietf.org/html/rfc7530#page-40 Will Client query 'change' attribute every time before reading to know if the data has been changed? +-+++-+---+ | Name| ID | Data Type | Acc | Defined in| +-+++-+---+ | supported_attrs | 0 | bitmap4| R | Section 5.8.1.1 | | type| 1 | nfs_ftype4 | R | Section 5.8.1.2 | | fh_expire_type | 2 | uint32_t | R | Section 5.8.1.3 | | change | 3 | changeid4 | R | Section 5.8.1.4 | | size| 4 | uint64_t | R W | Section 5.8.1.5 | | link_support| 5 | bool | R | Section 5.8.1.6 | | symlink_support | 6 | bool | R | Section 5.8.1.7 | | named_attr | 7 | bool | R | Section 5.8.1.8 | | fsid| 8 | fsid4 | R | Section 5.8.1.9 | | unique_handles | 9 | bool | R | Section 5.8.1.10 | | lease_time | 10 | nfs_lease4 | R | Section 5.8.1.11 | | rdattr_error| 11 | nfsstat4 | R | Section 5.8.1.12 | | filehandle | 19 | nfs_fh4| R | Section 5.8.1.13 | +-+++-+---+ On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton wrote: > > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote: > > Hi Jeff, > > Another question is about Client Caching when disabling delegation. > > I set breakpoint on nfs4_op_read, which is OP_READ process function in > > nfs-ganesha. Then I read a file, I found that it will hit only once on > > the first time, which means latter reading operation on this file will > > not trigger OP_READ. It will read the data from client side cache. Is > > it right? > > Yes. In the absence of a delegation, the client will periodically query > for the inode attributes, and will serve reads from the cache if it > looks like the file hasn't changed. > > > I also checked the nfs client code in linux kernel. Only > > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again, > > like this: > > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) { > > ret = nfs_invalidate_mapping(inode, mapping); > > } > > This about this senario, client1 connect ganesha1 and client2 connect > > ganesha2. I read /1.txt on client1 and client1 will cache the data. > > Then I modify this file on client2. At that time, how client1 know the > > file is modifed and how it will add NFS_INO_INVALID_DATA into > > cache_validity? > > > Once you modify the code on client2, ganesha2 will request the necessary > caps from the ceph MDS, and client1 will have its caps revoked. It'll > then make the change. > > When client1 reads again it will issue a GETATTR against the file [1]. > ganesha1 will then request caps to do the getattr, which will end up > revoking ganesha2's caps. client1 will then see the change in attributes > (the change attribute and mtime, most likely) and will invalidate the > mapping, causing it do reissue a READ on the wire. > > [1]: There may be a window of time after you change the file on client2 > where client1 doesn't see it. That's due to the fact that inode > attributes on the client are only revalidated after a timeout. You may > want to read over the DATA AND METADATA COHERENCE section of nfs(5) to > make sure you understand how the NFS client validates its caches. > > Cheers, > -- > Jeff Layton > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fwd: NAS solution for CephFS
Thanks Jeff. If I set Attr_Expiration_Time as zero in conf , deos it mean timeout is zero? If so, every client will see the change immediately. Will it decrease the performance hardly? I seems that GlusterFS FSAL use UPCALL to invalidate the cache. How about the CephFS FSAL? On Thu, Feb 14, 2019 at 9:04 PM Jeff Layton wrote: > > On Thu, 2019-02-14 at 20:57 +0800, Marvin Zhang wrote: > > Here is the copy from https://tools.ietf.org/html/rfc7530#page-40 > > Will Client query 'change' attribute every time before reading to know > > if the data has been changed? > > > > +-+++-+---+ > > | Name| ID | Data Type | Acc | Defined in| > > +-+++-+---+ > > | supported_attrs | 0 | bitmap4| R | Section 5.8.1.1 | > > | type| 1 | nfs_ftype4 | R | Section 5.8.1.2 | > > | fh_expire_type | 2 | uint32_t | R | Section 5.8.1.3 | > > | change | 3 | changeid4 | R | Section 5.8.1.4 | > > | size| 4 | uint64_t | R W | Section 5.8.1.5 | > > | link_support| 5 | bool | R | Section 5.8.1.6 | > > | symlink_support | 6 | bool | R | Section 5.8.1.7 | > > | named_attr | 7 | bool | R | Section 5.8.1.8 | > > | fsid| 8 | fsid4 | R | Section 5.8.1.9 | > > | unique_handles | 9 | bool | R | Section 5.8.1.10 | > > | lease_time | 10 | nfs_lease4 | R | Section 5.8.1.11 | > > | rdattr_error| 11 | nfsstat4 | R | Section 5.8.1.12 | > > | filehandle | 19 | nfs_fh4| R | Section 5.8.1.13 | > > +-+++-+---+ > > > > Not every time -- only when the cache needs revalidation. > > In the absence of a delegation, that happens on a timeout (see the > acregmin/acregmax settings in nfs(5)), though things like opens and file > locking events also affect when the client revalidates. > > When the v4 client does revalidate the cache, it relies heavily on NFSv4 > change attribute. Cephfs's change attribute is cluster-coherent too, so > if the client does revalidate it should see changes made on other > servers. > > > On Thu, Feb 14, 2019 at 8:29 PM Jeff Layton wrote: > > > On Thu, 2019-02-14 at 19:49 +0800, Marvin Zhang wrote: > > > > Hi Jeff, > > > > Another question is about Client Caching when disabling delegation. > > > > I set breakpoint on nfs4_op_read, which is OP_READ process function in > > > > nfs-ganesha. Then I read a file, I found that it will hit only once on > > > > the first time, which means latter reading operation on this file will > > > > not trigger OP_READ. It will read the data from client side cache. Is > > > > it right? > > > > > > Yes. In the absence of a delegation, the client will periodically query > > > for the inode attributes, and will serve reads from the cache if it > > > looks like the file hasn't changed. > > > > > > > I also checked the nfs client code in linux kernel. Only > > > > cache_validity is NFS_INO_INVALID_DATA, it will send OP_READ again, > > > > like this: > > > > if (nfsi->cache_validity & NFS_INO_INVALID_DATA) { > > > > ret = nfs_invalidate_mapping(inode, mapping); > > > > } > > > > This about this senario, client1 connect ganesha1 and client2 connect > > > > ganesha2. I read /1.txt on client1 and client1 will cache the data. > > > > Then I modify this file on client2. At that time, how client1 know the > > > > file is modifed and how it will add NFS_INO_INVALID_DATA into > > > > cache_validity? > > > > > > Once you modify the code on client2, ganesha2 will request the necessary > > > caps from the ceph MDS, and client1 will have its caps revoked. It'll > > > then make the change. > > > > > > When client1 reads again it will issue a GETATTR against the file [1]. > > > ganesha1 will then request caps to do the getattr, which will end up > > > revoking ganesha2's caps. client1 will then see the change in attributes > > > (the change attribute and mtime, most likely) and will invalidate the > > > mapping, causing it do reissue a READ on the wire. > > > > > > [1]: There may be a window of time after you change the file on client2 > > > where client1 doesn't see it. That's due to the fact that inode > > > attributes on the client are only revalidated after a timeout. You may > > > want to read over the DATA AND METADATA COHERENCE section of nfs(5) to > > > make sure you understand how the NFS client validates its caches. > > > > > > Cheers, > > > -- > > > Jeff Layton > > > > > -- > Jeff Layton > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com