Hello team.
We ran into some trouble in Gluster 9.3 with the Gluster NFS server. We updated
to a supported Gluster 9.6 and reproduced the problem.
We understand the Gluster team recommends the use of Ganesha for NFS but in our
specific environment and use case, Ganesha isn’t fast enough. No disrespect
intended; we never got the chance to work with the Ganesha team on it.
We tried to avoid Ganesha and Gluster NFS altogether, using kernel NFS with
fuse mounts exported, and that was faster, but failover didn’t work. We could
make the mount point highly available but not open files (so when the IP
failover happened, the mount point would still function but the open file – a
squashfs in this example – would not fail over).
So we embarked on a mission to try to figure out what was going on with the NFS
server. I am not an expert in network code or distributed filesystems. So,
someone with a careful eye would need to check these changes out. However, what
I generally found was that the Gluster NFS server requires the layers of
gluster to report back ‘errno’ to determine if EINVAL is set (to determine
is_eof). In some instances, errno was not being passed down the chain or was
being reset to 0. This resulted in NFS traces showing multiple READs for a 1
byte file and the NFS client showing an “I/O” error. It seemed like files above
170M worked ok. This is likely due to how the layers of gluster change with
changing and certain file sizes. However, we did not track this part down.
We found in one case disabling the NFS performance IO cache would fix the
problem for a non-sharded volume, but the problem persisted in a sharded
volume. Testing found our environment takes the disabling of the NFS
performance IO cache quite hard anyway, so it wasn’t an option for us.
We were curious why the fuse client wouldn’t be impacted but our quick look
found that fuse doesn’t really use or need errno in the same way Gluster NFS
does.
So, the attached patch fixed the issue. Accessing small files in either case
above now work properly. We tried running md5sum against large files over NFS
and fuse mounts and everything seemed fine.
In our environment, the NFS-exported directories tend to contain squashfs files
representing read-only root filesystems for compute nodes, and those worked
fine over NFS after the change as well.
If you do not wish to include this patch because Gluster NFS is deprecated, I
would greatly appreciate it if someone could validate my work as our solution
will need Gluster NFS enabled for the time being. I am concerned I could have
missed a nuance and caused a hard to detect problem.
Thank you all!
patch.txt attached.
diff -Narup glusterfs-9.6.sgi-ORIG/xlators/features/shard/src/shard.c
glusterfs-9.6.sgi/xlators/features/shard/src/shard.c
--- glusterfs-9.6.sgi-ORIG/xlators/features/shard/src/shard.c 2022-08-09
05:31:26.738079305 -0500
+++ glusterfs-9.6.sgi/xlators/features/shard/src/shard.c2024-03-13
12:31:56.110756841 -0500
@@ -4852,8 +4852,11 @@ shard_readv_do_cbk(call_frame_t *frame,
goto out;
}
-if (local->op_ret >= 0)
+if (local->op_ret >= 0) {
local->op_ret += op_ret;
+/* gnfs requires op_errno to determine is_eof */
+local->op_errno = op_errno;
+}
shard_inode_ctx_get(anon_fd->inode, this, );
block_num = ctx->block_num;
diff -Narup glusterfs-9.6.sgi-ORIG/xlators/performance/io-cache/src/page.c
glusterfs-9.6.sgi/xlators/performance/io-cache/src/page.c
--- glusterfs-9.6.sgi-ORIG/xlators/performance/io-cache/src/page.c
2022-08-09 05:31:26.825079586 -0500
+++ glusterfs-9.6.sgi/xlators/performance/io-cache/src/page.c 2024-03-13
12:32:01.978748913 -0500
@@ -790,6 +790,8 @@ ioc_frame_unwind(call_frame_t *frame)
GF_ASSERT(frame);
local = frame->local;
+/* gnfs requires op_errno to determine is_eof */
+op_errno = local->op_errno;
if (local == NULL) {
gf_smsg(frame->this->name, GF_LOG_WARNING, ENOMEM,
IO_CACHE_MSG_LOCAL_NULL, NULL);
---
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-devel mailing list
Gluster-devel@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-devel