Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Hi, I would actually prefer that we solve the issue with using the unbundled libntirpc. That seems like the better long term solution, and one we have to tackle anyway as we have a hard end date for using the bundled libntirpc in Fedora and EPEL. In the mean time I've reverted the EPEL 7 build (and perhaps the Fedora Rawhide build too) to use the bundled lib. Packages will land in Fedora and EPEL after a nominal testing period. Or you can get them sooner by enabling the Updates-Testing repo in /etc/yum.repos.d/{fedora-updates-testing,epel-testing}.repo Regards, -- Kaleb On 06/16/2015 07:57 PM, Malahal Naineni wrote: Thank you Niels for your time to chase the issue. It is important to have working files as people try, and move on if things don't work. Not everyone is as persistent as Alessandro! Regards, Malahal. Niels de Vos [nde...@redhat.com] wrote: On Mon, Jun 15, 2015 at 06:50:21PM -0500, Malahal Naineni wrote: Kaleb Keithley [kkeit...@redhat.com] wrote: But note that nfs-ganesha in EPEL[67] is built with a) glusterfs-api-3.6.x from Red Hat's downstream glusterfs, and b) the bundled static version of ntirpc, not the shared lib in the stand-along package above. If you're trying to use these packages with glusterfs-3.7.x then I guess I'm not too surprised if something isn't working. Look for nfs-ganesha packages built against glusterfs-3.7.x in the CentOS Storage SIG or watch for the same on download.gluster.org. They're not there yet, but they will be eventually. If I understood the issue, he wasn't even using gluster FSAL (he wasn't using any exports at all). His issue is probably unrelated to any gluster API changes. It seems that un-bunding the libntirpc causes this problem. Bisecting the NFS-Ganesha package builds for the epel7 branch show this. Also, re-bundling the libntirpc package makes nfs-ganesha-2.2.0 work again. I do not know yet where the actual problem is, maybe the libntirpc package in Fedora/EPEL does not work correctly, or there are some linker changes needed for NFS-Ganesha to build against the non-bundled libntirpc. I've only tested on CentOS-7 for now. Maybe the Fedora 22 libntirpc and nfs-ganesha packages have the same problem? In case someone cares, this is my patch against the current epel7 nfs-ganesha.spec: http://paste.fedoraproject.org/232837/44926521/ Cheers, Niels -- ___ Nfs-ganesha-devel mailing list nfs-ganesha-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel -- Kaleb ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
On Wed, Jun 17, 2015 at 08:27:57AM -0400, Kaleb S. KEITHLEY wrote: Hi, I would actually prefer that we solve the issue with using the unbundled libntirpc. That seems like the better long term solution, and one we have to tackle anyway as we have a hard end date for using the bundled libntirpc in Fedora and EPEL. Of course, that is the plan. I still need to test on Fedora where libntirpc was unbundled. Unfortunately there are very little other users of libntirpc, so testing with a non-ganesha application is not as simple as I hoped. I'll continue to investigate this in small steps. When I have more ideas on what the problem could be, I'll inform everyone :) In the mean time I've reverted the EPEL 7 build (and perhaps the Fedora Rawhide build too) to use the bundled lib. Packages will land in Fedora and EPEL after a nominal testing period. Or you can get them sooner by enabling the Updates-Testing repo in /etc/yum.repos.d/{fedora-updates-testing,epel-testing}.repo Thanks, that surely works for the immediate problem. Niels Regards, -- Kaleb On 06/16/2015 07:57 PM, Malahal Naineni wrote: Thank you Niels for your time to chase the issue. It is important to have working files as people try, and move on if things don't work. Not everyone is as persistent as Alessandro! Regards, Malahal. Niels de Vos [nde...@redhat.com] wrote: On Mon, Jun 15, 2015 at 06:50:21PM -0500, Malahal Naineni wrote: Kaleb Keithley [kkeit...@redhat.com] wrote: But note that nfs-ganesha in EPEL[67] is built with a) glusterfs-api-3.6.x from Red Hat's downstream glusterfs, and b) the bundled static version of ntirpc, not the shared lib in the stand-along package above. If you're trying to use these packages with glusterfs-3.7.x then I guess I'm not too surprised if something isn't working. Look for nfs-ganesha packages built against glusterfs-3.7.x in the CentOS Storage SIG or watch for the same on download.gluster.org. They're not there yet, but they will be eventually. If I understood the issue, he wasn't even using gluster FSAL (he wasn't using any exports at all). His issue is probably unrelated to any gluster API changes. It seems that un-bunding the libntirpc causes this problem. Bisecting the NFS-Ganesha package builds for the epel7 branch show this. Also, re-bundling the libntirpc package makes nfs-ganesha-2.2.0 work again. I do not know yet where the actual problem is, maybe the libntirpc package in Fedora/EPEL does not work correctly, or there are some linker changes needed for NFS-Ganesha to build against the non-bundled libntirpc. I've only tested on CentOS-7 for now. Maybe the Fedora 22 libntirpc and nfs-ganesha packages have the same problem? In case someone cares, this is my patch against the current epel7 nfs-ganesha.spec: http://paste.fedoraproject.org/232837/44926521/ Cheers, Niels -- ___ Nfs-ganesha-devel mailing list nfs-ganesha-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel -- Kaleb ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
On Mon, Jun 15, 2015 at 06:50:21PM -0500, Malahal Naineni wrote: Kaleb Keithley [kkeit...@redhat.com] wrote: But note that nfs-ganesha in EPEL[67] is built with a) glusterfs-api-3.6.x from Red Hat's downstream glusterfs, and b) the bundled static version of ntirpc, not the shared lib in the stand-along package above. If you're trying to use these packages with glusterfs-3.7.x then I guess I'm not too surprised if something isn't working. Look for nfs-ganesha packages built against glusterfs-3.7.x in the CentOS Storage SIG or watch for the same on download.gluster.org. They're not there yet, but they will be eventually. If I understood the issue, he wasn't even using gluster FSAL (he wasn't using any exports at all). His issue is probably unrelated to any gluster API changes. It seems that un-bunding the libntirpc causes this problem. Bisecting the NFS-Ganesha package builds for the epel7 branch show this. Also, re-bundling the libntirpc package makes nfs-ganesha-2.2.0 work again. I do not know yet where the actual problem is, maybe the libntirpc package in Fedora/EPEL does not work correctly, or there are some linker changes needed for NFS-Ganesha to build against the non-bundled libntirpc. I've only tested on CentOS-7 for now. Maybe the Fedora 22 libntirpc and nfs-ganesha packages have the same problem? In case someone cares, this is my patch against the current epel7 nfs-ganesha.spec: http://paste.fedoraproject.org/232837/44926521/ Cheers, Niels ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Thank you Niels for your time to chase the issue. It is important to have working files as people try, and move on if things don't work. Not everyone is as persistent as Alessandro! Regards, Malahal. Niels de Vos [nde...@redhat.com] wrote: On Mon, Jun 15, 2015 at 06:50:21PM -0500, Malahal Naineni wrote: Kaleb Keithley [kkeit...@redhat.com] wrote: But note that nfs-ganesha in EPEL[67] is built with a) glusterfs-api-3.6.x from Red Hat's downstream glusterfs, and b) the bundled static version of ntirpc, not the shared lib in the stand-along package above. If you're trying to use these packages with glusterfs-3.7.x then I guess I'm not too surprised if something isn't working. Look for nfs-ganesha packages built against glusterfs-3.7.x in the CentOS Storage SIG or watch for the same on download.gluster.org. They're not there yet, but they will be eventually. If I understood the issue, he wasn't even using gluster FSAL (he wasn't using any exports at all). His issue is probably unrelated to any gluster API changes. It seems that un-bunding the libntirpc causes this problem. Bisecting the NFS-Ganesha package builds for the epel7 branch show this. Also, re-bundling the libntirpc package makes nfs-ganesha-2.2.0 work again. I do not know yet where the actual problem is, maybe the libntirpc package in Fedora/EPEL does not work correctly, or there are some linker changes needed for NFS-Ganesha to build against the non-bundled libntirpc. I've only tested on CentOS-7 for now. Maybe the Fedora 22 libntirpc and nfs-ganesha packages have the same problem? In case someone cares, this is my patch against the current epel7 nfs-ganesha.spec: http://paste.fedoraproject.org/232837/44926521/ Cheers, Niels -- ___ Nfs-ganesha-devel mailing list nfs-ganesha-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Hi, any news on this? Did you have the chance to look into that? I'd also be curious to know if anyone tried nfs ganesha on CentOS 7.1 and if it was really working, as I also tried on a standalone, clean machine, and I see the very same behavior, even without gluster. Thanks, Alessandro On Fri, 2015-06-12 at 14:34 +0200, Alessandro De Salvo wrote: Hi, looking at the code and having recompiled adding some more debug, I might be wrong, but it seems that in nfs_rpc_dispatcher_thread.c, fuction nfs_rpc_dequeue_req, the threads enter the while (!(wqe-flags Wqe_LFlag_SyncDone)) and never exit from there. I do not know if it's normal or not as I should read better the code. Cheers, Alessandro On Fri, 2015-06-12 at 09:35 +0200, Alessandro De Salvo wrote: Hi Malahal, Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni mala...@us.ibm.com ha scritto: The logs indicate that ganesha was started successfully without any exports. gstack output seemed normal as well -- threads were waiting to serve requests. Yes, no exports as it was the default config before enabling Ganesha on any gluster volume. Assuming that you are running showmount -e on the same system, there shouldn't be any firewall coming into the picture. Yes it was the case in my last attempt, from the same machine. I also tried from another machine, but the result was the same. The firewall (firewalld, as it's a CentOS 7.1) is disabled anyways. If you are running showmount from some other system, make sure there is no firewall dropping the packets. I think you need tcpdump trace to figure out the problem. My wireshark trace showed two requests from the client to complete the showmount -e command: 1. Client sent GETPORT call to port 111 (rpcbind) to get the port number of MOUNT. 2. Then it sent EXPORT call to mountd port (port it got in response to #1). Yes, I did it already, and indeed it showed the two requests, so the portmapper works fine, but it hangs on the second request. Also rpcinfo -t localhost portmapper returns successfully, while rpcinfo -t localhost nfs hangs. The output of rpcinfo -p is the following: program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 56082 status 1000241 tcp 41858 status 133 udp 2049 nfs 133 tcp 2049 nfs 134 udp 2049 nfs 134 tcp 2049 nfs 151 udp 45611 mountd 151 tcp 55915 mountd 153 udp 45611 mountd 153 tcp 55915 mountd 1000214 udp 48775 nlockmgr 1000214 tcp 51621 nlockmgr 1000111 udp 4501 rquotad 1000111 tcp 4501 rquotad 1000112 udp 4501 rquotad 1000112 tcp 4501 rquotad What does rpcinfo -p server-ip show? Do you have selinux enabled? I am not sure if that is playing any role here... Nope, it's disabled: # uname -a Linux node2 3.10.0-229.4.2.el7.x86_64 #1 SMP Wed May 13 10:06:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Thanks for the help, Alessandro Regards, Malahal. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: Hi, this was an extract from the old logs, before Soumya's suggestion of changing the rquota port in the conf file. The new logs are attached (ganesha-20150611.log.gz) as well as the gstack of the ganesha process while I was executing the hanging showmount (ganesha-20150611.gstack.gz). Thanks, Alessandro On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni wrote: Soumya Koduri [skod...@redhat.com] wrote: CCin ganesha-devel to get more inputs. In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha. I am not a network expert but I have seen IPv4 traffic over IPv6 interface while fixing few things before. This may be normal. commit - git show 'd7e8f255' , which got added in v2.2 has more details. # netstat -ltaupn | grep 2049 tcp6 4 0 :::2049 :::* LISTEN 32080/ganesha.nfsd tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT - tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555 CLOSE_WAIT - udp6 0 0 :::2049 :::* 32080/ganesha.nfsd I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
We do run ganesha on RHEL7.0 (same as CentOS7.0), and I don't think 7.1 would be much different. We do run GPFS FSAL only (no VFS_FSAL). Regards, Malahal. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: Hi, any news on this? Did you have the chance to look into that? I'd also be curious to know if anyone tried nfs ganesha on CentOS 7.1 and if it was really working, as I also tried on a standalone, clean machine, and I see the very same behavior, even without gluster. Thanks, Alessandro On Fri, 2015-06-12 at 14:34 +0200, Alessandro De Salvo wrote: Hi, looking at the code and having recompiled adding some more debug, I might be wrong, but it seems that in nfs_rpc_dispatcher_thread.c, fuction nfs_rpc_dequeue_req, the threads enter the while (!(wqe-flags Wqe_LFlag_SyncDone)) and never exit from there. I do not know if it's normal or not as I should read better the code. Cheers, Alessandro On Fri, 2015-06-12 at 09:35 +0200, Alessandro De Salvo wrote: Hi Malahal, Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni mala...@us.ibm.com ha scritto: The logs indicate that ganesha was started successfully without any exports. gstack output seemed normal as well -- threads were waiting to serve requests. Yes, no exports as it was the default config before enabling Ganesha on any gluster volume. Assuming that you are running showmount -e on the same system, there shouldn't be any firewall coming into the picture. Yes it was the case in my last attempt, from the same machine. I also tried from another machine, but the result was the same. The firewall (firewalld, as it's a CentOS 7.1) is disabled anyways. If you are running showmount from some other system, make sure there is no firewall dropping the packets. I think you need tcpdump trace to figure out the problem. My wireshark trace showed two requests from the client to complete the showmount -e command: 1. Client sent GETPORT call to port 111 (rpcbind) to get the port number of MOUNT. 2. Then it sent EXPORT call to mountd port (port it got in response to #1). Yes, I did it already, and indeed it showed the two requests, so the portmapper works fine, but it hangs on the second request. Also rpcinfo -t localhost portmapper returns successfully, while rpcinfo -t localhost nfs hangs. The output of rpcinfo -p is the following: program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 56082 status 1000241 tcp 41858 status 133 udp 2049 nfs 133 tcp 2049 nfs 134 udp 2049 nfs 134 tcp 2049 nfs 151 udp 45611 mountd 151 tcp 55915 mountd 153 udp 45611 mountd 153 tcp 55915 mountd 1000214 udp 48775 nlockmgr 1000214 tcp 51621 nlockmgr 1000111 udp 4501 rquotad 1000111 tcp 4501 rquotad 1000112 udp 4501 rquotad 1000112 tcp 4501 rquotad What does rpcinfo -p server-ip show? Do you have selinux enabled? I am not sure if that is playing any role here... Nope, it's disabled: # uname -a Linux node2 3.10.0-229.4.2.el7.x86_64 #1 SMP Wed May 13 10:06:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Thanks for the help, Alessandro Regards, Malahal. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: Hi, this was an extract from the old logs, before Soumya's suggestion of changing the rquota port in the conf file. The new logs are attached (ganesha-20150611.log.gz) as well as the gstack of the ganesha process while I was executing the hanging showmount (ganesha-20150611.gstack.gz). Thanks, Alessandro On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni wrote: Soumya Koduri [skod...@redhat.com] wrote: CCin ganesha-devel to get more inputs. In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha. I am not a network expert but I have seen IPv4 traffic over IPv6 interface while fixing few things before. This may be normal. commit - git show 'd7e8f255' , which got added in v2.2 has more details. # netstat -ltaupn | grep 2049 tcp6 4 0 :::2049 :::* LISTEN 32080/ganesha.nfsd tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT - tcp6 1 0 127.0.0.1:2049
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
I am not familiar with tirpc code, but how are you building ganesha rpms? Did you do git submodule update to get the latest tirpc code when you built those rpms? Can somebody familiar with tirpc chime in? The one I use is below: # git submodule status b1a82463c4029315fac085a9d0d6bef766847ed7 src/libntirpc (v1.2.0-2-gb1a8246) The way I build ganesha 2.2 rpms is: #1. git clone repo git checkout V2.2-stable #2. git submodule update --init #3. cmake ./src -DDEBUG_SYMS=ON -DUSE_DBUS=ON -DUSE_ADMIN_TOOLS=ON -DUSE_GUI_ADMIN_TOOLS=OFF #4. make dist #5. rpmbuild --with utils -ta nfs-ganesha-2.2* Regards, Malahal. PS: there were some efforts to make tirpc as an rpm by itself. Not sure where that is. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: OK, thanks, so, any hint on what I could check now? I have tried even without any VFS, so with just the nfs-ganesha rpm installed and with an empty ganesha.conf, but still the same problem. The same configuration with ganesha 2.1.0 was working, on the same server. Any idea? I have sent you the logs but please tell me if you need more. Thanks, Alessandro Il giorno 15/giu/2015, alle ore 18:47, Malahal Naineni mala...@us.ibm.com ha scritto: We do run ganesha on RHEL7.0 (same as CentOS7.0), and I don't think 7.1 would be much different. We do run GPFS FSAL only (no VFS_FSAL). Regards, Malahal. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: Hi, any news on this? Did you have the chance to look into that? I'd also be curious to know if anyone tried nfs ganesha on CentOS 7.1 and if it was really working, as I also tried on a standalone, clean machine, and I see the very same behavior, even without gluster. Thanks, Alessandro On Fri, 2015-06-12 at 14:34 +0200, Alessandro De Salvo wrote: Hi, looking at the code and having recompiled adding some more debug, I might be wrong, but it seems that in nfs_rpc_dispatcher_thread.c, fuction nfs_rpc_dequeue_req, the threads enter the while (!(wqe-flags Wqe_LFlag_SyncDone)) and never exit from there. I do not know if it's normal or not as I should read better the code. Cheers, Alessandro On Fri, 2015-06-12 at 09:35 +0200, Alessandro De Salvo wrote: Hi Malahal, Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni mala...@us.ibm.com ha scritto: The logs indicate that ganesha was started successfully without any exports. gstack output seemed normal as well -- threads were waiting to serve requests. Yes, no exports as it was the default config before enabling Ganesha on any gluster volume. Assuming that you are running showmount -e on the same system, there shouldn't be any firewall coming into the picture. Yes it was the case in my last attempt, from the same machine. I also tried from another machine, but the result was the same. The firewall (firewalld, as it's a CentOS 7.1) is disabled anyways. If you are running showmount from some other system, make sure there is no firewall dropping the packets. I think you need tcpdump trace to figure out the problem. My wireshark trace showed two requests from the client to complete the showmount -e command: 1. Client sent GETPORT call to port 111 (rpcbind) to get the port number of MOUNT. 2. Then it sent EXPORT call to mountd port (port it got in response to #1). Yes, I did it already, and indeed it showed the two requests, so the portmapper works fine, but it hangs on the second request. Also rpcinfo -t localhost portmapper returns successfully, while rpcinfo -t localhost nfs hangs. The output of rpcinfo -p is the following: program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 56082 status 1000241 tcp 41858 status 133 udp 2049 nfs 133 tcp 2049 nfs 134 udp 2049 nfs 134 tcp 2049 nfs 151 udp 45611 mountd 151 tcp 55915 mountd 153 udp 45611 mountd 153 tcp 55915 mountd 1000214 udp 48775 nlockmgr 1000214 tcp 51621 nlockmgr 1000111 udp 4501 rquotad 1000111 tcp 4501 rquotad 1000112 udp 4501 rquotad 1000112 tcp 4501 rquotad What does rpcinfo -p server-ip show? Do you have selinux enabled? I am not sure if that is playing any role here... Nope, it's disabled: # uname -a Linux node2 3.10.0-229.4.2.el7.x86_64 #1 SMP Wed May 13 10:06:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Thanks for the help, Alessandro Regards, Malahal. Alessandro De
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
OK, thanks, so, any hint on what I could check now? I have tried even without any VFS, so with just the nfs-ganesha rpm installed and with an empty ganesha.conf, but still the same problem. The same configuration with ganesha 2.1.0 was working, on the same server. Any idea? I have sent you the logs but please tell me if you need more. Thanks, Alessandro Il giorno 15/giu/2015, alle ore 18:47, Malahal Naineni mala...@us.ibm.com ha scritto: We do run ganesha on RHEL7.0 (same as CentOS7.0), and I don't think 7.1 would be much different. We do run GPFS FSAL only (no VFS_FSAL). Regards, Malahal. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: Hi, any news on this? Did you have the chance to look into that? I'd also be curious to know if anyone tried nfs ganesha on CentOS 7.1 and if it was really working, as I also tried on a standalone, clean machine, and I see the very same behavior, even without gluster. Thanks, Alessandro On Fri, 2015-06-12 at 14:34 +0200, Alessandro De Salvo wrote: Hi, looking at the code and having recompiled adding some more debug, I might be wrong, but it seems that in nfs_rpc_dispatcher_thread.c, fuction nfs_rpc_dequeue_req, the threads enter the while (!(wqe-flags Wqe_LFlag_SyncDone)) and never exit from there. I do not know if it's normal or not as I should read better the code. Cheers, Alessandro On Fri, 2015-06-12 at 09:35 +0200, Alessandro De Salvo wrote: Hi Malahal, Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni mala...@us.ibm.com ha scritto: The logs indicate that ganesha was started successfully without any exports. gstack output seemed normal as well -- threads were waiting to serve requests. Yes, no exports as it was the default config before enabling Ganesha on any gluster volume. Assuming that you are running showmount -e on the same system, there shouldn't be any firewall coming into the picture. Yes it was the case in my last attempt, from the same machine. I also tried from another machine, but the result was the same. The firewall (firewalld, as it's a CentOS 7.1) is disabled anyways. If you are running showmount from some other system, make sure there is no firewall dropping the packets. I think you need tcpdump trace to figure out the problem. My wireshark trace showed two requests from the client to complete the showmount -e command: 1. Client sent GETPORT call to port 111 (rpcbind) to get the port number of MOUNT. 2. Then it sent EXPORT call to mountd port (port it got in response to #1). Yes, I did it already, and indeed it showed the two requests, so the portmapper works fine, but it hangs on the second request. Also rpcinfo -t localhost portmapper returns successfully, while rpcinfo -t localhost nfs hangs. The output of rpcinfo -p is the following: program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 56082 status 1000241 tcp 41858 status 133 udp 2049 nfs 133 tcp 2049 nfs 134 udp 2049 nfs 134 tcp 2049 nfs 151 udp 45611 mountd 151 tcp 55915 mountd 153 udp 45611 mountd 153 tcp 55915 mountd 1000214 udp 48775 nlockmgr 1000214 tcp 51621 nlockmgr 1000111 udp 4501 rquotad 1000111 tcp 4501 rquotad 1000112 udp 4501 rquotad 1000112 tcp 4501 rquotad What does rpcinfo -p server-ip show? Do you have selinux enabled? I am not sure if that is playing any role here... Nope, it's disabled: # uname -a Linux node2 3.10.0-229.4.2.el7.x86_64 #1 SMP Wed May 13 10:06:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Thanks for the help, Alessandro Regards, Malahal. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: Hi, this was an extract from the old logs, before Soumya's suggestion of changing the rquota port in the conf file. The new logs are attached (ganesha-20150611.log.gz) as well as the gstack of the ganesha process while I was executing the hanging showmount (ganesha-20150611.gstack.gz). Thanks, Alessandro On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni wrote: Soumya Koduri [skod...@redhat.com] wrote: CCin ganesha-devel to get more inputs. In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha. I am not a network expert but I have seen IPv4 traffic over IPv6 interface while fixing few things before. This may be normal. commit - git show 'd7e8f255' , which got added in v2.2 has more details. # netstat -ltaupn | grep 2049 tcp6 4 0 :::2049 :::* LISTEN
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Hi Malahal, I have downloaded and used the tarball created by git for the stable 2.2.0 branch, so it should have been consistent. Also, I have used the spec file from Epel to build the rpms. I’m going to try your procedure as well now, to see if anything changes. Thanks, Alessandro Il giorno 15/giu/2015, alle ore 20:10, Malahal Naineni mala...@us.ibm.com ha scritto: I am not familiar with tirpc code, but how are you building ganesha rpms? Did you do git submodule update to get the latest tirpc code when you built those rpms? Can somebody familiar with tirpc chime in? The one I use is below: # git submodule status b1a82463c4029315fac085a9d0d6bef766847ed7 src/libntirpc (v1.2.0-2-gb1a8246) The way I build ganesha 2.2 rpms is: #1. git clone repo git checkout V2.2-stable #2. git submodule update --init #3. cmake ./src -DDEBUG_SYMS=ON -DUSE_DBUS=ON -DUSE_ADMIN_TOOLS=ON -DUSE_GUI_ADMIN_TOOLS=OFF #4. make dist #5. rpmbuild --with utils -ta nfs-ganesha-2.2* Regards, Malahal. PS: there were some efforts to make tirpc as an rpm by itself. Not sure where that is. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: OK, thanks, so, any hint on what I could check now? I have tried even without any VFS, so with just the nfs-ganesha rpm installed and with an empty ganesha.conf, but still the same problem. The same configuration with ganesha 2.1.0 was working, on the same server. Any idea? I have sent you the logs but please tell me if you need more. Thanks, Alessandro Il giorno 15/giu/2015, alle ore 18:47, Malahal Naineni mala...@us.ibm.com ha scritto: We do run ganesha on RHEL7.0 (same as CentOS7.0), and I don't think 7.1 would be much different. We do run GPFS FSAL only (no VFS_FSAL). Regards, Malahal. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: Hi, any news on this? Did you have the chance to look into that? I'd also be curious to know if anyone tried nfs ganesha on CentOS 7.1 and if it was really working, as I also tried on a standalone, clean machine, and I see the very same behavior, even without gluster. Thanks, Alessandro On Fri, 2015-06-12 at 14:34 +0200, Alessandro De Salvo wrote: Hi, looking at the code and having recompiled adding some more debug, I might be wrong, but it seems that in nfs_rpc_dispatcher_thread.c, fuction nfs_rpc_dequeue_req, the threads enter the while (!(wqe-flags Wqe_LFlag_SyncDone)) and never exit from there. I do not know if it's normal or not as I should read better the code. Cheers, Alessandro On Fri, 2015-06-12 at 09:35 +0200, Alessandro De Salvo wrote: Hi Malahal, Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni mala...@us.ibm.com ha scritto: The logs indicate that ganesha was started successfully without any exports. gstack output seemed normal as well -- threads were waiting to serve requests. Yes, no exports as it was the default config before enabling Ganesha on any gluster volume. Assuming that you are running showmount -e on the same system, there shouldn't be any firewall coming into the picture. Yes it was the case in my last attempt, from the same machine. I also tried from another machine, but the result was the same. The firewall (firewalld, as it's a CentOS 7.1) is disabled anyways. If you are running showmount from some other system, make sure there is no firewall dropping the packets. I think you need tcpdump trace to figure out the problem. My wireshark trace showed two requests from the client to complete the showmount -e command: 1. Client sent GETPORT call to port 111 (rpcbind) to get the port number of MOUNT. 2. Then it sent EXPORT call to mountd port (port it got in response to #1). Yes, I did it already, and indeed it showed the two requests, so the portmapper works fine, but it hangs on the second request. Also rpcinfo -t localhost portmapper returns successfully, while rpcinfo -t localhost nfs hangs. The output of rpcinfo -p is the following: program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 56082 status 1000241 tcp 41858 status 133 udp 2049 nfs 133 tcp 2049 nfs 134 udp 2049 nfs 134 tcp 2049 nfs 151 udp 45611 mountd 151 tcp 55915 mountd 153 udp 45611 mountd 153 tcp 55915 mountd 1000214 udp 48775 nlockmgr 1000214 tcp 51621 nlockmgr 1000111 udp 4501 rquotad 1000111 tcp 4501 rquotad 1000112 udp 4501 rquotad 1000112 tcp 4501 rquotad What does rpcinfo -p server-ip show? Do you have selinux enabled? I am not sure
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: OK, I think we are now closer to the end of the story. Recompiling with your instructions, and slightly changing the release name to match the convention in epel, the new RPMS produce something working! So it means essentially that: 1) the RPMS in epel are broken, they should really be fixed; 2) the RPMS, produced by exporting the tarball from git, even by selecting the correct branch, and the spec file from epel are broken as well; What does this mean? Just the spec file in epel is broken. 3) following your procedure produce working packages, but with revision “0.3” instead of the required “3” (not a real problem, easy to fix). Yeah, it produces 2.2.0-0.3 instead of 2.2.0-3 that we wanted. A patch is welcome to fix this! What you have just tested is the latest V2.2-stable which is 2.2.0-3. The epel code is probably from V2.2.0 code. So either EPEL has a broken spec file or V2.2.0 is broken. Can someone from redhat figure this out and fix epel repo please. Regards, Malahal. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: Hi Malahal, --- nfs-ganesha.orig/src/nfs-ganesha.spec-in.cmake 2015-06-16 00:11:31.477442950 +0200 +++ nfs-ganesha/src/nfs-ganesha.spec-in.cmake 2015-06-15 22:11:57.068726917 +0200 @@ -72,13 +72,13 @@ @BCOND_GUI_UTILS@ gui_utils %global use_gui_utils %{on_off_switch gui_utils} -%global dev_version %{lua: extraver = string.gsub('@GANESHA_EXTRA_VERSION@', '%-', '.'); print(extraver) } +%global dev_version %{lua: extraver = string.gsub('@GANESHA_EXTRA_VERSION@', '%-', ''); print(extraver) } I think the idea here is to replace any hyphens with DOTs in the EXTRA_VERSION. The original statement should be fine provided we don't start the EXTRA_VERSION with hyphen. Your change fixes the usual hyphen we put in the extra version. I think we should remove that extra hyphen. Name: nfs-ganesha Version: @GANESHA_BASE_VERSION@ -Release: 0%{dev_version}%{?dist} +Release: %{dev_version}%{?dist} This looks to be the key change that is needed. Looks like a hyphen is inserted between BASE_VERSION and EXTRA_VERSION in the rpm names by by someone. So, I made the above Release change and made GANESHA_EXTRA_VERSION to just 2 instead of current -2. The rpms generated were fine but the tar file produced by make dist was nfs-ganesha-2.2.08-0.1.1-Source.tar.gz. A bit ugly... Can anyone please review Alessandro's changes and my comments? So either EPEL has a broken spec file or V2.2.0 is broken. I tend to say it’s 2.2.0-2 which is broken, but it’s just my opinion. At any rate, 2.2.0-3 is working, and this is indeed good news. There isn't any difference between V2.2.0-2 and V2.2.0-3 other than some GLUSTER FSAL changes and a lone GPFS FSAL config change. I am pretty sure this is some process issue rather than a code issue in V2.2.0-2. Regards, Malahal. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Kaleb Keithley [kkeit...@redhat.com] wrote: But note that nfs-ganesha in EPEL[67] is built with a) glusterfs-api-3.6.x from Red Hat's downstream glusterfs, and b) the bundled static version of ntirpc, not the shared lib in the stand-along package above. If you're trying to use these packages with glusterfs-3.7.x then I guess I'm not too surprised if something isn't working. Look for nfs-ganesha packages built against glusterfs-3.7.x in the CentOS Storage SIG or watch for the same on download.gluster.org. They're not there yet, but they will be eventually. If I understood the issue, he wasn't even using gluster FSAL (he wasn't using any exports at all). His issue is probably unrelated to any gluster API changes. Regards, Malahal. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Hi Malahal, Il giorno 15/giu/2015, alle ore 23:30, Malahal Naineni mala...@us.ibm.com ha scritto: Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: OK, I think we are now closer to the end of the story. Recompiling with your instructions, and slightly changing the release name to match the convention in epel, the new RPMS produce something working! So it means essentially that: 1) the RPMS in epel are broken, they should really be fixed; 2) the RPMS, produced by exporting the tarball from git, even by selecting the correct branch, and the spec file from epel are broken as well; What does this mean? Just the spec file in epel is broken. No, not really, I think the epel version 2.0.0-2 has problems by itself, so the epel packages are broken. Probably creating the zip file from git produces some odd effect, so it’s broken as well. I do not think it’s the spec file itself. 3) following your procedure produce working packages, but with revision “0.3” instead of the required “3” (not a real problem, easy to fix). Yeah, it produces 2.2.0-0.3 instead of 2.2.0-3 that we wanted. A patch is welcome to fix this! This is what I do, if it could be on any help: --- nfs-ganesha.orig/src/nfs-ganesha.spec-in.cmake 2015-06-16 00:11:31.477442950 +0200 +++ nfs-ganesha/src/nfs-ganesha.spec-in.cmake 2015-06-15 22:11:57.068726917 +0200 @@ -72,13 +72,13 @@ @BCOND_GUI_UTILS@ gui_utils %global use_gui_utils %{on_off_switch gui_utils} -%global dev_version %{lua: extraver = string.gsub('@GANESHA_EXTRA_VERSION@', '%-', '.'); print(extraver) } +%global dev_version %{lua: extraver = string.gsub('@GANESHA_EXTRA_VERSION@', '%-', ''); print(extraver) } %define sourcename @CPACK_SOURCE_PACKAGE_FILE_NAME@ Name: nfs-ganesha Version: @GANESHA_BASE_VERSION@ -Release: 0%{dev_version}%{?dist} +Release: %{dev_version}%{?dist} Summary: NFS-Ganesha is a NFS Server running in user space Group: Applications/System License: LGPLv3+ With this patch the version is produced with the correct numbering. What you have just tested is the latest V2.2-stable which is 2.2.0-3. The epel code is probably from V2.2.0 code. Should be 2.2.0-2, from the RPMs, but I’m not totally sure what’s inside. So either EPEL has a broken spec file or V2.2.0 is broken. I tend to say it’s 2.2.0-2 which is broken, but it’s just my opinion. At any rate, 2.2.0-3 is working, and this is indeed good news. Can someone from redhat figure this out and fix epel repo please. Yes please! Thanks, Alessandro Regards, Malahal. smime.p7s Description: S/MIME cryptographic signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
OK, I think we are now closer to the end of the story. Recompiling with your instructions, and slightly changing the release name to match the convention in epel, the new RPMS produce something working! So it means essentially that: 1) the RPMS in epel are broken, they should really be fixed; 2) the RPMS, produced by exporting the tarball from git, even by selecting the correct branch, and the spec file from epel are broken as well; 3) following your procedure produce working packages, but with revision “0.3” instead of the required “3” (not a real problem, easy to fix). I have created a repo in my institute, that can be used from outside too, in case someone is interested, it may be accessed by yum in this way: [nfs-ganesha-infn] name=NFS-Ganesha-INFN baseurl=http://classis01.roma1.infn.it/RPMS/nfs-ganesha/el$releasever/$basearch/ enabled=1 skip_if_unavailable=1 gpgcheck=0 I have also updated the puppet module to use this repo as failover, for the moment. I will publish the module soon in puppetforge, but you can already use it from here: https://github.com/desalvo/puppet-ganesha The module is still working only for gluster, but it’s very easy to add more. Thanks for all the help, Alessandro Il giorno 15/giu/2015, alle ore 21:40, Alessandro De Salvo alessandro.desa...@roma1.infn.it ha scritto: Hi Malahal, I have downloaded and used the tarball created by git for the stable 2.2.0 branch, so it should have been consistent. Also, I have used the spec file from Epel to build the rpms. I’m going to try your procedure as well now, to see if anything changes. Thanks, Alessandro Il giorno 15/giu/2015, alle ore 20:10, Malahal Naineni mala...@us.ibm.com ha scritto: I am not familiar with tirpc code, but how are you building ganesha rpms? Did you do git submodule update to get the latest tirpc code when you built those rpms? Can somebody familiar with tirpc chime in? The one I use is below: # git submodule status b1a82463c4029315fac085a9d0d6bef766847ed7 src/libntirpc (v1.2.0-2-gb1a8246) The way I build ganesha 2.2 rpms is: #1. git clone repo git checkout V2.2-stable #2. git submodule update --init #3. cmake ./src -DDEBUG_SYMS=ON -DUSE_DBUS=ON -DUSE_ADMIN_TOOLS=ON -DUSE_GUI_ADMIN_TOOLS=OFF #4. make dist #5. rpmbuild --with utils -ta nfs-ganesha-2.2* Regards, Malahal. PS: there were some efforts to make tirpc as an rpm by itself. Not sure where that is. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: OK, thanks, so, any hint on what I could check now? I have tried even without any VFS, so with just the nfs-ganesha rpm installed and with an empty ganesha.conf, but still the same problem. The same configuration with ganesha 2.1.0 was working, on the same server. Any idea? I have sent you the logs but please tell me if you need more. Thanks, Alessandro Il giorno 15/giu/2015, alle ore 18:47, Malahal Naineni mala...@us.ibm.com ha scritto: We do run ganesha on RHEL7.0 (same as CentOS7.0), and I don't think 7.1 would be much different. We do run GPFS FSAL only (no VFS_FSAL). Regards, Malahal. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: Hi, any news on this? Did you have the chance to look into that? I'd also be curious to know if anyone tried nfs ganesha on CentOS 7.1 and if it was really working, as I also tried on a standalone, clean machine, and I see the very same behavior, even without gluster. Thanks, Alessandro On Fri, 2015-06-12 at 14:34 +0200, Alessandro De Salvo wrote: Hi, looking at the code and having recompiled adding some more debug, I might be wrong, but it seems that in nfs_rpc_dispatcher_thread.c, fuction nfs_rpc_dequeue_req, the threads enter the while (!(wqe-flags Wqe_LFlag_SyncDone)) and never exit from there. I do not know if it's normal or not as I should read better the code. Cheers, Alessandro On Fri, 2015-06-12 at 09:35 +0200, Alessandro De Salvo wrote: Hi Malahal, Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni mala...@us.ibm.com ha scritto: The logs indicate that ganesha was started successfully without any exports. gstack output seemed normal as well -- threads were waiting to serve requests. Yes, no exports as it was the default config before enabling Ganesha on any gluster volume. Assuming that you are running showmount -e on the same system, there shouldn't be any firewall coming into the picture. Yes it was the case in my last attempt, from the same machine. I also tried from another machine, but the result was the same. The firewall (firewalld, as it's a CentOS 7.1) is disabled anyways. If you are running showmount from some other system, make sure there is no firewall dropping the packets. I think you need tcpdump trace to figure out the problem. My wireshark trace showed two requests from the client to complete the showmount -e command: 1. Client sent GETPORT call to
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
- Original Message - From: Malahal Naineni mala...@us.ibm.com ... PS: there were some efforts to make ntirpc as an rpm by itself. Not sure where that is. google[1] will tell you that libntirpc is in fact a stand-alone package in Fedora and EPEL, and as you can see at [2] it's even available for EPEL7 But note that nfs-ganesha in EPEL[67] is built with a) glusterfs-api-3.6.x from Red Hat's downstream glusterfs, and b) the bundled static version of ntirpc, not the shared lib in the stand-along package above. If you're trying to use these packages with glusterfs-3.7.x then I guess I'm not too surprised if something isn't working. Look for nfs-ganesha packages built against glusterfs-3.7.x in the CentOS Storage SIG or watch for the same on download.gluster.org. They're not there yet, but they will be eventually. Another thing to note is that the Fedora and EPEL builds of nfs-ganesha do not use the nfs-ganesha.spec.cmake.in spec file from the nfs-ganesha source tree. It is based on it, but it's not the same, for a number of reasons. I'll look at the EPEL nfs-ganesha when I have time. I do have a $dayjob, which takes priority over wrangling the community bits in Fedora and EPEL. Your patience and understanding is appreciated. [1] https://www.google.com/?gws_rd=ssl#q=fedora+koji+libntirpc [2] http://koji.fedoraproject.org/koji/packageinfo?packageID=20199 -- Kaleb ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Hi, exactly, I confirm this, I was not even reaching the point of using the gluster FSAL, so it should be unrelated, I guess. Cheers, Alessandro Il giorno 16/giu/2015, alle ore 01:50, Malahal Naineni mala...@us.ibm.com ha scritto: Kaleb Keithley [kkeit...@redhat.com] wrote: But note that nfs-ganesha in EPEL[67] is built with a) glusterfs-api-3.6.x from Red Hat's downstream glusterfs, and b) the bundled static version of ntirpc, not the shared lib in the stand-along package above. If you're trying to use these packages with glusterfs-3.7.x then I guess I'm not too surprised if something isn't working. Look for nfs-ganesha packages built against glusterfs-3.7.x in the CentOS Storage SIG or watch for the same on download.gluster.org. They're not there yet, but they will be eventually. If I understood the issue, he wasn't even using gluster FSAL (he wasn't using any exports at all). His issue is probably unrelated to any gluster API changes. Regards, Malahal. smime.p7s Description: S/MIME cryptographic signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Hi Malahal, Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni mala...@us.ibm.com ha scritto: The logs indicate that ganesha was started successfully without any exports. gstack output seemed normal as well -- threads were waiting to serve requests. Yes, no exports as it was the default config before enabling Ganesha on any gluster volume. Assuming that you are running showmount -e on the same system, there shouldn't be any firewall coming into the picture. Yes it was the case in my last attempt, from the same machine. I also tried from another machine, but the result was the same. The firewall (firewalld, as it's a CentOS 7.1) is disabled anyways. If you are running showmount from some other system, make sure there is no firewall dropping the packets. I think you need tcpdump trace to figure out the problem. My wireshark trace showed two requests from the client to complete the showmount -e command: 1. Client sent GETPORT call to port 111 (rpcbind) to get the port number of MOUNT. 2. Then it sent EXPORT call to mountd port (port it got in response to #1). Yes, I did it already, and indeed it showed the two requests, so the portmapper works fine, but it hangs on the second request. Also rpcinfo -t localhost portmapper returns successfully, while rpcinfo -t localhost nfs hangs. The output of rpcinfo -p is the following: program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 56082 status 1000241 tcp 41858 status 133 udp 2049 nfs 133 tcp 2049 nfs 134 udp 2049 nfs 134 tcp 2049 nfs 151 udp 45611 mountd 151 tcp 55915 mountd 153 udp 45611 mountd 153 tcp 55915 mountd 1000214 udp 48775 nlockmgr 1000214 tcp 51621 nlockmgr 1000111 udp 4501 rquotad 1000111 tcp 4501 rquotad 1000112 udp 4501 rquotad 1000112 tcp 4501 rquotad What does rpcinfo -p server-ip show? Do you have selinux enabled? I am not sure if that is playing any role here... Nope, it's disabled: # uname -a Linux node2 3.10.0-229.4.2.el7.x86_64 #1 SMP Wed May 13 10:06:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Thanks for the help, Alessandro Regards, Malahal. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: Hi, this was an extract from the old logs, before Soumya's suggestion of changing the rquota port in the conf file. The new logs are attached (ganesha-20150611.log.gz) as well as the gstack of the ganesha process while I was executing the hanging showmount (ganesha-20150611.gstack.gz). Thanks, Alessandro On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni wrote: Soumya Koduri [skod...@redhat.com] wrote: CCin ganesha-devel to get more inputs. In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha. I am not a network expert but I have seen IPv4 traffic over IPv6 interface while fixing few things before. This may be normal. commit - git show 'd7e8f255' , which got added in v2.2 has more details. # netstat -ltaupn | grep 2049 tcp6 4 0 :::2049 :::* LISTEN 32080/ganesha.nfsd tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT - tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555 CLOSE_WAIT - udp6 0 0 :::2049 :::* 32080/ganesha.nfsd I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: tcp6 0 0 :::111 :::* LISTEN 7433/rpcbind tcp6 0 0 :::2224 :::* LISTEN 9054/ruby tcp6 0 0 :::22 :::* LISTEN 1248/sshd udp6 0 0 :::111 :::* 7433/rpcbind udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 ::1:123 :::* 31238/ntpd udp6 0
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Hi, looking at the code and having recompiled adding some more debug, I might be wrong, but it seems that in nfs_rpc_dispatcher_thread.c, fuction nfs_rpc_dequeue_req, the threads enter the while (!(wqe-flags Wqe_LFlag_SyncDone)) and never exit from there. I do not know if it's normal or not as I should read better the code. Cheers, Alessandro On Fri, 2015-06-12 at 09:35 +0200, Alessandro De Salvo wrote: Hi Malahal, Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni mala...@us.ibm.com ha scritto: The logs indicate that ganesha was started successfully without any exports. gstack output seemed normal as well -- threads were waiting to serve requests. Yes, no exports as it was the default config before enabling Ganesha on any gluster volume. Assuming that you are running showmount -e on the same system, there shouldn't be any firewall coming into the picture. Yes it was the case in my last attempt, from the same machine. I also tried from another machine, but the result was the same. The firewall (firewalld, as it's a CentOS 7.1) is disabled anyways. If you are running showmount from some other system, make sure there is no firewall dropping the packets. I think you need tcpdump trace to figure out the problem. My wireshark trace showed two requests from the client to complete the showmount -e command: 1. Client sent GETPORT call to port 111 (rpcbind) to get the port number of MOUNT. 2. Then it sent EXPORT call to mountd port (port it got in response to #1). Yes, I did it already, and indeed it showed the two requests, so the portmapper works fine, but it hangs on the second request. Also rpcinfo -t localhost portmapper returns successfully, while rpcinfo -t localhost nfs hangs. The output of rpcinfo -p is the following: program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 56082 status 1000241 tcp 41858 status 133 udp 2049 nfs 133 tcp 2049 nfs 134 udp 2049 nfs 134 tcp 2049 nfs 151 udp 45611 mountd 151 tcp 55915 mountd 153 udp 45611 mountd 153 tcp 55915 mountd 1000214 udp 48775 nlockmgr 1000214 tcp 51621 nlockmgr 1000111 udp 4501 rquotad 1000111 tcp 4501 rquotad 1000112 udp 4501 rquotad 1000112 tcp 4501 rquotad What does rpcinfo -p server-ip show? Do you have selinux enabled? I am not sure if that is playing any role here... Nope, it's disabled: # uname -a Linux node2 3.10.0-229.4.2.el7.x86_64 #1 SMP Wed May 13 10:06:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Thanks for the help, Alessandro Regards, Malahal. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: Hi, this was an extract from the old logs, before Soumya's suggestion of changing the rquota port in the conf file. The new logs are attached (ganesha-20150611.log.gz) as well as the gstack of the ganesha process while I was executing the hanging showmount (ganesha-20150611.gstack.gz). Thanks, Alessandro On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni wrote: Soumya Koduri [skod...@redhat.com] wrote: CCin ganesha-devel to get more inputs. In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha. I am not a network expert but I have seen IPv4 traffic over IPv6 interface while fixing few things before. This may be normal. commit - git show 'd7e8f255' , which got added in v2.2 has more details. # netstat -ltaupn | grep 2049 tcp6 4 0 :::2049 :::* LISTEN 32080/ganesha.nfsd tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT - tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555 CLOSE_WAIT - udp6 0 0 :::2049 :::* 32080/ganesha.nfsd I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: tcp6 0 0 :::111 :::* LISTEN 7433/rpcbind tcp6 0 0 :::2224 :::* LISTEN 9054/ruby tcp6 0 0 :::22 :::* LISTEN 1248/sshd udp6 0 0 :::111
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
The logs indicate that ganesha was started successfully without any exports. gstack output seemed normal as well -- threads were waiting to serve requests. Assuming that you are running showmount -e on the same system, there shouldn't be any firewall coming into the picture. If you are running showmount from some other system, make sure there is no firewall dropping the packets. I think you need tcpdump trace to figure out the problem. My wireshark trace showed two requests from the client to complete the showmount -e command: 1. Client sent GETPORT call to port 111 (rpcbind) to get the port number of MOUNT. 2. Then it sent EXPORT call to mountd port (port it got in response to #1). What does rpcinfo -p server-ip show? Do you have selinux enabled? I am not sure if that is playing any role here... Regards, Malahal. Alessandro De Salvo [alessandro.desa...@roma1.infn.it] wrote: Hi, this was an extract from the old logs, before Soumya's suggestion of changing the rquota port in the conf file. The new logs are attached (ganesha-20150611.log.gz) as well as the gstack of the ganesha process while I was executing the hanging showmount (ganesha-20150611.gstack.gz). Thanks, Alessandro On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni wrote: Soumya Koduri [skod...@redhat.com] wrote: CCin ganesha-devel to get more inputs. In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha. I am not a network expert but I have seen IPv4 traffic over IPv6 interface while fixing few things before. This may be normal. commit - git show 'd7e8f255' , which got added in v2.2 has more details. # netstat -ltaupn | grep 2049 tcp6 4 0 :::2049 :::* LISTEN 32080/ganesha.nfsd tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT - tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555 CLOSE_WAIT - udp6 0 0 :::2049 :::* 32080/ganesha.nfsd I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: tcp6 0 0 :::111 :::* LISTEN 7433/rpcbind tcp6 0 0 :::2224 :::* LISTEN 9054/ruby tcp6 0 0 :::22 :::* LISTEN 1248/sshd udp6 0 0 :::111 :::* 7433/rpcbind udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 ::1:123 :::* 31238/ntpd udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd udp6 0 0 :::123 :::* 31238/ntpd udp6 0 0 :::824 :::* 7433/rpcbind The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following: 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use) 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue. 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded The above messages indicate that someone tried to restart ganesha. But ganesha failed to come up because RQUOTA port (default is 875) is already in use by an old ganesha instance or some other program holding it. The new instance of ganesha will die, but if you are using systemd, it will try to restart automatically. We have disabled systemd auto restart in our environment as it was causing issues for debugging. What version of ganesha is this? Regards, Malahal. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Hi, this was an extract from the old logs, before Soumya's suggestion of changing the rquota port in the conf file. The new logs are attached (ganesha-20150611.log.gz) as well as the gstack of the ganesha process while I was executing the hanging showmount (ganesha-20150611.gstack.gz). Thanks, Alessandro On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni wrote: Soumya Koduri [skod...@redhat.com] wrote: CCin ganesha-devel to get more inputs. In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha. I am not a network expert but I have seen IPv4 traffic over IPv6 interface while fixing few things before. This may be normal. commit - git show 'd7e8f255' , which got added in v2.2 has more details. # netstat -ltaupn | grep 2049 tcp6 4 0 :::2049 :::* LISTEN 32080/ganesha.nfsd tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT - tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555 CLOSE_WAIT - udp6 0 0 :::2049 :::* 32080/ganesha.nfsd I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: tcp6 0 0 :::111 :::* LISTEN 7433/rpcbind tcp6 0 0 :::2224 :::* LISTEN 9054/ruby tcp6 0 0 :::22 :::* LISTEN 1248/sshd udp6 0 0 :::111 :::* 7433/rpcbind udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 ::1:123 :::* 31238/ntpd udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd udp6 0 0 :::123 :::* 31238/ntpd udp6 0 0 :::824 :::* 7433/rpcbind The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following: 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use) 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue. 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded The above messages indicate that someone tried to restart ganesha. But ganesha failed to come up because RQUOTA port (default is 875) is already in use by an old ganesha instance or some other program holding it. The new instance of ganesha will die, but if you are using systemd, it will try to restart automatically. We have disabled systemd auto restart in our environment as it was causing issues for debugging. What version of ganesha is this? Regards, Malahal. ganesha-20150611.gstack.gz Description: application/gzip ganesha-20150611.log.gz Description: application/gzip ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Soumya Koduri [skod...@redhat.com] wrote: CCin ganesha-devel to get more inputs. In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha. I am not a network expert but I have seen IPv4 traffic over IPv6 interface while fixing few things before. This may be normal. IPv6 can encapsulate IPv4 traffic. In my testing I use IPv4 addresses, but they are encapsulated in IPv6 (and thus forced me to get Ganesha's support for that to actually work...). commit - git show 'd7e8f255' , which got added in v2.2 has more details. # netstat -ltaupn | grep 2049 tcp6 4 0 :::2049 :::* LISTEN 32080/ganesha.nfsd tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT - tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555 CLOSE_WAIT - udp6 0 0 :::2049 :::* 32080/ganesha.nfsd I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: tcp6 0 0 :::111 :::* LISTEN 7433/rpcbind tcp6 0 0 :::2224 :::* LISTEN 9054/ruby tcp6 0 0 :::22 :::* LISTEN 1248/sshd udp6 0 0 :::111 :::* 7433/rpcbind udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 ::1:123 :::* 31238/ntpd udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd udp6 0 0 :::123 :::* 31238/ntpd udp6 0 0 :::824 :::* 7433/rpcbind The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following: 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use) 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd- 26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue. 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded The above messages indicate that someone tried to restart ganesha. But ganesha failed to come up because RQUOTA port (default is 875) is already in use by an old ganesha instance or some other program holding it. The new instance of ganesha will die, but if you are using systemd, it will try to restart automatically. We have disabled systemd auto restart in our environment as it was causing issues for debugging. What version of ganesha is this? Regards, Malahal. -- ___ Nfs-ganesha-devel mailing list nfs-ganesha-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size
Soumya Koduri [skod...@redhat.com] wrote: CCin ganesha-devel to get more inputs. In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha. I am not a network expert but I have seen IPv4 traffic over IPv6 interface while fixing few things before. This may be normal. commit - git show 'd7e8f255' , which got added in v2.2 has more details. # netstat -ltaupn | grep 2049 tcp6 4 0 :::2049 :::* LISTEN 32080/ganesha.nfsd tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT - tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555 CLOSE_WAIT - udp6 0 0 :::2049 :::* 32080/ganesha.nfsd I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: tcp6 0 0 :::111 :::* LISTEN 7433/rpcbind tcp6 0 0 :::2224 :::* LISTEN 9054/ruby tcp6 0 0 :::22 :::* LISTEN 1248/sshd udp6 0 0 :::111 :::* 7433/rpcbind udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 ::1:123 :::* 31238/ntpd udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd udp6 0 0 :::123 :::* 31238/ntpd udp6 0 0 :::824 :::* 7433/rpcbind The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following: 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use) 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue. 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded The above messages indicate that someone tried to restart ganesha. But ganesha failed to come up because RQUOTA port (default is 875) is already in use by an old ganesha instance or some other program holding it. The new instance of ganesha will die, but if you are using systemd, it will try to restart automatically. We have disabled systemd auto restart in our environment as it was causing issues for debugging. What version of ganesha is this? Regards, Malahal. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users