Re: [Gluster-users] Questions on ganesha HA and shared storage size
Soumya, do you have any other idea of what to check on my side? Many thanks, Alessandro Il giorno 10/giu/2015, alle ore 21:07, Alessandro De Salvo alessandro.desa...@roma1.infn.it ha scritto: Hi, by looking at the connections I also see a strange problem: # netstat -ltaupn | grep 2049 tcp6 4 0 :::2049 :::* LISTEN 32080/ganesha.nfsd tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT - tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555 CLOSE_WAIT - udp6 0 0 :::2049 :::* 32080/ganesha.nfsd Why tcp6 is used with an IPv4 address? In another machine where ganesha 2.1.0 is running I see tcp is used, not tcp6. Could it be that the RPC are always trying to use IPv6? That would be wrong. Thanks, Alessandro On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote: On 06/10/2015 05:49 AM, Alessandro De Salvo wrote: Hi, I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: tcp6 0 0 :::111 :::*LISTEN 7433/rpcbind tcp6 0 0 :::2224 :::*LISTEN 9054/ruby tcp6 0 0 :::22 :::*LISTEN 1248/sshd udp6 0 0 :::111 :::* 7433/rpcbind udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 ::1:123 :::* 31238/ntpd udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd udp6 0 0 :::123 :::* 31238/ntpd udp6 0 0 :::824 :::* 7433/rpcbind The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following: 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use) 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue. 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded We have seen such issues with RPCBIND few times. NFS-Ganesha setup first disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes, there could be delay or issue with Gluster-NFS un-registering those services and when NFS-Ganesha tries to register to the same port, it throws this error. Please try registering Rquota to any random port using below config option in /etc/ganesha/ganesha.conf NFS_Core_Param { #Use a non-privileged port for RQuota Rquota_Port = 4501; } and cleanup '/var/cache/rpcbind/' directory before the setup. Thanks, Soumya Thanks, Alessandro Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri skod...@redhat.com ha scritto: On 06/09/2015 09:47 PM, Alessandro De Salvo wrote: Another update: the fact that I was unable to use vol set ganesha.enable was due to another bug in the ganesha scripts. In short, they are all using the following line to get the location of the conf file: CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =) First of all by default in /etc/sysconfig/ganesha there is no line CONFFILE, second there is a bug in that directive, as it works if I add in /etc/sysconfig/ganesha CONFFILE=/etc/ganesha/ganesha.conf but it fails if the same is quoted CONFFILE=/etc/ganesha/ganesha.conf It would be much better to use the following, which has a default as well: eval $(grep -F CONFFILE= /etc/sysconfig/ganesha) CONF=${CONFFILE:/etc/ganesha/ganesha.conf} I'll update the bug report. Having said this... the last issue to tackle is the real problem with the ganesha.nfsd :-( Thanks. Could you try changing log level to NIV_FULL_DEBUG in '/etc/sysconfig/ganesha' and check if anything gets logged in '/var/log/ganesha.log' or '/ganesha.log'. Thanks, Soumya Cheers, Alessandro On Tue, 2015-06-09 at 14:25 +0200,
Re: [Gluster-users] Questions on ganesha HA and shared storage size
CCin ganesha-devel to get more inputs. In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha. commit - git show 'd7e8f255' , which got added in v2.2 has more details. # netstat -ltaupn | grep 2049 tcp6 4 0 :::2049 :::* LISTEN 32080/ganesha.nfsd tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT - tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555 CLOSE_WAIT - udp6 0 0 :::2049 :::* 32080/ganesha.nfsd Looks like (even from the logs and the netstat output), there was a shutdown request even before the server has come out of grace period. 10/06/2015 01:58:53 : epoch 55777da1 : node2 : ganesha.nfsd-20696[work-6] nfs_rpc_dequeue_req :DISP :F_DBG :dequeue_req try qpair REQ_Q_LOW_LATENCY 0x7fdf8dc67b00:0x7fdf8dc67b68 10/06/2015 01:58:53 : epoch 55777da1 : node2 : ganesha.nfsd-20696[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE .. 10/06/2015 01:58:55 : epoch 55777da1 : node2 : ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of poll loop 10/06/2015 01:58:55 : epoch 55777da1 : node2 : ganesha.nfsd-20696[main] nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED 10/06/2015 01:58:55 : epoch 55777da1 : node2 : ganesha.nfsd-20696[work-12] nfs_rpc_consume_req :DISP :F_DBG :try splice, qpair REQ_Q_LOW_LATENCY consumer qsize=0 producer qsize=0 .. 10/06/2015 01:59:52 : epoch 55777da1 : node2 : ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of poll loop 10/06/2015 01:59:52 : epoch 55777da1 : node2 : ganesha.nfsd-20696[Admin] do_shutdown :MAIN :EVENT :NFS EXIT: stopping NFS service ... 10/06/2015 02:00:00 : epoch 55777da1 : node2 : ganesha.nfsd-20696[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now NOT IN GRACE 10/06/2015 02:00:00 : epoch 55777da1 : node2 : ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of poll loop When you observe the hang, please take 'gstack ganesha_pid' output and post it in the mail. Thanks, Soumya On 06/11/2015 12:37 AM, Alessandro De Salvo wrote: Hi, by looking at the connections I also see a strange problem: # netstat -ltaupn | grep 2049 tcp6 4 0 :::2049 :::* LISTEN 32080/ganesha.nfsd tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT - tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555 CLOSE_WAIT - udp6 0 0 :::2049 :::* 32080/ganesha.nfsd Why tcp6 is used with an IPv4 address? In another machine where ganesha 2.1.0 is running I see tcp is used, not tcp6. Could it be that the RPC are always trying to use IPv6? That would be wrong. Thanks, Alessandro On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote: On 06/10/2015 05:49 AM, Alessandro De Salvo wrote: Hi, I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: tcp6 0 0 :::111 :::*LISTEN 7433/rpcbind tcp6 0 0 :::2224 :::*LISTEN 9054/ruby tcp6 0 0 :::22 :::*LISTEN 1248/sshd udp6 0 0 :::111 :::* 7433/rpcbind udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 ::1:123 :::* 31238/ntpd udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd udp6 0 0 :::123 :::* 31238/ntpd udp6 0 0 :::824 :::* 7433/rpcbind The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following: 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use) 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue. 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded We have seen such issues with RPCBIND few times. NFS-Ganesha setup first disables
Re: [Gluster-users] Questions on ganesha HA and shared storage size
On 06/10/2015 05:49 AM, Alessandro De Salvo wrote: Hi, I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: tcp6 0 0 :::111 :::*LISTEN 7433/rpcbind tcp6 0 0 :::2224 :::*LISTEN 9054/ruby tcp6 0 0 :::22 :::*LISTEN 1248/sshd udp6 0 0 :::111 :::* 7433/rpcbind udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 ::1:123 :::* 31238/ntpd udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd udp6 0 0 :::123 :::* 31238/ntpd udp6 0 0 :::824 :::* 7433/rpcbind The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following: 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use) 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue. 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded We have seen such issues with RPCBIND few times. NFS-Ganesha setup first disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes, there could be delay or issue with Gluster-NFS un-registering those services and when NFS-Ganesha tries to register to the same port, it throws this error. Please try registering Rquota to any random port using below config option in /etc/ganesha/ganesha.conf NFS_Core_Param { #Use a non-privileged port for RQuota Rquota_Port = 4501; } and cleanup '/var/cache/rpcbind/' directory before the setup. Thanks, Soumya Thanks, Alessandro Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri skod...@redhat.com ha scritto: On 06/09/2015 09:47 PM, Alessandro De Salvo wrote: Another update: the fact that I was unable to use vol set ganesha.enable was due to another bug in the ganesha scripts. In short, they are all using the following line to get the location of the conf file: CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =) First of all by default in /etc/sysconfig/ganesha there is no line CONFFILE, second there is a bug in that directive, as it works if I add in /etc/sysconfig/ganesha CONFFILE=/etc/ganesha/ganesha.conf but it fails if the same is quoted CONFFILE=/etc/ganesha/ganesha.conf It would be much better to use the following, which has a default as well: eval $(grep -F CONFFILE= /etc/sysconfig/ganesha) CONF=${CONFFILE:/etc/ganesha/ganesha.conf} I'll update the bug report. Having said this... the last issue to tackle is the real problem with the ganesha.nfsd :-( Thanks. Could you try changing log level to NIV_FULL_DEBUG in '/etc/sysconfig/ganesha' and check if anything gets logged in '/var/log/ganesha.log' or '/ganesha.log'. Thanks, Soumya Cheers, Alessandro On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote: OK, I can confirm that the ganesha.nsfd process is actually not answering to the calls. Here it is what I see: # rpcinfo -p program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 41594 status 1000241 tcp 53631 status 133 udp 2049 nfs 133 tcp 2049 nfs 134 udp 2049 nfs 134 tcp 2049 nfs 151 udp 58127 mountd 151 tcp 56301 mountd 153 udp 58127 mountd 153 tcp 56301 mountd 1000214 udp 46203 nlockmgr 1000214 tcp 41798 nlockmgr 1000111 udp875 rquotad 1000111 tcp875 rquotad 1000112 udp875 rquotad 1000112 tcp875
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Hi Soumya, OK, that trick worked, but now I'm back to the same situation of the hanging showmount -e. Did you check the logs I sent yesterday? Now I'm essentially back to the situation of the fir log (ganesha.log.gz) in all cases. Thanks, Alessandro On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote: On 06/10/2015 05:49 AM, Alessandro De Salvo wrote: Hi, I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: tcp6 0 0 :::111 :::*LISTEN 7433/rpcbind tcp6 0 0 :::2224 :::*LISTEN 9054/ruby tcp6 0 0 :::22 :::*LISTEN 1248/sshd udp6 0 0 :::111 :::* 7433/rpcbind udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 ::1:123 :::* 31238/ntpd udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd udp6 0 0 :::123 :::* 31238/ntpd udp6 0 0 :::824 :::* 7433/rpcbind The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following: 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use) 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue. 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded We have seen such issues with RPCBIND few times. NFS-Ganesha setup first disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes, there could be delay or issue with Gluster-NFS un-registering those services and when NFS-Ganesha tries to register to the same port, it throws this error. Please try registering Rquota to any random port using below config option in /etc/ganesha/ganesha.conf NFS_Core_Param { #Use a non-privileged port for RQuota Rquota_Port = 4501; } and cleanup '/var/cache/rpcbind/' directory before the setup. Thanks, Soumya Thanks, Alessandro Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri skod...@redhat.com ha scritto: On 06/09/2015 09:47 PM, Alessandro De Salvo wrote: Another update: the fact that I was unable to use vol set ganesha.enable was due to another bug in the ganesha scripts. In short, they are all using the following line to get the location of the conf file: CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =) First of all by default in /etc/sysconfig/ganesha there is no line CONFFILE, second there is a bug in that directive, as it works if I add in /etc/sysconfig/ganesha CONFFILE=/etc/ganesha/ganesha.conf but it fails if the same is quoted CONFFILE=/etc/ganesha/ganesha.conf It would be much better to use the following, which has a default as well: eval $(grep -F CONFFILE= /etc/sysconfig/ganesha) CONF=${CONFFILE:/etc/ganesha/ganesha.conf} I'll update the bug report. Having said this... the last issue to tackle is the real problem with the ganesha.nfsd :-( Thanks. Could you try changing log level to NIV_FULL_DEBUG in '/etc/sysconfig/ganesha' and check if anything gets logged in '/var/log/ganesha.log' or '/ganesha.log'. Thanks, Soumya Cheers, Alessandro On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote: OK, I can confirm that the ganesha.nsfd process is actually not answering to the calls. Here it is what I see: # rpcinfo -p program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 41594 status 1000241 tcp 53631 status
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Hi, by looking at the connections I also see a strange problem: # netstat -ltaupn | grep 2049 tcp6 4 0 :::2049 :::* LISTEN 32080/ganesha.nfsd tcp6 1 0 x.x.x.2:2049 x.x.x.2:33285 CLOSE_WAIT - tcp6 1 0 127.0.0.1:2049 127.0.0.1:39555 CLOSE_WAIT - udp6 0 0 :::2049 :::* 32080/ganesha.nfsd Why tcp6 is used with an IPv4 address? In another machine where ganesha 2.1.0 is running I see tcp is used, not tcp6. Could it be that the RPC are always trying to use IPv6? That would be wrong. Thanks, Alessandro On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote: On 06/10/2015 05:49 AM, Alessandro De Salvo wrote: Hi, I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: tcp6 0 0 :::111 :::*LISTEN 7433/rpcbind tcp6 0 0 :::2224 :::*LISTEN 9054/ruby tcp6 0 0 :::22 :::*LISTEN 1248/sshd udp6 0 0 :::111 :::* 7433/rpcbind udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 ::1:123 :::* 31238/ntpd udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd udp6 0 0 :::123 :::* 31238/ntpd udp6 0 0 :::824 :::* 7433/rpcbind The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following: 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use) 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue. 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded We have seen such issues with RPCBIND few times. NFS-Ganesha setup first disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes, there could be delay or issue with Gluster-NFS un-registering those services and when NFS-Ganesha tries to register to the same port, it throws this error. Please try registering Rquota to any random port using below config option in /etc/ganesha/ganesha.conf NFS_Core_Param { #Use a non-privileged port for RQuota Rquota_Port = 4501; } and cleanup '/var/cache/rpcbind/' directory before the setup. Thanks, Soumya Thanks, Alessandro Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri skod...@redhat.com ha scritto: On 06/09/2015 09:47 PM, Alessandro De Salvo wrote: Another update: the fact that I was unable to use vol set ganesha.enable was due to another bug in the ganesha scripts. In short, they are all using the following line to get the location of the conf file: CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =) First of all by default in /etc/sysconfig/ganesha there is no line CONFFILE, second there is a bug in that directive, as it works if I add in /etc/sysconfig/ganesha CONFFILE=/etc/ganesha/ganesha.conf but it fails if the same is quoted CONFFILE=/etc/ganesha/ganesha.conf It would be much better to use the following, which has a default as well: eval $(grep -F CONFFILE= /etc/sysconfig/ganesha) CONF=${CONFFILE:/etc/ganesha/ganesha.conf} I'll update the bug report. Having said this... the last issue to tackle is the real problem with the ganesha.nfsd :-( Thanks. Could you try changing log level to NIV_FULL_DEBUG in '/etc/sysconfig/ganesha' and check if anything gets logged in '/var/log/ganesha.log' or '/ganesha.log'. Thanks, Soumya Cheers, Alessandro On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote: OK, I can confirm that the ganesha.nsfd process is actually not answering to the calls. Here it is what I see: # rpcinfo -p
Re: [Gluster-users] Questions on ganesha HA and shared storage size
A better solution to the pid file problem is to add -p /var/run/ganesha.nfsd.pid” to the OPTIONS in /etc/sysconfig/ganesha, so that it becomes: # cat /etc/sysconfig/ganesha OPTIONS=-L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p /var/run/ganesha.nfsd.pid” This is definitely solving the cluster troubles. However the main problem with the RPC timing out is still there. Cheers, Alessandro Il giorno 09/giu/2015, alle ore 11:18, Alessandro De Salvo alessandro.desa...@roma1.infn.it ha scritto: Hi, OK, the problem with the VIPs not starting is due to the ganesha_mon heartbeat script looking for a pid file called /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is creating /var/run/ganesha.pid, this needs to be corrected. The file is in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case. For the moment I have created a symlink in this way and it works: ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid So far so good, the VIPs are up and pingable, but still there is the problem of the hanging showmount (i.e. hanging RPC). Still, I see a lot of errors like this in /var/log/messages: Jun 9 11:15:20 atlas-node1 lrmd[31221]: notice: operation_finished: nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ] While ganesha.log shows the server is not in grace: 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org http://buildhw-09.phx2.fedoraproject.org/ 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries found in configuration file !!! 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File ((null):0): Empty configuration file 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 60 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (2:2) 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor. 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :9P/TCP dispatcher thread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P dispatcher started 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully 09/06/2015 11:16:22 : epoch
Re: [Gluster-users] Questions on ganesha HA and shared storage size
On 06/09/2015 02:48 PM, Alessandro De Salvo wrote: Hi, OK, the problem with the VIPs not starting is due to the ganesha_mon heartbeat script looking for a pid file called /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is creating /var/run/ganesha.pid, this needs to be corrected. The file is in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case. For the moment I have created a symlink in this way and it works: ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid Thanks. Please update this as well in the bug. So far so good, the VIPs are up and pingable, but still there is the problem of the hanging showmount (i.e. hanging RPC). Still, I see a lot of errors like this in /var/log/messages: Jun 9 11:15:20 atlas-node1 lrmd[31221]: notice: operation_finished: nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ] While ganesha.log shows the server is not in grace: 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org http://buildhw-09.phx2.fedoraproject.org 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries found in configuration file !!! 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File ((null):0): Empty configuration file 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 60 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (2:2) 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor. 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :9P/TCP dispatcher thread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P dispatcher started 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT :- 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT :
Re: [Gluster-users] Questions on ganesha HA and shared storage size
On 06/09/2015 02:06 PM, Alessandro De Salvo wrote: Hi Soumya, Il giorno 09/giu/2015, alle ore 08:06, Soumya Koduri skod...@redhat.com ha scritto: On 06/09/2015 01:31 AM, Alessandro De Salvo wrote: OK, I found at least one of the bugs. The /usr/libexec/ganesha/ganesha.sh has the following lines: if [ -e /etc/os-release ]; then RHEL6_PCS_CNAME_OPTION= fi This is OK for RHEL 7, but does not work for = 7. I have changed it to the following, to make it working: if [ -e /etc/os-release ]; then eval $(grep -F REDHAT_SUPPORT_PRODUCT= /etc/os-release) [ $REDHAT_SUPPORT_PRODUCT == Fedora ] RHEL6_PCS_CNAME_OPTION= fi Oh..Thanks for the fix. Could you please file a bug for the same (and probably submit your fix as well). We shall have it corrected. Just did it, https://bugzilla.redhat.com/show_bug.cgi?id=1229601 Thanks! Apart from that, the VIP_node I was using were wrong, and I should have converted all the “-“ to underscores, maybe this could be mentioned in the documentation when you will have it ready. Now, the cluster starts, but the VIPs apparently not: Sure. Thanks again for pointing it out. We shall make a note of it. Online: [ atlas-node1 atlas-node2 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ atlas-node1 atlas-node2 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ atlas-node1 atlas-node2 ] atlas-node1-cluster_ip-1 (ocf::heartbeat:IPaddr):Stopped atlas-node1-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 atlas-node2-cluster_ip-1 (ocf::heartbeat:IPaddr):Stopped atlas-node2-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 PCSD Status: atlas-node1: Online atlas-node2: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled Here corosync and pacemaker shows 'disabled' state. Can you check the status of their services. They should be running prior to cluster creation. We need to include that step in document as well. Ah, OK, you’re right, I have added it to my puppet modules (we install and configure ganesha via puppet, I’ll put the module on puppetforge soon, in case anyone is interested). Sure. This sounds great. Please let us know once its added. But the issue that is puzzling me more is the following: # showmount -e localhost rpc mount export: RPC: Timed out And when I try to enable the ganesha exports on a volume I get this error: # gluster volume set atlas-home-01 ganesha.enable on volume set: failed: Failed to create NFS-Ganesha export config file. But I see the file created in /etc/ganesha/exports/*.conf Still, showmount hangs and times out. Any help? Thanks, Hmm that's strange. Sometimes, in case if there was no proper cleanup done while trying to re-create the cluster, we have seen such issues. https://bugzilla.redhat.com/show_bug.cgi?id=1227709 http://review.gluster.org/#/c/11093/ Can you please unexport all the volumes, teardown the cluster using 'gluster vol set volname ganesha.enable off’ OK: # gluster vol set atlas-home-01 ganesha.enable off volume set: failed: ganesha.enable is already 'off'. # gluster vol set atlas-data-01 ganesha.enable off volume set: failed: ganesha.enable is already 'off'. 'gluster ganesha disable' command. I’m assuming you wanted to write nfs-ganesha instead? Am sorry. you are right. I was referring to 'nfs-ganesha' # gluster nfs-ganesha disable ganesha enable : success A side note (not really important): it’s strange that when I do a disable the message is “ganesha enable” :-) yeah. This doesn't seem correct. Please update the bug(s) with all the discrepancies you have found. Thanks, Soumya Verify if the following files have been deleted on all the nodes- '/etc/cluster/cluster.conf’ this file is not present at all, I think it’s not needed in CentOS 7 '/etc/ganesha/ganesha.conf’, it’s still there, but empty, and I guess it should be OK, right? '/etc/ganesha/exports/*’ no more files there '/var/lib/pacemaker/cib’ it’s empty Verify if the ganesha service is stopped on all the nodes. nope, it’s still running, I will stop it. start/restart the services - corosync, pcs. In the node where I issued the nfs-ganesha disable there is no more any /etc/corosync/corosync.conf so corosync won’t start. The other node instead still has the file, it’s strange. And re-try the HA cluster creation 'gluster ganesha enable’ This time (repeated twice) it did not work at all: # pcs status Cluster name: ATLAS_GANESHA_01 Last updated: Tue Jun 9 10:13:43 2015 Last change: Tue Jun 9 10:13:22 2015 Stack: corosync Current DC: atlas-node1 (1) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 6 Resources configured Online: [ atlas-node1
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Hi, OK, the problem with the VIPs not starting is due to the ganesha_mon heartbeat script looking for a pid file called /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is creating /var/run/ganesha.pid, this needs to be corrected. The file is in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case. For the moment I have created a symlink in this way and it works: ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid So far so good, the VIPs are up and pingable, but still there is the problem of the hanging showmount (i.e. hanging RPC). Still, I see a lot of errors like this in /var/log/messages: Jun 9 11:15:20 atlas-node1 lrmd[31221]: notice: operation_finished: nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ] While ganesha.log shows the server is not in grace: 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries found in configuration file !!! 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File ((null):0): Empty configuration file 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 60 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (2:2) 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor. 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :9P/TCP dispatcher thread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P dispatcher started 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT :- 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main]
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Hi, Il giorno 09/giu/2015, alle ore 11:46, Soumya Koduri skod...@redhat.com ha scritto: On 06/09/2015 02:48 PM, Alessandro De Salvo wrote: Hi, OK, the problem with the VIPs not starting is due to the ganesha_mon heartbeat script looking for a pid file called /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is creating /var/run/ganesha.pid, this needs to be corrected. The file is in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case. For the moment I have created a symlink in this way and it works: ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid Thanks. Please update this as well in the bug. Done :-) So far so good, the VIPs are up and pingable, but still there is the problem of the hanging showmount (i.e. hanging RPC). Still, I see a lot of errors like this in /var/log/messages: Jun 9 11:15:20 atlas-node1 lrmd[31221]: notice: operation_finished: nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ] While ganesha.log shows the server is not in grace: 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org http://buildhw-09.phx2.fedoraproject.org 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries found in configuration file !!! 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File ((null):0): Empty configuration file 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 60 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory (/var/run/ganesha) already exists 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential failed (2:2) 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :Starting delayed executor. 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :9P/TCP dispatcher thread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P dispatcher started 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :admin thread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :General fridge was started successfully 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main]
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Hi Soumya, Il giorno 09/giu/2015, alle ore 08:06, Soumya Koduri skod...@redhat.com ha scritto: On 06/09/2015 01:31 AM, Alessandro De Salvo wrote: OK, I found at least one of the bugs. The /usr/libexec/ganesha/ganesha.sh has the following lines: if [ -e /etc/os-release ]; then RHEL6_PCS_CNAME_OPTION= fi This is OK for RHEL 7, but does not work for = 7. I have changed it to the following, to make it working: if [ -e /etc/os-release ]; then eval $(grep -F REDHAT_SUPPORT_PRODUCT= /etc/os-release) [ $REDHAT_SUPPORT_PRODUCT == Fedora ] RHEL6_PCS_CNAME_OPTION= fi Oh..Thanks for the fix. Could you please file a bug for the same (and probably submit your fix as well). We shall have it corrected. Just did it, https://bugzilla.redhat.com/show_bug.cgi?id=1229601 Apart from that, the VIP_node I was using were wrong, and I should have converted all the “-“ to underscores, maybe this could be mentioned in the documentation when you will have it ready. Now, the cluster starts, but the VIPs apparently not: Sure. Thanks again for pointing it out. We shall make a note of it. Online: [ atlas-node1 atlas-node2 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ atlas-node1 atlas-node2 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ atlas-node1 atlas-node2 ] atlas-node1-cluster_ip-1 (ocf::heartbeat:IPaddr):Stopped atlas-node1-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 atlas-node2-cluster_ip-1 (ocf::heartbeat:IPaddr):Stopped atlas-node2-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 PCSD Status: atlas-node1: Online atlas-node2: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled Here corosync and pacemaker shows 'disabled' state. Can you check the status of their services. They should be running prior to cluster creation. We need to include that step in document as well. Ah, OK, you’re right, I have added it to my puppet modules (we install and configure ganesha via puppet, I’ll put the module on puppetforge soon, in case anyone is interested). But the issue that is puzzling me more is the following: # showmount -e localhost rpc mount export: RPC: Timed out And when I try to enable the ganesha exports on a volume I get this error: # gluster volume set atlas-home-01 ganesha.enable on volume set: failed: Failed to create NFS-Ganesha export config file. But I see the file created in /etc/ganesha/exports/*.conf Still, showmount hangs and times out. Any help? Thanks, Hmm that's strange. Sometimes, in case if there was no proper cleanup done while trying to re-create the cluster, we have seen such issues. https://bugzilla.redhat.com/show_bug.cgi?id=1227709 http://review.gluster.org/#/c/11093/ Can you please unexport all the volumes, teardown the cluster using 'gluster vol set volname ganesha.enable off’ OK: # gluster vol set atlas-home-01 ganesha.enable off volume set: failed: ganesha.enable is already 'off'. # gluster vol set atlas-data-01 ganesha.enable off volume set: failed: ganesha.enable is already 'off'. 'gluster ganesha disable' command. I’m assuming you wanted to write nfs-ganesha instead? # gluster nfs-ganesha disable ganesha enable : success A side note (not really important): it’s strange that when I do a disable the message is “ganesha enable” :-) Verify if the following files have been deleted on all the nodes- '/etc/cluster/cluster.conf’ this file is not present at all, I think it’s not needed in CentOS 7 '/etc/ganesha/ganesha.conf’, it’s still there, but empty, and I guess it should be OK, right? '/etc/ganesha/exports/*’ no more files there '/var/lib/pacemaker/cib’ it’s empty Verify if the ganesha service is stopped on all the nodes. nope, it’s still running, I will stop it. start/restart the services - corosync, pcs. In the node where I issued the nfs-ganesha disable there is no more any /etc/corosync/corosync.conf so corosync won’t start. The other node instead still has the file, it’s strange. And re-try the HA cluster creation 'gluster ganesha enable’ This time (repeated twice) it did not work at all: # pcs status Cluster name: ATLAS_GANESHA_01 Last updated: Tue Jun 9 10:13:43 2015 Last change: Tue Jun 9 10:13:22 2015 Stack: corosync Current DC: atlas-node1 (1) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 6 Resources configured Online: [ atlas-node1 atlas-node2 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ atlas-node1 atlas-node2 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ atlas-node1 atlas-node2 ] atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started
Re: [Gluster-users] Questions on ganesha HA and shared storage size
On 06/09/2015 01:31 AM, Alessandro De Salvo wrote: OK, I found at least one of the bugs. The /usr/libexec/ganesha/ganesha.sh has the following lines: if [ -e /etc/os-release ]; then RHEL6_PCS_CNAME_OPTION= fi This is OK for RHEL 7, but does not work for = 7. I have changed it to the following, to make it working: if [ -e /etc/os-release ]; then eval $(grep -F REDHAT_SUPPORT_PRODUCT= /etc/os-release) [ $REDHAT_SUPPORT_PRODUCT == Fedora ] RHEL6_PCS_CNAME_OPTION= fi Oh..Thanks for the fix. Could you please file a bug for the same (and probably submit your fix as well). We shall have it corrected. Apart from that, the VIP_node I was using were wrong, and I should have converted all the “-“ to underscores, maybe this could be mentioned in the documentation when you will have it ready. Now, the cluster starts, but the VIPs apparently not: Sure. Thanks again for pointing it out. We shall make a note of it. Online: [ atlas-node1 atlas-node2 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ atlas-node1 atlas-node2 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ atlas-node1 atlas-node2 ] atlas-node1-cluster_ip-1 (ocf::heartbeat:IPaddr):Stopped atlas-node1-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 atlas-node2-cluster_ip-1 (ocf::heartbeat:IPaddr):Stopped atlas-node2-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 PCSD Status: atlas-node1: Online atlas-node2: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled Here corosync and pacemaker shows 'disabled' state. Can you check the status of their services. They should be running prior to cluster creation. We need to include that step in document as well. But the issue that is puzzling me more is the following: # showmount -e localhost rpc mount export: RPC: Timed out And when I try to enable the ganesha exports on a volume I get this error: # gluster volume set atlas-home-01 ganesha.enable on volume set: failed: Failed to create NFS-Ganesha export config file. But I see the file created in /etc/ganesha/exports/*.conf Still, showmount hangs and times out. Any help? Thanks, Hmm that's strange. Sometimes, in case if there was no proper cleanup done while trying to re-create the cluster, we have seen such issues. https://bugzilla.redhat.com/show_bug.cgi?id=1227709 http://review.gluster.org/#/c/11093/ Can you please unexport all the volumes, teardown the cluster using 'gluster vol set volname ganesha.enable off' 'gluster ganesha disable' command. Verify if the following files have been deleted on all the nodes- '/etc/cluster/cluster.conf' '/etc/ganesha/ganesha.conf', '/etc/ganesha/exports/*' '/var/lib/pacemaker/cib' Verify if the ganesha service is stopped on all the nodes. start/restart the services - corosync, pcs. And re-try the HA cluster creation 'gluster ganesha enable' Thanks, Soumya Alessandro Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo alessandro.desa...@roma1.infn.it ha scritto: Hi, indeed, it does not work :-) OK, this is what I did, with 2 machines, running CentOS 7.1, Glusterfs 3.7.1 and nfs-ganesha 2.2.0: 1) ensured that the machines are able to resolve their IPs (but this was already true since they were in the DNS); 2) disabled NetworkManager and enabled network on both machines; 3) created a gluster shared volume 'gluster_shared_storage' and mounted it on '/run/gluster/shared_storage' on all the cluster nodes using glusterfs native mount (on CentOS 7.1 there is a link by default /var/run - ../run) 4) created an empty /etc/ganesha/ganesha.conf; 5) installed pacemaker pcs resource-agents corosync on all cluster machines; 6) set the ‘hacluster’ user the same password on all machines; 7) pcs cluster auth hostname -u hacluster -p pass on all the nodes (on both nodes I issued the commands for both nodes) 8) IPv6 is configured by default on all nodes, although the infrastructure is not ready for IPv6 9) enabled pcsd and started it on all nodes 10) populated /etc/ganesha/ganesha-ha.conf with the following contents, one per machine: === atlas-node1 # Name of the HA cluster created. HA_NAME=ATLAS_GANESHA_01 # The server from which you intend to mount # the shared volume. HA_VOL_SERVER=“atlas-node1 # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES=“atlas-node1,atlas-node2 # Virtual IPs of each of the nodes specified above. VIP_atlas-node1=“x.x.x.1 VIP_atlas-node2=“x.x.x.2 === atlas-node2 # Name of the HA cluster created. HA_NAME=ATLAS_GANESHA_01 # The server from which you intend to mount # the shared volume. HA_VOL_SERVER=“atlas-node2 # The subset of nodes of the
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Another update: the fact that I was unable to use vol set ganesha.enable was due to another bug in the ganesha scripts. In short, they are all using the following line to get the location of the conf file: CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =) First of all by default in /etc/sysconfig/ganesha there is no line CONFFILE, second there is a bug in that directive, as it works if I add in /etc/sysconfig/ganesha CONFFILE=/etc/ganesha/ganesha.conf but it fails if the same is quoted CONFFILE=/etc/ganesha/ganesha.conf It would be much better to use the following, which has a default as well: eval $(grep -F CONFFILE= /etc/sysconfig/ganesha) CONF=${CONFFILE:/etc/ganesha/ganesha.conf} I'll update the bug report. Having said this... the last issue to tackle is the real problem with the ganesha.nfsd :-( Cheers, Alessandro On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote: OK, I can confirm that the ganesha.nsfd process is actually not answering to the calls. Here it is what I see: # rpcinfo -p program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 41594 status 1000241 tcp 53631 status 133 udp 2049 nfs 133 tcp 2049 nfs 134 udp 2049 nfs 134 tcp 2049 nfs 151 udp 58127 mountd 151 tcp 56301 mountd 153 udp 58127 mountd 153 tcp 56301 mountd 1000214 udp 46203 nlockmgr 1000214 tcp 41798 nlockmgr 1000111 udp875 rquotad 1000111 tcp875 rquotad 1000112 udp875 rquotad 1000112 tcp875 rquotad # netstat -lpn | grep ganesha tcp6 14 0 :::2049 :::* LISTEN 11937/ganesha.nfsd tcp6 0 0 :::41798:::* LISTEN 11937/ganesha.nfsd tcp6 0 0 :::875 :::* LISTEN 11937/ganesha.nfsd tcp6 10 0 :::56301:::* LISTEN 11937/ganesha.nfsd tcp6 0 0 :::564 :::* LISTEN 11937/ganesha.nfsd udp6 0 0 :::2049 :::* 11937/ganesha.nfsd udp6 0 0 :::46203:::* 11937/ganesha.nfsd udp6 0 0 :::58127:::* 11937/ganesha.nfsd udp6 0 0 :::875 :::* 11937/ganesha.nfsd I'm attaching the strace of a showmount from a node to the other. This machinery was working with nfs-ganesha 2.1.0, so it must be something introduced with 2.2.0. Cheers, Alessandro On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote: On 06/09/2015 02:48 PM, Alessandro De Salvo wrote: Hi, OK, the problem with the VIPs not starting is due to the ganesha_mon heartbeat script looking for a pid file called /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is creating /var/run/ganesha.pid, this needs to be corrected. The file is in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case. For the moment I have created a symlink in this way and it works: ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid Thanks. Please update this as well in the bug. So far so good, the VIPs are up and pingable, but still there is the problem of the hanging showmount (i.e. hanging RPC). Still, I see a lot of errors like this in /var/log/messages: Jun 9 11:15:20 atlas-node1 lrmd[31221]: notice: operation_finished: nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ] While ganesha.log shows the server is not in grace: 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org http://buildhw-09.phx2.fedoraproject.org 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries found in configuration file !!! 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config
Re: [Gluster-users] Questions on ganesha HA and shared storage size
On 06/09/2015 09:47 PM, Alessandro De Salvo wrote: Another update: the fact that I was unable to use vol set ganesha.enable was due to another bug in the ganesha scripts. In short, they are all using the following line to get the location of the conf file: CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =) First of all by default in /etc/sysconfig/ganesha there is no line CONFFILE, second there is a bug in that directive, as it works if I add in /etc/sysconfig/ganesha CONFFILE=/etc/ganesha/ganesha.conf but it fails if the same is quoted CONFFILE=/etc/ganesha/ganesha.conf It would be much better to use the following, which has a default as well: eval $(grep -F CONFFILE= /etc/sysconfig/ganesha) CONF=${CONFFILE:/etc/ganesha/ganesha.conf} I'll update the bug report. Having said this... the last issue to tackle is the real problem with the ganesha.nfsd :-( Thanks. Could you try changing log level to NIV_FULL_DEBUG in '/etc/sysconfig/ganesha' and check if anything gets logged in '/var/log/ganesha.log' or '/ganesha.log'. Thanks, Soumya Cheers, Alessandro On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote: OK, I can confirm that the ganesha.nsfd process is actually not answering to the calls. Here it is what I see: # rpcinfo -p program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 41594 status 1000241 tcp 53631 status 133 udp 2049 nfs 133 tcp 2049 nfs 134 udp 2049 nfs 134 tcp 2049 nfs 151 udp 58127 mountd 151 tcp 56301 mountd 153 udp 58127 mountd 153 tcp 56301 mountd 1000214 udp 46203 nlockmgr 1000214 tcp 41798 nlockmgr 1000111 udp875 rquotad 1000111 tcp875 rquotad 1000112 udp875 rquotad 1000112 tcp875 rquotad # netstat -lpn | grep ganesha tcp6 14 0 :::2049 :::* LISTEN 11937/ganesha.nfsd tcp6 0 0 :::41798:::* LISTEN 11937/ganesha.nfsd tcp6 0 0 :::875 :::* LISTEN 11937/ganesha.nfsd tcp6 10 0 :::56301:::* LISTEN 11937/ganesha.nfsd tcp6 0 0 :::564 :::* LISTEN 11937/ganesha.nfsd udp6 0 0 :::2049 :::* 11937/ganesha.nfsd udp6 0 0 :::46203:::* 11937/ganesha.nfsd udp6 0 0 :::58127:::* 11937/ganesha.nfsd udp6 0 0 :::875 :::* 11937/ganesha.nfsd I'm attaching the strace of a showmount from a node to the other. This machinery was working with nfs-ganesha 2.1.0, so it must be something introduced with 2.2.0. Cheers, Alessandro On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote: On 06/09/2015 02:48 PM, Alessandro De Salvo wrote: Hi, OK, the problem with the VIPs not starting is due to the ganesha_mon heartbeat script looking for a pid file called /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is creating /var/run/ganesha.pid, this needs to be corrected. The file is in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case. For the moment I have created a symlink in this way and it works: ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid Thanks. Please update this as well in the bug. So far so good, the VIPs are up and pingable, but still there is the problem of the hanging showmount (i.e. hanging RPC). Still, I see a lot of errors like this in /var/log/messages: Jun 9 11:15:20 atlas-node1 lrmd[31221]: notice: operation_finished: nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ] While ganesha.log shows the server is not in grace: 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org http://buildhw-09.phx2.fedoraproject.org 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries found in configuration file !!! 09/06/2015 11:16:20 : epoch 5576aee4
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Hi, I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen: tcp6 0 0 :::111 :::*LISTEN 7433/rpcbind tcp6 0 0 :::2224 :::*LISTEN 9054/ruby tcp6 0 0 :::22 :::*LISTEN 1248/sshd udp6 0 0 :::111 :::* 7433/rpcbind udp6 0 0 fe80::8c2:27ff:fef2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 fe80::230:48ff:fed2:123 :::* 31238/ntpd udp6 0 0 ::1:123 :::* 31238/ntpd udp6 0 0 fe80::5484:7aff:fef:123 :::* 31238/ntpd udp6 0 0 :::123 :::* 31238/ntpd udp6 0 0 :::824 :::* 7433/rpcbind The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following: 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use) 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue. 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded Thanks, Alessandro ganesha.log.gz Description: GNU Zip compressed data ganesha-after-export.log.gz Description: GNU Zip compressed data Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri skod...@redhat.com ha scritto: On 06/09/2015 09:47 PM, Alessandro De Salvo wrote: Another update: the fact that I was unable to use vol set ganesha.enable was due to another bug in the ganesha scripts. In short, they are all using the following line to get the location of the conf file: CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =) First of all by default in /etc/sysconfig/ganesha there is no line CONFFILE, second there is a bug in that directive, as it works if I add in /etc/sysconfig/ganesha CONFFILE=/etc/ganesha/ganesha.conf but it fails if the same is quoted CONFFILE=/etc/ganesha/ganesha.conf It would be much better to use the following, which has a default as well: eval $(grep -F CONFFILE= /etc/sysconfig/ganesha) CONF=${CONFFILE:/etc/ganesha/ganesha.conf} I'll update the bug report. Having said this... the last issue to tackle is the real problem with the ganesha.nfsd :-( Thanks. Could you try changing log level to NIV_FULL_DEBUG in '/etc/sysconfig/ganesha' and check if anything gets logged in '/var/log/ganesha.log' or '/ganesha.log'. Thanks, Soumya Cheers, Alessandro On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote: OK, I can confirm that the ganesha.nsfd process is actually not answering to the calls. Here it is what I see: # rpcinfo -p program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 41594 status 1000241 tcp 53631 status 133 udp 2049 nfs 133 tcp 2049 nfs 134 udp 2049 nfs 134 tcp 2049 nfs 151 udp 58127 mountd 151 tcp 56301 mountd 153 udp 58127 mountd 153 tcp 56301 mountd 1000214 udp 46203 nlockmgr 1000214 tcp 41798 nlockmgr 1000111 udp875 rquotad 1000111 tcp875 rquotad 1000112 udp875 rquotad 1000112 tcp875 rquotad # netstat -lpn | grep ganesha tcp6 14 0 :::2049 :::* LISTEN 11937/ganesha.nfsd tcp6 0 0 :::41798:::* LISTEN 11937/ganesha.nfsd tcp6 0 0 :::875 :::* LISTEN 11937/ganesha.nfsd tcp6 10 0 :::56301:::* LISTEN 11937/ganesha.nfsd tcp6
Re: [Gluster-users] Questions on ganesha HA and shared storage size
OK, I can confirm that the ganesha.nsfd process is actually not answering to the calls. Here it is what I see: # rpcinfo -p program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 41594 status 1000241 tcp 53631 status 133 udp 2049 nfs 133 tcp 2049 nfs 134 udp 2049 nfs 134 tcp 2049 nfs 151 udp 58127 mountd 151 tcp 56301 mountd 153 udp 58127 mountd 153 tcp 56301 mountd 1000214 udp 46203 nlockmgr 1000214 tcp 41798 nlockmgr 1000111 udp875 rquotad 1000111 tcp875 rquotad 1000112 udp875 rquotad 1000112 tcp875 rquotad # netstat -lpn | grep ganesha tcp6 14 0 :::2049 :::* LISTEN 11937/ganesha.nfsd tcp6 0 0 :::41798:::* LISTEN 11937/ganesha.nfsd tcp6 0 0 :::875 :::* LISTEN 11937/ganesha.nfsd tcp6 10 0 :::56301:::* LISTEN 11937/ganesha.nfsd tcp6 0 0 :::564 :::* LISTEN 11937/ganesha.nfsd udp6 0 0 :::2049 :::* 11937/ganesha.nfsd udp6 0 0 :::46203:::* 11937/ganesha.nfsd udp6 0 0 :::58127:::* 11937/ganesha.nfsd udp6 0 0 :::875 :::* 11937/ganesha.nfsd I'm attaching the strace of a showmount from a node to the other. This machinery was working with nfs-ganesha 2.1.0, so it must be something introduced with 2.2.0. Cheers, Alessandro On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote: On 06/09/2015 02:48 PM, Alessandro De Salvo wrote: Hi, OK, the problem with the VIPs not starting is due to the ganesha_mon heartbeat script looking for a pid file called /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is creating /var/run/ganesha.pid, this needs to be corrected. The file is in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case. For the moment I have created a symlink in this way and it works: ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid Thanks. Please update this as well in the bug. So far so good, the VIPs are up and pingable, but still there is the problem of the hanging showmount (i.e. hanging RPC). Still, I see a lot of errors like this in /var/log/messages: Jun 9 11:15:20 atlas-node1 lrmd[31221]: notice: operation_finished: nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ] While ganesha.log shows the server is not in grace: 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org http://buildhw-09.phx2.fedoraproject.org 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully parsed 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized. 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries found in configuration file !!! 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File ((null):0): Empty configuration file 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed for proper quota management in FSAL 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs 09/06/2015 11:16:21 : epoch
Re: [Gluster-users] Questions on ganesha HA and shared storage size
OK, I found at least one of the bugs. The /usr/libexec/ganesha/ganesha.sh has the following lines: if [ -e /etc/os-release ]; then RHEL6_PCS_CNAME_OPTION= fi This is OK for RHEL 7, but does not work for = 7. I have changed it to the following, to make it working: if [ -e /etc/os-release ]; then eval $(grep -F REDHAT_SUPPORT_PRODUCT= /etc/os-release) [ $REDHAT_SUPPORT_PRODUCT == Fedora ] RHEL6_PCS_CNAME_OPTION= fi Apart from that, the VIP_node I was using were wrong, and I should have converted all the “-“ to underscores, maybe this could be mentioned in the documentation when you will have it ready. Now, the cluster starts, but the VIPs apparently not: Online: [ atlas-node1 atlas-node2 ] Full list of resources: Clone Set: nfs-mon-clone [nfs-mon] Started: [ atlas-node1 atlas-node2 ] Clone Set: nfs-grace-clone [nfs-grace] Started: [ atlas-node1 atlas-node2 ] atlas-node1-cluster_ip-1 (ocf::heartbeat:IPaddr):Stopped atlas-node1-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 atlas-node2-cluster_ip-1 (ocf::heartbeat:IPaddr):Stopped atlas-node2-trigger_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 PCSD Status: atlas-node1: Online atlas-node2: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled But the issue that is puzzling me more is the following: # showmount -e localhost rpc mount export: RPC: Timed out And when I try to enable the ganesha exports on a volume I get this error: # gluster volume set atlas-home-01 ganesha.enable on volume set: failed: Failed to create NFS-Ganesha export config file. But I see the file created in /etc/ganesha/exports/*.conf Still, showmount hangs and times out. Any help? Thanks, Alessandro Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo alessandro.desa...@roma1.infn.it ha scritto: Hi, indeed, it does not work :-) OK, this is what I did, with 2 machines, running CentOS 7.1, Glusterfs 3.7.1 and nfs-ganesha 2.2.0: 1) ensured that the machines are able to resolve their IPs (but this was already true since they were in the DNS); 2) disabled NetworkManager and enabled network on both machines; 3) created a gluster shared volume 'gluster_shared_storage' and mounted it on '/run/gluster/shared_storage' on all the cluster nodes using glusterfs native mount (on CentOS 7.1 there is a link by default /var/run - ../run) 4) created an empty /etc/ganesha/ganesha.conf; 5) installed pacemaker pcs resource-agents corosync on all cluster machines; 6) set the ‘hacluster’ user the same password on all machines; 7) pcs cluster auth hostname -u hacluster -p pass on all the nodes (on both nodes I issued the commands for both nodes) 8) IPv6 is configured by default on all nodes, although the infrastructure is not ready for IPv6 9) enabled pcsd and started it on all nodes 10) populated /etc/ganesha/ganesha-ha.conf with the following contents, one per machine: === atlas-node1 # Name of the HA cluster created. HA_NAME=ATLAS_GANESHA_01 # The server from which you intend to mount # the shared volume. HA_VOL_SERVER=“atlas-node1 # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES=“atlas-node1,atlas-node2 # Virtual IPs of each of the nodes specified above. VIP_atlas-node1=“x.x.x.1 VIP_atlas-node2=“x.x.x.2 === atlas-node2 # Name of the HA cluster created. HA_NAME=ATLAS_GANESHA_01 # The server from which you intend to mount # the shared volume. HA_VOL_SERVER=“atlas-node2 # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES=“atlas-node1,atlas-node2 # Virtual IPs of each of the nodes specified above. VIP_atlas-node1=“x.x.x.1 VIP_atlas-node2=“x.x.x.2” 11) issued gluster nfs-ganesha enable, but it fails with a cryptic message: # gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha. Please check the log file for details Looking at the logs I found nothing really special but this: == /var/log/glusterfs/etc-glusterfs-glusterd.vol.log == [2015-06-08 17:57:15.672844] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped [2015-06-08 17:57:15.675395] I [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host found Hostname is atlas-node2 [2015-06-08 17:57:15.720692] I [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host found Hostname is atlas-node2 [2015-06-08 17:57:15.721161] I [glusterd-ganesha.c:335:is_ganesha_host]
Re: [Gluster-users] Questions on ganesha HA and shared storage size
On 06/08/2015 08:20 PM, Alessandro De Salvo wrote: Sorry, just another question: - in my installation of gluster 3.7.1 the command gluster features.ganesha enable does not work: # gluster features.ganesha enable unrecognized word: features.ganesha (position 0) Which version has full support for it? Sorry. This option has recently been changed. It is now $ gluster nfs-ganesha enable - in the documentation the ccs and cman packages are required, but they seems not to be available anymore on CentOS 7 and similar, I guess they are not really required anymore, as pcs should do the full job Thanks, Alessandro Looks like so from http://clusterlabs.org/quickstart-redhat.html. Let us know if it doesn't work. Thanks, Soumya Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo alessandro.desa...@roma1.infn.it ha scritto: Great, many thanks Soumya! Cheers, Alessandro Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri skod...@redhat.com ha scritto: Hi, Please find the slides of the demo video at [1] We recommend to have a distributed replica volume as a shared volume for better data-availability. Size of the volume depends on the workload you may have. Since it is used to maintain states of NLM/NFSv4 clients, you may calculate the size of the volume to be minimum of aggregate of (typical_size_of'/var/lib/nfs'_directory + ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point) We shall document about this feature sooner in the gluster docs as well. Thanks, Soumya [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846 On 06/08/2015 04:34 PM, Alessandro De Salvo wrote: Hi, I have seen the demo video on ganesha HA, https://www.youtube.com/watch?v=Z4mvTQC-efM However there is no advice on the appropriate size of the shared volume. How is it really used, and what should be a reasonable size for it? Also, are the slides from the video available somewhere, as well as a documentation on all this? I did not manage to find them. Thanks, Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Hi, indeed, it does not work :-) OK, this is what I did, with 2 machines, running CentOS 7.1, Glusterfs 3.7.1 and nfs-ganesha 2.2.0: 1) ensured that the machines are able to resolve their IPs (but this was already true since they were in the DNS); 2) disabled NetworkManager and enabled network on both machines; 3) created a gluster shared volume 'gluster_shared_storage' and mounted it on '/run/gluster/shared_storage' on all the cluster nodes using glusterfs native mount (on CentOS 7.1 there is a link by default /var/run - ../run) 4) created an empty /etc/ganesha/ganesha.conf; 5) installed pacemaker pcs resource-agents corosync on all cluster machines; 6) set the ‘hacluster’ user the same password on all machines; 7) pcs cluster auth hostname -u hacluster -p pass on all the nodes (on both nodes I issued the commands for both nodes) 8) IPv6 is configured by default on all nodes, although the infrastructure is not ready for IPv6 9) enabled pcsd and started it on all nodes 10) populated /etc/ganesha/ganesha-ha.conf with the following contents, one per machine: === atlas-node1 # Name of the HA cluster created. HA_NAME=ATLAS_GANESHA_01 # The server from which you intend to mount # the shared volume. HA_VOL_SERVER=“atlas-node1 # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES=“atlas-node1,atlas-node2 # Virtual IPs of each of the nodes specified above. VIP_atlas-node1=“x.x.x.1 VIP_atlas-node2=“x.x.x.2 === atlas-node2 # Name of the HA cluster created. HA_NAME=ATLAS_GANESHA_01 # The server from which you intend to mount # the shared volume. HA_VOL_SERVER=“atlas-node2 # The subset of nodes of the Gluster Trusted Pool # that forms the ganesha HA cluster. IP/Hostname # is specified. HA_CLUSTER_NODES=“atlas-node1,atlas-node2 # Virtual IPs of each of the nodes specified above. VIP_atlas-node1=“x.x.x.1 VIP_atlas-node2=“x.x.x.2” 11) issued gluster nfs-ganesha enable, but it fails with a cryptic message: # gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha. Please check the log file for details Looking at the logs I found nothing really special but this: == /var/log/glusterfs/etc-glusterfs-glusterd.vol.log == [2015-06-08 17:57:15.672844] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped [2015-06-08 17:57:15.675395] I [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host found Hostname is atlas-node2 [2015-06-08 17:57:15.720692] I [glusterd-ganesha.c:386:check_host_list] 0-management: ganesha host found Hostname is atlas-node2 [2015-06-08 17:57:15.721161] I [glusterd-ganesha.c:335:is_ganesha_host] 0-management: ganesha host found Hostname is atlas-node2 [2015-06-08 17:57:16.633048] E [glusterd-ganesha.c:254:glusterd_op_set_ganesha] 0-management: Initial NFS-Ganesha set up failed [2015-06-08 17:57:16.641563] E [glusterd-syncop.c:1396:gd_commit_op_phase] 0-management: Commit of operation 'Volume (null)' failed on localhost : Failed to set up HA config for NFS-Ganesha. Please check the log file for details == /var/log/glusterfs/cmd_history.log == [2015-06-08 17:57:16.643615] : nfs-ganesha enable : FAILED : Failed to set up HA config for NFS-Ganesha. Please check the log file for details == /var/log/glusterfs/cli.log == [2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting with: -1 Also, pcs seems to be fine for the auth part, although it obviously tells me the cluster is not running. I, [2015-06-08T19:57:16.305323 #7223] INFO -- : Running: /usr/sbin/corosync-cmapctl totem.cluster_name I, [2015-06-08T19:57:16.345457 #7223] INFO -- : Running: /usr/sbin/pcs cluster token-nodes :::141.108.38.46 - - [08/Jun/2015 19:57:16] GET /remote/check_auth HTTP/1.1 200 68 0.1919 :::141.108.38.46 - - [08/Jun/2015 19:57:16] GET /remote/check_auth HTTP/1.1 200 68 0.1920 atlas-node1.mydomain - - [08/Jun/2015:19:57:16 CEST] GET /remote/check_auth HTTP/1.1 200 68 - - /remote/check_auth What am I doing wrong? Thanks, Alessandro Il giorno 08/giu/2015, alle ore 19:30, Soumya Koduri skod...@redhat.com ha scritto: On 06/08/2015 08:20 PM, Alessandro De Salvo wrote: Sorry, just another question: - in my installation of gluster 3.7.1 the command gluster features.ganesha enable does not work: # gluster features.ganesha enable unrecognized word: features.ganesha (position 0) Which version has full support for it? Sorry. This option has recently been changed. It is now $ gluster nfs-ganesha enable - in the documentation the ccs and cman packages are required, but they seems not to be available anymore on CentOS 7 and similar, I guess they are not really required anymore, as pcs should do the full job Thanks, Alessandro Looks like so from
[Gluster-users] Questions on ganesha HA and shared storage size
Hi, I have seen the demo video on ganesha HA, https://www.youtube.com/watch?v=Z4mvTQC-efM However there is no advice on the appropriate size of the shared volume. How is it really used, and what should be a reasonable size for it? Also, are the slides from the video available somewhere, as well as a documentation on all this? I did not manage to find them. Thanks, Alessandro smime.p7s Description: S/MIME cryptographic signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Great, many thanks Soumya! Cheers, Alessandro Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri skod...@redhat.com ha scritto: Hi, Please find the slides of the demo video at [1] We recommend to have a distributed replica volume as a shared volume for better data-availability. Size of the volume depends on the workload you may have. Since it is used to maintain states of NLM/NFSv4 clients, you may calculate the size of the volume to be minimum of aggregate of (typical_size_of'/var/lib/nfs'_directory + ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point) We shall document about this feature sooner in the gluster docs as well. Thanks, Soumya [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846 On 06/08/2015 04:34 PM, Alessandro De Salvo wrote: Hi, I have seen the demo video on ganesha HA, https://www.youtube.com/watch?v=Z4mvTQC-efM However there is no advice on the appropriate size of the shared volume. How is it really used, and what should be a reasonable size for it? Also, are the slides from the video available somewhere, as well as a documentation on all this? I did not manage to find them. Thanks, Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users smime.p7s Description: S/MIME cryptographic signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Hi, Please find the slides of the demo video at [1] We recommend to have a distributed replica volume as a shared volume for better data-availability. Size of the volume depends on the workload you may have. Since it is used to maintain states of NLM/NFSv4 clients, you may calculate the size of the volume to be minimum of aggregate of (typical_size_of'/var/lib/nfs'_directory + ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point) We shall document about this feature sooner in the gluster docs as well. Thanks, Soumya [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846 On 06/08/2015 04:34 PM, Alessandro De Salvo wrote: Hi, I have seen the demo video on ganesha HA, https://www.youtube.com/watch?v=Z4mvTQC-efM However there is no advice on the appropriate size of the shared volume. How is it really used, and what should be a reasonable size for it? Also, are the slides from the video available somewhere, as well as a documentation on all this? I did not manage to find them. Thanks, Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Questions on ganesha HA and shared storage size
Sorry, just another question: - in my installation of gluster 3.7.1 the command gluster features.ganesha enable does not work: # gluster features.ganesha enable unrecognized word: features.ganesha (position 0) Which version has full support for it? - in the documentation the ccs and cman packages are required, but they seems not to be available anymore on CentOS 7 and similar, I guess they are not really required anymore, as pcs should do the full job Thanks, Alessandro Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo alessandro.desa...@roma1.infn.it ha scritto: Great, many thanks Soumya! Cheers, Alessandro Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri skod...@redhat.com ha scritto: Hi, Please find the slides of the demo video at [1] We recommend to have a distributed replica volume as a shared volume for better data-availability. Size of the volume depends on the workload you may have. Since it is used to maintain states of NLM/NFSv4 clients, you may calculate the size of the volume to be minimum of aggregate of (typical_size_of'/var/lib/nfs'_directory + ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point) We shall document about this feature sooner in the gluster docs as well. Thanks, Soumya [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846 On 06/08/2015 04:34 PM, Alessandro De Salvo wrote: Hi, I have seen the demo video on ganesha HA, https://www.youtube.com/watch?v=Z4mvTQC-efM However there is no advice on the appropriate size of the shared volume. How is it really used, and what should be a reasonable size for it? Also, are the slides from the video available somewhere, as well as a documentation on all this? I did not manage to find them. Thanks, Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users smime.p7s Description: S/MIME cryptographic signature ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users