Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-11 Thread Alessandro De Salvo
Soumya, do you have any other idea of what to check on my side?
Many thanks,

Alessandro

 Il giorno 10/giu/2015, alle ore 21:07, Alessandro De Salvo 
 alessandro.desa...@roma1.infn.it ha scritto:
 
 Hi,
 by looking at the connections I also see a strange problem:
 
 # netstat -ltaupn | grep 2049
 tcp6   4  0 :::2049 :::*
 LISTEN  32080/ganesha.nfsd  
 tcp6   1  0 x.x.x.2:2049  x.x.x.2:33285 CLOSE_WAIT
 -   
 tcp6   1  0 127.0.0.1:2049  127.0.0.1:39555
 CLOSE_WAIT  -   
 udp6   0  0 :::2049 :::*
 32080/ganesha.nfsd  
 
 
 Why tcp6 is used with an IPv4 address?
 In another machine where ganesha 2.1.0 is running I see tcp is used, not
 tcp6.
 Could it be that the RPC are always trying to use IPv6? That would be
 wrong.
 Thanks,
 
   Alessandro
 
 On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote:
 
 On 06/10/2015 05:49 AM, Alessandro De Salvo wrote:
 Hi,
 I have enabled the full debug already, but I see nothing special. Before 
 exporting any volume the log shows no error, even when I do a showmount 
 (the log is attached, ganesha.log.gz). If I do the same after exporting a 
 volume nfs-ganesha does not even start, complaining for not being able to 
 bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, 
 so it should not happen:
 
 tcp6   0  0 :::111  :::*LISTEN  
 7433/rpcbind
 tcp6   0  0 :::2224 :::*LISTEN  
 9054/ruby
 tcp6   0  0 :::22   :::*LISTEN  
 1248/sshd
 udp6   0  0 :::111  :::*
 7433/rpcbind
 udp6   0  0 fe80::8c2:27ff:fef2:123 :::*
 31238/ntpd
 udp6   0  0 fe80::230:48ff:fed2:123 :::*
 31238/ntpd
 udp6   0  0 fe80::230:48ff:fed2:123 :::*
 31238/ntpd
 udp6   0  0 fe80::230:48ff:fed2:123 :::*
 31238/ntpd
 udp6   0  0 ::1:123 :::*
 31238/ntpd
 udp6   0  0 fe80::5484:7aff:fef:123 :::*
 31238/ntpd
 udp6   0  0 :::123  :::*
 31238/ntpd
 udp6   0  0 :::824  :::*
 7433/rpcbind
 
 The error, as shown in the attached ganesha-after-export.log.gz logfile, is 
 the following:
 
 
 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
 Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 
 (Address already in use)
 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
 Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue.
 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
 glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded
 
 
 We have seen such issues with RPCBIND few times. NFS-Ganesha setup first 
 disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes, 
 there could be delay or issue with Gluster-NFS un-registering those 
 services and when NFS-Ganesha tries to register to the same port, it 
 throws this error. Please try registering Rquota to any random port 
 using below config option in /etc/ganesha/ganesha.conf
 
 NFS_Core_Param {
 #Use a non-privileged port for RQuota
 Rquota_Port = 4501;
 }
 
 and cleanup '/var/cache/rpcbind/' directory before the setup.
 
 Thanks,
 Soumya
 
 
 Thanks,
 
 Alessandro
 
 
 
 
 Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri skod...@redhat.com 
 ha scritto:
 
 
 
 On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
 Another update: the fact that I was unable to use vol set ganesha.enable
 was due to another bug in the ganesha scripts. In short, they are all
 using the following line to get the location of the conf file:
 
 CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =)
 
 First of all by default in /etc/sysconfig/ganesha there is no line
 CONFFILE, second there is a bug in that directive, as it works if I add
 in /etc/sysconfig/ganesha
 
 CONFFILE=/etc/ganesha/ganesha.conf
 
 but it fails if the same is quoted
 
 CONFFILE=/etc/ganesha/ganesha.conf
 
 It would be much better to use the following, which has a default as
 well:
 
 eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
 CONF=${CONFFILE:/etc/ganesha/ganesha.conf}
 
 I'll update the bug report.
 Having said this... the last issue to tackle is the real problem with
 the ganesha.nfsd :-(
 
 Thanks. Could you try changing log level to NIV_FULL_DEBUG in 
 '/etc/sysconfig/ganesha' and check if anything gets logged in 
 '/var/log/ganesha.log' or '/ganesha.log'.
 
 Thanks,
 Soumya
 
 Cheers,
 
   Alessandro
 
 
 On Tue, 2015-06-09 at 14:25 +0200, 

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-11 Thread Soumya Koduri

CCin ganesha-devel to get more inputs.

In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha.

commit - git show 'd7e8f255' , which got added in v2.2 has more details.

 # netstat -ltaupn | grep 2049
 tcp6   4  0 :::2049 :::*
 LISTEN  32080/ganesha.nfsd
 tcp6   1  0 x.x.x.2:2049  x.x.x.2:33285 CLOSE_WAIT
 -
 tcp6   1  0 127.0.0.1:2049  127.0.0.1:39555
 CLOSE_WAIT  -
 udp6   0  0 :::2049 :::*
 32080/ganesha.nfsd


Looks like (even from the logs and the netstat output), there was a 
shutdown request even before the server has come out of grace period.


10/06/2015 01:58:53 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[work-6] nfs_rpc_dequeue_req :DISP :F_DBG :dequeue_req 
try qpair REQ_Q_LOW_LATENCY 0x7fdf8dc67b00:0x7fdf8dc67b68
10/06/2015 01:58:53 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN 
GRACE

..
10/06/2015 01:58:55 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of 
poll loop
10/06/2015 01:58:55 : epoch 55777da1 : node2 : ganesha.nfsd-20696[main] 
nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED
10/06/2015 01:58:55 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[work-12] nfs_rpc_consume_req :DISP :F_DBG :try 
splice, qpair REQ_Q_LOW_LATENCY consumer qsize=0 producer qsize=0

..
10/06/2015 01:59:52 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of 
poll loop
10/06/2015 01:59:52 : epoch 55777da1 : node2 : ganesha.nfsd-20696[Admin] 
do_shutdown :MAIN :EVENT :NFS EXIT: stopping NFS service

...
10/06/2015 02:00:00 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now 
NOT IN GRACE
10/06/2015 02:00:00 : epoch 55777da1 : node2 : 
ganesha.nfsd-20696[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of 
poll loop


When you observe the hang, please take 'gstack ganesha_pid' output and 
post it in the mail.


Thanks,
Soumya

On 06/11/2015 12:37 AM, Alessandro De Salvo wrote:

Hi,
by looking at the connections I also see a strange problem:

# netstat -ltaupn | grep 2049
tcp6   4  0 :::2049 :::*
LISTEN  32080/ganesha.nfsd
tcp6   1  0 x.x.x.2:2049  x.x.x.2:33285 CLOSE_WAIT
-
tcp6   1  0 127.0.0.1:2049  127.0.0.1:39555
CLOSE_WAIT  -
udp6   0  0 :::2049 :::*
32080/ganesha.nfsd


Why tcp6 is used with an IPv4 address?
In another machine where ganesha 2.1.0 is running I see tcp is used, not
tcp6.
Could it be that the RPC are always trying to use IPv6? That would be
wrong.
Thanks,

Alessandro

On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote:


On 06/10/2015 05:49 AM, Alessandro De Salvo wrote:

Hi,
I have enabled the full debug already, but I see nothing special. Before 
exporting any volume the log shows no error, even when I do a showmount (the 
log is attached, ganesha.log.gz). If I do the same after exporting a volume 
nfs-ganesha does not even start, complaining for not being able to bind the 
IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should 
not happen:

tcp6   0  0 :::111  :::*LISTEN  
7433/rpcbind
tcp6   0  0 :::2224 :::*LISTEN  
9054/ruby
tcp6   0  0 :::22   :::*LISTEN  
1248/sshd
udp6   0  0 :::111  :::*
7433/rpcbind
udp6   0  0 fe80::8c2:27ff:fef2:123 :::*
31238/ntpd
udp6   0  0 fe80::230:48ff:fed2:123 :::*
31238/ntpd
udp6   0  0 fe80::230:48ff:fed2:123 :::*
31238/ntpd
udp6   0  0 fe80::230:48ff:fed2:123 :::*
31238/ntpd
udp6   0  0 ::1:123 :::*
31238/ntpd
udp6   0  0 fe80::5484:7aff:fef:123 :::*
31238/ntpd
udp6   0  0 :::123  :::*
31238/ntpd
udp6   0  0 :::824  :::*
7433/rpcbind

The error, as shown in the attached ganesha-after-export.log.gz logfile, is the 
following:


10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address 
already in use)
10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue.
10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded



We have seen such issues with RPCBIND few times. NFS-Ganesha setup first
disables 

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-10 Thread Soumya Koduri



On 06/10/2015 05:49 AM, Alessandro De Salvo wrote:

Hi,
I have enabled the full debug already, but I see nothing special. Before 
exporting any volume the log shows no error, even when I do a showmount (the 
log is attached, ganesha.log.gz). If I do the same after exporting a volume 
nfs-ganesha does not even start, complaining for not being able to bind the 
IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should 
not happen:

tcp6   0  0 :::111  :::*LISTEN  
7433/rpcbind
tcp6   0  0 :::2224 :::*LISTEN  
9054/ruby
tcp6   0  0 :::22   :::*LISTEN  
1248/sshd
udp6   0  0 :::111  :::*
7433/rpcbind
udp6   0  0 fe80::8c2:27ff:fef2:123 :::*
31238/ntpd
udp6   0  0 fe80::230:48ff:fed2:123 :::*
31238/ntpd
udp6   0  0 fe80::230:48ff:fed2:123 :::*
31238/ntpd
udp6   0  0 fe80::230:48ff:fed2:123 :::*
31238/ntpd
udp6   0  0 ::1:123 :::*
31238/ntpd
udp6   0  0 fe80::5484:7aff:fef:123 :::*
31238/ntpd
udp6   0  0 :::123  :::*
31238/ntpd
udp6   0  0 :::824  :::*
7433/rpcbind

The error, as shown in the attached ganesha-after-export.log.gz logfile, is the 
following:


10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address 
already in use)
10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue.
10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded



We have seen such issues with RPCBIND few times. NFS-Ganesha setup first 
disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes, 
there could be delay or issue with Gluster-NFS un-registering those 
services and when NFS-Ganesha tries to register to the same port, it 
throws this error. Please try registering Rquota to any random port 
using below config option in /etc/ganesha/ganesha.conf


NFS_Core_Param {
#Use a non-privileged port for RQuota
Rquota_Port = 4501;
}

and cleanup '/var/cache/rpcbind/' directory before the setup.

Thanks,
Soumya



Thanks,

Alessandro





Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri skod...@redhat.com ha 
scritto:



On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:

Another update: the fact that I was unable to use vol set ganesha.enable
was due to another bug in the ganesha scripts. In short, they are all
using the following line to get the location of the conf file:

CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =)

First of all by default in /etc/sysconfig/ganesha there is no line
CONFFILE, second there is a bug in that directive, as it works if I add
in /etc/sysconfig/ganesha

CONFFILE=/etc/ganesha/ganesha.conf

but it fails if the same is quoted

CONFFILE=/etc/ganesha/ganesha.conf

It would be much better to use the following, which has a default as
well:

eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
CONF=${CONFFILE:/etc/ganesha/ganesha.conf}

I'll update the bug report.
Having said this... the last issue to tackle is the real problem with
the ganesha.nfsd :-(


Thanks. Could you try changing log level to NIV_FULL_DEBUG in 
'/etc/sysconfig/ganesha' and check if anything gets logged in 
'/var/log/ganesha.log' or '/ganesha.log'.

Thanks,
Soumya


Cheers,

Alessandro


On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:

OK, I can confirm that the ganesha.nsfd process is actually not
answering to the calls. Here it is what I see:

# rpcinfo -p
program vers proto   port  service
 104   tcp111  portmapper
 103   tcp111  portmapper
 102   tcp111  portmapper
 104   udp111  portmapper
 103   udp111  portmapper
 102   udp111  portmapper
 1000241   udp  41594  status
 1000241   tcp  53631  status
 133   udp   2049  nfs
 133   tcp   2049  nfs
 134   udp   2049  nfs
 134   tcp   2049  nfs
 151   udp  58127  mountd
 151   tcp  56301  mountd
 153   udp  58127  mountd
 153   tcp  56301  mountd
 1000214   udp  46203  nlockmgr
 1000214   tcp  41798  nlockmgr
 1000111   udp875  rquotad
 1000111   tcp875  rquotad
 1000112   udp875  rquotad
 1000112   tcp875  

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-10 Thread Alessandro De Salvo
Hi Soumya,
OK, that trick worked, but now I'm back to the same situation of the
hanging showmount -e. Did you check the logs I sent yesterday? Now I'm
essentially back to the situation of the fir log (ganesha.log.gz) in all
cases.
Thanks,

Alessandro

On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote:
 
 On 06/10/2015 05:49 AM, Alessandro De Salvo wrote:
  Hi,
  I have enabled the full debug already, but I see nothing special. Before 
  exporting any volume the log shows no error, even when I do a showmount 
  (the log is attached, ganesha.log.gz). If I do the same after exporting a 
  volume nfs-ganesha does not even start, complaining for not being able to 
  bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, 
  so it should not happen:
 
  tcp6   0  0 :::111  :::*LISTEN  
  7433/rpcbind
  tcp6   0  0 :::2224 :::*LISTEN  
  9054/ruby
  tcp6   0  0 :::22   :::*LISTEN  
  1248/sshd
  udp6   0  0 :::111  :::*
  7433/rpcbind
  udp6   0  0 fe80::8c2:27ff:fef2:123 :::*
  31238/ntpd
  udp6   0  0 fe80::230:48ff:fed2:123 :::*
  31238/ntpd
  udp6   0  0 fe80::230:48ff:fed2:123 :::*
  31238/ntpd
  udp6   0  0 fe80::230:48ff:fed2:123 :::*
  31238/ntpd
  udp6   0  0 ::1:123 :::*
  31238/ntpd
  udp6   0  0 fe80::5484:7aff:fef:123 :::*
  31238/ntpd
  udp6   0  0 :::123  :::*
  31238/ntpd
  udp6   0  0 :::824  :::*
  7433/rpcbind
 
  The error, as shown in the attached ganesha-after-export.log.gz logfile, is 
  the following:
 
 
  10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
  Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 
  (Address already in use)
  10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
  Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue.
  10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
  glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded
 
 
 We have seen such issues with RPCBIND few times. NFS-Ganesha setup first 
 disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes, 
 there could be delay or issue with Gluster-NFS un-registering those 
 services and when NFS-Ganesha tries to register to the same port, it 
 throws this error. Please try registering Rquota to any random port 
 using below config option in /etc/ganesha/ganesha.conf
 
 NFS_Core_Param {
  #Use a non-privileged port for RQuota
  Rquota_Port = 4501;
 }
 
 and cleanup '/var/cache/rpcbind/' directory before the setup.
 
 Thanks,
 Soumya
 
 
  Thanks,
 
  Alessandro
 
 
 
 
  Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri skod...@redhat.com 
  ha scritto:
 
 
 
  On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
  Another update: the fact that I was unable to use vol set ganesha.enable
  was due to another bug in the ganesha scripts. In short, they are all
  using the following line to get the location of the conf file:
 
  CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =)
 
  First of all by default in /etc/sysconfig/ganesha there is no line
  CONFFILE, second there is a bug in that directive, as it works if I add
  in /etc/sysconfig/ganesha
 
  CONFFILE=/etc/ganesha/ganesha.conf
 
  but it fails if the same is quoted
 
  CONFFILE=/etc/ganesha/ganesha.conf
 
  It would be much better to use the following, which has a default as
  well:
 
  eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
  CONF=${CONFFILE:/etc/ganesha/ganesha.conf}
 
  I'll update the bug report.
  Having said this... the last issue to tackle is the real problem with
  the ganesha.nfsd :-(
 
  Thanks. Could you try changing log level to NIV_FULL_DEBUG in 
  '/etc/sysconfig/ganesha' and check if anything gets logged in 
  '/var/log/ganesha.log' or '/ganesha.log'.
 
  Thanks,
  Soumya
 
  Cheers,
 
Alessandro
 
 
  On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:
  OK, I can confirm that the ganesha.nsfd process is actually not
  answering to the calls. Here it is what I see:
 
  # rpcinfo -p
  program vers proto   port  service
   104   tcp111  portmapper
   103   tcp111  portmapper
   102   tcp111  portmapper
   104   udp111  portmapper
   103   udp111  portmapper
   102   udp111  portmapper
   1000241   udp  41594  status
   1000241   tcp  53631  status
   

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-10 Thread Alessandro De Salvo
Hi,
by looking at the connections I also see a strange problem:

# netstat -ltaupn | grep 2049
tcp6   4  0 :::2049 :::*
LISTEN  32080/ganesha.nfsd  
tcp6   1  0 x.x.x.2:2049  x.x.x.2:33285 CLOSE_WAIT
-   
tcp6   1  0 127.0.0.1:2049  127.0.0.1:39555
CLOSE_WAIT  -   
udp6   0  0 :::2049 :::*
32080/ganesha.nfsd  


Why tcp6 is used with an IPv4 address?
In another machine where ganesha 2.1.0 is running I see tcp is used, not
tcp6.
Could it be that the RPC are always trying to use IPv6? That would be
wrong.
Thanks,

Alessandro

On Wed, 2015-06-10 at 15:28 +0530, Soumya Koduri wrote:
 
 On 06/10/2015 05:49 AM, Alessandro De Salvo wrote:
  Hi,
  I have enabled the full debug already, but I see nothing special. Before 
  exporting any volume the log shows no error, even when I do a showmount 
  (the log is attached, ganesha.log.gz). If I do the same after exporting a 
  volume nfs-ganesha does not even start, complaining for not being able to 
  bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, 
  so it should not happen:
 
  tcp6   0  0 :::111  :::*LISTEN  
  7433/rpcbind
  tcp6   0  0 :::2224 :::*LISTEN  
  9054/ruby
  tcp6   0  0 :::22   :::*LISTEN  
  1248/sshd
  udp6   0  0 :::111  :::*
  7433/rpcbind
  udp6   0  0 fe80::8c2:27ff:fef2:123 :::*
  31238/ntpd
  udp6   0  0 fe80::230:48ff:fed2:123 :::*
  31238/ntpd
  udp6   0  0 fe80::230:48ff:fed2:123 :::*
  31238/ntpd
  udp6   0  0 fe80::230:48ff:fed2:123 :::*
  31238/ntpd
  udp6   0  0 ::1:123 :::*
  31238/ntpd
  udp6   0  0 fe80::5484:7aff:fef:123 :::*
  31238/ntpd
  udp6   0  0 :::123  :::*
  31238/ntpd
  udp6   0  0 :::824  :::*
  7433/rpcbind
 
  The error, as shown in the attached ganesha-after-export.log.gz logfile, is 
  the following:
 
 
  10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
  Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 
  (Address already in use)
  10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
  Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue.
  10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
  glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded
 
 
 We have seen such issues with RPCBIND few times. NFS-Ganesha setup first 
 disables Gluster-NFS and then brings up NFS-Ganesha service. Sometimes, 
 there could be delay or issue with Gluster-NFS un-registering those 
 services and when NFS-Ganesha tries to register to the same port, it 
 throws this error. Please try registering Rquota to any random port 
 using below config option in /etc/ganesha/ganesha.conf
 
 NFS_Core_Param {
  #Use a non-privileged port for RQuota
  Rquota_Port = 4501;
 }
 
 and cleanup '/var/cache/rpcbind/' directory before the setup.
 
 Thanks,
 Soumya
 
 
  Thanks,
 
  Alessandro
 
 
 
 
  Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri skod...@redhat.com 
  ha scritto:
 
 
 
  On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
  Another update: the fact that I was unable to use vol set ganesha.enable
  was due to another bug in the ganesha scripts. In short, they are all
  using the following line to get the location of the conf file:
 
  CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =)
 
  First of all by default in /etc/sysconfig/ganesha there is no line
  CONFFILE, second there is a bug in that directive, as it works if I add
  in /etc/sysconfig/ganesha
 
  CONFFILE=/etc/ganesha/ganesha.conf
 
  but it fails if the same is quoted
 
  CONFFILE=/etc/ganesha/ganesha.conf
 
  It would be much better to use the following, which has a default as
  well:
 
  eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
  CONF=${CONFFILE:/etc/ganesha/ganesha.conf}
 
  I'll update the bug report.
  Having said this... the last issue to tackle is the real problem with
  the ganesha.nfsd :-(
 
  Thanks. Could you try changing log level to NIV_FULL_DEBUG in 
  '/etc/sysconfig/ganesha' and check if anything gets logged in 
  '/var/log/ganesha.log' or '/ganesha.log'.
 
  Thanks,
  Soumya
 
  Cheers,
 
Alessandro
 
 
  On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:
  OK, I can confirm that the ganesha.nsfd process is actually not
  answering to the calls. Here it is what I see:
 
  # rpcinfo -p
  

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-09 Thread Alessandro De Salvo
A better solution to the pid file problem is to add -p 
/var/run/ganesha.nfsd.pid” to the OPTIONS in /etc/sysconfig/ganesha, so that it 
becomes:

# cat /etc/sysconfig/ganesha 
OPTIONS=-L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT -p 
/var/run/ganesha.nfsd.pid”

This is definitely solving the cluster troubles. However the main problem with 
the RPC timing out is still there.
Cheers,

Alessandro

 Il giorno 09/giu/2015, alle ore 11:18, Alessandro De Salvo 
 alessandro.desa...@roma1.infn.it ha scritto:
 
 Hi,
 OK, the problem with the VIPs not starting is due to the ganesha_mon 
 heartbeat script looking for a pid file called /var/run/ganesha.nfsd.pid, 
 while by default ganesha.nfsd v.2.2.0 is creating /var/run/ganesha.pid, this 
 needs to be corrected. The file is in glusterfs-ganesha-3.7.1-1.el7.x86_64, 
 in my case.
 For the moment I have created a symlink in this way and it works:
 
 ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid
 
 So far so good, the VIPs are up and pingable, but still there is the problem 
 of the hanging showmount (i.e. hanging RPC).
 Still, I see a lot of errors like this in /var/log/messages:
 
 Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice: operation_finished: 
 nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ]
 
 While ganesha.log shows the server is not in grace:
 
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29964[main] 
 main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 
 /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at May 18 2015 14:17:18 on 
 buildhw-09.phx2.fedoraproject.org http://buildhw-09.phx2.fedoraproject.org/
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully 
 parsed
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper.
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized.
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 main :NFS STARTUP :WARN :No export entries found in configuration file !!!
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 config_errs_to_log :CONFIG :WARN :Config File ((null):0): Empty configuration 
 file
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed 
 for proper quota management in FSAL
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = 
 cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs
 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized
 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 nfs4_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 60
 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory 
 (/var/run/ganesha) already exists
 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 nfs_rpc_cb_init_ccache :NFS STARTUP :WARN 
 :gssd_refresh_krb5_machine_credential failed (2:2)
 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 nfs_Start_threads :THREAD :EVENT :Starting delayed executor.
 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 nfs_Start_threads :THREAD :EVENT :9P/TCP dispatcher thread was started 
 successfully
 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : 
 ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P 
 dispatcher started
 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully
 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 nfs_Start_threads :THREAD :EVENT :admin thread was started successfully
 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
 nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully
 09/06/2015 11:16:22 : epoch 

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-09 Thread Soumya Koduri



On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:

Hi,
OK, the problem with the VIPs not starting is due to the ganesha_mon
heartbeat script looking for a pid file called
/var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is
creating /var/run/ganesha.pid, this needs to be corrected. The file is
in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
For the moment I have created a symlink in this way and it works:

ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid


Thanks. Please update this as well in the bug.


So far so good, the VIPs are up and pingable, but still there is the
problem of the hanging showmount (i.e. hanging RPC).
Still, I see a lot of errors like this in /var/log/messages:

Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice: operation_finished:
nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ]

While ganesha.log shows the server is not in grace:

09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting:
Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org
http://buildhw-09.phx2.fedoraproject.org
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT
:Configuration file successfully parsed
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT
:Initializing ID Mapper.
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper
successfully initialized.
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries
found in configuration file !!!
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File
((null):0): Empty configuration file
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT
:CAP_SYS_RESOURCE was successfully removed for proper quota management
in FSAL
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set
capabilities are: =
cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire
credentials for principal nfs
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin
thread initialized
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT :NFS Server Now
IN GRACE, duration 60
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT
:Callback creds directory (/var/run/ganesha) already exists
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN
:gssd_refresh_krb5_machine_credential failed (2:2)
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :Starting
delayed executor.
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :9P/TCP
dispatcher thread was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P
dispatcher started
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
:gsh_dbusthread was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :admin thread
was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :reaper thread
was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN
GRACE
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :General
fridge was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT
:-
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_start :NFS STARTUP :EVENT :   

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-09 Thread Soumya Koduri



On 06/09/2015 02:06 PM, Alessandro De Salvo wrote:

Hi Soumya,


Il giorno 09/giu/2015, alle ore 08:06, Soumya Koduri skod...@redhat.com ha 
scritto:



On 06/09/2015 01:31 AM, Alessandro De Salvo wrote:

OK, I found at least one of the bugs.
The /usr/libexec/ganesha/ganesha.sh has the following lines:

 if [ -e /etc/os-release ]; then
 RHEL6_PCS_CNAME_OPTION=
 fi

This is OK for RHEL  7, but does not work for = 7. I have changed it to the 
following, to make it working:

 if [ -e /etc/os-release ]; then
 eval $(grep -F REDHAT_SUPPORT_PRODUCT= /etc/os-release)
 [ $REDHAT_SUPPORT_PRODUCT == Fedora ]  RHEL6_PCS_CNAME_OPTION=
 fi


Oh..Thanks for the fix. Could you please file a bug for the same (and probably 
submit your fix as well). We shall have it corrected.


Just did it, https://bugzilla.redhat.com/show_bug.cgi?id=1229601


Thanks!






Apart from that, the VIP_node I was using were wrong, and I should have 
converted all the “-“ to underscores, maybe this could be mentioned in the 
documentation when you will have it ready.
Now, the cluster starts, but the VIPs apparently not:


Sure. Thanks again for pointing it out. We shall make a note of it.


Online: [ atlas-node1 atlas-node2 ]

Full list of resources:

  Clone Set: nfs-mon-clone [nfs-mon]
  Started: [ atlas-node1 atlas-node2 ]
  Clone Set: nfs-grace-clone [nfs-grace]
  Started: [ atlas-node1 atlas-node2 ]
  atlas-node1-cluster_ip-1  (ocf::heartbeat:IPaddr):Stopped
  atlas-node1-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
  atlas-node2-cluster_ip-1  (ocf::heartbeat:IPaddr):Stopped
  atlas-node2-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2
  atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1
  atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2

PCSD Status:
   atlas-node1: Online
   atlas-node2: Online

Daemon Status:
   corosync: active/disabled
   pacemaker: active/disabled
   pcsd: active/enabled



Here corosync and pacemaker shows 'disabled' state. Can you check the status of 
their services. They should be running prior to cluster creation. We need to 
include that step in document as well.


Ah, OK, you’re right, I have added it to my puppet modules (we install and 
configure ganesha via puppet, I’ll put the module on puppetforge soon, in case 
anyone is interested).


Sure. This sounds great. Please let us know once its added.





But the issue that is puzzling me more is the following:

# showmount -e localhost
rpc mount export: RPC: Timed out

And when I try to enable the ganesha exports on a volume I get this error:

# gluster volume set atlas-home-01 ganesha.enable on
volume set: failed: Failed to create NFS-Ganesha export config file.

But I see the file created in /etc/ganesha/exports/*.conf
Still, showmount hangs and times out.
Any help?
Thanks,


Hmm that's strange. Sometimes, in case if there was no proper cleanup done 
while trying to re-create the cluster, we have seen such issues.

https://bugzilla.redhat.com/show_bug.cgi?id=1227709

http://review.gluster.org/#/c/11093/

Can you please unexport all the volumes, teardown the cluster using
'gluster vol set volname ganesha.enable off’


OK:

# gluster vol set atlas-home-01 ganesha.enable off
volume set: failed: ganesha.enable is already 'off'.

# gluster vol set atlas-data-01 ganesha.enable off
volume set: failed: ganesha.enable is already 'off'.



'gluster ganesha disable' command.


I’m assuming you wanted to write nfs-ganesha instead?

Am sorry. you are right. I was referring to 'nfs-ganesha'


# gluster nfs-ganesha disable
ganesha enable : success


A side note (not really important): it’s strange that when I do a disable the 
message is “ganesha enable” :-)

yeah. This doesn't seem correct. Please update the bug(s) with all the 
discrepancies you have found.


Thanks,
Soumya


Verify if the following files have been deleted on all the nodes-
'/etc/cluster/cluster.conf’


this file is not present at all, I think it’s not needed in CentOS 7


'/etc/ganesha/ganesha.conf’,


it’s still there, but empty, and I guess it should be OK, right?


'/etc/ganesha/exports/*’


no more files there


'/var/lib/pacemaker/cib’


it’s empty



Verify if the ganesha service is stopped on all the nodes.


nope, it’s still running, I will stop it.



start/restart the services - corosync, pcs.


In the node where I issued the nfs-ganesha disable there is no more any 
/etc/corosync/corosync.conf so corosync won’t start. The other node instead 
still has the file, it’s strange.



And re-try the HA cluster creation
'gluster ganesha enable’


This time (repeated twice) it did not work at all:

# pcs status
Cluster name: ATLAS_GANESHA_01
Last updated: Tue Jun  9 10:13:43 2015
Last change: Tue Jun  9 10:13:22 2015
Stack: corosync
Current DC: atlas-node1 (1) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
6 Resources configured


Online: [ atlas-node1 

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-09 Thread Alessandro De Salvo
Hi,
OK, the problem with the VIPs not starting is due to the ganesha_mon heartbeat 
script looking for a pid file called /var/run/ganesha.nfsd.pid, while by 
default ganesha.nfsd v.2.2.0 is creating /var/run/ganesha.pid, this needs to be 
corrected. The file is in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
For the moment I have created a symlink in this way and it works:

ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid

So far so good, the VIPs are up and pingable, but still there is the problem of 
the hanging showmount (i.e. hanging RPC).
Still, I see a lot of errors like this in /var/log/messages:

Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice: operation_finished: 
nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ]

While ganesha.log shows the server is not in grace:

09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29964[main] 
main :MAIN :EVENT :ganesha.nfsd Starting: Ganesha Version 
/builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at May 18 2015 14:17:18 on 
buildhw-09.phx2.fedoraproject.org
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs_set_param_from_conf :NFS STARTUP :EVENT :Configuration file successfully 
parsed
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
init_server_pkgs :NFS STARTUP :EVENT :Initializing ID Mapper.
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
init_server_pkgs :NFS STARTUP :EVENT :ID Mapper successfully initialized.
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
main :NFS STARTUP :WARN :No export entries found in configuration file !!!
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
config_errs_to_log :CONFIG :WARN :Config File ((null):0): Empty configuration 
file
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
lower_my_caps :NFS STARTUP :EVENT :CAP_SYS_RESOURCE was successfully removed 
for proper quota management in FSAL
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
lower_my_caps :NFS STARTUP :EVENT :currenty set capabilities are: = 
cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs_Init_svc :DISP :CRIT :Cannot acquire credentials for principal nfs
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs_Init_admin_thread :NFS CB :EVENT :Admin thread initialized
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs4_start_grace :STATE :EVENT :NFS Server Now IN GRACE, duration 60
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT :Callback creds directory 
(/var/run/ganesha) already exists
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs_rpc_cb_init_ccache :NFS STARTUP :WARN :gssd_refresh_krb5_machine_credential 
failed (2:2)
09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs_Start_threads :THREAD :EVENT :Starting delayed executor.
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs_Start_threads :THREAD :EVENT :9P/TCP dispatcher thread was started 
successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : 
ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P 
dispatcher started
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs_Start_threads :THREAD :EVENT :gsh_dbusthread was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs_Start_threads :THREAD :EVENT :admin thread was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs_Start_threads :THREAD :EVENT :reaper thread was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[reaper] 
nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs_Start_threads :THREAD :EVENT :General fridge was started successfully
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs_start :NFS STARTUP :EVENT :-
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 
nfs_start :NFS STARTUP :EVENT : NFS SERVER INITIALIZED
09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 : ganesha.nfsd-29965[main] 

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-09 Thread Alessandro De Salvo
Hi,

 Il giorno 09/giu/2015, alle ore 11:46, Soumya Koduri skod...@redhat.com ha 
 scritto:
 
 
 
 On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
 Hi,
 OK, the problem with the VIPs not starting is due to the ganesha_mon
 heartbeat script looking for a pid file called
 /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is
 creating /var/run/ganesha.pid, this needs to be corrected. The file is
 in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
 For the moment I have created a symlink in this way and it works:
 
 ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid
 
 Thanks. Please update this as well in the bug.

Done :-)

 
 So far so good, the VIPs are up and pingable, but still there is the
 problem of the hanging showmount (i.e. hanging RPC).
 Still, I see a lot of errors like this in /var/log/messages:
 
 Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice: operation_finished:
 nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ]
 
 While ganesha.log shows the server is not in grace:
 
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting:
 Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
 May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org
 http://buildhw-09.phx2.fedoraproject.org
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT
 :Configuration file successfully parsed
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT
 :Initializing ID Mapper.
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper
 successfully initialized.
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries
 found in configuration file !!!
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File
 ((null):0): Empty configuration file
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT
 :CAP_SYS_RESOURCE was successfully removed for proper quota management
 in FSAL
 09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set
 capabilities are: =
 cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire
 credentials for principal nfs
 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] nfs_Init_admin_thread :NFS CB :EVENT :Admin
 thread initialized
 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] nfs4_start_grace :STATE :EVENT :NFS Server Now
 IN GRACE, duration 60
 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :EVENT
 :Callback creds directory (/var/run/ganesha) already exists
 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] nfs_rpc_cb_init_ccache :NFS STARTUP :WARN
 :gssd_refresh_krb5_machine_credential failed (2:2)
 09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :Starting
 delayed executor.
 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :9P/TCP
 dispatcher thread was started successfully
 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[_9p_disp] _9p_dispatcher_thread :9P DISP :EVENT :9P
 dispatcher started
 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT
 :gsh_dbusthread was started successfully
 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :admin thread
 was started successfully
 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :reaper thread
 was started successfully
 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[reaper] nfs_in_grace :STATE :EVENT :NFS Server Now IN
 GRACE
 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] nfs_Start_threads :THREAD :EVENT :General
 fridge was started successfully
 09/06/2015 11:16:22 : epoch 5576aee4 : atlas-node1 :
 ganesha.nfsd-29965[main] 

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-09 Thread Alessandro De Salvo
Hi Soumya,

 Il giorno 09/giu/2015, alle ore 08:06, Soumya Koduri skod...@redhat.com ha 
 scritto:
 
 
 
 On 06/09/2015 01:31 AM, Alessandro De Salvo wrote:
 OK, I found at least one of the bugs.
 The /usr/libexec/ganesha/ganesha.sh has the following lines:
 
 if [ -e /etc/os-release ]; then
 RHEL6_PCS_CNAME_OPTION=
 fi
 
 This is OK for RHEL  7, but does not work for = 7. I have changed it to 
 the following, to make it working:
 
 if [ -e /etc/os-release ]; then
 eval $(grep -F REDHAT_SUPPORT_PRODUCT= /etc/os-release)
 [ $REDHAT_SUPPORT_PRODUCT == Fedora ]  
 RHEL6_PCS_CNAME_OPTION=
 fi
 
 Oh..Thanks for the fix. Could you please file a bug for the same (and 
 probably submit your fix as well). We shall have it corrected.

Just did it, https://bugzilla.redhat.com/show_bug.cgi?id=1229601

 
 Apart from that, the VIP_node I was using were wrong, and I should have 
 converted all the “-“ to underscores, maybe this could be mentioned in the 
 documentation when you will have it ready.
 Now, the cluster starts, but the VIPs apparently not:
 
 Sure. Thanks again for pointing it out. We shall make a note of it.
 
 Online: [ atlas-node1 atlas-node2 ]
 
 Full list of resources:
 
  Clone Set: nfs-mon-clone [nfs-mon]
  Started: [ atlas-node1 atlas-node2 ]
  Clone Set: nfs-grace-clone [nfs-grace]
  Started: [ atlas-node1 atlas-node2 ]
  atlas-node1-cluster_ip-1  (ocf::heartbeat:IPaddr):Stopped
  atlas-node1-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
  atlas-node2-cluster_ip-1  (ocf::heartbeat:IPaddr):Stopped
  atlas-node2-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2
  atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1
  atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2
 
 PCSD Status:
   atlas-node1: Online
   atlas-node2: Online
 
 Daemon Status:
   corosync: active/disabled
   pacemaker: active/disabled
   pcsd: active/enabled
 
 
 Here corosync and pacemaker shows 'disabled' state. Can you check the status 
 of their services. They should be running prior to cluster creation. We need 
 to include that step in document as well.

Ah, OK, you’re right, I have added it to my puppet modules (we install and 
configure ganesha via puppet, I’ll put the module on puppetforge soon, in case 
anyone is interested).

 
 But the issue that is puzzling me more is the following:
 
 # showmount -e localhost
 rpc mount export: RPC: Timed out
 
 And when I try to enable the ganesha exports on a volume I get this error:
 
 # gluster volume set atlas-home-01 ganesha.enable on
 volume set: failed: Failed to create NFS-Ganesha export config file.
 
 But I see the file created in /etc/ganesha/exports/*.conf
 Still, showmount hangs and times out.
 Any help?
 Thanks,
 
 Hmm that's strange. Sometimes, in case if there was no proper cleanup done 
 while trying to re-create the cluster, we have seen such issues.
 
 https://bugzilla.redhat.com/show_bug.cgi?id=1227709
 
 http://review.gluster.org/#/c/11093/
 
 Can you please unexport all the volumes, teardown the cluster using
 'gluster vol set volname ganesha.enable off’

OK:

# gluster vol set atlas-home-01 ganesha.enable off
volume set: failed: ganesha.enable is already 'off'.

# gluster vol set atlas-data-01 ganesha.enable off
volume set: failed: ganesha.enable is already 'off'.


 'gluster ganesha disable' command.

I’m assuming you wanted to write nfs-ganesha instead?

# gluster nfs-ganesha disable
ganesha enable : success


A side note (not really important): it’s strange that when I do a disable the 
message is “ganesha enable” :-)

 
 Verify if the following files have been deleted on all the nodes-
 '/etc/cluster/cluster.conf’

this file is not present at all, I think it’s not needed in CentOS 7

 '/etc/ganesha/ganesha.conf’,

it’s still there, but empty, and I guess it should be OK, right?

 '/etc/ganesha/exports/*’

no more files there

 '/var/lib/pacemaker/cib’

it’s empty

 
 Verify if the ganesha service is stopped on all the nodes.

nope, it’s still running, I will stop it.

 
 start/restart the services - corosync, pcs.

In the node where I issued the nfs-ganesha disable there is no more any 
/etc/corosync/corosync.conf so corosync won’t start. The other node instead 
still has the file, it’s strange.

 
 And re-try the HA cluster creation
 'gluster ganesha enable’

This time (repeated twice) it did not work at all:

# pcs status
Cluster name: ATLAS_GANESHA_01
Last updated: Tue Jun  9 10:13:43 2015
Last change: Tue Jun  9 10:13:22 2015
Stack: corosync
Current DC: atlas-node1 (1) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
6 Resources configured


Online: [ atlas-node1 atlas-node2 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
 Started: [ atlas-node1 atlas-node2 ]
 Clone Set: nfs-grace-clone [nfs-grace]
 Started: [ atlas-node1 atlas-node2 ]
 atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started 

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-09 Thread Soumya Koduri



On 06/09/2015 01:31 AM, Alessandro De Salvo wrote:

OK, I found at least one of the bugs.
The /usr/libexec/ganesha/ganesha.sh has the following lines:

 if [ -e /etc/os-release ]; then
 RHEL6_PCS_CNAME_OPTION=
 fi

This is OK for RHEL  7, but does not work for = 7. I have changed it to the 
following, to make it working:

 if [ -e /etc/os-release ]; then
 eval $(grep -F REDHAT_SUPPORT_PRODUCT= /etc/os-release)
 [ $REDHAT_SUPPORT_PRODUCT == Fedora ]  RHEL6_PCS_CNAME_OPTION=
 fi

Oh..Thanks for the fix. Could you please file a bug for the same (and 
probably submit your fix as well). We shall have it corrected.



Apart from that, the VIP_node I was using were wrong, and I should have 
converted all the “-“ to underscores, maybe this could be mentioned in the 
documentation when you will have it ready.
Now, the cluster starts, but the VIPs apparently not:


Sure. Thanks again for pointing it out. We shall make a note of it.


Online: [ atlas-node1 atlas-node2 ]

Full list of resources:

  Clone Set: nfs-mon-clone [nfs-mon]
  Started: [ atlas-node1 atlas-node2 ]
  Clone Set: nfs-grace-clone [nfs-grace]
  Started: [ atlas-node1 atlas-node2 ]
  atlas-node1-cluster_ip-1  (ocf::heartbeat:IPaddr):Stopped
  atlas-node1-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1
  atlas-node2-cluster_ip-1  (ocf::heartbeat:IPaddr):Stopped
  atlas-node2-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2
  atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1
  atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2

PCSD Status:
   atlas-node1: Online
   atlas-node2: Online

Daemon Status:
   corosync: active/disabled
   pacemaker: active/disabled
   pcsd: active/enabled


Here corosync and pacemaker shows 'disabled' state. Can you check the 
status of their services. They should be running prior to cluster 
creation. We need to include that step in document as well.



But the issue that is puzzling me more is the following:

# showmount -e localhost
rpc mount export: RPC: Timed out

And when I try to enable the ganesha exports on a volume I get this error:

# gluster volume set atlas-home-01 ganesha.enable on
volume set: failed: Failed to create NFS-Ganesha export config file.

But I see the file created in /etc/ganesha/exports/*.conf
Still, showmount hangs and times out.
Any help?
Thanks,

Hmm that's strange. Sometimes, in case if there was no proper cleanup 
done while trying to re-create the cluster, we have seen such issues.


https://bugzilla.redhat.com/show_bug.cgi?id=1227709

http://review.gluster.org/#/c/11093/

Can you please unexport all the volumes, teardown the cluster using
'gluster vol set volname ganesha.enable off'
'gluster ganesha disable' command.

Verify if the following files have been deleted on all the nodes-
'/etc/cluster/cluster.conf'
'/etc/ganesha/ganesha.conf',
'/etc/ganesha/exports/*'
'/var/lib/pacemaker/cib'

Verify if the ganesha service is stopped on all the nodes.

start/restart the services - corosync, pcs.

And re-try the HA cluster creation
'gluster ganesha enable'


Thanks,
Soumya


Alessandro


Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo 
alessandro.desa...@roma1.infn.it ha scritto:

Hi,
indeed, it does not work :-)
OK, this is what I did, with 2 machines, running CentOS 7.1, Glusterfs 3.7.1 
and nfs-ganesha 2.2.0:

1) ensured that the machines are able to resolve their IPs (but this was 
already true since they were in the DNS);
2) disabled NetworkManager and enabled network on both machines;
3) created a gluster shared volume 'gluster_shared_storage' and mounted it on 
'/run/gluster/shared_storage' on all the cluster nodes using glusterfs native 
mount (on CentOS 7.1 there is a link by default /var/run - ../run)
4) created an empty /etc/ganesha/ganesha.conf;
5) installed pacemaker pcs resource-agents corosync on all cluster machines;
6) set the ‘hacluster’ user the same password on all machines;
7) pcs cluster auth hostname -u hacluster -p pass on all the nodes (on both 
nodes I issued the commands for both nodes)
8) IPv6 is configured by default on all nodes, although the infrastructure is 
not ready for IPv6
9) enabled pcsd and started it on all nodes
10) populated /etc/ganesha/ganesha-ha.conf with the following contents, one per 
machine:


=== atlas-node1
# Name of the HA cluster created.
HA_NAME=ATLAS_GANESHA_01
# The server from which you intend to mount
# the shared volume.
HA_VOL_SERVER=“atlas-node1
# The subset of nodes of the Gluster Trusted Pool
# that forms the ganesha HA cluster. IP/Hostname
# is specified.
HA_CLUSTER_NODES=“atlas-node1,atlas-node2
# Virtual IPs of each of the nodes specified above.
VIP_atlas-node1=“x.x.x.1
VIP_atlas-node2=“x.x.x.2

=== atlas-node2
# Name of the HA cluster created.
HA_NAME=ATLAS_GANESHA_01
# The server from which you intend to mount
# the shared volume.
HA_VOL_SERVER=“atlas-node2
# The subset of nodes of the 

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-09 Thread Alessandro De Salvo
Another update: the fact that I was unable to use vol set ganesha.enable
was due to another bug in the ganesha scripts. In short, they are all
using the following line to get the location of the conf file:

CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =)

First of all by default in /etc/sysconfig/ganesha there is no line
CONFFILE, second there is a bug in that directive, as it works if I add
in /etc/sysconfig/ganesha

CONFFILE=/etc/ganesha/ganesha.conf

but it fails if the same is quoted

CONFFILE=/etc/ganesha/ganesha.conf

It would be much better to use the following, which has a default as
well:

eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
CONF=${CONFFILE:/etc/ganesha/ganesha.conf}

I'll update the bug report.
Having said this... the last issue to tackle is the real problem with
the ganesha.nfsd :-(
Cheers,

Alessandro


On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:
 OK, I can confirm that the ganesha.nsfd process is actually not
 answering to the calls. Here it is what I see:
 
 # rpcinfo -p
program vers proto   port  service
 104   tcp111  portmapper
 103   tcp111  portmapper
 102   tcp111  portmapper
 104   udp111  portmapper
 103   udp111  portmapper
 102   udp111  portmapper
 1000241   udp  41594  status
 1000241   tcp  53631  status
 133   udp   2049  nfs
 133   tcp   2049  nfs
 134   udp   2049  nfs
 134   tcp   2049  nfs
 151   udp  58127  mountd
 151   tcp  56301  mountd
 153   udp  58127  mountd
 153   tcp  56301  mountd
 1000214   udp  46203  nlockmgr
 1000214   tcp  41798  nlockmgr
 1000111   udp875  rquotad
 1000111   tcp875  rquotad
 1000112   udp875  rquotad
 1000112   tcp875  rquotad
 
 # netstat -lpn | grep ganesha
 tcp6  14  0 :::2049 :::*
 LISTEN  11937/ganesha.nfsd  
 tcp6   0  0 :::41798:::*
 LISTEN  11937/ganesha.nfsd  
 tcp6   0  0 :::875  :::*
 LISTEN  11937/ganesha.nfsd  
 tcp6  10  0 :::56301:::*
 LISTEN  11937/ganesha.nfsd  
 tcp6   0  0 :::564  :::*
 LISTEN  11937/ganesha.nfsd  
 udp6   0  0 :::2049 :::*
 11937/ganesha.nfsd  
 udp6   0  0 :::46203:::*
 11937/ganesha.nfsd  
 udp6   0  0 :::58127:::*
 11937/ganesha.nfsd  
 udp6   0  0 :::875  :::*
 11937/ganesha.nfsd
 
 I'm attaching the strace of a showmount from a node to the other.
 This machinery was working with nfs-ganesha 2.1.0, so it must be
 something introduced with 2.2.0.
 Cheers,
 
   Alessandro
 
 
 
 On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:
  
  On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
   Hi,
   OK, the problem with the VIPs not starting is due to the ganesha_mon
   heartbeat script looking for a pid file called
   /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is
   creating /var/run/ganesha.pid, this needs to be corrected. The file is
   in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
   For the moment I have created a symlink in this way and it works:
  
   ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid
  
  Thanks. Please update this as well in the bug.
  
   So far so good, the VIPs are up and pingable, but still there is the
   problem of the hanging showmount (i.e. hanging RPC).
   Still, I see a lot of errors like this in /var/log/messages:
  
   Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice: operation_finished:
   nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ]
  
   While ganesha.log shows the server is not in grace:
  
   09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
   ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting:
   Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
   May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org
   http://buildhw-09.phx2.fedoraproject.org
   09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
   ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT
   :Configuration file successfully parsed
   09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
   ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT
   :Initializing ID Mapper.
   09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
   ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper
   successfully initialized.
   09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
   ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries
   found in configuration file !!!
   09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
   ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config 

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-09 Thread Soumya Koduri



On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:

Another update: the fact that I was unable to use vol set ganesha.enable
was due to another bug in the ganesha scripts. In short, they are all
using the following line to get the location of the conf file:

CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =)

First of all by default in /etc/sysconfig/ganesha there is no line
CONFFILE, second there is a bug in that directive, as it works if I add
in /etc/sysconfig/ganesha

CONFFILE=/etc/ganesha/ganesha.conf

but it fails if the same is quoted

CONFFILE=/etc/ganesha/ganesha.conf

It would be much better to use the following, which has a default as
well:

eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
CONF=${CONFFILE:/etc/ganesha/ganesha.conf}

I'll update the bug report.
Having said this... the last issue to tackle is the real problem with
the ganesha.nfsd :-(


Thanks. Could you try changing log level to NIV_FULL_DEBUG in 
'/etc/sysconfig/ganesha' and check if anything gets logged in 
'/var/log/ganesha.log' or '/ganesha.log'.


Thanks,
Soumya


Cheers,

Alessandro


On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:

OK, I can confirm that the ganesha.nsfd process is actually not
answering to the calls. Here it is what I see:

# rpcinfo -p
program vers proto   port  service
 104   tcp111  portmapper
 103   tcp111  portmapper
 102   tcp111  portmapper
 104   udp111  portmapper
 103   udp111  portmapper
 102   udp111  portmapper
 1000241   udp  41594  status
 1000241   tcp  53631  status
 133   udp   2049  nfs
 133   tcp   2049  nfs
 134   udp   2049  nfs
 134   tcp   2049  nfs
 151   udp  58127  mountd
 151   tcp  56301  mountd
 153   udp  58127  mountd
 153   tcp  56301  mountd
 1000214   udp  46203  nlockmgr
 1000214   tcp  41798  nlockmgr
 1000111   udp875  rquotad
 1000111   tcp875  rquotad
 1000112   udp875  rquotad
 1000112   tcp875  rquotad

# netstat -lpn | grep ganesha
tcp6  14  0 :::2049 :::*
LISTEN  11937/ganesha.nfsd
tcp6   0  0 :::41798:::*
LISTEN  11937/ganesha.nfsd
tcp6   0  0 :::875  :::*
LISTEN  11937/ganesha.nfsd
tcp6  10  0 :::56301:::*
LISTEN  11937/ganesha.nfsd
tcp6   0  0 :::564  :::*
LISTEN  11937/ganesha.nfsd
udp6   0  0 :::2049 :::*
11937/ganesha.nfsd
udp6   0  0 :::46203:::*
11937/ganesha.nfsd
udp6   0  0 :::58127:::*
11937/ganesha.nfsd
udp6   0  0 :::875  :::*
11937/ganesha.nfsd

I'm attaching the strace of a showmount from a node to the other.
This machinery was working with nfs-ganesha 2.1.0, so it must be
something introduced with 2.2.0.
Cheers,

Alessandro



On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:


On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:

Hi,
OK, the problem with the VIPs not starting is due to the ganesha_mon
heartbeat script looking for a pid file called
/var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is
creating /var/run/ganesha.pid, this needs to be corrected. The file is
in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
For the moment I have created a symlink in this way and it works:

ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid


Thanks. Please update this as well in the bug.


So far so good, the VIPs are up and pingable, but still there is the
problem of the hanging showmount (i.e. hanging RPC).
Still, I see a lot of errors like this in /var/log/messages:

Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice: operation_finished:
nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ]

While ganesha.log shows the server is not in grace:

09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting:
Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org
http://buildhw-09.phx2.fedoraproject.org
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT
:Configuration file successfully parsed
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT
:Initializing ID Mapper.
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper
successfully initialized.
09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries
found in configuration file !!!
09/06/2015 11:16:20 : epoch 5576aee4 

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-09 Thread Alessandro De Salvo
Hi,
I have enabled the full debug already, but I see nothing special. Before 
exporting any volume the log shows no error, even when I do a showmount (the 
log is attached, ganesha.log.gz). If I do the same after exporting a volume 
nfs-ganesha does not even start, complaining for not being able to bind the 
IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should 
not happen:

tcp6   0  0 :::111  :::*LISTEN  
7433/rpcbind
tcp6   0  0 :::2224 :::*LISTEN  
9054/ruby   
tcp6   0  0 :::22   :::*LISTEN  
1248/sshd   
udp6   0  0 :::111  :::*
7433/rpcbind
udp6   0  0 fe80::8c2:27ff:fef2:123 :::*
31238/ntpd  
udp6   0  0 fe80::230:48ff:fed2:123 :::*
31238/ntpd  
udp6   0  0 fe80::230:48ff:fed2:123 :::*
31238/ntpd  
udp6   0  0 fe80::230:48ff:fed2:123 :::*
31238/ntpd  
udp6   0  0 ::1:123 :::*
31238/ntpd  
udp6   0  0 fe80::5484:7aff:fef:123 :::*
31238/ntpd  
udp6   0  0 :::123  :::*
31238/ntpd  
udp6   0  0 :::824  :::*
7433/rpcbind

The error, as shown in the attached ganesha-after-export.log.gz logfile, is the 
following:


10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address 
already in use)
10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue.
10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] 
glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded 


Thanks,

Alessandro



ganesha.log.gz
Description: GNU Zip compressed data


ganesha-after-export.log.gz
Description: GNU Zip compressed data

 Il giorno 09/giu/2015, alle ore 18:37, Soumya Koduri skod...@redhat.com ha 
 scritto:
 
 
 
 On 06/09/2015 09:47 PM, Alessandro De Salvo wrote:
 Another update: the fact that I was unable to use vol set ganesha.enable
 was due to another bug in the ganesha scripts. In short, they are all
 using the following line to get the location of the conf file:
 
 CONF=$(cat /etc/sysconfig/ganesha | grep CONFFILE | cut -f 2 -d =)
 
 First of all by default in /etc/sysconfig/ganesha there is no line
 CONFFILE, second there is a bug in that directive, as it works if I add
 in /etc/sysconfig/ganesha
 
 CONFFILE=/etc/ganesha/ganesha.conf
 
 but it fails if the same is quoted
 
 CONFFILE=/etc/ganesha/ganesha.conf
 
 It would be much better to use the following, which has a default as
 well:
 
 eval $(grep -F CONFFILE= /etc/sysconfig/ganesha)
 CONF=${CONFFILE:/etc/ganesha/ganesha.conf}
 
 I'll update the bug report.
 Having said this... the last issue to tackle is the real problem with
 the ganesha.nfsd :-(
 
 Thanks. Could you try changing log level to NIV_FULL_DEBUG in 
 '/etc/sysconfig/ganesha' and check if anything gets logged in 
 '/var/log/ganesha.log' or '/ganesha.log'.
 
 Thanks,
 Soumya
 
 Cheers,
 
  Alessandro
 
 
 On Tue, 2015-06-09 at 14:25 +0200, Alessandro De Salvo wrote:
 OK, I can confirm that the ganesha.nsfd process is actually not
 answering to the calls. Here it is what I see:
 
 # rpcinfo -p
program vers proto   port  service
 104   tcp111  portmapper
 103   tcp111  portmapper
 102   tcp111  portmapper
 104   udp111  portmapper
 103   udp111  portmapper
 102   udp111  portmapper
 1000241   udp  41594  status
 1000241   tcp  53631  status
 133   udp   2049  nfs
 133   tcp   2049  nfs
 134   udp   2049  nfs
 134   tcp   2049  nfs
 151   udp  58127  mountd
 151   tcp  56301  mountd
 153   udp  58127  mountd
 153   tcp  56301  mountd
 1000214   udp  46203  nlockmgr
 1000214   tcp  41798  nlockmgr
 1000111   udp875  rquotad
 1000111   tcp875  rquotad
 1000112   udp875  rquotad
 1000112   tcp875  rquotad
 
 # netstat -lpn | grep ganesha
 tcp6  14  0 :::2049 :::*
 LISTEN  11937/ganesha.nfsd
 tcp6   0  0 :::41798:::*
 LISTEN  11937/ganesha.nfsd
 tcp6   0  0 :::875  :::*
 LISTEN  11937/ganesha.nfsd
 tcp6  10  0 :::56301:::*
 LISTEN  11937/ganesha.nfsd
 tcp6 

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-09 Thread Alessandro De Salvo
OK, I can confirm that the ganesha.nsfd process is actually not
answering to the calls. Here it is what I see:

# rpcinfo -p
   program vers proto   port  service
104   tcp111  portmapper
103   tcp111  portmapper
102   tcp111  portmapper
104   udp111  portmapper
103   udp111  portmapper
102   udp111  portmapper
1000241   udp  41594  status
1000241   tcp  53631  status
133   udp   2049  nfs
133   tcp   2049  nfs
134   udp   2049  nfs
134   tcp   2049  nfs
151   udp  58127  mountd
151   tcp  56301  mountd
153   udp  58127  mountd
153   tcp  56301  mountd
1000214   udp  46203  nlockmgr
1000214   tcp  41798  nlockmgr
1000111   udp875  rquotad
1000111   tcp875  rquotad
1000112   udp875  rquotad
1000112   tcp875  rquotad

# netstat -lpn | grep ganesha
tcp6  14  0 :::2049 :::*
LISTEN  11937/ganesha.nfsd  
tcp6   0  0 :::41798:::*
LISTEN  11937/ganesha.nfsd  
tcp6   0  0 :::875  :::*
LISTEN  11937/ganesha.nfsd  
tcp6  10  0 :::56301:::*
LISTEN  11937/ganesha.nfsd  
tcp6   0  0 :::564  :::*
LISTEN  11937/ganesha.nfsd  
udp6   0  0 :::2049 :::*
11937/ganesha.nfsd  
udp6   0  0 :::46203:::*
11937/ganesha.nfsd  
udp6   0  0 :::58127:::*
11937/ganesha.nfsd  
udp6   0  0 :::875  :::*
11937/ganesha.nfsd

I'm attaching the strace of a showmount from a node to the other.
This machinery was working with nfs-ganesha 2.1.0, so it must be
something introduced with 2.2.0.
Cheers,

Alessandro



On Tue, 2015-06-09 at 15:16 +0530, Soumya Koduri wrote:
 
 On 06/09/2015 02:48 PM, Alessandro De Salvo wrote:
  Hi,
  OK, the problem with the VIPs not starting is due to the ganesha_mon
  heartbeat script looking for a pid file called
  /var/run/ganesha.nfsd.pid, while by default ganesha.nfsd v.2.2.0 is
  creating /var/run/ganesha.pid, this needs to be corrected. The file is
  in glusterfs-ganesha-3.7.1-1.el7.x86_64, in my case.
  For the moment I have created a symlink in this way and it works:
 
  ln -s /var/run/ganesha.pid /var/run/ganesha.nfsd.pid
 
 Thanks. Please update this as well in the bug.
 
  So far so good, the VIPs are up and pingable, but still there is the
  problem of the hanging showmount (i.e. hanging RPC).
  Still, I see a lot of errors like this in /var/log/messages:
 
  Jun  9 11:15:20 atlas-node1 lrmd[31221]:   notice: operation_finished:
  nfs-mon_monitor_1:29292:stderr [ Error: Resource does not exist. ]
 
  While ganesha.log shows the server is not in grace:
 
  09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
  ganesha.nfsd-29964[main] main :MAIN :EVENT :ganesha.nfsd Starting:
  Ganesha Version /builddir/build/BUILD/nfs-ganesha-2.2.0/src, built at
  May 18 2015 14:17:18 on buildhw-09.phx2.fedoraproject.org
  http://buildhw-09.phx2.fedoraproject.org
  09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
  ganesha.nfsd-29965[main] nfs_set_param_from_conf :NFS STARTUP :EVENT
  :Configuration file successfully parsed
  09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
  ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT
  :Initializing ID Mapper.
  09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
  ganesha.nfsd-29965[main] init_server_pkgs :NFS STARTUP :EVENT :ID Mapper
  successfully initialized.
  09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
  ganesha.nfsd-29965[main] main :NFS STARTUP :WARN :No export entries
  found in configuration file !!!
  09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
  ganesha.nfsd-29965[main] config_errs_to_log :CONFIG :WARN :Config File
  ((null):0): Empty configuration file
  09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
  ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT
  :CAP_SYS_RESOURCE was successfully removed for proper quota management
  in FSAL
  09/06/2015 11:16:20 : epoch 5576aee4 : atlas-node1 :
  ganesha.nfsd-29965[main] lower_my_caps :NFS STARTUP :EVENT :currenty set
  capabilities are: =
  cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap+ep
  09/06/2015 11:16:21 : epoch 5576aee4 : atlas-node1 :
  ganesha.nfsd-29965[main] nfs_Init_svc :DISP :CRIT :Cannot acquire
  credentials for principal nfs
  09/06/2015 11:16:21 : epoch 

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-08 Thread Alessandro De Salvo
OK, I found at least one of the bugs.
The /usr/libexec/ganesha/ganesha.sh has the following lines:

if [ -e /etc/os-release ]; then
RHEL6_PCS_CNAME_OPTION=
fi

This is OK for RHEL  7, but does not work for = 7. I have changed it to the 
following, to make it working:

if [ -e /etc/os-release ]; then
eval $(grep -F REDHAT_SUPPORT_PRODUCT= /etc/os-release)
[ $REDHAT_SUPPORT_PRODUCT == Fedora ]  RHEL6_PCS_CNAME_OPTION=
fi

Apart from that, the VIP_node I was using were wrong, and I should have 
converted all the “-“ to underscores, maybe this could be mentioned in the 
documentation when you will have it ready.
Now, the cluster starts, but the VIPs apparently not:

Online: [ atlas-node1 atlas-node2 ]

Full list of resources:

 Clone Set: nfs-mon-clone [nfs-mon]
 Started: [ atlas-node1 atlas-node2 ]
 Clone Set: nfs-grace-clone [nfs-grace]
 Started: [ atlas-node1 atlas-node2 ]
 atlas-node1-cluster_ip-1  (ocf::heartbeat:IPaddr):Stopped 
 atlas-node1-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node1 
 atlas-node2-cluster_ip-1  (ocf::heartbeat:IPaddr):Stopped 
 atlas-node2-trigger_ip-1  (ocf::heartbeat:Dummy): Started atlas-node2 
 atlas-node1-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node1 
 atlas-node2-dead_ip-1 (ocf::heartbeat:Dummy): Started atlas-node2 

PCSD Status:
  atlas-node1: Online
  atlas-node2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


But the issue that is puzzling me more is the following:

# showmount -e localhost
rpc mount export: RPC: Timed out

And when I try to enable the ganesha exports on a volume I get this error:

# gluster volume set atlas-home-01 ganesha.enable on
volume set: failed: Failed to create NFS-Ganesha export config file.

But I see the file created in /etc/ganesha/exports/*.conf
Still, showmount hangs and times out.
Any help?
Thanks,

Alessandro

 Il giorno 08/giu/2015, alle ore 20:00, Alessandro De Salvo 
 alessandro.desa...@roma1.infn.it ha scritto:
 
 Hi,
 indeed, it does not work :-)
 OK, this is what I did, with 2 machines, running CentOS 7.1, Glusterfs 3.7.1 
 and nfs-ganesha 2.2.0:
 
 1) ensured that the machines are able to resolve their IPs (but this was 
 already true since they were in the DNS);
 2) disabled NetworkManager and enabled network on both machines;
 3) created a gluster shared volume 'gluster_shared_storage' and mounted it on 
 '/run/gluster/shared_storage' on all the cluster nodes using glusterfs native 
 mount (on CentOS 7.1 there is a link by default /var/run - ../run)
 4) created an empty /etc/ganesha/ganesha.conf;
 5) installed pacemaker pcs resource-agents corosync on all cluster machines;
 6) set the ‘hacluster’ user the same password on all machines;
 7) pcs cluster auth hostname -u hacluster -p pass on all the nodes (on 
 both nodes I issued the commands for both nodes)
 8) IPv6 is configured by default on all nodes, although the infrastructure is 
 not ready for IPv6
 9) enabled pcsd and started it on all nodes
 10) populated /etc/ganesha/ganesha-ha.conf with the following contents, one 
 per machine:
 
 
 === atlas-node1
 # Name of the HA cluster created.
 HA_NAME=ATLAS_GANESHA_01
 # The server from which you intend to mount
 # the shared volume.
 HA_VOL_SERVER=“atlas-node1
 # The subset of nodes of the Gluster Trusted Pool
 # that forms the ganesha HA cluster. IP/Hostname
 # is specified.
 HA_CLUSTER_NODES=“atlas-node1,atlas-node2
 # Virtual IPs of each of the nodes specified above.
 VIP_atlas-node1=“x.x.x.1
 VIP_atlas-node2=“x.x.x.2
 
 === atlas-node2
 # Name of the HA cluster created.
 HA_NAME=ATLAS_GANESHA_01
 # The server from which you intend to mount
 # the shared volume.
 HA_VOL_SERVER=“atlas-node2
 # The subset of nodes of the Gluster Trusted Pool
 # that forms the ganesha HA cluster. IP/Hostname
 # is specified.
 HA_CLUSTER_NODES=“atlas-node1,atlas-node2
 # Virtual IPs of each of the nodes specified above.
 VIP_atlas-node1=“x.x.x.1
 VIP_atlas-node2=“x.x.x.2”
 
 11) issued gluster nfs-ganesha enable, but it fails with a cryptic message:
 
 # gluster nfs-ganesha enable
 Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted 
 pool. Do you still want to continue? (y/n) y
 nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha. Please check 
 the log file for details
 
 Looking at the logs I found nothing really special but this:
 
 == /var/log/glusterfs/etc-glusterfs-glusterd.vol.log ==
 [2015-06-08 17:57:15.672844] I [MSGID: 106132] 
 [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
 [2015-06-08 17:57:15.675395] I [glusterd-ganesha.c:386:check_host_list] 
 0-management: ganesha host found Hostname is atlas-node2
 [2015-06-08 17:57:15.720692] I [glusterd-ganesha.c:386:check_host_list] 
 0-management: ganesha host found Hostname is atlas-node2
 [2015-06-08 17:57:15.721161] I [glusterd-ganesha.c:335:is_ganesha_host] 
 

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-08 Thread Soumya Koduri




On 06/08/2015 08:20 PM, Alessandro De Salvo wrote:

Sorry, just another question:

- in my installation of gluster 3.7.1 the command gluster features.ganesha 
enable does not work:

# gluster features.ganesha enable
unrecognized word: features.ganesha (position 0)

Which version has full support for it?


Sorry. This option has recently been changed. It is now

$ gluster nfs-ganesha enable




- in the documentation the ccs and cman packages are required, but they seems 
not to be available anymore on CentOS 7 and similar, I guess they are not 
really required anymore, as pcs should do the full job

Thanks,

Alessandro


Looks like so from http://clusterlabs.org/quickstart-redhat.html. Let us 
know if it doesn't work.


Thanks,
Soumya




Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo 
alessandro.desa...@roma1.infn.it ha scritto:

Great, many thanks Soumya!
Cheers,

Alessandro


Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri skod...@redhat.com ha 
scritto:

Hi,

Please find the slides of the demo video at [1]

We recommend to have a distributed replica volume as a shared volume for better 
data-availability.

Size of the volume depends on the workload you may have. Since it is used to 
maintain states of NLM/NFSv4 clients, you may calculate the size of the volume 
to be minimum of aggregate of
(typical_size_of'/var/lib/nfs'_directory + 
~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)

We shall document about this feature sooner in the gluster docs as well.

Thanks,
Soumya

[1] - http://www.slideshare.net/SoumyaKoduri/high-49117846

On 06/08/2015 04:34 PM, Alessandro De Salvo wrote:

Hi,
I have seen the demo video on ganesha HA, 
https://www.youtube.com/watch?v=Z4mvTQC-efM
However there is no advice on the appropriate size of the shared volume. How is 
it really used, and what should be a reasonable size for it?
Also, are the slides from the video available somewhere, as well as a 
documentation on all this? I did not manage to find them.
Thanks,

Alessandro



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users






___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-08 Thread Alessandro De Salvo
Hi,
indeed, it does not work :-)
OK, this is what I did, with 2 machines, running CentOS 7.1, Glusterfs 3.7.1 
and nfs-ganesha 2.2.0:

1) ensured that the machines are able to resolve their IPs (but this was 
already true since they were in the DNS);
2) disabled NetworkManager and enabled network on both machines;
3) created a gluster shared volume 'gluster_shared_storage' and mounted it on 
'/run/gluster/shared_storage' on all the cluster nodes using glusterfs native 
mount (on CentOS 7.1 there is a link by default /var/run - ../run)
4) created an empty /etc/ganesha/ganesha.conf;
5) installed pacemaker pcs resource-agents corosync on all cluster machines;
6) set the ‘hacluster’ user the same password on all machines;
7) pcs cluster auth hostname -u hacluster -p pass on all the nodes (on both 
nodes I issued the commands for both nodes)
8) IPv6 is configured by default on all nodes, although the infrastructure is 
not ready for IPv6
9) enabled pcsd and started it on all nodes
10) populated /etc/ganesha/ganesha-ha.conf with the following contents, one per 
machine:


=== atlas-node1
# Name of the HA cluster created.
HA_NAME=ATLAS_GANESHA_01
# The server from which you intend to mount
# the shared volume.
HA_VOL_SERVER=“atlas-node1
# The subset of nodes of the Gluster Trusted Pool
# that forms the ganesha HA cluster. IP/Hostname
# is specified.
HA_CLUSTER_NODES=“atlas-node1,atlas-node2
# Virtual IPs of each of the nodes specified above.
VIP_atlas-node1=“x.x.x.1
VIP_atlas-node2=“x.x.x.2

=== atlas-node2
# Name of the HA cluster created.
HA_NAME=ATLAS_GANESHA_01
# The server from which you intend to mount
# the shared volume.
HA_VOL_SERVER=“atlas-node2
# The subset of nodes of the Gluster Trusted Pool
# that forms the ganesha HA cluster. IP/Hostname
# is specified.
HA_CLUSTER_NODES=“atlas-node1,atlas-node2
# Virtual IPs of each of the nodes specified above.
VIP_atlas-node1=“x.x.x.1
VIP_atlas-node2=“x.x.x.2”

11) issued gluster nfs-ganesha enable, but it fails with a cryptic message:

# gluster nfs-ganesha enable
Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted 
pool. Do you still want to continue? (y/n) y
nfs-ganesha: failed: Failed to set up HA config for NFS-Ganesha. Please check 
the log file for details

Looking at the logs I found nothing really special but this:

== /var/log/glusterfs/etc-glusterfs-glusterd.vol.log ==
[2015-06-08 17:57:15.672844] I [MSGID: 106132] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2015-06-08 17:57:15.675395] I [glusterd-ganesha.c:386:check_host_list] 
0-management: ganesha host found Hostname is atlas-node2
[2015-06-08 17:57:15.720692] I [glusterd-ganesha.c:386:check_host_list] 
0-management: ganesha host found Hostname is atlas-node2
[2015-06-08 17:57:15.721161] I [glusterd-ganesha.c:335:is_ganesha_host] 
0-management: ganesha host found Hostname is atlas-node2
[2015-06-08 17:57:16.633048] E [glusterd-ganesha.c:254:glusterd_op_set_ganesha] 
0-management: Initial NFS-Ganesha set up failed
[2015-06-08 17:57:16.641563] E [glusterd-syncop.c:1396:gd_commit_op_phase] 
0-management: Commit of operation 'Volume (null)' failed on localhost : Failed 
to set up HA config for NFS-Ganesha. Please check the log file for details

== /var/log/glusterfs/cmd_history.log ==
[2015-06-08 17:57:16.643615]  : nfs-ganesha enable : FAILED : Failed to set up 
HA config for NFS-Ganesha. Please check the log file for details

== /var/log/glusterfs/cli.log ==
[2015-06-08 17:57:16.643839] I [input.c:36:cli_batch] 0-: Exiting with: -1


Also, pcs seems to be fine for the auth part, although it obviously tells me 
the cluster is not running.

I, [2015-06-08T19:57:16.305323 #7223]  INFO -- : Running: 
/usr/sbin/corosync-cmapctl totem.cluster_name
I, [2015-06-08T19:57:16.345457 #7223]  INFO -- : Running: /usr/sbin/pcs cluster 
token-nodes
:::141.108.38.46 - - [08/Jun/2015 19:57:16] GET /remote/check_auth 
HTTP/1.1 200 68 0.1919
:::141.108.38.46 - - [08/Jun/2015 19:57:16] GET /remote/check_auth 
HTTP/1.1 200 68 0.1920
atlas-node1.mydomain - - [08/Jun/2015:19:57:16 CEST] GET /remote/check_auth 
HTTP/1.1 200 68
- - /remote/check_auth


What am I doing wrong?
Thanks,

Alessandro

 Il giorno 08/giu/2015, alle ore 19:30, Soumya Koduri skod...@redhat.com ha 
 scritto:
 
 
 
 
 On 06/08/2015 08:20 PM, Alessandro De Salvo wrote:
 Sorry, just another question:
 
 - in my installation of gluster 3.7.1 the command gluster features.ganesha 
 enable does not work:
 
 # gluster features.ganesha enable
 unrecognized word: features.ganesha (position 0)
 
 Which version has full support for it?
 
 Sorry. This option has recently been changed. It is now
 
 $ gluster nfs-ganesha enable
 
 
 
 - in the documentation the ccs and cman packages are required, but they 
 seems not to be available anymore on CentOS 7 and similar, I guess they are 
 not really required anymore, as pcs should do the full job
 
 Thanks,
 
  Alessandro
 
 Looks like so from 

[Gluster-users] Questions on ganesha HA and shared storage size

2015-06-08 Thread Alessandro De Salvo
Hi,
I have seen the demo video on ganesha HA, 
https://www.youtube.com/watch?v=Z4mvTQC-efM
However there is no advice on the appropriate size of the shared volume. How is 
it really used, and what should be a reasonable size for it?
Also, are the slides from the video available somewhere, as well as a 
documentation on all this? I did not manage to find them.
Thanks,

Alessandro

smime.p7s
Description: S/MIME cryptographic signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-08 Thread Alessandro De Salvo
Great, many thanks Soumya!
Cheers,

Alessandro

 Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri skod...@redhat.com ha 
 scritto:
 
 Hi,
 
 Please find the slides of the demo video at [1]
 
 We recommend to have a distributed replica volume as a shared volume for 
 better data-availability.
 
 Size of the volume depends on the workload you may have. Since it is used to 
 maintain states of NLM/NFSv4 clients, you may calculate the size of the 
 volume to be minimum of aggregate of
 (typical_size_of'/var/lib/nfs'_directory + 
 ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
 
 We shall document about this feature sooner in the gluster docs as well.
 
 Thanks,
 Soumya
 
 [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846
 
 On 06/08/2015 04:34 PM, Alessandro De Salvo wrote:
 Hi,
 I have seen the demo video on ganesha HA, 
 https://www.youtube.com/watch?v=Z4mvTQC-efM
 However there is no advice on the appropriate size of the shared volume. How 
 is it really used, and what should be a reasonable size for it?
 Also, are the slides from the video available somewhere, as well as a 
 documentation on all this? I did not manage to find them.
 Thanks,
 
  Alessandro
 
 
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
 



smime.p7s
Description: S/MIME cryptographic signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-08 Thread Soumya Koduri

Hi,

Please find the slides of the demo video at [1]

We recommend to have a distributed replica volume as a shared volume for 
better data-availability.


Size of the volume depends on the workload you may have. Since it is 
used to maintain states of NLM/NFSv4 clients, you may calculate the size 
of the volume to be minimum of aggregate of
 (typical_size_of'/var/lib/nfs'_directory + 
~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)


We shall document about this feature sooner in the gluster docs as well.

Thanks,
Soumya

[1] - http://www.slideshare.net/SoumyaKoduri/high-49117846

On 06/08/2015 04:34 PM, Alessandro De Salvo wrote:

Hi,
I have seen the demo video on ganesha HA, 
https://www.youtube.com/watch?v=Z4mvTQC-efM
However there is no advice on the appropriate size of the shared volume. How is 
it really used, and what should be a reasonable size for it?
Also, are the slides from the video available somewhere, as well as a 
documentation on all this? I did not manage to find them.
Thanks,

Alessandro



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Questions on ganesha HA and shared storage size

2015-06-08 Thread Alessandro De Salvo
Sorry, just another question:

- in my installation of gluster 3.7.1 the command gluster features.ganesha 
enable does not work:

# gluster features.ganesha enable
unrecognized word: features.ganesha (position 0)

Which version has full support for it?

- in the documentation the ccs and cman packages are required, but they seems 
not to be available anymore on CentOS 7 and similar, I guess they are not 
really required anymore, as pcs should do the full job

Thanks,

Alessandro

 Il giorno 08/giu/2015, alle ore 15:09, Alessandro De Salvo 
 alessandro.desa...@roma1.infn.it ha scritto:
 
 Great, many thanks Soumya!
 Cheers,
 
   Alessandro
 
 Il giorno 08/giu/2015, alle ore 13:53, Soumya Koduri skod...@redhat.com ha 
 scritto:
 
 Hi,
 
 Please find the slides of the demo video at [1]
 
 We recommend to have a distributed replica volume as a shared volume for 
 better data-availability.
 
 Size of the volume depends on the workload you may have. Since it is used to 
 maintain states of NLM/NFSv4 clients, you may calculate the size of the 
 volume to be minimum of aggregate of
 (typical_size_of'/var/lib/nfs'_directory + 
 ~4k*no_of_clients_connected_to_each_of_the_nfs_servers_at_any_point)
 
 We shall document about this feature sooner in the gluster docs as well.
 
 Thanks,
 Soumya
 
 [1] - http://www.slideshare.net/SoumyaKoduri/high-49117846
 
 On 06/08/2015 04:34 PM, Alessandro De Salvo wrote:
 Hi,
 I have seen the demo video on ganesha HA, 
 https://www.youtube.com/watch?v=Z4mvTQC-efM
 However there is no advice on the appropriate size of the shared volume. 
 How is it really used, and what should be a reasonable size for it?
 Also, are the slides from the video available somewhere, as well as a 
 documentation on all this? I did not manage to find them.
 Thanks,
 
 Alessandro
 
 
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
 
 



smime.p7s
Description: S/MIME cryptographic signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users