[ClusterLabs] What is the current state of the art in setting up HA NFS?

2020-02-17 Thread Dennis Jacobfeuerborn
Hi,
what are the current best practices to set up a HA NFS Server? I see
that EPEL no longer contains the drbd packages for CentOS 8 for example.
Also a lot of documents on the internet still refer to either Pacemaker
1.x or meddle with the fsid which apparently is no longer recommended.
What is the most up-to-date and comprehensive guide for such a setup out
there at the moment?

Regards,
  Dennis
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] node avoidance still leads to "status=Not installed" error for monitor op

2019-11-29 Thread Dennis Jacobfeuerborn
Hi,
I'm currently trying to set up a drbd 8.4 resource in a 3-node pacemaker
cluster. The idea is to have nodes storage1 and storage2 running with
the drbd clones and only use the third node storage3 for quorum.
The way I'm trying to do this:

pcs cluster cib cib.xml
pcs -f cib.xml resource create drbd ocf:linbit:drbd drbd_resource=r0 op
monitor interval=60s
pcs -f cib.xml resource master drbd-clone drbd master-max=1
master-node-max=1 clone-max=2 clone-node-max=1  notify=true
pcs -f cib.xml constraint location drbd-clone avoids storage3=INFINITY
pcs cluster cib-push cib.xml

What I get after the cib-push is this failure:
Failed Resource Actions:
* drbd_monitor_0 on storage3 'not installed' (5): call=6, status=Not
installed, exitreason='',
last-rc-change='Fri Nov 29 14:20:52 2019', queued=0ms, exec=0ms

This is unexpected as there is no need to do any monitoring on the node
that isn't allowed to host the resource. I can cleanup the error once
I've installed the drbd tool and kernel module on storage3 as well but
is there a way to tell the cluster to ignore storage3 node complete for
any resources and only use it as a voting node?

Regards,
  Dennis
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] op stop timeout update causes monitor op to fail?

2019-09-17 Thread Dennis Jacobfeuerborn
On 11.09.19 16:51, Ken Gaillot wrote:
> On Tue, 2019-09-10 at 09:54 +0200, Dennis Jacobfeuerborn wrote:
>> Hi,
>> I just updated the timeout for the stop operation on an nfs cluster
>> and
>> while the timeout was update the status suddenly showed this:
>>
>> Failed Actions:
>> * nfsserver_monitor_1 on nfs1aqs1 'unknown error' (1): call=41,
>> status=Timed Out, exitreason='none',
>> last-rc-change='Tue Aug 13 14:14:28 2019', queued=0ms, exec=0ms
> 
> Are you sure it wasn't already showing that? The timestamp of that
> error is Aug 13, while the logs show the timeout update happening Sep
> 10.

I'm fairly certain. I did a "pcs status" before that operation to check
the state of the cluster.

> 
> Old errors will keep showing up in status until you manually clean them
> up (with crm_resource --cleanup or a higher-level tool equivalent), or
> any configured failure-timeout is reached.
> 
> In any case, the log excerpt shows that nothing went wrong during the
> time it covers. There were no actions scheduled in that transition in
> response to the timeout change (which is as expected).

What about this line:
pengine:  warning: unpack_rsc_op_failure:   Processing failed op monitor
for nfsserver on nfs1aqs1: unknown error (1)

I cleaned up the error and tried this again and this time it worked. The
corresponding line in the log now reads:
pengine: info: determine_op_status: Operation monitor found resource
nfsserver active on nfs1aqs1

What I'm wondering is if this could be a race condition of pacemaker
updating the resource and the monitor operation.

Regards,
  Dennis
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] op stop timeout update causes monitor op to fail?

2019-09-10 Thread Dennis Jacobfeuerborn
Hi,
I just updated the timeout for the stop operation on an nfs cluster and
while the timeout was update the status suddenly showed this:

Failed Actions:
* nfsserver_monitor_1 on nfs1aqs1 'unknown error' (1): call=41,
status=Timed Out, exitreason='none',
last-rc-change='Tue Aug 13 14:14:28 2019', queued=0ms, exec=0ms

The command used:
pcs resource update nfsserver op stop timeout=30s

I can't imagine that this is expected to happen. Is there another way to
update the timeout that doesn't cause this?

I attached the log of the transition.

Regards,
  Dennis
Sep 10 09:39:29 [2378] nfs1a-qs1cib: info: cib_process_request: 
Forwarding cib_replace operation for section configuration to all 
(origin=local/cibadmin/2)
Sep 10 09:39:29 [2378] nfs1a-qs1cib: info: cib_perform_op:  Diff: 
--- 0.76.14 2
Sep 10 09:39:29 [2378] nfs1a-qs1cib: info: cib_perform_op:  Diff: 
+++ 0.77.0 8b73092b4ee9744fc4eaff60f8ba8388
Sep 10 09:39:29 [2378] nfs1a-qs1cib: info: cib_perform_op:  +  
/cib:  @epoch=77, @num_updates=0
Sep 10 09:39:29 [2378] nfs1a-qs1cib: info: cib_perform_op:  +  
/cib/configuration/resources/primitive[@id='nfsserver']/operations/op[@id='nfsserver-stop-interval-0s']:
  @timeout=30s
Sep 10 09:39:29 [2378] nfs1a-qs1cib: info: cib_perform_op:  ++ 
/cib/configuration/resources/primitive[@id='nfsserver']:  
Sep 10 09:39:29 [2378] nfs1a-qs1cib: info: cib_process_request: 
Completed cib_replace operation for section configuration: OK (rc=0, 
origin=nfs1aqs1/cibadmin/2, version=0.77.0)
Sep 10 09:39:29 [2383] nfs1a-qs1   crmd: info: abort_transition_graph:  
Transition aborted by op.nfsserver-stop-interval-0s 'modify': Configuration 
change | cib=0.77.0 source=te_update_diff:456 
path=/cib/configuration/resources/primitive[@id='nfsserver']/operations/op[@id='nfsserver-stop-interval-0s']
 complete=true
Sep 10 09:39:29 [2383] nfs1a-qs1   crmd:   notice: do_state_transition: 
State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph
Sep 10 09:39:29 [2382] nfs1a-qs1pengine:   notice: unpack_config:   On loss 
of CCM Quorum: Ignore
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: determine_online_status: 
Node nfs1bqs1 is online
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: determine_online_status: 
Node nfs1aqs1 is online
Sep 10 09:39:29 [2382] nfs1a-qs1pengine:  warning: unpack_rsc_op_failure:   
Processing failed op monitor for nfsserver on nfs1aqs1: unknown error (1)
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: unpack_node_loop:
Node 2 is already processed
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: unpack_node_loop:
Node 1 is already processed
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: unpack_node_loop:
Node 2 is already processed
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: unpack_node_loop:
Node 1 is already processed
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: clone_print:  
Master/Slave Set: drbd-clone [drbd]
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: short_print:  
Masters: [ nfs1aqs1 ]
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: short_print:  
Slaves: [ nfs1bqs1 ]
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: common_print:
metadata-fs (ocf::heartbeat:Filesystem):Started nfs1aqs1
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: common_print:
medias-fs   (ocf::heartbeat:Filesystem):Started nfs1aqs1
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: common_print:
nfsserver   (ocf::heartbeat:nfsserver): Started nfs1aqs1
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: common_print:vip 
(ocf::heartbeat:IPaddr2):   Started nfs1aqs1
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: get_failcount_full:  
nfsserver has failed 1 times on nfs1aqs1
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: 
check_migration_threshold:   nfsserver can fail 99 more times on 
nfs1aqs1 before being forced off
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: master_color:
Promoting drbd:1 (Master nfs1aqs1)
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: master_color:
drbd-clone: Promoted 1 instances of a possible 1 to master
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: LogActions:  Leave   
drbd:0  (Slave nfs1bqs1)
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: LogActions:  Leave   
drbd:1  (Master nfs1aqs1)
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: LogActions:  Leave   
metadata-fs (Started nfs1aqs1)
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: LogActions:  Leave   
medias-fs   (Started nfs1aqs1)
Sep 10 09:39:29 [2382] nfs1a-qs1pengine: info: LogActions:  Leave   
nfsserver   (Started nfs1aqs1)
Sep 10 09:39:29 [2382] nfs1a-qs1   

Re: [ClusterLabs] drbd clone not becoming master

2017-11-03 Thread Dennis Jacobfeuerborn
On 03.11.2017 15:49, Ken Gaillot wrote:
> On Thu, 2017-11-02 at 23:18 +0100, Dennis Jacobfeuerborn wrote:
>> On 02.11.2017 23:08, Dennis Jacobfeuerborn wrote:
>>> Hi,
>>> I'm setting up a redundant NFS server for some experiments but
>>> almost
>>> immediately ran into a strange issue. The drbd clone resource never
>>> promotes either of the to clones to the Master state.
>>>
>>> The state says this:
>>>
>>>  Master/Slave Set: drbd-clone [drbd]
>>>  Slaves: [ nfsserver1 nfsserver2 ]
>>>  metadata-fs(ocf::heartbeat:Filesystem):Stopped
>>>
>>> The resource configuration looks like this:
>>>
>>> Resources:
>>>  Master: drbd-clone
>>>   Meta Attrs: master-node-max=1 clone-max=2 notify=true master-
>>> max=1
>>> clone-node-max=1
>>>   Resource: drbd (class=ocf provider=linbit type=drbd)
>>>    Attributes: drbd_resource=r0
>>>    Operations: demote interval=0s timeout=90 (drbd-demote-interval-
>>> 0s)
>>>    monitor interval=60s (drbd-monitor-interval-60s)
>>>    promote interval=0s timeout=90 (drbd-promote-
>>> interval-0s)
>>>    start interval=0s timeout=240 (drbd-start-interval-
>>> 0s)
>>>    stop interval=0s timeout=100 (drbd-stop-interval-0s)
>>>  Resource: metadata-fs (class=ocf provider=heartbeat
>>> type=Filesystem)
>>>   Attributes: device=/dev/drbd/by-res/r0/0
>>> directory=/var/lib/nfs_shared
>>> fstype=ext4 options=noatime
>>>   Operations: monitor interval=20 timeout=40
>>> (metadata-fs-monitor-interval-20)
>>>   start interval=0s timeout=60 (metadata-fs-start-
>>> interval-0s)
>>>   stop interval=0s timeout=60 (metadata-fs-stop-
>>> interval-0s)
>>>
>>> Location Constraints:
>>> Ordering Constraints:
>>>   promote drbd-clone then start metadata-fs (kind:Mandatory)
>>> Colocation Constraints:
>>>   metadata-fs with drbd-clone (score:INFINITY) (with-rsc-
>>> role:Master)
>>>
>>> Shouldn't one of the clones be promoted to the Master state
>>> automatically?
>>
>> I think the source of the issue is this:
>>
>> Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Called
>> /usr/sbin/crm_master -Q -l reboot -v 1
>> Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Exit code 107
>> Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Command
>> output:
>> Nov  2 23:12:03 nfsserver1 lrmd[2163]:  notice:
>> drbd_monitor_6:4673:stderr [ Error signing on to the CIB service:
>> Transport endpoint is not connected ]
>>
>> It seems the drbd resource agent tries to use crm_master to promote
>> the
>> clone but fails because it cannot "sign on to the CIB service". Does
>> anybody know what that means?
>>
>> Regards,
>>   Dennis
>>
> 
> That's odd, it should only happen if the cluster is not running, but
> then the agent wouldn't have been called.
> 
> The CIB is one of the core daemons of pacemaker; it manages the cluster
> configuration and status. If it's not running, the cluster can't do
> anything.
> 
> Perhaps the CIB is crashing, or something is blocking the communication
> between the agent and the CIB.

SELinux was the culprit. After disabling it the problem went away.

Regards,
  Dennis


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] drbd clone not becoming master

2017-11-02 Thread Dennis Jacobfeuerborn
On 02.11.2017 23:08, Dennis Jacobfeuerborn wrote:
> Hi,
> I'm setting up a redundant NFS server for some experiments but almost
> immediately ran into a strange issue. The drbd clone resource never
> promotes either of the to clones to the Master state.
> 
> The state says this:
> 
>  Master/Slave Set: drbd-clone [drbd]
>  Slaves: [ nfsserver1 nfsserver2 ]
>  metadata-fs  (ocf::heartbeat:Filesystem):Stopped
> 
> The resource configuration looks like this:
> 
> Resources:
>  Master: drbd-clone
>   Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1
> clone-node-max=1
>   Resource: drbd (class=ocf provider=linbit type=drbd)
>Attributes: drbd_resource=r0
>Operations: demote interval=0s timeout=90 (drbd-demote-interval-0s)
>monitor interval=60s (drbd-monitor-interval-60s)
>promote interval=0s timeout=90 (drbd-promote-interval-0s)
>start interval=0s timeout=240 (drbd-start-interval-0s)
>stop interval=0s timeout=100 (drbd-stop-interval-0s)
>  Resource: metadata-fs (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: device=/dev/drbd/by-res/r0/0 directory=/var/lib/nfs_shared
> fstype=ext4 options=noatime
>   Operations: monitor interval=20 timeout=40
> (metadata-fs-monitor-interval-20)
>   start interval=0s timeout=60 (metadata-fs-start-interval-0s)
>   stop interval=0s timeout=60 (metadata-fs-stop-interval-0s)
> 
> Location Constraints:
> Ordering Constraints:
>   promote drbd-clone then start metadata-fs (kind:Mandatory)
> Colocation Constraints:
>   metadata-fs with drbd-clone (score:INFINITY) (with-rsc-role:Master)
> 
> Shouldn't one of the clones be promoted to the Master state automatically?

I think the source of the issue is this:

Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Called
/usr/sbin/crm_master -Q -l reboot -v 1
Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Exit code 107
Nov  2 23:12:03 nfsserver1 drbd(drbd)[4673]: ERROR: r0: Command output:
Nov  2 23:12:03 nfsserver1 lrmd[2163]:  notice:
drbd_monitor_6:4673:stderr [ Error signing on to the CIB service:
Transport endpoint is not connected ]

It seems the drbd resource agent tries to use crm_master to promote the
clone but fails because it cannot "sign on to the CIB service". Does
anybody know what that means?

Regards,
  Dennis




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] drbd clone not becoming master

2017-11-02 Thread Dennis Jacobfeuerborn
Hi,
I'm setting up a redundant NFS server for some experiments but almost
immediately ran into a strange issue. The drbd clone resource never
promotes either of the to clones to the Master state.

The state says this:

 Master/Slave Set: drbd-clone [drbd]
 Slaves: [ nfsserver1 nfsserver2 ]
 metadata-fs(ocf::heartbeat:Filesystem):Stopped

The resource configuration looks like this:

Resources:
 Master: drbd-clone
  Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1
clone-node-max=1
  Resource: drbd (class=ocf provider=linbit type=drbd)
   Attributes: drbd_resource=r0
   Operations: demote interval=0s timeout=90 (drbd-demote-interval-0s)
   monitor interval=60s (drbd-monitor-interval-60s)
   promote interval=0s timeout=90 (drbd-promote-interval-0s)
   start interval=0s timeout=240 (drbd-start-interval-0s)
   stop interval=0s timeout=100 (drbd-stop-interval-0s)
 Resource: metadata-fs (class=ocf provider=heartbeat type=Filesystem)
  Attributes: device=/dev/drbd/by-res/r0/0 directory=/var/lib/nfs_shared
fstype=ext4 options=noatime
  Operations: monitor interval=20 timeout=40
(metadata-fs-monitor-interval-20)
  start interval=0s timeout=60 (metadata-fs-start-interval-0s)
  stop interval=0s timeout=60 (metadata-fs-stop-interval-0s)

Location Constraints:
Ordering Constraints:
  promote drbd-clone then start metadata-fs (kind:Mandatory)
Colocation Constraints:
  metadata-fs with drbd-clone (score:INFINITY) (with-rsc-role:Master)

Shouldn't one of the clones be promoted to the Master state automatically?

Regards,
  Dennis

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How to mount rpc_pipefs on a different mountpoint on RHEL/CentOS 7?

2017-11-02 Thread Dennis Jacobfeuerborn
On 31.10.2017 12:58, Ferenc Wágner wrote:
> Dennis Jacobfeuerborn <denni...@conversis.de> writes:
> 
>> if I create a new unit file for the new file the services would not
>> depend on it so it wouldn't get automatically mounted when they start.
> 
> Put the new unit file under /etc/systemd/system/x.service.requires to
> have x.service require it.  I don't get the full picture, but this trick
> may help puzzle it together.
> 

It seems the nfsserve resource agent isn't compatible with
RHEL/CentOS-7. These systems always mount /var/lib/nfs/rpc_pipefs on
boot but the resource agent script actually checks if "/var/lib/nfs" is
present in /proc/mounts and if that is the case then it will refuse to
start the nfs server thus preventing the fail-over.
I honestly have no good idea how to solve this as the mounting of
/var/lib/nfs/rpc_pipefs is basically hard-coded into the RHEL/CentOS
service files.

Regards,
  Dennis


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] How to mount rpc_pipefs on a different mountpoint on RHEL/CentOS 7?

2017-10-31 Thread Dennis Jacobfeuerborn
Hi,
I'm trying to create a redundant NFS system but hit a problem with the
way the nfs packages on RHEL/CentOS 7 handle the sunrpc mount point.

I put /var/lib/nfs on its own redudant drbd device but on a failover the
nfsserver resource agent complains that something is already mounted
below /var/lib/nfs which would be the rpc_pipefs mountpoint.

The problem is that mount units don't really support drop-ins because
the "Where" directive must match the filename of the unit file. As a
result I'm in a situation where I cannot rename/modify the nfs-utils
supplied unit file as this would break things after an update, I cannot
mask the unit as that would prevent the nfs services from starting and
even if that could be worked around if I create a new unit file for the
new file the services would not depend on it so it wouldn't get
automatically mounted when they start.

What I want to express is "instead of mounting sunrpc on
/var/lib/nfs/rpc_pipefs mount it on /var/lib/rpc_pipefs" but I cannot
see a good way to do this.

Any ideas?

Regards,
  Dennis

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [corosync] Master branch

2016-10-11 Thread Dennis Jacobfeuerborn
On 11.10.2016 12:42, Christine Caulfield wrote:
> I've just committed a bit patch to the master branch of corosync - it is
> now all very experimental, and existing pull requests against master
> might need to be checked. This starts the work on what will hopefully
> become corosync 3.0
> 
> The commit is to make Kronosnet the new, default, transport for
> corosync. It might take a while to get this fully stabilised but I've
> been running it myself for a while now and it seems pretty reliable.
> 
> Here are the commit notes:
> 
> totem: Add Kronosnet transport.
> 
> This is a big update that removes RRP & MRP from the codebase
> and makes knet the default transport for corosync. UDP & UDPU
> are still (currently) supported but are deprecated. Also crypto
> and mutiple interfaces are only supported over knet.
> 
> To compile this codebase you will need to install libknet from
> https://github.com/fabbione/kronosnet
> 
> The corosync.conf(5) man page has been updated with info on the new
> options. Older config files should still work but many options
> have changed because of the knet implementation so configs should
> be checked carefully. In particular any cluster using using RRP
> over UDP or UDPU will not start as RRP is no longer present. If you
> need multiple interface support then you should be using the knet
> transport.
> 
> Knet brings many benefits to the corosync codebase, it provides support
> for more interfaces than RRP (up to 8), will be more reliable in the
> event
> of network outages and allows dynamic reconfiguration of interfaces.
> It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have
> plagued corosync/openais from day 1
> 
> Signed-off-by: Christine Caulfield 

Is it wise to only have support for a project that seems to be stuck in
an almost abandoned state? There seems to exist no meaningful
documentation available, the readme says that the project is in its
early stages of development (apparently for many years now) and the repo
sees very little activity from mostly one person. The user mailing list
has received one mail in 2010 and the development mailing list isn't
much more active either.

Regards,
  Dennis



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Can't get nfs4 to work.

2016-06-02 Thread Dennis Jacobfeuerborn
On 02.06.2016 09:18, Ferenc Wágner wrote:
> "Stephano-Shachter, Dylan"  writes:
> 
>> I can not figure out why version 4 is not supported.
> 
> Have you got fsid=root (or fsid=0) on your root export?
> See man exports.
> 

This is apparently no longer recommended:
http://wiki.linux-nfs.org/wiki/index.php/Nfsv4_configuration

"The linux implementation allows you to designate a real filesystem as
the pseudofilesystem, identifying that export with the fsid=0 option; we
no longer recommend this. Instead, on any recent linux distribution,
just list exports in /etc/exports exactly as you would for NFSv2 or NFSv3. "

Regards,
  Dennis


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Can't get nfs4 to work.

2016-06-01 Thread Dennis Jacobfeuerborn
On 01.06.2016 20:25, Stephano-Shachter, Dylan wrote:
> Hello all,
> 
> I have just finished setting up my HA nfs cluster and I am having a small
> problem. I would like to have nfs4 working but whenever I try to mount I
> get the following message,
> 
> mount: no type was given - I'll assume nfs because of the colon

I'm not sure if the type "nfs" is supposed to work with v4 as well but
on my systems the mounts use the explicit type "nfs4" so you can try
mounting with "-t nfs4".

Regards,
  Dennis


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: reproducible split brain

2016-03-19 Thread Dennis Jacobfeuerborn
On 18.03.2016 00:50, Digimer wrote:
> On 17/03/16 07:30 PM, Christopher Harvey wrote:
>> On Thu, Mar 17, 2016, at 06:24 PM, Ken Gaillot wrote:
>>> On 03/17/2016 05:10 PM, Christopher Harvey wrote:
 If I ignore pacemaker's existence, and just run corosync, corosync
 disagrees about node membership in the situation presented in the first
 email. While it's true that stonith just happens to quickly correct the
 situation after it occurs it still smells like a bug in the case where
 corosync in used in isolation. Corosync is after all a membership and
 total ordering protocol, and the nodes in the cluster are unable to
 agree on membership.

 The Totem protocol specifies a ring_id in the token passed in a ring.
 Since all of the 3 nodes but one have formed a new ring with a new id
 how is it that the single node can survive in a ring with no other
 members passing a token with the old ring_id?

 Are there network failure situations that can fool the Totem membership
 protocol or is this an implementation problem? I don't see how it could
 not be one or the other, and it's bad either way.
>>>
>>> Neither, really. In a split brain situation, there simply is not enough
>>> information for any protocol or implementation to reliably decide what
>>> to do. That's what fencing is meant to solve -- it provides the
>>> information that certain nodes are definitely not active.
>>>
>>> There's no way for either side of the split to know whether the opposite
>>> side is down, or merely unable to communicate properly. If the latter,
>>> it's possible that they are still accessing shared resources, which
>>> without proper communication, can lead to serious problems (e.g. data
>>> corruption of a shared volume).
>>
>> The totem protocol is silent on the topic of fencing and resources, much
>> the way TCP is.
>>
>> Please explain to me what needs to be fenced in a cluster without
>> resources where membership and total message ordering are the only
>> concern. If fencing were a requirement for membership and ordering,
>> wouldn't stonith be part of corosync and not pacemaker?
> 
> Corosync is a membership and communication layer (and in v2+, a quorum
> provider). It doesn't care about or manage anything higher up. So it
> doesn't care about fencing itself.
> 
> It simply cares about;
> 
> * Who is in the cluster?
> * How do the members communicate?
> * (v2+) Is there enough members for quorum?
> * Notify resource managers of membership changes (join or loss).
> 
> The resource manager, pacemaker or rgmanager, care about resources, so
> it is what cares about making smart decisions. As Ken pointed out,
> without fencing, it can never tell the difference between no access and
> dead peer.
> 
> This is (again) why fencing is critical.

I think the key issue here is that people think about corosync they
believe there can only be two state for membership (true or false) when
in reality there are three possible states: true, false and unknown.

The problem then is that corosync apparently has no built-in way to deal
with the "unknown" situation and requires guidance from an external
entity for that (in this case pacemakers fencing).

This means that corosync alone simply cannot give you reliable
membership guarantees. I strictly requires external help to be able to
provide that.

Regards,
  Dennis


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PCS, Corosync, Pacemaker, and Bind (Ken Gaillot)

2016-03-19 Thread Dennis Jacobfeuerborn
On 17.03.2016 08:45, Andrei Borzenkov wrote:
> On Wed, Mar 16, 2016 at 9:35 PM, Mike Bernhardt  wrote:
>> I guess I have to say "never mind!" I don't know what the problem was
>> yesterday, but it loads just fine today, even when the named config and the
>> virtual ip don't match! But for your edamacation, ifconfig does NOT show the
>> address although ip addr does:
>>
> 
> That's normal. ifconfig knows nothing about addresses added using "ip
> addr add". You can make it visible to ifconfig by adding label:
> 
> ip addr add dev eth1 ... label eth1:my-label

Just stop using ifconfig/route and keep using the ip command.
ifconfig/route have been deprecated for a decade now and I know old
habits die hard but at some point you just have to move on otherwise you
might end up like those admins who stopped learning in the 80s and now
haunt discussion forums mostly contributing snark and negativity because
they've come to hate their jobs.

Regards,
  Dennis



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org