[ClusterLabs] Q: RA metadata description

2019-09-04 Thread Ulrich Windl
Hi!

The desription of RA metadata is poor for years (e.g. the DTD has almost no 
comments) (I still prefer the DTD for readability over the RNG), and working on 
an RA I have a question:

How is the  to be formatted? A "crm ra info " des not seem to 
wrap long lines at terminal width. So are the long lines in  expected 
to be wrapped in XML? Even if not, for practical reasons of readability the 
lines may be wrapped manually in XML.
So are there any formatting rules, or maybe even some markup?

BTW: Is there a reason why the DTD is named "ra-api-1.dtd", while the RNG is 
named "metadata.rng"?

Regards,
Ulrich

P.S. Here's the "well-documented" DTD:




























___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: Q: Recommened directory for RA auxillary files?

2019-09-04 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 04.09.2019 um 16:26 in
Nachricht
<2634f19382b90736bdfb80b9c84997111479d337.ca...@redhat.com>:
> On Wed, 2019‑09‑04 at 10:07 +0200, Jehan‑Guillaume de Rorthais wrote:
>> On Tue, 03 Sep 2019 09:35:39 ‑0500
>> Ken Gaillot  wrote:
>> 
>> > On Mon, 2019‑09‑02 at 15:23 +0200, Ulrich Windl wrote:
>> > > Hi!
>> > > 
>> > > Are there any recommendations where to place (fixed content)
>> > > files an
>> > > RA uses?
>> > > Usually my RAs use a separate XML file for the metadata, just to
>> > > allow editing it in XML mode automatically.
>> > > Traditionally I put the file in the same directory as the RA
>> > > itself
>> > > (like "cat $0.xml" for meta‑data).
>> > > Are there any expectations that every file in the RA directory is
>> > > an
>> > > RA?
>> > > (Currently I'm extending an RA, and I'd like to provide some
>> > > additional user‑modifiable template file, and I wonder which path
>> > > to
>> > > use)
>> > > 
>> > > Regards,
>> > > Ulrich  
>> > 
>> > I believe most (maybe even all modern?) deployments have both lib
>> > and
>> > resource.d under /usr/lib/ocf. If you have a custom provider for
>> > the RA
>> > under resource.d, it would make sense to use the same pattern under
>> > lib.
>> 
>> Shouldn't it be $OCF_FUNCTIONS_DIR?
> 
> Good point ‑‑ if the RA is using ocf‑shellfuncs, yes. $OCF_ROOT/lib
> should be safe if the RA doesn't use ocf‑shellfuncs.
> 
> It's a weird situation; the OCF standard actually specifies /usr/ocf,
> but everyone implemented /usr/lib/ocf. I do plan to add a configure
> option for it in pacemaker, but it shouldn't be changed unless you can
> make the same change in every other cluster component that needs it.

The thing with $OCF_ROOT is: If $OCF_ROOT already contains "/lib", it looks
off to add another "/lib".
To me it looks as if it's time for an $OCF_LIB (which would be $OCF_ROOT if
the latter is /usr/lib/ocf already, otherwise $OCF_ROOT/lib). Personally I
think the /usr/ predates the [/usr][/share]]/lib/.

> 
>> Could this be generalized to RA for their
>> own lib or permanent dependencies files?
> 
> The OCF standard specifies only the resource.d subdirectory, and
> doesn't comment on adding others. lib/heartbeat is a common choice for
> the resource‑agents package shell includes (an older approach was to
> put them as dot files in resource.d/heartbeat, and there are often
> symlinks at those locations for backward compatibility).
> 
> Since "heartbeat" is a resource agent provider name, and the standard
> specifies that agents go under resource.d/, it does make
> sense that lib/ would be where RA files would go.

I wonder when we will be able to retire "heartbeat" ;-) If it's supposed to be
of "vendor" type, maybe replace it with "clusterlabs" at some time...

Regards,
Ulrich

> ‑‑ 
> Ken Gaillot 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: Re: Antw: Re: Q: Recommened directory for RA auxillary files?

2019-09-04 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 04.09.2019 um 16:13 in 
>>> Nachricht
<7a2fd358234b7924982ad2ed2d0df0c6a731adc0.ca...@redhat.com>:
> On Wed, 2019-09-04 at 10:09 +0200, Jehan-Guillaume de Rorthais wrote:
>> On Wed, 04 Sep 2019 07:54:50 +0200
>> "Ulrich Windl"  wrote:
>> 
>> > > > > Ken Gaillot  schrieb am 03.09.2019 um
>> > > > > 16:35 in  
[...]
>> > 
>> > So what concrete path are you suggesting?
>> > /usr/lib//?
>> 
>> I would bet on /usr/lib/ocf/lib/ ?

The "double-lib" looks odd, however.

> 
> That was what I had in mind. Parallels "heartbeat"
> -- 
> Ken Gaillot 




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Q: Recommened directory for RA auxillary files?

2019-09-04 Thread Ken Gaillot
On Wed, 2019-09-04 at 10:07 +0200, Jehan-Guillaume de Rorthais wrote:
> On Tue, 03 Sep 2019 09:35:39 -0500
> Ken Gaillot  wrote:
> 
> > On Mon, 2019-09-02 at 15:23 +0200, Ulrich Windl wrote:
> > > Hi!
> > > 
> > > Are there any recommendations where to place (fixed content)
> > > files an
> > > RA uses?
> > > Usually my RAs use a separate XML file for the metadata, just to
> > > allow editing it in XML mode automatically.
> > > Traditionally I put the file in the same directory as the RA
> > > itself
> > > (like "cat $0.xml" for meta-data).
> > > Are there any expectations that every file in the RA directory is
> > > an
> > > RA?
> > > (Currently I'm extending an RA, and I'd like to provide some
> > > additional user-modifiable template file, and I wonder which path
> > > to
> > > use)
> > > 
> > > Regards,
> > > Ulrich  
> > 
> > I believe most (maybe even all modern?) deployments have both lib
> > and
> > resource.d under /usr/lib/ocf. If you have a custom provider for
> > the RA
> > under resource.d, it would make sense to use the same pattern under
> > lib.
> 
> Shouldn't it be $OCF_FUNCTIONS_DIR?

Good point -- if the RA is using ocf-shellfuncs, yes. $OCF_ROOT/lib
should be safe if the RA doesn't use ocf-shellfuncs.

It's a weird situation; the OCF standard actually specifies /usr/ocf,
but everyone implemented /usr/lib/ocf. I do plan to add a configure
option for it in pacemaker, but it shouldn't be changed unless you can
make the same change in every other cluster component that needs it.

> Could this be generalized to RA for their
> own lib or permanent dependencies files?

The OCF standard specifies only the resource.d subdirectory, and
doesn't comment on adding others. lib/heartbeat is a common choice for
the resource-agents package shell includes (an older approach was to
put them as dot files in resource.d/heartbeat, and there are often
symlinks at those locations for backward compatibility).

Since "heartbeat" is a resource agent provider name, and the standard
specifies that agents go under resource.d/, it does make
sense that lib/ would be where RA files would go.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: Re: Q: Recommened directory for RA auxillary files?

2019-09-04 Thread Ken Gaillot
On Wed, 2019-09-04 at 10:09 +0200, Jehan-Guillaume de Rorthais wrote:
> On Wed, 04 Sep 2019 07:54:50 +0200
> "Ulrich Windl"  wrote:
> 
> > > > > Ken Gaillot  schrieb am 03.09.2019 um
> > > > > 16:35 in  
> > 
> > Nachricht
> > <979978d5a488aabd9ed4a941ff4eac60c271c84d.ca...@redhat.com>:
> > > On Mon, 2019‑09‑02 at 15:23 +0200, Ulrich Windl wrote:  
> > > > Hi!
> > > > 
> > > > Are there any recommendations where to place (fixed content)
> > > > files an
> > > > RA uses?
> > > > Usually my RAs use a separate XML file for the metadata, just
> > > > to
> > > > allow editing it in XML mode automatically.
> > > > Traditionally I put the file in the same directory as the RA
> > > > itself
> > > > (like "cat $0.xml" for meta‑data).
> > > > Are there any expectations that every file in the RA directory
> > > > is an
> > > > RA?
> > > > (Currently I'm extending an RA, and I'd like to provide some
> > > > additional user‑modifiable template file, and I wonder which
> > > > path to
> > > > use)
> > > > 
> > > > Regards,
> > > > Ulrich  
> > > 
> > > I believe most (maybe even all modern?) deployments have both lib
> > > and
> > > resource.d under /usr/lib/ocf. If you have a custom provider for
> > > the RA
> > > under resource.d, it would make sense to use the same pattern
> > > under
> > > lib.  
> > 
> > So what concrete path are you suggesting?
> > /usr/lib//?
> 
> I would bet on /usr/lib/ocf/lib/ ?

That was what I had in mind. Parallels "heartbeat"
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: Re: Q: Recommened directory for RA auxillary files?

2019-09-04 Thread Ulrich Windl
>>> Jehan-Guillaume de Rorthais  schrieb am 04.09.2019 um
10:07 in
Nachricht <20190904100736.14c93f9e@firost>:
> On Tue, 03 Sep 2019 09:35:39 ‑0500
> Ken Gaillot  wrote:
> 
>> On Mon, 2019‑09‑02 at 15:23 +0200, Ulrich Windl wrote:
>> > Hi!
>> > 
>> > Are there any recommendations where to place (fixed content) files an
>> > RA uses?
>> > Usually my RAs use a separate XML file for the metadata, just to
>> > allow editing it in XML mode automatically.
>> > Traditionally I put the file in the same directory as the RA itself
>> > (like "cat $0.xml" for meta‑data).
>> > Are there any expectations that every file in the RA directory is an
>> > RA?
>> > (Currently I'm extending an RA, and I'd like to provide some
>> > additional user‑modifiable template file, and I wonder which path to
>> > use)
>> > 
>> > Regards,
>> > Ulrich  
>> 
>> I believe most (maybe even all modern?) deployments have both lib and
>> resource.d under /usr/lib/ocf. If you have a custom provider for the RA
>> under resource.d, it would make sense to use the same pattern under
>> lib.
> 
> Shouldn't it be $OCF_FUNCTIONS_DIR? Could this be generalized to RA for 
> their
> own lib or permanent dependencies files?

It all depends (a bit):
I'm adding logrotate support for my RA: As there will be multiple instances,
each using different log files, the RA creates a logrotate config file that
matches the configured instance upon "start" and removes it upon "stop". As I
did not want to use a built-in logrotate template, I put it in an external
file, letting the user customize it (a bit). Like maximum file size, frequency
of rotation, etc.

The current template looks like this:
# {__NOTE__}
{__LOGDIR__}/*.log {
rotate 2
nocreate
size 1M
#weekly
#olddir {__LOGDIR__}/old
missingok
postrotate
kill -HUP $(cat '{__PIDFILE__}')
endscript
delaycompress
#nocompress
}

And I wondered where to put such files.  Currently I'm using
"/etc/ocf///logrotate.in" which gives me kind of clean namespace.

Regards,
Ulrich

> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Corosync main process was not scheduled for 2889.8477 ms (threshold is 800.0000 ms), though it runs with realtime priority and there was not much load on the node

2019-09-04 Thread Jeevan Patnaik
Hi,

On Wed, Sep 4, 2019 at 9:30 AM Andrei Borzenkov  wrote:

> 04.09.2019 0:27, wf...@niif.hu пишет:
> > Jeevan Patnaik  writes:
> >
> >> [16187] node1 corosyncwarning [MAIN  ] Corosync main process was not
> >> scheduled for 2889.8477 ms (threshold is 800. ms). Consider token
> >> timeout increase.
> >> [...]
> >> 2. How to fix this? We have not much load on the nodes, the corosync is
> >> already running with RT priority.
> >
> > Does your corosync daemon use a watchdog device?  (See in the startup
> > logs.)  Watchdog interaction can be *slow*.
> >
>
> Watchdog is disabled in pacemaker.

> Can you elaborate? This is the first time I see that corosync has
> anything to do with watchdog. How exactly corosync interacts with
> watchdog? Where in corosync configuration watchdog device is defined?
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/



Regards,
Jeevan.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Corosync main process was not scheduled for 2889.8477 ms (threshold is 800.0000 ms), though it runs with realtime priority and there was not much load on the node

2019-09-04 Thread Jeevan Patnaik
Hi Honza,

On Tue, Sep 3, 2019 at 7:20 PM Jan Friesse  wrote:

> Jeevan,
>
> Jeevan Patnaik napsal(a):
> >Hi Honza,
> >
> >   Thanks for the response.
> >
> > If you increase token timeout even higher
> > (let's say 12sec) is it still appearing or not?
> > - I will try this.
> >
> >   If you try to run it without RT priority, does it help?
> > - Can RT priority affect the process scheduling negatively?
>
> Actually we've had report that it can, because it blocks kernel thread
> which is responsible for sending/receiving packets. I was not able to
> reporduce this behavior myself, and it seemed to be kernel specific, but
> resolution was that behavior without RT was better.
>
Thanks. I will check this. Also in theory, can blocking kernel thread
responsible for sending/receiving packets affect scheduling of the corosync
process (with RT priority) ?

>
> >
> > I don't see any irregular IO activity during the time when we got these
> > errors. Also, swap usage and swap IO is not much at all, it's only in
> KBs.
> > we have vm.swappiness set to 1. So, I don't think swap is causing any
> issue.
> >
> > However, I see slight network activity during the issue times (What I
> > understand is network activity should not affect the CPU jobs as long as
> > CPU load is normal and without any blocking IO).
>
> It shouldn't
>
> >
> > I am thinking of debugging in the following way, unless there is option
> to
> > restart corosync with debugger mode. :
>
> You can turn on debug messages (debug: on in logging section of
> corosync.conf).
>
> Yes, I found thist later. Will try debugging. Hoping it would help in
knowing where the problem is.

> >
> > -> Run a process strace in background on the corosync process and
> redirect
> > log to a output
> > -> Add a frequent cron job to rotate the output log (delete old ones),
> > unless there is a flag file to keep the old log
> > -> Add another frequent cron job to check corosync log for the specific
> > token timeout error and add the above mentioned flag file to not delete
> the
> > strace output.
> >
> > Don't know if the above process is safe to run on a production server, >
> without creating much impact on the system resources. Need to check.
> >
>
> Yep. Hopefully you find something.
>
> Regards,
>Honza
>
> >
> > On Mon, Sep 2, 2019 at 5:50 PM Jan Friesse  wrote:
> >
> >> Jeevan,
> >>
> >> Jeevan Patnaik napsal(a):
> >>> Hi,
> >>>
> >>> Also, both are physical machines.
> >>>
> >>> On Fri, Aug 30, 2019 at 7:23 PM Jeevan Patnaik 
> >> wrote:
> >>>
>  Hi,
> 
>  We see the following messages almost everyday in our 2 node cluster
> and
>  resources gets migrated when it happens:
> 
>  [16187] node1 corosyncwarning [MAIN  ] Corosync main process was not
> >> scheduled for 2889.8477 ms (threshold is 800. ms). Consider token
> >> timeout increase.
>  [16187] node1 corosyncnotice  [TOTEM ] c.
>  [16187] node1 corosyncnotice  [TOTEM ] A new membership (
> >> 192.168.0.1:1268) was formed. Members joined: 2 left: 2
>  [16187] node1 corosyncnotice  [TOTEM ] Failed to receive the leave
> >> message. failed: 2
> 
> 
>  After setting the token timeout to 6000ms, at least the "Failed to
>  receive the leave message" doesn't appear anymore. But we see corosync
>  timeout errors:
>  [16395] node1 corosyncwarning [MAIN  ] Corosync main process was not
>  scheduled for 6660.9043 ms (threshold is 4800. ms). Consider token
>  timeout increase.
> 
>  1. Why is the set timeout not in effect? It's 4800ms instead of
> 6000ms.
> >>
> >> It is in effect. Threshold for pause detector is set as 0.8 * token
> >> timeout.
> >>
>  2. How to fix this? We have not much load on the nodes, the corosync
> is
>  already running with RT priority.
> >>
> >> There must be something wrong. If you increase token timeout even higher
> >> (let's say 12sec) is it still appearing or not? If so, isn't the machine
> >> swapping (for example) or waiting for IO? If you try to run it without
> >> RT priority, does it help?
> >>
> >> Regards,
> >> Honza
> >>
> >>
> 
>  The following is the details of OS and packages:
> 
>  Kernel: 3.10.0-957.el7.x86_64
>  OS: Oracle Linux Server 7.6
> 
>  corosync-2.4.3-4.el7.x86_64
>  corosynclib-2.4.3-4.el7.x86_64
> 
>  Thanks in advance.
> 
>  --
>  Regards,
>  Jeevan.
>  Create your own email signature
>  <
> >>
> https://www.wisestamp.com/signature-in-email?utm_source=promotion&utm_medium=signature&utm_campaign=create_your_own
> >>>
> 
> >>>
> >>>
> >>>
> >>>
> >>> ___
> >>> Manage your subscription:
> >>> https://lists.clusterlabs.org/mailman/listinfo/users
> >>>
> >>> ClusterLabs home: https://www.clusterlabs.org/
> >>>
> >>
> >>
> >
> > Regards,
> > Jeevan.
> >
>
>

Regards,
Jeevan
___
Manage your subscription:
https://lists.clust

Re: [ClusterLabs] stonith-ng - performing action 'monitor' timed out with signal 15

2019-09-04 Thread Marco Marino
First of all, thank you for your support.
Andrey: sure, I can reach machines through IPMI.
Here is a short "log":

#From ld1 trying to contact ld1
[root@ld1 ~]# ipmitool -I lanplus -H 192.168.254.250 -U root -P XX sdr
elist all
SEL  | 72h | ns  |  7.1 | No Reading
Intrusion| 73h | ok  |  7.1 |
iDRAC8   | 00h | ok  |  7.1 | Dynamic MC @ 20h
...

#From ld1 trying to contact ld2
ipmitool -I lanplus -H 192.168.254.251 -U root -P XX sdr elist all
SEL  | 72h | ns  |  7.1 | No Reading
Intrusion| 73h | ok  |  7.1 |
iDRAC7   | 00h | ok  |  7.1 | Dynamic MC @ 20h
...


#From ld2 trying to contact ld1:
root@ld2 ~]# ipmitool -I lanplus -H 192.168.254.250 -U root -P X sdr
elist all
SEL  | 72h | ns  |  7.1 | No Reading
Intrusion| 73h | ok  |  7.1 |
iDRAC8   | 00h | ok  |  7.1 | Dynamic MC @ 20h
System Board | 00h | ns  |  7.1 | Logical FRU @00h
.

#From ld2 trying to contact ld2
[root@ld2 ~]# ipmitool -I lanplus -H 192.168.254.251 -U root -P  sdr
elist all
SEL  | 72h | ns  |  7.1 | No Reading
Intrusion| 73h | ok  |  7.1 |
iDRAC7   | 00h | ok  |  7.1 | Dynamic MC @ 20h
System Board | 00h | ns  |  7.1 | Logical FRU @00h


Jan: Actually the cluster uses /etc/hosts in order to resolve names:
172.16.77.10ld1.mydomain.it  ld1
172.16.77.11ld2.mydomain.it  ld2

Furthermore I'm using ip addresses for ipmi interfaces in the configuration:
[root@ld1 ~]# pcs stonith show fence-node1
 Resource: fence-node1 (class=stonith type=fence_ipmilan)
  Attributes: ipaddr=192.168.254.250 lanplus=1 login=root passwd=X
pcmk_host_check=static-list pcmk_host_list=ld1.mydomain.it
  Operations: monitor interval=60s (fence-node1-monitor-interval-60s)


Any idea?
How can I reset the state of the cluster without downtime? "pcs resource
cleanup" is enough?
Thank you,
Marco


Il giorno mer 4 set 2019 alle ore 10:29 Jan Pokorný 
ha scritto:

> On 03/09/19 20:15 +0300, Andrei Borzenkov wrote:
> > 03.09.2019 11:09, Marco Marino пишет:
> >> Hi, I have a problem with fencing on a two node cluster. It seems that
> >> randomly the cluster cannot complete monitor operation for fence
> devices.
> >> In log I see:
> >> crmd[8206]:   error: Result of monitor operation for fence-node2 on
> >> ld2.mydomain.it: Timed Out
> >
> > Can you actually access IP addresses of your IPMI ports?
>
> [
> Tangentially, interesting aspect beyond that and applicable for any
> non-IP cross-host referential needs, which I haven't seen mentioned
> anywhere so far, is the risk of DNS resolution (when /etc/hosts will
> come short) getting to troubles (stale records, port blocked, DNS
> server overload [DNSSEC, etc.], IPv4/IPv6 parallel records that the SW
> cannot handle gracefully, etc.).  In any case, just a single DNS
> server would apparently be an undesired SPOF, and would be unfortunate
> when unable to fence a node because of that.
>
> I think the most robust approach is to use IP addresses whenever
> possible, and unambiguous records in /etc/hosts when practical.
> ]
>
> >> As attachment there is
> >> - /var/log/messages for node1 (only the important part)
> >> - /var/log/messages for node2 (only the important part) <-- Problem
> starts
> >> here
> >> - pcs status
> >> - pcs stonith show (for both fence devices)
> >>
> >> I think it could be a timeout problem, so how can I see timeout value
> for
> >> monitor operation in stonith devices?
> >> Please, someone can help me with this problem?
> >> Furthermore, how can I fix the state of fence devices without downtime?
>
> --
> Jan (Poki)
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] stonith-ng - performing action 'monitor' timed out with signal 15

2019-09-04 Thread Jan Pokorný
On 03/09/19 20:15 +0300, Andrei Borzenkov wrote:
> 03.09.2019 11:09, Marco Marino пишет:
>> Hi, I have a problem with fencing on a two node cluster. It seems that
>> randomly the cluster cannot complete monitor operation for fence devices.
>> In log I see:
>> crmd[8206]:   error: Result of monitor operation for fence-node2 on
>> ld2.mydomain.it: Timed Out
> 
> Can you actually access IP addresses of your IPMI ports?

[
Tangentially, interesting aspect beyond that and applicable for any
non-IP cross-host referential needs, which I haven't seen mentioned
anywhere so far, is the risk of DNS resolution (when /etc/hosts will
come short) getting to troubles (stale records, port blocked, DNS
server overload [DNSSEC, etc.], IPv4/IPv6 parallel records that the SW
cannot handle gracefully, etc.).  In any case, just a single DNS
server would apparently be an undesired SPOF, and would be unfortunate
when unable to fence a node because of that.

I think the most robust approach is to use IP addresses whenever
possible, and unambiguous records in /etc/hosts when practical.
]

>> As attachment there is
>> - /var/log/messages for node1 (only the important part)
>> - /var/log/messages for node2 (only the important part) <-- Problem starts
>> here
>> - pcs status
>> - pcs stonith show (for both fence devices)
>> 
>> I think it could be a timeout problem, so how can I see timeout value for
>> monitor operation in stonith devices?
>> Please, someone can help me with this problem?
>> Furthermore, how can I fix the state of fence devices without downtime?

-- 
Jan (Poki)


pgpL97hDs1Edl.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] IPaddr2 RA and multicast mac

2019-09-04 Thread Michael Schwartzkopff
Am 04.09.19 um 00:27 schrieb Tomer Azran:
> Hello,
>
> When using IPaddr2 RA in order to set a cloned IP address resource:
>
> pcs resource create vip1 ocf:heartbeat:IPaddr2 ip=10.0.0.100 iflabel=vip1 
> cidr_netmask=24 flush_routes=true op monitor interval=30s
> pcs resource clone vip1 clone-max=2 clone-node-max=2 globally-unique=true
>
> Then the cluster set the iptables CLUSTERIP module, and the result is 
> something like that:
>
> # iptables -L -n
> .
> .
> .
> CLUSTERIP  all  --  0.0.0.0/010.0.0.100 CLUSTERIP 
> hashmode=sourceip-sourceport clustermac=A1:DE:DE:89:A6:FE total_nodes=2 
> local_node=1 hash_init=0
> .
> .
> .
>
> The problem is that the RA picks a clustermac address which is not on the 
> multicast range (must start with 01:00:5E)
> If not working with a multicast address, the traffic is being treated as 
> broadcast which is bad.
>
> I found that you can set a multicast mac if you use the "mac" parameter, 
> which solves the issue.
>
> Can the RA default be changed to use multicast range?
> In addition, I think that you might need to update the documentation 
> (https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_clone_the_ip_address.html)
>  and instruct users to use the mac parameter when creating the resource. In 
> addition, I think that the documentation should instruct the user to enable 
> multicast traffic on the network, which is not enabled by default.
>
> Tomer Azran
> IDM & LINUX Professional Services
>
> tomer.az...@edp.co.il
> m: +972-52-6389961
> t: +972-3-6438222
> f: +972-3-6438004
>
> [http://www.edp.co.il/logo1-small.png]
> www.edp.co.il
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/


Hi,


im Layer2 frames the least important bit of the most important byte
decides if it is multicast / broadcast or unicat. A "0" tells the switch
it is unicast and a "1" indicates a multicast address.

Depending on the switch vendor, the switch does learn the mutlicast MAC
address for the interface where it sees such a packet comming in or not.


A IEEE document explicitly says that a router SHOULD NOT learn multicast
MAC addresses for unicast IP addresses. Cisco is the only vendor that
sticks to that standard. On Cisco devices you have to add the MAC
manually. All other vendors just learn the MAC address.



Mit freundlichen Grüßen,

-- 

[*] sys4 AG
 
https://sys4.de, +49 (89) 30 90 46 64
Schleißheimer Straße 26/MG,80333 München
 
Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer, Wolfgang Stief
Aufsichtsratsvorsitzender: Florian Kirstein



signature.asc
Description: OpenPGP digital signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Re: Q: Recommened directory for RA auxillary files?

2019-09-04 Thread Jehan-Guillaume de Rorthais
On Wed, 04 Sep 2019 07:54:50 +0200
"Ulrich Windl"  wrote:

> >>> Ken Gaillot  schrieb am 03.09.2019 um 16:35 in  
> Nachricht
> <979978d5a488aabd9ed4a941ff4eac60c271c84d.ca...@redhat.com>:
> > On Mon, 2019‑09‑02 at 15:23 +0200, Ulrich Windl wrote:  
> >> Hi!
> >> 
> >> Are there any recommendations where to place (fixed content) files an
> >> RA uses?
> >> Usually my RAs use a separate XML file for the metadata, just to
> >> allow editing it in XML mode automatically.
> >> Traditionally I put the file in the same directory as the RA itself
> >> (like "cat $0.xml" for meta‑data).
> >> Are there any expectations that every file in the RA directory is an
> >> RA?
> >> (Currently I'm extending an RA, and I'd like to provide some
> >> additional user‑modifiable template file, and I wonder which path to
> >> use)
> >> 
> >> Regards,
> >> Ulrich  
> > 
> > I believe most (maybe even all modern?) deployments have both lib and
> > resource.d under /usr/lib/ocf. If you have a custom provider for the RA
> > under resource.d, it would make sense to use the same pattern under
> > lib.  
> 
> So what concrete path are you suggesting? /usr/lib//?

I would bet on /usr/lib/ocf/lib/ ?
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Q: Recommened directory for RA auxillary files?

2019-09-04 Thread Jehan-Guillaume de Rorthais
On Tue, 03 Sep 2019 09:35:39 -0500
Ken Gaillot  wrote:

> On Mon, 2019-09-02 at 15:23 +0200, Ulrich Windl wrote:
> > Hi!
> > 
> > Are there any recommendations where to place (fixed content) files an
> > RA uses?
> > Usually my RAs use a separate XML file for the metadata, just to
> > allow editing it in XML mode automatically.
> > Traditionally I put the file in the same directory as the RA itself
> > (like "cat $0.xml" for meta-data).
> > Are there any expectations that every file in the RA directory is an
> > RA?
> > (Currently I'm extending an RA, and I'd like to provide some
> > additional user-modifiable template file, and I wonder which path to
> > use)
> > 
> > Regards,
> > Ulrich  
> 
> I believe most (maybe even all modern?) deployments have both lib and
> resource.d under /usr/lib/ocf. If you have a custom provider for the RA
> under resource.d, it would make sense to use the same pattern under
> lib.

Shouldn't it be $OCF_FUNCTIONS_DIR? Could this be generalized to RA for their
own lib or permanent dependencies files?
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/