Re: [ClusterLabs] Antw: Re: Antw: Re: from where does the default value for start/stop op of a resource come ?

2017-08-02 Thread Ken Gaillot
On Wed, 2017-08-02 at 18:32 +0200, Lentes, Bernd wrote:
> 
> - On Aug 2, 2017, at 10:42 AM, Ulrich Windl 
> ulrich.wi...@rz.uni-regensburg.de wrote:
> 
> 
> > 
> > I thought the cluster does not perform actions that are not defined in the
> > configuration (e.g. "monitor"). 
> 
> I think the cluster performs and configures automatically start/stop 
> operations if not defined it the resource, 
> but a monitor op has to be configured explicitly, to my knowledge.

Correct. We've considered adding an implicit monitor if none is
explicitly specified, as well as adding implicit master and slave role
monitors if only one monitor is specified for a master/slave resource.
That might happen in a future version.

> >> 2. Set timeouts for any operations that have defaults in the RA
> >>meta-data?
> >> 
> >> What most people seem to expect is 2), but it sounds like what you are
> >> expecting is 1). Crmsh can't read minds, so it would have to pick one or
> >> the other.
> 
> Yes, i expected the cluster chooses the "defaults" from the meta-data of the 
> RA..

It is confusing. Pacemaker doesn't use much from the resource agent
meta-data currently. I could see an argument for using the RA defaults,
though it could still be confusing since there are multiple possible
interpretations.

The implementation would be complicated, though. Currently, only the
crmd has the meta-data information; it's not in the CIB, so the policy
engine (which sets the timeouts) doesn't have it. Also, we can schedule
probe, start, and monitor operations in a single transition, before
we've gotten the RA meta-data, so the timeouts couldn't be known when
the actions are scheduled. There are potential ways around that, but it
would be a significant project.

> >> Another thing to consider is that if RA meta-data is preferred over the
> >> global default timeout, then the global default timeout wouldn't be used
> >> at all for operations that happen to have default timeouts in the
> >> meta-data. That seems surprising as well to me.
> 
> Yes. You configure global defaults, but they are not used. Confusing.
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671






___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: DRBD and SSD TRIM - Slow!

2017-08-02 Thread Eric Robinson
1) iotop did not show any significant io, just maybe 30k/second of drbd traffic.

2) okay. I've never done that before. I'll give it a shot.

3) I'm not sure what I'm looking at there.

--
Eric Robinson

> -Original Message-
> From: Ulrich Windl [mailto:ulrich.wi...@rz.uni-regensburg.de]
> Sent: Tuesday, August 01, 2017 11:28 PM
> To: users@clusterlabs.org
> Subject: [ClusterLabs] Antw: DRBD and SSD TRIM - Slow!
> 
> Hi!
> 
> I know little about trim operations, but you could try one of these:
> 
> 1) iotop to see whether some I/O is done during trimming (assuming
> trimming itself is not considered to be I/O)
> 
> 2) Try blocktrace on the affected devices to see what's going on. It's hard to
> set up and to extract the info you are looking for, but it provides deep
> insights
> 
> 3) Watch /sys/block/$BDEV/stat for performance statistics. I don't know how
> well DRBD supports these, however (e.g. MDRAID shows no wait times and
> no busy operations, while a multipath map has it all).
> 
> Regards,
> Ulrich
> 
> >>> Eric Robinson  schrieb am 02.08.2017 um
> >>> 07:09 in
> Nachricht
>  d03.prod.outlook.com>
> 
> > Does anyone know why trimming a filesystem mounted on a DRBD volume
> > takes so long? I mean like three days to trim a 1.2TB filesystem.
> >
> > Here are some pertinent details:
> >
> > OS: SLES 12 SP2
> > Kernel: 4.4.74-92.29
> > Drives: 6 x Samsung SSD 840 Pro 512GB
> > RAID: 0 (mdraid)
> > DRBD: 9.0.8
> > Protocol: C
> > Network: Gigabit
> > Utilization: 10%
> > Latency: < 1ms
> > Loss: 0%
> > Iperf test: 900 mbits/sec
> >
> > When I write to a non-DRBD partition, I get 400MB/sec (bypassing caches).
> > When I trim a non-DRBD partition, it completes fast.
> > When I write to a DRBD volume, I get 80MB/sec.
> >
> > When I trim a DRBD volume, it takes bloody ages!
> >
> > --
> > Eric Robinson
> 
> 
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: from where does the default value for start/stop op of a resource come ?

2017-08-02 Thread Lentes, Bernd


- On Aug 2, 2017, at 10:42 AM, Ulrich Windl 
ulrich.wi...@rz.uni-regensburg.de wrote:


> 
> I thought the cluster does not perform actions that are not defined in the
> configuration (e.g. "monitor"). 

I think the cluster performs and configures automatically start/stop operations 
if not defined it the resource, 
but a monitor op has to be configured explicitly, to my knowledge.

>> 
>> 2. Set timeouts for any operations that have defaults in the RA
>>meta-data?
>> 
>> What most people seem to expect is 2), but it sounds like what you are
>> expecting is 1). Crmsh can't read minds, so it would have to pick one or
>> the other.

Yes, i expected the cluster chooses the "defaults" from the meta-data of the 
RA..

>> 
>> Another thing to consider is that if RA meta-data is preferred over the
>> global default timeout, then the global default timeout wouldn't be used
>> at all for operations that happen to have default timeouts in the
>> meta-data. That seems surprising as well to me.

Yes. You configure global defaults, but they are not used. Confusing.

Bernd
 

Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: from where does the default value for start/stop op of a resource come ?

2017-08-02 Thread Kristoffer Grönlund
Ulrich Windl  writes:

>
> See my proposal above. ;-)

Hmm, yes. It's a possibility. Magic values rarely end up making things
simpler though :/

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Updated attribute is not displayed in crm_mon

2017-08-02 Thread 井上 和徳
Hi,

In Pacemaker-1.1.17, the attribute updated while starting pacemaker is not 
displayed in crm_mon.
In Pacemaker-1.1.16, it is displayed and results are different.

https://github.com/ClusterLabs/pacemaker/commit/fe44f400a3116a158ab331a92a49a4ad8937170d
This commit is the cause, but the following result (3.) is expected behavior?

[test case]
1. Start pacemaker on two nodes at the same time and update the attribute 
during startup.
   In this case, the attribute is displayed in crm_mon.

   [root@node1 ~]# ssh -f node1 'systemctl start pacemaker ; attrd_updater -n 
KEY -U V-1' ; \
   ssh -f node3 'systemctl start pacemaker ; attrd_updater -n 
KEY -U V-3'
   [root@node1 ~]# crm_mon -QA1
   Stack: corosync
   Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum

   2 nodes configured
   0 resources configured

   Online: [ node1 node3 ]

   No active resources


   Node Attributes:
   * Node node1:
   + KEY   : V-1
   * Node node3:
   + KEY   : V-3


2. Restart pacemaker on node1, and update the attribute during startup.

   [root@node1 ~]# systemctl stop pacemaker
   [root@node1 ~]# systemctl start pacemaker ; attrd_updater -n KEY -U V-10


3. The attribute is registered in attrd but it is not registered in CIB,
   so the updated attribute is not displayed in crm_mon.

   [root@node1 ~]# attrd_updater -Q -n KEY -A
   name="KEY" host="node3" value="V-3"
   name="KEY" host="node1" value="V-10"

   [root@node1 ~]# crm_mon -QA1
   Stack: corosync
   Current DC: node3 (version 1.1.17-1.el7-b36b869) - partition with quorum

   2 nodes configured
   0 resources configured

   Online: [ node1 node3 ]

   No active resources


   Node Attributes:
   * Node node1:
   * Node node3:
   + KEY   : V-3


Best Regards

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: from where does the default value for start/stop op of a resource come ?

2017-08-02 Thread Ulrich Windl
>>> Kristoffer Grönlund  schrieb am 02.08.2017 um 10:32 in
Nachricht <87bmny74ka@suse.com>:
> Ulrich Windl  writes:
> 
> Kristoffer Grönlund  schrieb am 02.08.2017 um 10:05
in
>>> 
>>> One idea might be to have a new command which inserts missing operations
>>> and operation timeouts based on the RA metadata.
>>
>> Sometimes there are reasons for not defining some operations. I can't
quite
>> follow your logic. [My item 2) only applies to operations the user has
>> specified]
> 
> Well, what would you want it to do?
> 
> 1. Set timeouts for operations that are defined but don't have explicit
>timeouts set?

I thought the cluster does not perform actions that are not defined in the
configuration (e.g. "monitor"). Maybe having some new magic tokens could help
here: A "global-default" would use the global default value, while a
"ra-default" would use the RA's default value, and all other values are set as
is. The default for all (not configured) operations would then be
"ra-default".

> 
> 2. Set timeouts for any operations that have defaults in the RA
>meta-data?
> 
> What most people seem to expect is 2), but it sounds like what you are
> expecting is 1). Crmsh can't read minds, so it would have to pick one or
> the other.
> 
> Another thing to consider is that if RA meta-data is preferred over the
> global default timeout, then the global default timeout wouldn't be used
> at all for operations that happen to have default timeouts in the
> meta-data. That seems surprising as well to me.

See my proposal above. ;-)

Regards,
Ulrich
P.S. Adding back the list for this discussion, assuming you sent a message to
the list that will arrive later
[...]


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: fence_vmware_soap: reads VM status but fails to reboot/on/off

2017-08-02 Thread Octavian Ciobanu
Hey Hideo,

Yes I'm using the free license for testing 6.5 version. We do have full
license on production server but it is for an older version. I will install
a full version key and get back with info but I'm sure after reading the
info starting from your link that what you pointed out is the source of my
issue.

Thank you for pointing that out.

Best regards
Octavian

On Wed, Aug 2, 2017 at 10:55 AM,  wrote:

> Hi Octavian,
>
> Are you possibly using the free version of ESXi?
> On the free version of ESXi, the operation on or off fails.
>
> The same phenomenon also occurs in connection with virsh.
>
>  - https://communities.vmware.com/thread/542433
>
> Best Regards,
> Hideo Yamauchi.
> - Original Message -
> >From: Octavian Ciobanu 
> >To: Cluster Labs - All topics related to open-source clustering welcomed <
> users@clusterlabs.org>
> >Date: 2017/8/1, Tue 23:07
> >Subject: Re: [ClusterLabs] Antw: fence_vmware_soap: reads VM status but
> fails to reboot/on/off
> >
> >
> >Hey Marek,
> >
> >I've run the command with --action off and uploaded the file on one of
> our servers : https://cloud.iwgate.com/index.php/s/1SpZlG8mBSR1dNE
> >
> >Interesting thing is that at the end of the file I found "Unable to
> connect/login to fencing device" instead of "Failed: Timed out waiting to
> power OFF"
> >
> >As information about my test rig:
> > Host OS: VMware ESXi 6.5 Hypervisor
> > Guest OS: Centos 7.3.1611 minimal with the latest updates
> > Fence agents installed with yum :
> >fence-agents-hpblade-4.0.11-47.el7_3.5.x86_64
> >fence-agents-rsa-4.0.11-47.el7_3.5.x86_64
> >fence-agents-ilo-moonshot-4.0.11-47.el7_3.5.x86_64
> >fence-agents-rhevm-4.0.11-47.el7_3.5.x86_64
> >fence-virt-0.3.2-5.el7.x86_64
> >fence-agents-mpath-4.0.11-47.el7_3.5.x86_64
> >fence-agents-ibmblade-4.0.11-47.el7_3.5.x86_64
> >fence-agents-ipdu-4.0.11-47.el7_3.5.x86_64
> >fence-agents-common-4.0.11-47.el7_3.5.x86_64
> >fence-agents-rsb-4.0.11-47.el7_3.5.x86_64
> >fence-agents-ilo-ssh-4.0.11-47.el7_3.5.x86_64
> >fence-agents-bladecenter-4.0.11-47.el7_3.5.x86_64
> >fence-agents-drac5-4.0.11-47.el7_3.5.x86_64
> >fence-agents-brocade-4.0.11-47.el7_3.5.x86_64
> >fence-agents-wti-4.0.11-47.el7_3.5.x86_64
> >fence-agents-compute-4.0.11-47.el7_3.5.x86_64
> >fence-agents-eps-4.0.11-47.el7_3.5.x86_64
> >fence-agents-cisco-ucs-4.0.11-47.el7_3.5.x86_64
> >fence-agents-intelmodular-4.0.11-47.el7_3.5.x86_64
> >fence-agents-eaton-snmp-4.0.11-47.el7_3.5.x86_64
> >fence-agents-cisco-mds-4.0.11-47.el7_3.5.x86_64
> >fence-agents-apc-snmp-4.0.11-47.el7_3.5.x86_64
> >fence-agents-ilo2-4.0.11-47.el7_3.5.x86_64
> >fence-agents-all-4.0.11-47.el7_3.5.x86_64
> >fence-agents-vmware-soap-4.0.11-47.el7_3.5.x86_64
> >fence-agents-ilo-mp-4.0.11-47.el7_3.5.x86_64
> >fence-agents-apc-4.0.11-47.el7_3.5.x86_64
> >fence-agents-emerson-4.0.11-47.el7_3.5.x86_64
> >fence-agents-ipmilan-4.0.11-47.el7_3.5.x86_64
> >fence-agents-ifmib-4.0.11-47.el7_3.5.x86_64
> >fence-agents-kdump-4.0.11-47.el7_3.5.x86_64
> >fence-agents-scsi-4.0.11-47.el7_3.5.x86_64
> >
> >Thank you
> >
> >
> >
> >On Tue, Aug 1, 2017 at 2:22 PM, Marek Grac  wrote:
> >
> >Hi,
> >>
> >>
> >>> But when I call any of the power actions (on, off, reboot) I get
> "Failed:
>  Timed out waiting to power OFF".
> 
>  I've tried with all the combinations of --power-timeout and
> --power-wait
>  and same error without any change in the response time.
> 
>  Any ideas from where or how to fix this issue ?
> >>>
> >>
> >>
> >>No, you have used the right options and if they were high enough it
> should work. You can try to post verbose (anonymized) output and we can
> take a look at it more deeply.
> >>
> >>>I suspect "power off" is actually a virtual press of the ACPI power
> button (reboot likewise), so your VM tries to shut down cleanly. That could
> take time, and it could hang (I guess). I don't use VMware, but maybe
> there's a "reset" action that presses the virtual reset button of the
> virtual hardware... ;-)
> >>>
> >>
> >>
> >>There should not be a fence agent that will do soft reboot. The 'reset'
> action does  power off/check status/power on so we are sure that machine
> was really down (of course unless --method cycle when 'reboot' button is
> used).
> >>
> >>m,
> >>__ _
> >>Users mailing list: Users@clusterlabs.org
> >>http://lists.clusterlabs.org/ mailman/listinfo/users
> >>
> >>Project Home: http://www.clusterlabs.org
> >>Getting started: http://www.clusterlabs.org/
> doc/Cluster_from_Scratch.pdf
> >>Bugs: http://bugs.clusterlabs.org
> >>
> >>
> >
> >___
> >Users mailing list: Users@clusterlabs.org
> >http://lists.clusterlabs.org/mailman/listinfo/users
> >
> >Project Home: http://www.clusterlabs.org
> 

Re: [ClusterLabs] Antw: Re: from where does the default value for start/stop op of a resource come ?

2017-08-02 Thread Kristoffer Grönlund
Ulrich Windl  writes:

>
> What aout this priority for newly added resources:?
> 1) Use the value specified explicitly
> 2) Use the value the RA's metadata specifies
> 3) Use the global default
>
> With "use" I mean "add it to the RA configuration".

Yeah, I've considered it. The main issue I see with making the change to
crmsh now is that it would also be confusing, when configuring a
resource without any operations and getting operations defined
anyway. Also, it would be impossible not to define operations that have
defaults in the metadata.

One idea might be to have a new command which inserts missing operations
and operation timeouts based on the RA metadata.

Cheers,
Kristoffer

>
> Regards,
> Ulrich
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: fence_vmware_soap: reads VM status but fails to reboot/on/off

2017-08-02 Thread renayama19661014
Hi Octavian,

Are you possibly using the free version of ESXi? 
On the free version of ESXi, the operation on or off fails.

The same phenomenon also occurs in connection with virsh.

 - https://communities.vmware.com/thread/542433

Best Regards,
Hideo Yamauchi.
- Original Message -
>From: Octavian Ciobanu 
>To: Cluster Labs - All topics related to open-source clustering welcomed 
> 
>Date: 2017/8/1, Tue 23:07
>Subject: Re: [ClusterLabs] Antw: fence_vmware_soap: reads VM status but fails 
>to reboot/on/off
> 
>
>Hey Marek,
>
>I've run the command with --action off and uploaded the file on one of our 
>servers : https://cloud.iwgate.com/index.php/s/1SpZlG8mBSR1dNE
>
>Interesting thing is that at the end of the file I found "Unable to 
>connect/login to fencing device" instead of "Failed: Timed out waiting to 
>power OFF"
>
>As information about my test rig:
> Host OS: VMware ESXi 6.5 Hypervisor
> Guest OS: Centos 7.3.1611 minimal with the latest updates
> Fence agents installed with yum : 
>    fence-agents-hpblade-4.0.11-47.el7_3.5.x86_64
>    fence-agents-rsa-4.0.11-47.el7_3.5.x86_64
>    fence-agents-ilo-moonshot-4.0.11-47.el7_3.5.x86_64
>    fence-agents-rhevm-4.0.11-47.el7_3.5.x86_64
>    fence-virt-0.3.2-5.el7.x86_64
>    fence-agents-mpath-4.0.11-47.el7_3.5.x86_64
>    fence-agents-ibmblade-4.0.11-47.el7_3.5.x86_64
>    fence-agents-ipdu-4.0.11-47.el7_3.5.x86_64
>    fence-agents-common-4.0.11-47.el7_3.5.x86_64
>    fence-agents-rsb-4.0.11-47.el7_3.5.x86_64
>    fence-agents-ilo-ssh-4.0.11-47.el7_3.5.x86_64
>    fence-agents-bladecenter-4.0.11-47.el7_3.5.x86_64
>    fence-agents-drac5-4.0.11-47.el7_3.5.x86_64
>    fence-agents-brocade-4.0.11-47.el7_3.5.x86_64
>    fence-agents-wti-4.0.11-47.el7_3.5.x86_64
>    fence-agents-compute-4.0.11-47.el7_3.5.x86_64
>    fence-agents-eps-4.0.11-47.el7_3.5.x86_64
>    fence-agents-cisco-ucs-4.0.11-47.el7_3.5.x86_64
>    fence-agents-intelmodular-4.0.11-47.el7_3.5.x86_64
>    fence-agents-eaton-snmp-4.0.11-47.el7_3.5.x86_64
>    fence-agents-cisco-mds-4.0.11-47.el7_3.5.x86_64
>    fence-agents-apc-snmp-4.0.11-47.el7_3.5.x86_64
>    fence-agents-ilo2-4.0.11-47.el7_3.5.x86_64
>    fence-agents-all-4.0.11-47.el7_3.5.x86_64
>    fence-agents-vmware-soap-4.0.11-47.el7_3.5.x86_64
>    fence-agents-ilo-mp-4.0.11-47.el7_3.5.x86_64
>    fence-agents-apc-4.0.11-47.el7_3.5.x86_64
>    fence-agents-emerson-4.0.11-47.el7_3.5.x86_64
>    fence-agents-ipmilan-4.0.11-47.el7_3.5.x86_64
>    fence-agents-ifmib-4.0.11-47.el7_3.5.x86_64
>    fence-agents-kdump-4.0.11-47.el7_3.5.x86_64
>    fence-agents-scsi-4.0.11-47.el7_3.5.x86_64
>
>Thank you
>
>
>
>On Tue, Aug 1, 2017 at 2:22 PM, Marek Grac  wrote:
>
>Hi,
>>
>>
>>> But when I call any of the power actions (on, off, reboot) I get "Failed:
 Timed out waiting to power OFF".

 I've tried with all the combinations of --power-timeout and --power-wait
 and same error without any change in the response time.

 Any ideas from where or how to fix this issue ?
>>>
>>
>>
>>No, you have used the right options and if they were high enough it should 
>>work. You can try to post verbose (anonymized) output and we can take a look 
>>at it more deeply. 
>>
>>>I suspect "power off" is actually a virtual press of the ACPI power button 
>>>(reboot likewise), so your VM tries to shut down cleanly. That could take 
>>>time, and it could hang (I guess). I don't use VMware, but maybe there's a 
>>>"reset" action that presses the virtual reset button of the virtual 
>>>hardware... ;-)
>>>
>>
>>
>>There should not be a fence agent that will do soft reboot. The 'reset' 
>>action does  power off/check status/power on so we are sure that machine was 
>>really down (of course unless --method cycle when 'reboot' button is used).
>>
>>m,
>>__ _
>>Users mailing list: Users@clusterlabs.org
>>http://lists.clusterlabs.org/ mailman/listinfo/users
>>
>>Project Home: http://www.clusterlabs.org
>>Getting started: http://www.clusterlabs.org/ doc/Cluster_from_Scratch.pdf
>>Bugs: http://bugs.clusterlabs.org
>>
>>
>
>___
>Users mailing list: Users@clusterlabs.org
>http://lists.clusterlabs.org/mailman/listinfo/users
>
>Project Home: http://www.clusterlabs.org
>Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>Bugs: http://bugs.clusterlabs.org
>
>
>

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: Re: from where does the default value for start/stop op of a resource come ?

2017-08-02 Thread Ulrich Windl
>>> Kristoffer Grönlund  schrieb am 02.08.2017 um 09:33 in
Nachricht <87h8xq77a1@suse.com>:
> "Lentes, Bernd"  writes:
> 
>> Hi,
>>
>> i'm wondering from where the default values for operations of a resource 
> come from.
> 
> [snip]
> 
>>
>> Is it hardcoded ? All timeouts i found in my config were explicitly related

> to a dedicated resource.
>> What are the values for the hardcoded defaults ?
>>
>> Does that also mean that what the description of the RA says as "default" 
> isn't a default, but just a recommendation ?
> 
> The default timeout is set by the default-action-timeout property, and
> the default value is 20s.
> 
> You are correct, the timeout values defined in the resource agent are
> not used automatically. They are recommended minimums, and the
> thought as I understand it (this predates my involvement in HA) is that
> any timeouts need to be reviewed carefully by the administrator.
> 
> I agree that it is somewhat surprising.

What aout this priority for newly added resources:?
1) Use the value specified explicitly
2) Use the value the RA's metadata specifies
3) Use the global default

With "use" I mean "add it to the RA configuration".

Regards,
Ulrich


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] from where does the default value for start/stop op of a resource come ?

2017-08-02 Thread Kristoffer Grönlund
"Lentes, Bernd"  writes:

> Hi,
>
> i'm wondering from where the default values for operations of a resource come 
> from.

[snip]

>
> Is it hardcoded ? All timeouts i found in my config were explicitly related 
> to a dedicated resource.
> What are the values for the hardcoded defaults ?
>
> Does that also mean that what the description of the RA says as "default" 
> isn't a default, but just a recommendation ?

The default timeout is set by the default-action-timeout property, and
the default value is 20s.

You are correct, the timeout values defined in the resource agent are
not used automatically. They are recommended minimums, and the
thought as I understand it (this predates my involvement in HA) is that
any timeouts need to be reviewed carefully by the administrator.

I agree that it is somewhat surprising.

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: DRBD and SSD TRIM - Slow!

2017-08-02 Thread Ulrich Windl
Hi!

I know little about trim operations, but you could try one of these:

1) iotop to see whether some I/O is done during trimming (assuming trimming 
itself is not considered to be I/O)

2) Try blocktrace on the affected devices to see what's going on. It's hard to 
set up and to extract the info you are looking for, but it provides deep 
insights

3) Watch /sys/block/$BDEV/stat for performance statistics. I don't know how 
well DRBD supports these, however (e.g. MDRAID shows no wait times and no busy 
operations, while a multipath map has it all).

Regards,
Ulrich

>>> Eric Robinson  schrieb am 02.08.2017 um 07:09 in
Nachricht


> Does anyone know why trimming a filesystem mounted on a DRBD volume takes so 
> long? I mean like three days to trim a 1.2TB filesystem.
> 
> Here are some pertinent details:
> 
> OS: SLES 12 SP2
> Kernel: 4.4.74-92.29
> Drives: 6 x Samsung SSD 840 Pro 512GB
> RAID: 0 (mdraid)
> DRBD: 9.0.8
> Protocol: C
> Network: Gigabit
> Utilization: 10%
> Latency: < 1ms
> Loss: 0%
> Iperf test: 900 mbits/sec
> 
> When I write to a non-DRBD partition, I get 400MB/sec (bypassing caches).
> When I trim a non-DRBD partition, it completes fast.
> When I write to a DRBD volume, I get 80MB/sec.
> 
> When I trim a DRBD volume, it takes bloody ages!
> 
> --
> Eric Robinson





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org