Re: [ClusterLabs] VirtualDomain and Resource_is_Too_Active ?? - problem/error

2019-06-04 Thread Jan Pokorný
On 03/06/19 13:39 +0200, Jan Pokorný wrote:
> Yes, there are at least two issues in ocf:heartbeat:VirtualDomain:
> 
> 1/ dealing with user input derived value, in an unchecked manner, while
>such value can be an empty string or may contain spaces
>(for the latter, see also I've raised back then:
>https://lists.clusterlabs.org/pipermail/users/2015-May/007629.html 
>https://lists.clusterlabs.org/pipermail/developers/2015-May/000620.html
>)
> 
> 2/ agent doesn't try to figure out whether is tries to parse
>a reasonably familiar file, in this case, it means it can
>be grep'ing file spanning up to terabytes of data
> 
> In your case, you mistakenly pointed the agent (via "config" parameter
> as highlighted above) not to the expected configuration, but rather to
> the disk image itself -- that's not how talk to libvirt -- only such
> a guest configuration XML shall point to where the disk image itself
> is located instead.  See ocf_heartbeat_VirtualDomain(7) or the output
> you get when invoking the agent with "meta-data" argument.
> 
> Such configuration issue could be indicated reliably with "validate-all"
> passed as an action for configured set of agent parameters would
> 1/ and/or 2/ not exist in the agent's implementation.
> Please, file issues against VirtualDomain agent to that effect at
> https://github.com/ClusterLabs/fence-agents/issues

sorry, apparently resource-agents, hence

https://github.com/ClusterLabs/resource-agents/issues

(I don't think GitHub allows after-the-fact issue retargeting, sadly)

-- 
Jan (Poki)


pgpnoFDROu6cE.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] VirtualDomain and Resource_is_Too_Active ?? - problem/error

2019-06-03 Thread Jan Pokorný
On 29/05/19 09:29 -0500, Ken Gaillot wrote:
> On Wed, 2019-05-29 at 11:42 +0100, lejeczek wrote:
>> I doing something which I believe is fairly simple, namely:
>> 
>> $ pcs resource create HA-work9-win10-kvm VirtualDomain \
>>   hypervisor="qemu:///system" \
>>   config="/0-ALL.SYSDATA/QEMU_VMs/HA-work9-win10.qcow2" \
 
>>   migration_transport=ssh --disable
>> 
>> virt guest is good, runs in libvirth okey, yet pacemaker fails:
>> 
>> ...
>>   notice: Calculated transition 1864, saving inputs in 
>> /var/lib/pacemaker/pengine/pe-input-2022.bz2
>>   notice: Configuration ERRORs found during PE processing.  Please run 
>> "crm_verify -L" to identify issues.
>>   notice: Initiating monitor operation HA-work9-win10-kvm_monitor_0 locally 
>> on whale.private
>>   notice: Initiating monitor operation HA-work9-win10-kvm_monitor_0 on 
>> swir.private
>>   notice: Initiating monitor operation HA-work9-win10-kvm_monitor_0 on 
>> rider.private
>>  warning: HA-work9-win10-kvm_monitor_0 process (PID 2103512) timed out
>>  warning: HA-work9-win10-kvm_monitor_0:2103512 - timed out after 3ms
>>   notice: HA-work9-win10-kvm_monitor_0:2103512:stderr [ 
>> /usr/lib/ocf/resource.d/heartbeat/VirtualDomain: line 981: [: too
>> many arguments ]
> 
> This looks like a bug in the resource agent, probably due to some
> unexpected configuration value. Double-check your resource
> configuration for what values the various parameters can have. (Or it
> may just be a side effect of the interval issue above, so try fixing
> that first.)

Yes, there are at least two issues in ocf:heartbeat:VirtualDomain:

1/ dealing with user input derived value, in an unchecked manner, while
   such value can be an empty string or may contain spaces
   (for the latter, see also I've raised back then:
   https://lists.clusterlabs.org/pipermail/users/2015-May/007629.html 
   https://lists.clusterlabs.org/pipermail/developers/2015-May/000620.html
   )

2/ agent doesn't try to figure out whether is tries to parse
   a reasonably familiar file, in this case, it means it can
   be grep'ing file spanning up to terabytes of data

In your case, you mistakenly pointed the agent (via "config" parameter
as highlighted above) not to the expected configuration, but rather to
the disk image itself -- that's not how talk to libvirt -- only such
a guest configuration XML shall point to where the disk image itself
is located instead.  See ocf_heartbeat_VirtualDomain(7) or the output
you get when invoking the agent with "meta-data" argument.

Such configuration issue could be indicated reliably with "validate-all"
passed as an action for configured set of agent parameters would
1/ and/or 2/ not exist in the agent's implementation.
Please, file issues against VirtualDomain agent to that effect at
https://github.com/ClusterLabs/fence-agents/issues

In addition, there may be some other configuration discrepancies as
pointed out by Ken.  Let us know if any issues persist once all these
are resolved.

-- 
Jan (Poki)


pgpE18eLaF8PX.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] VirtualDomain and Resource_is_Too_Active ?? - problem/error

2019-05-29 Thread Ken Gaillot
On Wed, 2019-05-29 at 11:42 +0100, lejeczek wrote:
> hi guys,
> 
> I doing something which I believe is fairly simple, namely:
> 
> $ pcs resource create HA-work9-win10-kvm VirtualDomain
> hypervisor="qemu:///system"
> config="/0-ALL.SYSDATA/QEMU_VMs/HA-work9-win10.qcow2"
> migration_transport=ssh --disable
> 
> virt guest is good, runs in libvirth okey, yet pacemaker fails:
> 
> ...
> 
> 
>   notice: State transition S_IDLE -> S_POLICY_ENGINE
>error: Invalid recurring action
> chenbro0.1-raid5-mnt-start-interval-90 wth name: 'start'
>error: Invalid recurring action chenbro0.1-raid5-mnt-stop-
> interval-90
> wth name: 'stop'

The "start" and "stop" actions in the configuration must have interval
0 (which is the default if you just omit it). Configuring start/stop is
just a way to be able to set the timeout etc. used with those actions.

>   notice: Calculated transition 1864, saving inputs in
> /var/lib/pacemaker/pengine/pe-input-2022.bz2
>   notice: Configuration ERRORs found during PE processing.  Please
> run
> "crm_verify -L" to identify issues.
>   notice: Initiating monitor operation HA-work9-win10-kvm_monitor_0
> locally on whale.private
>   notice: Initiating monitor operation HA-work9-win10-kvm_monitor_0
> on
> swir.private
>   notice: Initiating monitor operation HA-work9-win10-kvm_monitor_0
> on
> rider.private
>  warning: HA-work9-win10-kvm_monitor_0 process (PID 2103512) timed
> out
>  warning: HA-work9-win10-kvm_monitor_0:2103512 - timed out after
> 3ms
>   notice: HA-work9-win10-kvm_monitor_0:2103512:stderr [
> /usr/lib/ocf/resource.d/heartbeat/VirtualDomain: line 981: [: too
> many
> arguments ]

This looks like a bug in the resource agent, probably due to some
unexpected configuration value. Double-check your resource
configuration for what values the various parameters can have. (Or it
may just be a side effect of the interval issue above, so try fixing
that first.)

>error: Result of probe operation for HA-work9-win10-kvm on
> whale.private: Timed Out
>   notice: whale.private-HA-work9-win10-kvm_monitor_0:204 [
> /usr/lib/ocf/resource.d/heartbeat/VirtualDomain: line 981: [: too
> many
> arguments\n ]
>  warning: Action 15 (HA-work9-win10-kvm_monitor_0) on rider.private
> failed (target: 7 vs. rc: 1): Error
>   notice: Transition aborted by operation HA-work9-win10-
> kvm_monitor_0
> 'modify' on rider.private: Event failed
>  warning: Action 17 (HA-work9-win10-kvm_monitor_0) on whale.private
> failed (target: 7 vs. rc: 1): Error
>  warning: Action 16 (HA-work9-win10-kvm_monitor_0) on swir.private
> failed (target: 7 vs. rc: 1): Error
>   notice: Transition 1864 (Complete=3, Pending=0, Fired=0, Skipped=0,
> Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2022.bz2):
> Complete
>  warning: Processing failed probe of HA-work9-win10-kvm on
> whale.private: unknown error
>   notice: If it is not possible for HA-work9-win10-kvm to run on
> whale.private, see the resource-discovery option for location
> constraints
>  warning: Processing failed probe of HA-work9-win10-kvm on
> whale.private: unknown error
>   notice: If it is not possible for HA-work9-win10-kvm to run on
> whale.private, see the resource-discovery option for location
> constraints
>  warning: Processing failed probe of HA-work9-win10-kvm on
> swir.private:
> unknown error
>   notice: If it is not possible for HA-work9-win10-kvm to run on
> swir.private, see the resource-discovery option for location
> constraints
>  warning: Processing failed probe of HA-work9-win10-kvm on
> swir.private:
> unknown error
>   notice: If it is not possible for HA-work9-win10-kvm to run on
> swir.private, see the resource-discovery option for location
> constraints
>  warning: Processing failed probe of HA-work9-win10-kvm on
> rider.private: unknown error
>   notice: If it is not possible for HA-work9-win10-kvm to run on
> rider.private, see the resource-discovery option for location
> constraints
>  warning: Processing failed probe of HA-work9-win10-kvm on
> rider.private: unknown error
>   notice: If it is not possible for HA-work9-win10-kvm to run on
> rider.private, see the resource-discovery option for location
> constraints
>error: Invalid recurring action
> chenbro0.1-raid5-mnt-start-interval-90 wth name: 'start'
>error: Invalid recurring action chenbro0.1-raid5-mnt-stop-
> interval-90
> wth name: 'stop'
>error: Resource HA-work9-win10-kvm is active on 3 nodes
> (attempting
> recovery)
>   notice: See
> https://wiki.clusterlabs.org/wiki/FAQ#Resource_is_Too_Active for more
> information
>   notice:  * Stop   HA-work9-win10-kvm  
> ( whale.private )   due
> to
> node availability
>   notice:  * Stop   HA-work9-win10-kvm  
> (  swir.private )   due
> to
> node availability
>   notice:  * Stop   HA-work9-win10-kvm  
> ( rider.private )   due
> to
> node