Hi David,

The resource (process forked from stonithd) stays until sleep is completed.
stonithd does not call st_child_term().

I set 61 seconds in timeout of start.
Therefore, the value of track->timeout in run_stonith_agent() is 61000(ms).
However, it is after '1000*track->timeout' milliseconds that st_child_term()
is called. That is, it is approximately _17 hours_ later.

- lib/fencing/st_client.c
 476 run_stonith_agent(const char *agent, const char *action, const char 
*victim,
 477               GHashTable * device_args, GHashTable * port_map, int 
*agent_result, char **output,
 478               async_command_t * track)

 541         if (track->timeout) {
 542             track->pid = pid;
 543             track->timer_sigterm = g_timeout_add(1000*track->timeout, 
st_child_term, track);
 544             track->timer_sigkill = g_timeout_add(1000*(track->timeout+5), 
st_child_kill, track);

Best Regards,
Kazunori INOUE

(12.08.01 00:01), David Vossel wrote:
----- Original Message -----
From: "Kazunori INOUE" <inouek...@intellilink.co.jp>
To: "pacemaker@oss" <pacemaker@oss.clusterlabs.org>
Cc: shimaza...@intellilink.co.jp
Sent: Monday, July 30, 2012 5:13:40 AM
Subject: [Pacemaker] Timeout value of STONITH resource is too large

Hi,

I am using Pacemaker-1.1.
- glue       (2012 Jul 16) 2719:18489f275f75
- libqb      (2012 Jul 19) 11b20e19beff7f1b6003be0b4c73da8ecf936442
- corosync   (2012 Jul 12) 908ed7dcb390c0eade3dddb0cdfe181eb26b2ce2
- pacemaker  (2012 Jul 29) 33119da31c235710195c783e5c9a32c6e95b3efc

The timeout value of the _start_ operation of STONITH resource is
large.
Therefore, even after the start operation is timed out, the process
of
plugin remains.

How long after the timeout occurs does the process stay around?  Does it 
terminate a few seconds after the timeout, or does the resource wait until the 
entire duration of the sleep 3600 finishes?

-- Vossel



The following is gdb at the time of STONITH resource starting.
----
    [root@dev1 ~]# gdb /usr/libexec/pacemaker/stonithd `pgrep
    stonithd`
    (gdb) b run_stonith_agent
    Breakpoint 1 at 0x7f03f1e00d69: file st_client.c, line 479.
    (gdb) c
    Continuing.

    Breakpoint 1, run_stonith_agent (agent=0xe0f820 "fence_legacy",
    action=0xe11fb0 "monitor",
     <snip>
    479     {
    (gdb) bt
    #0  run_stonith_agent (agent=0xe0f820 "fence_legacy",
    action=0xe11fb0 "monitor",
        victim=0x0, device_args=Traceback (most recent call
        last):0xcffe30, port_map=
        Traceback (most recent call last):0xcffe80,
        agent_result=0x7fff70214ef4,
        output=0x0, track=0xe11d20) at st_client.c:479
    #1  0x0000000000406230 in stonith_device_execute (device=0xe10ff0)
    at commands.c:140
    #2  0x0000000000406404 in stonith_device_dispatch
    (user_data=0xe10ff0) at commands.c:160
    #3  0x00007f03f224ad00 in crm_trigger_dispatch (source=0xe11160,
    callback=
        0x4063dd <stonith_device_dispatch>, userdata=0xe11160) at
        mainloop.c:105
    #4  0x0000003642638f0e in g_main_context_dispatch () from
    /lib64/libglib-2.0.so.0
    #5  0x000000364263c938 in ?? () from /lib64/libglib-2.0.so.0
    #6  0x000000364263cd55 in g_main_loop_run () from
    /lib64/libglib-2.0.so.0
    #7  0x00000000004056dc in main (argc=1, argv=0x7fff70215278) at
    main.c:853
    (gdb) n 15
    Detaching after fork from child process 28915.
    510       if (pid) {
    (gdb) n 15
    542                 track->pid = pid;
    (gdb) list
    537             track->stdout = p_read_fd;
    538             g_child_watch_add(pid, track->done, track);
    539             crm_trace("Op: %s on %s, pid: %d, timeout: %ds",
    action, agent, pid, track->timeout);
    540
    541             if (track->timeout) {
    542                 track->pid = pid;
    543                 track->timer_sigterm =
    g_timeout_add(1000*track->timeout, st_child_term, track);
    544                 track->timer_sigkill =
    g_timeout_add(1000*(track->timeout+5), st_child_kill, track);
    545
    546             } else {
    (gdb) n
    543                 track->timer_sigterm =
    g_timeout_add(1000*track->timeout, st_child_term, track);
    (gdb) n
    544                 track->timer_sigkill =
    g_timeout_add(1000*(track->timeout+5), st_child_kill, track);
    (gdb) p agent
    $1 = 0xe0f820 "fence_legacy"
    (gdb) p action
    $2 = 0xe11fb0 "monitor"
    (gdb) p args
    $3 = 0xe11500
    
"plugin=external/libvirt\nhostlist=dev2\nhypervisor_uri=qemu+ssh://n8/system\noption=monitor\n"
  * (gdb) p track->timeout
    $4 = 61000
  * (gdb) p 1000*track->timeout
    $5 = 61000000
----
1. I added "sleep 3600" to status() of
    /usr/lib64/stonith/plugins/external/libvirt.

    [root@dev1 external]# diff -u libvirt.ORG libvirt
    --- libvirt.ORG 2012-07-17 13:10:01.000000000 +0900
    +++ libvirt     2012-07-30 13:36:19.661431208 +0900
    @@ -221,6 +221,7 @@
         ;;

         status)
    +    sleep 3600
         libvirt_check_config
         libvirt_status
         exit $?

2. service corosync start ; service pacemaker start
3. cibadmin -U -x test.xml
4. When I wait for 61 seconds (timeout value of start),

    [root@dev1 ~]# crm_mon -rf1
    ============
    Last updated: Mon Jul 30 13:18:48 2012
    Last change: Mon Jul 30 13:15:08 2012 via cibadmin on dev1
    Stack: corosync
    Current DC: dev1 (-1788499776) - partition with quorum
    Version: 1.1.7-33119da
    2 Nodes configured, unknown expected votes
    1 Resources configured.
    ============

    Online: [ dev1 dev2 ]

    Full list of resources:

     f-2    (stonith:external/libvirt):     Started dev1 FAILED

    Migration summary:
    * Node dev2:
    * Node dev1:
       f-2: migration-threshold=1 fail-count=1000000

    Failed actions:
  *     f-2_start_0 (node=dev1, call=-1, rc=1, status=Timed Out):
  unknown error

    [root@dev1 ~]# ps -ef|egrep
    "UID|corosync|pacemaker|stonith|fence|sleep"
    UID    PID  PPID  C STIME TTY     TIME CMD
    root 28840     1  0 13:13 ?   00:00:01 corosync
    root 28858     1  0 13:13 ?   00:00:00 pacemakerd
    496  28860 28858  0 13:13 ?   00:00:00 /usr/libexec/pacemaker/cib
    root 28861 28858  0 13:13 ?   00:00:00
    /usr/libexec/pacemaker/stonithd
    root 28862 28858 73 13:13 ?   00:04:16 /usr/libexec/pacemaker/lrmd
    496  28863 28858  0 13:13 ?   00:00:00
    /usr/libexec/pacemaker/attrd
    496  28864 28858  0 13:13 ?   00:00:00
    /usr/libexec/pacemaker/pengine
    496  28865 28858 51 13:13 ?   00:02:58 /usr/libexec/pacemaker/crmd
  * root 28915 28861  0 13:15 ?   00:00:00 /usr/bin/perl
  /usr/sbin/fence_legacy
  * root 28916 28915  0 13:15 ?   00:00:00 stonith -t external/libvirt
  -E -S
  * root 28921 28916  0 13:15 ?   00:00:00 /bin/sh
  /usr/lib64/stonith/plugins/external/libvirt status
    root 28925 28921  0 13:15 ?   00:00:00 sleep 3600

    [root@dev1 ~]# top -bn1
    top - 13:21:26 up 5 days,  3:23,  5 users,  load average: 1.99,
    1.42, 0.72
    Tasks: 198 total,   3 running, 195 sleeping,   0 stopped,   0
    zombie
    Cpu(s):  0.7%us,  0.7%sy,  0.0%ni, 98.5%id,  0.0%wa,  0.0%hi,
     0.0%si,  0.0%st
    Mem:   5089052k total,  2423104k used,  2665948k free,   265756k
    buffers
    Swap:  1048568k total,        0k used,  1048568k free,  1717712k
    cached

      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
       COMMAND
  * 28862 root      20   0 83816 3412 2572 R 98.2  0.1   6:17.18 lrmd
  * 28865 hacluste  20   0  166m 6380 3428 R 98.2  0.1   4:59.84 crmd
    28860 hacluste  20   0 93888 7192 4472 S  2.0  0.1   0:00.23 cib
    29052 root      20   0 15024 1136  792 R  2.0  0.0   0:00.01 top
        1 root      20   0 19348 1520 1212 S  0.0  0.0   0:00.77 init
        2 root      20   0     0    0    0 S  0.0  0.0   0:00.00
        kthreadd
        3 root      RT   0     0    0    0 S  0.0  0.0   0:06.85
        migration/0
        4 root      20   0     0    0    0 S  0.0  0.0  14:25.15
        ksoftirqd/0
        5 root      RT   0     0    0    0 S  0.0  0.0   0:00.10
        migration/0

Best Regards,
Kazunori INOUE

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to