Re: [Pacemaker] Problems with jboss on pacemaker

NAKAHIRA Kazutomo Mon, 09 May 2011 19:24:08 -0700

Hi, Benjamin

primitive p_jboss_ocf ocf:heartbeat:jboss \
        params java_home="/usr/lib64/jvm/java"
jboss_home="/usr/share/jboss" jboss_pstring="java -Dprogram.name=run.sh"
jboss_stop_timeout="30" \
        op start interval="0" timeout="240s" \
        op stop interval="0" timeout="240s" \
        op monitor interval="20s"


params "jboss_pstring" is properly "pstring".
# crm command should refuse above configuration.

A "pstring" should be configured according to the jboss process name
that is highly depend on your environment.

Maybe your JBOSS_HOME directory is "/data/jboss-4.2.2.GA"?
If so, params "jboss_home" should be "/data/jboss-4.2.2.GA".

Parameters "pstring", "jboss_home" and "java_home" are
highly depend on your environment.

And params statusurl should be change from "http://127.0.0.1:8080";
to your wanted health-check URL.


In may test environment(RHEL5.4), following configurations works fine.

--- crm sample configuration ---
node $id="9a7e6694-20bc-4cac-b371-23c1ef8de652" vm1
node $id="e49f96b7-b49c-4e20-8c57-1ebd585eff9c" vm2
primitive jboss-1 ocf:heartbeat:jboss \
        op start interval="0" timeout="120s" on-fail="restart" \
        op monitor interval="10s" timeout="60s" on-fail="restart" \
        op stop interval="0" timeout="120s" on-fail="fence" \

params jboss_home="/usr/share/jboss"java_home="/usr/java/jdk1.6.0_24/" pstring="java -Dprogram.name=run.sh"

property $id="cib-bootstrap-options" \
        dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \
        cluster-infrastructure="Heartbeat" \
        expected-quorum-votes="2" \
        no-quorum-policy="ignore" \
        stonith-enabled="false" \
        default-resource-stickiness="INFINITY" \
        default-action-timeout="120s" \
        last-lrm-refresh="1304934304"
rsc_defaults $id="rsc-options" \
        resource-stickiness="INFINITY" \
        migration-threshold="1"
--- crm sample configuration ---

[root@VM1 ~]# crm_mon -rfA1
============
Last updated: Tue May 10 10:29:56 2011
Stack: Heartbeat

Current DC: vm1 (9a7e6694-20bc-4cac-b371-23c1ef8de652) - partition withquorum

Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ vm1 ]
OFFLINE: [ vm2 ]

Full list of resources:

 jboss-1        (ocf::heartbeat:jboss): Started vm1

Node Attributes:
* Node vm1:

Migration summary:
* Node vm1:

[root@VM1 ~]# ps -ef | grep jboss | grep -v grep

root 24831 1 0 09:55 ? 00:00:00 /bin/sh/usr/share/jboss/bin/run.sh -c default -l lpg4jroot 24857 24831 0 09:55 ? 00:00:13/usr/java/jdk1.6.0_24//bin/java -Dprogram.name=run.sh -server -Xms128m-Xmx512m -Dsun.rmi.dgc.client.gcInterval=3600000-Dsun.rmi.dgc.server.gcInterval=3600000 -Djava.net.preferIPv4Stack=true-Djava.endorsed.dirs=/usr/share/jboss/lib/endorsed -classpath/usr/share/jboss/bin/run.jar:/usr/java/jdk1.6.0_24//lib/tools.jarorg.jboss.Main -c default -l lpg4j



My tested JBoss version is jboss-4.2.2.GA.
I download jboss-4.2.2.GA.zip from following URL and unzip JBoss files
and mv "jboss-4.2.2.GA" directory to "/usr/share/jboss".

 http://sourceforge.net/projects/jboss/files/JBoss/JBoss-4.2.2.GA/

And I don't change any configuration of JBoss(Default).

Best Regards,
NAKAHIRA Kazutomo

(2011/05/06 16:32), Dejan Muhamedagic wrote:

On Thu, May 05, 2011 at 06:39:09PM +0200, Benjamin Knoth wrote:

Hi

Am 05.05.2011 16:35, schrieb Dejan Muhamedagic:

On Thu, May 05, 2011 at 12:26:57PM +0200, Benjamin Knoth wrote:

Hi again,

i copied the jboss ocf and modified the variables, that the script use
my variables ifi start it. Now if i start the ocf script i get the
following everytime.

./jboss-test start
jboss-test[6165]: DEBUG: [jboss] Enter jboss start
jboss-test[6165]: DEBUG: start_jboss[jboss]: retry monitor_jboss
jboss-test[6165]: DEBUG: start_jboss[jboss]: retry monitor_jboss
jboss-test[6165]: DEBUG: start_jboss[jboss]: retry monitor_jboss

Something is wrong.


Typically, the start operation includes a monitor at the end to
make sure that the resource really started. In this case it
looks like the monitor repeatedly fails. You should check the
monitor operation. Take a look at the output of "crm ra info
jboss" for parameters which have effect on monitoring. BTW, you
can test your resource without cluster using ocf-tester.


I don't find the ocf tester or i don't know how to use them.
The log of jboss says that jboss will started, but it can't deploy some
packages, with the ocf script. The most important is:

18:18:16,654 ERROR [MainDeployer] Could not start deployment:
file:/data/jboss-4.2.2.GA/server/default/tmp/deploy/tmp8457743723406154025escidoc-core.ear-contents/escidoc-core.war
org.jboss.deployment.DeploymentException: URL
file:/data/jboss-4.2.2.GA/server/default/tmp/deploy/tmp8457743723406154025escidoc-core.ear-contents/escidoc-core-exp.war/
deployment failed

--- Incompletely deployed packages ---
org.jboss.deployment.DeploymentInfo@844a3a10 {
url=file:/data/jboss-4.2.2.GA/server/default/deploy/escidoc-core.ear }
   deployer: org.jboss.deployment.EARDeployer@40f940f9
   status: Deployment FAILED reason: URL
file:/data/jboss-4.2.2.GA/server/default/tmp/deploy/tmp8457743723406154025escidoc-core.ear-contents/escidoc-core-exp.war/
deployment failed
   state: FAILED
   watch: file:/data/jboss-4.2.2.GA/server/default/deploy/escidoc-core.ear
   altDD: null
   lastDeployed: 1304612289701
   lastModified: 1304612278000
   mbeans:

After 4 minutes Jboss will shutdown from pacemaker.

If i run the init-script normal it runs fine and all important packages
will deploy.

I checked the differnce between processes on start bei init-script and
ocf-script from pacemaker

pacemaker

root     20074  0.0  0.0  12840  1792 ?        S    17:56   0:00 /bin/sh
/usr/lib/ocf/resource.d//heartbeat/jboss start
root     20079  0.0  0.0  48336  1368 ?        S    17:56   0:00 su -
jboss -s /bin/bash -c export JAVA_HOME=/usr/lib64/jvm/java;\n?
                   export JBOSS_HOME=/usr/share/jboss;\n?
             /usr/share/jboss/bin/run.sh -c default
-Djboss.bind.address=0.0.0.0

init-script

root     20079  0.0  0.0  48336  1368 ?        S    17:56   0:00 su
jboss -s /bin/bash -c /usr/share/jboss/bin/run.sh -c default
-Djboss.bind.address=0.0.0.0


No idea. Perhaps somebody using jboss here can take a look. Or
you could experiment a bit to find out which part makes the
difference. Apart from the two exported vars, the rest of the
command line is the same. In addition the OCF RA does 'su -'.

Thanks,

Dejan

Cheers

Benjamin


Thanks,

Dejan

Cheers
Benjamin

Am 05.05.2011 12:03, schrieb Benjamin Knoth:

Hi,

Am 05.05.2011 11:46, schrieb Dejan Muhamedagic:

On Wed, May 04, 2011 at 03:44:02PM +0200, Benjamin Knoth wrote:



Am 04.05.2011 13:18, schrieb Benjamin Knoth:

Hi,

Am 04.05.2011 12:18, schrieb Dejan Muhamedagic:

Hi,

On Wed, May 04, 2011 at 10:37:40AM +0200, Benjamin Knoth wrote:


Am 04.05.2011 09:42, schrieb Florian Haas:

On 05/04/2011 09:31 AM, Benjamin Knoth wrote:

Hi Florian,
i test  it with ocf, but i couldn't run.


Well that's really helpful information. Logs? Error messages? Anything?


Logs

May  4 09:55:10 vm36 lrmd: [19214]: WARN: p_jboss_ocf:start process (PID
27702) timed out (try 1).  Killing with signal SIGTERM (15).

You need to set/increase the timeout for the start operation to
match the maximum expected start time. Take a look at "crm ra
info jboss" for minimum values.


May  4 09:55:10 vm36 attrd: [19215]: info: find_hash_entry: Creating
hash entry for fail-count-p_jboss_ocf
May  4 09:55:10 vm36 lrmd: [19214]: WARN: operation start[342] on
ocf::jboss::p_jboss_ocf for client 19217, its parameters:
CRM_meta_name=[start] crm_feature_set=[3.0.1]
java_home=[/usr/lib64/jvm/java] CRM_meta_timeout=[240000] jboss_sto
p_timeout=[30] jboss_home=[/usr/share/jboss] jboss_pstring=[java
-Dprogram.name=run.sh] : pid [27702] timed out
May  4 09:55:10 vm36 attrd: [19215]: info: attrd_trigger_update: Sending
flush op to all hosts for: fail-count-p_jboss_ocf (INFINITY)
May  4 09:55:10 vm36 crmd: [19217]: WARN: status_from_rc: Action 64
(p_jboss_ocf_start_0) on vm36 failed (target: 0 vs. rc: -2): Error
May  4 09:55:10 vm36 lrmd: [19214]: info: rsc:p_jboss_ocf:346: stop
May  4 09:55:10 vm36 attrd: [19215]: info: attrd_perform_update: Sent
update 2294: fail-count-p_jboss_ocf=INFINITY
May  4 09:55:10 vm36 pengine: [19216]: notice: unpack_rsc_op: Hard error
- p_jboss_lsb_monitor_0 failed with rc=5: Preventing p_jboss_lsb from
re-starting on vm36
May  4 09:55:10 vm36 crmd: [19217]: WARN: update_failcount: Updating
failcount for p_jboss_ocf on vm36 after failed start: rc=-2
(update=INFINITY, time=1304495710)
May  4 09:55:10 vm36 attrd: [19215]: info: find_hash_entry: Creating
hash entry for last-failure-p_jboss_ocf
May  4 09:55:10 vm36 pengine: [19216]: notice: unpack_rsc_op: Operation
p_jboss_cs_monitor_0 found resource p_jboss_cs active on vm36
May  4 09:55:10 vm36 crmd: [19217]: info: abort_transition_graph:
match_graph_event:272 - Triggered transition abort (complete=0,
tag=lrm_rsc_op, id=p_jboss_ocf_start_0,
magic=2:-2;64:1375:0:fc16910d-2fe9-4daa-834a-348a4c7645ef, cib=0.53
5.2) : Event failed
May  4 09:55:10 vm36 attrd: [19215]: info: attrd_trigger_update: Sending
flush op to all hosts for: last-failure-p_jboss_ocf (1304495710)
May  4 09:55:10 vm36 pengine: [19216]: notice: unpack_rsc_op: Hard error
- p_jboss_init_monitor_0 failed with rc=5: Preventing p_jboss_init from
re-starting on vm36
May  4 09:55:10 vm36 crmd: [19217]: info: match_graph_event: Action
p_jboss_ocf_start_0 (64) confirmed on vm36 (rc=4)
May  4 09:55:10 vm36 attrd: [19215]: info: attrd_perform_update: Sent
update 2297: last-failure-p_jboss_ocf=1304495710
May  4 09:55:10 vm36 pengine: [19216]: WARN: unpack_rsc_op: Processing
failed op p_jboss_ocf_start_0 on vm36: unknown exec error (-2)
May  4 09:55:10 vm36 crmd: [19217]: info: te_rsc_command: Initiating
action 1: stop p_jboss_ocf_stop_0 on vm36 (local)
May  4 09:55:10 vm36 pengine: [19216]: notice: unpack_rsc_op: Operation
p_jboss_ocf_monitor_0 found resource p_jboss_ocf active on vm37
May  4 09:55:10 vm36 crmd: [19217]: info: do_lrm_rsc_op: Performing
key=1:1376:0:fc16910d-2fe9-4daa-834a-348a4c7645ef op=p_jboss_ocf_stop_0 )
May  4 09:55:10 vm36 pengine: [19216]: notice: native_print: p_jboss_ocf
        (ocf::heartbeat:jboss): Stopped
May  4 09:55:10 vm36 pengine: [19216]: info: get_failcount: p_jboss_ocf
has failed INFINITY times on vm36
May  4 09:55:10 vm36 pengine: [19216]: WARN: common_apply_stickiness:
Forcing p_jboss_ocf away from vm36 after 1000000 failures (max=1000000)
May  4 09:59:10 vm36 pengine: [19216]: info: unpack_config: Node scores:
'red' = -INFINITY, 'yellow' = 0, 'green' = 0
May  4 09:59:10 vm36 crmd: [19217]: WARN: status_from_rc: Action 50
(p_jboss_ocf_start_0) on vm37 failed (target: 0 vs. rc: -2): Error
May  4 09:59:10 vm36 pengine: [19216]: info: determine_online_status:
Node vm36 is online
May  4 09:59:10 vm36 crmd: [19217]: WARN: update_failcount: Updating
failcount for p_jboss_ocf on vm37 after failed start: rc=-2
(update=INFINITY, time=1304495950)
May  4 09:59:10 vm36 pengine: [19216]: notice: unpack_rsc_op: Hard error
- p_jboss_lsb_monitor_0 failed with rc=5: Preventing p_jboss_lsb from
re-starting on vm36
May  4 09:59:10 vm36 crmd: [19217]: info: abort_transition_graph:
match_graph_event:272 - Triggered transition abort (complete=0,
tag=lrm_rsc_op, id=p_jboss_ocf_start_0,
magic=2:-2;50:1377:0:fc16910d-2fe9-4daa-834a-348a4c7645ef, cib=0.53
5.12) : Event failed
May  4 09:59:10 vm36 pengine: [19216]: notice: unpack_rsc_op: Operation
p_jboss_cs_monitor_0 found resource p_jboss_cs active on vm36
May  4 09:59:10 vm36 crmd: [19217]: info: match_graph_event: Action
p_jboss_ocf_start_0 (50) confirmed on vm37 (rc=4)
May  4 09:59:10 vm36 pengine: [19216]: notice: native_print: p_jboss_ocf
        (ocf::heartbeat:jboss): Stopped
May  4 09:59:10 vm36 pengine: [19216]: info: get_failcount: p_jboss_ocf
has failed INFINITY times on vm37
May  4 09:59:10 vm36 pengine: [19216]: WARN: common_apply_stickiness:
Forcing p_jboss_ocf away from vm37 after 1000000 failures (max=1000000)
May  4 09:59:10 vm36 pengine: [19216]: info: get_failcount: p_jboss_ocf
has failed INFINITY times on vm36
May  4 09:59:10 vm36 pengine: [19216]: info: native_color: Resource
p_jboss_ocf cannot run anywhere
May  4 09:59:10 vm36 pengine: [19216]: notice: LogActions: Leave
resource p_jboss_ocf   (Stopped)
May  4 09:59:31 vm36 pengine: [19216]: notice: native_print: p_jboss_ocf
        (ocf::heartbeat:jboss): Stopped
....

Now i don't know how can i reset the resource p_jboss_ocf to test it again.

crm resource cleanup p_jboss_ocf


That's the now way, but if i start this command on shell or crm shell in
both i get Cleaning up p_jboss_ocf on vm37
Cleaning up p_jboss_ocf on vm36

But if i look on the monitoring with crm_mon -1 i getevery time

Failed actions:
p_jboss_ocf_start_0 (node=vm36, call=-1, rc=1, status=Timed Out):
unknown error
     p_jboss_monitor_0 (node=vm37, call=205, rc=5, status=complete): not
installed
     p_jboss_ocf_start_0 (node=vm37, call=281, rc=-2, status=Timed Out):
unknown exec error

p_jboss was deleted in the config yesterday.


For demonstration:

15:34:22 ~ # crm_mon -1

Failed actions:
     p_jboss_ocf_start_0 (node=vm36, call=376, rc=-2, status=Timed Out):
unknown exec error
     p_jboss_monitor_0 (node=vm37, call=205, rc=5, status=complete): not
installed
     p_jboss_ocf_start_0 (node=vm37, call=283, rc=-2, status=Timed Out):
unknown exec error

15:35:02 ~ # crm resource cleanup p_jboss_ocf
INFO: no curses support: you won't see colors
Cleaning up p_jboss_ocf on vm37
Cleaning up p_jboss_ocf on vm36

15:39:12 ~ # crm resource cleanup p_jboss
INFO: no curses support: you won't see colors
Cleaning up p_jboss on vm37
Cleaning up p_jboss on vm36

15:39:19 ~ # crm_mon -1

Failed actions:
     p_jboss_ocf_start_0 (node=vm36, call=376, rc=-2, status=Timed Out):
unknown exec error
     p_jboss_monitor_0 (node=vm37, call=205, rc=5, status=complete): not
installed
     p_jboss_ocf_start_0 (node=vm37, call=283, rc=-2, status=Timed Out):
unknown exec error


Strange, after i edit the config all other Failed actions are deleted
only this Failed actions will be displayed.

Failed actions:
     p_jboss_ocf_start_0 (node=vm36, call=380, rc=-2, status=Timed Out):
unknown exec error
     p_jboss_ocf_start_0 (node=vm37, call=287, rc=-2, status=Timed Out):
unknown exec error


Strange, perhaps you ran into a bug here. You can open a bugzilla
with hb_report.

Anyway, you should fix the timeout issue.


I know but what sould i do to resolve this issue.

my config entry for jboss is:

primitive p_jboss_ocf ocf:heartbeat:jboss \
         params java_home="/usr/lib64/jvm/java"
jboss_home="/usr/share/jboss" jboss_pstring="java -Dprogram.name=run.sh"
jboss_stop_timeout="30" \
         op start interval="0" timeout="240s" \
         op stop interval="0" timeout="240s" \
         op monitor interval="20s"

In worst case jboss needs max 120s and that's really the worst.

Cheers,
Benjamin


Thanks,

Dejan


And after some tests i have some not  more existing resouces in the
Failed actions list. How can i delete them?

The same way.

Thanks,

Dejan


Thx

Benjamin


Florian




_______________________________________________



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


--
Benjamin Knoth
Max Planck Digital Library (MPDL)
Systemadministration
Amalienstrasse 33
80799 Munich, Germany
http://www.mpdl.mpg.de

Mail: kn...@mpdl.mpg.de
Phone:  +49 89 38602 202
Fax:    +49-89-38602-280

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


--
Benjamin Knoth
Max Planck Digital Library (MPDL)
Systemadministration
Amalienstrasse 33
80799 Munich, Germany
http://www.mpdl.mpg.de

Mail: kn...@mpdl.mpg.de
Phone:  +49 89 38602 202
Fax:    +49-89-38602-280

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


--
NAKAHIRA Kazutomo
Infrastructure Software Technology Unit
NTT Open Source Software Center

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Problems with jboss on pacemaker

Reply via email to