The trafodion installer is ok by checking related classpath.
Check the trafodion.dtm.log, found some error like this, and many core dump in
$TRAF_HOME/sql/scripts
2017-11-29 00:30:01,360 ERROR transactional.TransactionManager: doAbortX
UnknownTransactionException for transaction 1691649 participantNum 6 Location
TRAFODION.TPCC.STOCK,,1498022419057.1fc4d0ba0a5191b0325ea196985616d4.
org.apache.hadoop.hbase.client.transactional.UnknownTransactionException:
java.io.IOException: UnknownTransactionException
at
org.apache.hadoop.hbase.client.transactional.TransactionManager$TransactionManagerCallable.doAbortX(TransactionManager.java:973)
at
org.apache.hadoop.hbase.client.transactional.TransactionManager$10.call(TransactionManager.java:2405)
at
org.apache.hadoop.hbase.client.transactional.TransactionManager$10.call(TransactionManager.java:2403)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
-rw-------. 1 trafodion trafodion 6422528 Jun 21 21:10 core.11978
-rw-------. 1 trafodion trafodion 14561280 Jun 21 21:10 core.11979
-rw-------. 1 trafodion trafodion 14561280 Jun 21 21:10 core.11980
-rw-------. 1 trafodion trafodion 6422528 Jun 21 21:10 core.12130
-rw-------. 1 trafodion trafodion 6422528 Jun 21 21:10 core.12151
-rw-------. 1 trafodion trafodion 147271680 Nov 28 22:16 core.1506
-rw-r--r--. 1 trafodion trafodion 162172000 Nov 28 04:32
core.2017-11-28_04-24-31.ZSM000.6663.mxssmp
-rw-------. 1 trafodion trafodion 2237276160 Jun 21 04:44 core.24494
-rw-------. 1 trafodion trafodion 987738112 Sep 6 02:36 core.3926
-rw-------. 1 trafodion trafodion 986812416 Sep 6 02:36 core.3970
-rw-------. 1 trafodion trafodion 2353975296 Jun 21 09:03 core.51428
-rw-------. 1 trafodion trafodion 111112192 Nov 28 22:17 core.5279
-rw-------. 1 trafodion trafodion 111112192 Nov 28 23:06 core.55161
-rw-------. 1 trafodion trafodion 1552384 Nov 28 22:05 core.5604
-rw-------. 1 trafodion trafodion 111132672 Nov 28 23:11 core.57098
-rw-------. 1 trafodion trafodion 48193536 Nov 28 23:11 core.57646
-rw-------. 1 trafodion trafodion 111112192 Nov 28 23:07 core.58026
-rw-------. 1 trafodion trafodion 885874688 Nov 28 22:51 core.58335
-rw-------. 1 trafodion trafodion 142344192 Nov 28 23:07 core.58551
-rw-------. 1 trafodion trafodion 133554176 Nov 28 22:17 core.6053
-rw-------. 1 trafodion trafodion 111112192 Nov 28 23:07 core.61491
-rw-------. 1 trafodion trafodion 111112192 Nov 28 22:16 core.65350
Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
[email protected]<mailto:[email protected]>
From: Prashanth Vasudev [mailto:[email protected]]
Sent: Wednesday, November 29, 2017 1:25 PM
To: [email protected]; Yuan <[email protected]>
Cc: Eric Owhadi <[email protected]>; Narendra Goyal
<[email protected]>
Subject: RE: DCS is not started
1. Please also check to make sure all steps in the installer completed
successfully.
2. From the shell, please check to see hbase classpath includes trx jars.
$ hbase classpath | grep trx
Prashanth
From: Selva Govindarajan [mailto:[email protected]]
Sent: Tuesday, November 28, 2017 9:21 PM
To:
[email protected]<mailto:[email protected]>;
Yuan <[email protected]<mailto:[email protected]>>
Cc: Eric Owhadi <[email protected]<mailto:[email protected]>>; Narendra
Goyal <[email protected]<mailto:[email protected]>>
Subject: RE: DCS is not started
It looks like the Transaction Manager failed to come up for some reason. The
log directory $TRAF_HOME/logs should have files starting with tm_<nid>.log and
trafodion_dtm.log. These log files might give some clue to the problem.
Unless the Transaction Manager comes up, other processes will not be started.
Also check if there is a core file of TM program. The core file can be found in
the directory pointed by /proc/sys/kernel/core_pattern. If there is no
directory configured, the core file may be found at $TRAF_HOME/sql/scripts
Selva
From: Huang, Jack [mailto:[email protected]]
Sent: Tuesday, November 28, 2017 9:09 PM
To: Yuan <[email protected]<mailto:[email protected]>>;
[email protected]<mailto:[email protected]>
Cc: Eric Owhadi <[email protected]<mailto:[email protected]>>; Narendra
Goyal <[email protected]<mailto:[email protected]>>
Subject: RE: DCS is not started
Only 1 trafodion node but 128G Mem configured for the server. Does it enough ?
Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
[email protected]<mailto:[email protected]>
From: Liu, Yuan (Yuan) [mailto:[email protected]]
Sent: Wednesday, November 29, 2017 1:07 PM
To: Huang, Jack <[email protected]<mailto:[email protected]>>;
[email protected]<mailto:[email protected]>
Cc: Eric Owhadi <[email protected]<mailto:[email protected]>>; Narendra
Goyal <[email protected]<mailto:[email protected]>>
Subject: RE: DCS is not started
How many trafodion nodes do you have? What is the memory of each node? I think
you configured too many mxosrvrs.
Best regards,
Yuan
From: Huang, Jack [mailto:[email protected]]
Sent: Wednesday, November 29, 2017 12:16 PM
To:
[email protected]<mailto:[email protected]>
Cc: Liu, Yuan (Yuan) <[email protected]<mailto:[email protected]>>; Eric Owhadi
<[email protected]<mailto:[email protected]>>; Narendra Goyal
<[email protected]<mailto:[email protected]>>
Subject: RE: DCS is not started
Sign! ckillall and sqstart , several mintues after, the trafodion env is still
down!
[trafodion@trafodion logs]$ sqcheck
*** Checking Trafodion Environment ***
Checking if processes are up.
Checking attempt: 1; user specified max: 2. Execution time in seconds: 3.
The Trafodion environment is not up at all, or partially up and not
operational. Check the logs.
Process Configured Actual Down
------- ---------- ------ ----
DTM 2 0 \$TM0 \$TM1
RMS 4 0 \$ZSC000 \$ZSC001 \$ZSM000 \$ZSM001
DcsMaster 1 1
DcsServer 1 0 1
mxosrvr 100 0 100
RestServer 1 1
Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
[email protected]<mailto:[email protected]>
From: Huang, Jack
Sent: Wednesday, November 29, 2017 10:13 AM
To: '[email protected]'
<[email protected]<mailto:[email protected]>>
Cc: 'Liu, Yuan (Yuan)' <[email protected]<mailto:[email protected]>>; 'Eric
Owhadi' <[email protected]<mailto:[email protected]>>; 'Narendra Goyal'
<[email protected]<mailto:[email protected]>>
Subject: RE: DCS is not started
Thanks all. ckillall/sqstart is working now.
Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
[email protected]<mailto:[email protected]>
From: Liu, Yuan (Yuan) [mailto:[email protected]]
Sent: Wednesday, November 29, 2017 10:07 AM
To:
[email protected]<mailto:[email protected]>
Subject: RE: DCS is not started
Please use cstat to check if any process existed. If yes, then use ckillall to
kill all process and then run cstat again.
Best regards,
Yuan
From: Narendra Goyal [mailto:[email protected]]
Sent: Wednesday, November 29, 2017 10:05 AM
To:
[email protected]<mailto:[email protected]>
Subject: RE: DCS is not started
Hi Jack,
Please try:
- ckillall
o this should kill all the orphan processes in the environment
- sqstart
-Narendra
From: Huang, Jack [mailto:[email protected]]
Sent: Tuesday, November 28, 2017 6:03 PM
To:
[email protected]<mailto:[email protected]>
Subject: DCS is not started
Hi,
My trafodion env is down, how can I recover the trafodion environment?
[trafodion@trafodion ~]$ sqcheck
*** Checking Trafodion Environment ***
Checking if processes are up.
Checking attempt: 1; user specified max: 2. Execution time in seconds: 0.
The Trafodion environment is not up at all, or partially up and not
operational. Check the logs.
Process Configured Actual Down
------- ---------- ------ ----
DTM 2 0 \$TM0 \$TM1
RMS 4 0 \$ZSC000 \$ZSC001 \$ZSM000 \$ZSM001
DcsMaster 1 0 1
DcsServer 1 0 1
mxosrvr 100 0 100
RestServer 1 1
The Trafodion environment is down.
[trafodion@trafodion ~]$ dcsstart
*** Checking Trafodion Environment ***
Checking if processes are up.
Checking attempt: 1; user specified max: 1. Execution time in seconds: 0.
The Trafodion environment is not up at all, or partially up and not
operational. Check the logs.
Process Configured Actual Down
------- ---------- ------ ----
DTM 2 0 \$TM0 \$TM1
RMS 4 0 \$ZSC000 \$ZSC001 \$ZSM000 \$ZSM001
DcsMaster 1 0 1
DcsServer 1 0 1
mxosrvr 100 0 100
RestServer 1 1
The Trafodion environment is down.
DCS is not started. Please start Trafodion ...
[trafodion@trafodion ~]$ sqstart
Checking orphan processes: 3.
There are orphan processes from a previous SQ instance.
uid pid ppid wchan rss vsz time stat cmd
--- --- ---- ----- --- --- ---- ---- ---
trafodion 5952 1 hrtime 39412 402572 00:18:09 Ssl
/home/trafodion/apache-trafodion-2.1.0/export/bin64/monitor COLD
trafodion 5953 1 hrtime 39192 402568 00:14:04 Ssl
/home/trafodion/apache-trafodion-2.1.0/export/bin64/monitor COLD
trafodion 5938 1 poll_s 1440 21232 00:00:00 S mpirun
-disable-auto-cleanup -demux select -env SQ_IC TCP -env MPI_ERROR_LEVEL 2 -env
SQ_PIDMAP 1 -env MPI_TMPDIR /home/trafodion/apache-trafodion-2.1.0/tmp -env
TRAF_HOME /home/trafodion/apache-trafodion-2.1.0 -np 2
/home/trafodion/apache-trafodion-2.1.0/export/bin64/monitor COLD
trafodion 11720 6953 wait 1780 106556 00:00:00 S+ /bin/bash
/home/trafodion/apache-trafodion-2.1.0/sql/scripts/sqstart
[trafodion@trafodion ~]$ sqstop
SQ environment is not up.
Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
[email protected]<mailto:[email protected]>