Find the first core file produced when you attempted to do sqstart. It can be done using
ls -ltr core.* And compare the timestamp of the core file with the time when sqstart was issued. Then issue, file <core_file> to find the program that produced the core file. gdb <program_file> <core_file> bt If gdb not installed on the cluster, you might need to install gdb via yum install gdb. Selva From: Huang, Jack [mailto:jack.hu...@dell.com] Sent: Tuesday, November 28, 2017 9:34 PM To: user@trafodion.incubator.apache.org; Yuan <yuan....@esgyn.cn> Cc: Eric Owhadi <eric.owh...@esgyn.com>; Narendra Goyal <narendra.go...@esgyn.com> Subject: RE: DCS is not started The trafodion installer is ok by checking related classpath. Check the trafodion.dtm.log, found some error like this, and many core dump in $TRAF_HOME/sql/scripts 2017-11-29 00:30:01,360 ERROR transactional.TransactionManager: doAbortX UnknownTransactionException for transaction 1691649 participantNum 6 Location TRAFODION.TPCC.STOCK,,1498022419057.1fc4d0ba0a5191b0325ea196985616d4. org.apache.hadoop.hbase.client.transactional.UnknownTransactionException: java.io.IOException: UnknownTransactionException at org.apache.hadoop.hbase.client.transactional.TransactionManager$TransactionManagerCallable.doAbortX(TransactionManager.java:973) at org.apache.hadoop.hbase.client.transactional.TransactionManager$10.call(TransactionManager.java:2405) at org.apache.hadoop.hbase.client.transactional.TransactionManager$10.call(TransactionManager.java:2403) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -rw-------. 1 trafodion trafodion 6422528 Jun 21 21:10 core.11978 -rw-------. 1 trafodion trafodion 14561280 Jun 21 21:10 core.11979 -rw-------. 1 trafodion trafodion 14561280 Jun 21 21:10 core.11980 -rw-------. 1 trafodion trafodion 6422528 Jun 21 21:10 core.12130 -rw-------. 1 trafodion trafodion 6422528 Jun 21 21:10 core.12151 -rw-------. 1 trafodion trafodion 147271680 Nov 28 22:16 core.1506 -rw-r--r--. 1 trafodion trafodion 162172000 Nov 28 04:32 core.2017-11-28_04-24-31.ZSM000.6663.mxssmp -rw-------. 1 trafodion trafodion 2237276160 Jun 21 04:44 core.24494 -rw-------. 1 trafodion trafodion 987738112 Sep 6 02:36 core.3926 -rw-------. 1 trafodion trafodion 986812416 Sep 6 02:36 core.3970 -rw-------. 1 trafodion trafodion 2353975296 Jun 21 09:03 core.51428 -rw-------. 1 trafodion trafodion 111112192 Nov 28 22:17 core.5279 -rw-------. 1 trafodion trafodion 111112192 Nov 28 23:06 core.55161 -rw-------. 1 trafodion trafodion 1552384 Nov 28 22:05 core.5604 -rw-------. 1 trafodion trafodion 111132672 Nov 28 23:11 core.57098 -rw-------. 1 trafodion trafodion 48193536 Nov 28 23:11 core.57646 -rw-------. 1 trafodion trafodion 111112192 Nov 28 23:07 core.58026 -rw-------. 1 trafodion trafodion 885874688 Nov 28 22:51 core.58335 -rw-------. 1 trafodion trafodion 142344192 Nov 28 23:07 core.58551 -rw-------. 1 trafodion trafodion 133554176 Nov 28 22:17 core.6053 -rw-------. 1 trafodion trafodion 111112192 Nov 28 23:07 core.61491 -rw-------. 1 trafodion trafodion 111112192 Nov 28 22:16 core.65350 Jack Huang Dell EMC | CTD MRES Cyclone Group mobile +86-13880577652<tel:+86-13880577652> jack.hu...@dell.com<mailto:jack.hu...@dell.com> From: Prashanth Vasudev [mailto:prashanth.vasu...@esgyn.com] Sent: Wednesday, November 29, 2017 1:25 PM To: user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>; Yuan <yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>> Cc: Eric Owhadi <eric.owh...@esgyn.com<mailto:eric.owh...@esgyn.com>>; Narendra Goyal <narendra.go...@esgyn.com<mailto:narendra.go...@esgyn.com>> Subject: RE: DCS is not started 1. Please also check to make sure all steps in the installer completed successfully. 2. From the shell, please check to see hbase classpath includes trx jars. $ hbase classpath | grep trx Prashanth From: Selva Govindarajan [mailto:selva.govindara...@esgyn.com] Sent: Tuesday, November 28, 2017 9:21 PM To: user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>; Yuan <yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>> Cc: Eric Owhadi <eric.owh...@esgyn.com<mailto:eric.owh...@esgyn.com>>; Narendra Goyal <narendra.go...@esgyn.com<mailto:narendra.go...@esgyn.com>> Subject: RE: DCS is not started It looks like the Transaction Manager failed to come up for some reason. The log directory $TRAF_HOME/logs should have files starting with tm_<nid>.log and trafodion_dtm.log. These log files might give some clue to the problem. Unless the Transaction Manager comes up, other processes will not be started. Also check if there is a core file of TM program. The core file can be found in the directory pointed by /proc/sys/kernel/core_pattern. If there is no directory configured, the core file may be found at $TRAF_HOME/sql/scripts Selva From: Huang, Jack [mailto:jack.hu...@dell.com] Sent: Tuesday, November 28, 2017 9:09 PM To: Yuan <yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>>; user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org> Cc: Eric Owhadi <eric.owh...@esgyn.com<mailto:eric.owh...@esgyn.com>>; Narendra Goyal <narendra.go...@esgyn.com<mailto:narendra.go...@esgyn.com>> Subject: RE: DCS is not started Only 1 trafodion node but 128G Mem configured for the server. Does it enough ? Jack Huang Dell EMC | CTD MRES Cyclone Group mobile +86-13880577652<tel:+86-13880577652> jack.hu...@dell.com<mailto:jack.hu...@dell.com> From: Liu, Yuan (Yuan) [mailto:yuan....@esgyn.cn] Sent: Wednesday, November 29, 2017 1:07 PM To: Huang, Jack <jack.hu...@emc.com<mailto:jack.hu...@emc.com>>; user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org> Cc: Eric Owhadi <eric.owh...@esgyn.com<mailto:eric.owh...@esgyn.com>>; Narendra Goyal <narendra.go...@esgyn.com<mailto:narendra.go...@esgyn.com>> Subject: RE: DCS is not started How many trafodion nodes do you have? What is the memory of each node? I think you configured too many mxosrvrs. Best regards, Yuan From: Huang, Jack [mailto:jack.hu...@dell.com] Sent: Wednesday, November 29, 2017 12:16 PM To: user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org> Cc: Liu, Yuan (Yuan) <yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>>; Eric Owhadi <eric.owh...@esgyn.com<mailto:eric.owh...@esgyn.com>>; Narendra Goyal <narendra.go...@esgyn.com<mailto:narendra.go...@esgyn.com>> Subject: RE: DCS is not started Sign! ckillall and sqstart , several mintues after, the trafodion env is still down! [trafodion@trafodion logs]$ sqcheck *** Checking Trafodion Environment *** Checking if processes are up. Checking attempt: 1; user specified max: 2. Execution time in seconds: 3. The Trafodion environment is not up at all, or partially up and not operational. Check the logs. Process Configured Actual Down ------- ---------- ------ ---- DTM 2 0 \$TM0 \$TM1 RMS 4 0 \$ZSC000 \$ZSC001 \$ZSM000 \$ZSM001 DcsMaster 1 1 DcsServer 1 0 1 mxosrvr 100 0 100 RestServer 1 1 Jack Huang Dell EMC | CTD MRES Cyclone Group mobile +86-13880577652<tel:+86-13880577652> jack.hu...@dell.com<mailto:jack.hu...@dell.com> From: Huang, Jack Sent: Wednesday, November 29, 2017 10:13 AM To: 'user@trafodion.incubator.apache.org' <user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>> Cc: 'Liu, Yuan (Yuan)' <yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>>; 'Eric Owhadi' <eric.owh...@esgyn.com<mailto:eric.owh...@esgyn.com>>; 'Narendra Goyal' <narendra.go...@esgyn.com<mailto:narendra.go...@esgyn.com>> Subject: RE: DCS is not started Thanks all. ckillall/sqstart is working now. Jack Huang Dell EMC | CTD MRES Cyclone Group mobile +86-13880577652<tel:+86-13880577652> jack.hu...@dell.com<mailto:jack.hu...@dell.com> From: Liu, Yuan (Yuan) [mailto:yuan....@esgyn.cn] Sent: Wednesday, November 29, 2017 10:07 AM To: user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org> Subject: RE: DCS is not started Please use cstat to check if any process existed. If yes, then use ckillall to kill all process and then run cstat again. Best regards, Yuan From: Narendra Goyal [mailto:narendra.go...@esgyn.com] Sent: Wednesday, November 29, 2017 10:05 AM To: user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org> Subject: RE: DCS is not started Hi Jack, Please try: * ckillall * this should kill all the orphan processes in the environment * sqstart -Narendra From: Huang, Jack [mailto:jack.hu...@dell.com] Sent: Tuesday, November 28, 2017 6:03 PM To: user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org> Subject: DCS is not started Hi, My trafodion env is down, how can I recover the trafodion environment? [trafodion@trafodion ~]$ sqcheck *** Checking Trafodion Environment *** Checking if processes are up. Checking attempt: 1; user specified max: 2. Execution time in seconds: 0. The Trafodion environment is not up at all, or partially up and not operational. Check the logs. Process Configured Actual Down ------- ---------- ------ ---- DTM 2 0 \$TM0 \$TM1 RMS 4 0 \$ZSC000 \$ZSC001 \$ZSM000 \$ZSM001 DcsMaster 1 0 1 DcsServer 1 0 1 mxosrvr 100 0 100 RestServer 1 1 The Trafodion environment is down. [trafodion@trafodion ~]$ dcsstart *** Checking Trafodion Environment *** Checking if processes are up. Checking attempt: 1; user specified max: 1. Execution time in seconds: 0. The Trafodion environment is not up at all, or partially up and not operational. Check the logs. Process Configured Actual Down ------- ---------- ------ ---- DTM 2 0 \$TM0 \$TM1 RMS 4 0 \$ZSC000 \$ZSC001 \$ZSM000 \$ZSM001 DcsMaster 1 0 1 DcsServer 1 0 1 mxosrvr 100 0 100 RestServer 1 1 The Trafodion environment is down. DCS is not started. Please start Trafodion ... [trafodion@trafodion ~]$ sqstart Checking orphan processes: 3. There are orphan processes from a previous SQ instance. uid pid ppid wchan rss vsz time stat cmd --- --- ---- ----- --- --- ---- ---- --- trafodion 5952 1 hrtime 39412 402572 00:18:09 Ssl /home/trafodion/apache-trafodion-2.1.0/export/bin64/monitor COLD trafodion 5953 1 hrtime 39192 402568 00:14:04 Ssl /home/trafodion/apache-trafodion-2.1.0/export/bin64/monitor COLD trafodion 5938 1 poll_s 1440 21232 00:00:00 S mpirun -disable-auto-cleanup -demux select -env SQ_IC TCP -env MPI_ERROR_LEVEL 2 -env SQ_PIDMAP 1 -env MPI_TMPDIR /home/trafodion/apache-trafodion-2.1.0/tmp -env TRAF_HOME /home/trafodion/apache-trafodion-2.1.0 -np 2 /home/trafodion/apache-trafodion-2.1.0/export/bin64/monitor COLD trafodion 11720 6953 wait 1780 106556 00:00:00 S+ /bin/bash /home/trafodion/apache-trafodion-2.1.0/sql/scripts/sqstart [trafodion@trafodion ~]$ sqstop SQ environment is not up. Jack Huang Dell EMC | CTD MRES Cyclone Group mobile +86-13880577652<tel:+86-13880577652> jack.hu...@dell.com<mailto:jack.hu...@dell.com>