Please check the hbase-site.xml config file is correct on your node, if you are 
using CDH/HDP, the default xml file is located in /etc/hbase/conf/hbase-site.xml


Thanks,
Eason


From: "Huang, Jack" <jack.hu...@dell.com>
Reply-To: "user@trafodion.incubator.apache.org" 
<user@trafodion.incubator.apache.org>
Date: Wednesday, 29 November 2017 at 13:48
To: "user@trafodion.incubator.apache.org" 
<user@trafodion.incubator.apache.org>, Yuan Liu <yuan....@esgyn.cn>
Cc: Eric Owhadi <eric.owh...@esgyn.com>, Narendra Goyal 
<narendra.go...@esgyn.com>
Subject: RE: DCS is not started

Very useful, Thanks.
I will follow the instructions to triage it.


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.hu...@dell.com<mailto:jack.hu...@dell.com>




From: Selva Govindarajan [mailto:selva.govindara...@esgyn.com]
Sent: Wednesday, November 29, 2017 1:41 PM
To: user@trafodion.incubator.apache.org; Yuan <yuan....@esgyn.cn>
Cc: Eric Owhadi <eric.owh...@esgyn.com>; Narendra Goyal 
<narendra.go...@esgyn.com>
Subject: RE: DCS is not started

Find the first core file produced when you attempted to do sqstart. It can be 
done using

ls -ltr core.*

And compare the timestamp of the core file with the time when sqstart was 
issued.

Then issue,

file <core_file>

to find the program that produced the core file.

gdb <program_file> <core_file>
bt

If gdb not installed on the cluster,  you might need to install gdb via yum 
install gdb.

Selva

From: Huang, Jack [mailto:jack.hu...@dell.com]
Sent: Tuesday, November 28, 2017 9:34 PM
To: 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>;
 Yuan <yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>>
Cc: Eric Owhadi <eric.owh...@esgyn.com<mailto:eric.owh...@esgyn.com>>; Narendra 
Goyal <narendra.go...@esgyn.com<mailto:narendra.go...@esgyn.com>>
Subject: RE: DCS is not started

The trafodion installer is ok by checking related classpath.

Check the trafodion.dtm.log, found some error like this, and many core dump in 
$TRAF_HOME/sql/scripts

2017-11-29 00:30:01,360 ERROR transactional.TransactionManager: doAbortX 
UnknownTransactionException for transaction 1691649 participantNum 6 Location 
TRAFODION.TPCC.STOCK,,1498022419057.1fc4d0ba0a5191b0325ea196985616d4.
org.apache.hadoop.hbase.client.transactional.UnknownTransactionException: 
java.io.IOException: UnknownTransactionException
        at 
org.apache.hadoop.hbase.client.transactional.TransactionManager$TransactionManagerCallable.doAbortX(TransactionManager.java:973)
        at 
org.apache.hadoop.hbase.client.transactional.TransactionManager$10.call(TransactionManager.java:2405)
        at 
org.apache.hadoop.hbase.client.transactional.TransactionManager$10.call(TransactionManager.java:2403)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

-rw-------. 1 trafodion trafodion    6422528 Jun 21 21:10 core.11978
-rw-------. 1 trafodion trafodion   14561280 Jun 21 21:10 core.11979
-rw-------. 1 trafodion trafodion   14561280 Jun 21 21:10 core.11980
-rw-------. 1 trafodion trafodion    6422528 Jun 21 21:10 core.12130
-rw-------. 1 trafodion trafodion    6422528 Jun 21 21:10 core.12151
-rw-------. 1 trafodion trafodion  147271680 Nov 28 22:16 core.1506
-rw-r--r--. 1 trafodion trafodion  162172000 Nov 28 04:32 
core.2017-11-28_04-24-31.ZSM000.6663.mxssmp
-rw-------. 1 trafodion trafodion 2237276160 Jun 21 04:44 core.24494
-rw-------. 1 trafodion trafodion  987738112 Sep  6 02:36 core.3926
-rw-------. 1 trafodion trafodion  986812416 Sep  6 02:36 core.3970
-rw-------. 1 trafodion trafodion 2353975296 Jun 21 09:03 core.51428
-rw-------. 1 trafodion trafodion  111112192 Nov 28 22:17 core.5279
-rw-------. 1 trafodion trafodion  111112192 Nov 28 23:06 core.55161
-rw-------. 1 trafodion trafodion    1552384 Nov 28 22:05 core.5604
-rw-------. 1 trafodion trafodion  111132672 Nov 28 23:11 core.57098
-rw-------. 1 trafodion trafodion   48193536 Nov 28 23:11 core.57646
-rw-------. 1 trafodion trafodion  111112192 Nov 28 23:07 core.58026
-rw-------. 1 trafodion trafodion  885874688 Nov 28 22:51 core.58335
-rw-------. 1 trafodion trafodion  142344192 Nov 28 23:07 core.58551
-rw-------. 1 trafodion trafodion  133554176 Nov 28 22:17 core.6053
-rw-------. 1 trafodion trafodion  111112192 Nov 28 23:07 core.61491
-rw-------. 1 trafodion trafodion  111112192 Nov 28 22:16 core.65350

Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.hu...@dell.com<mailto:jack.hu...@dell.com>




From: Prashanth Vasudev [mailto:prashanth.vasu...@esgyn.com]
Sent: Wednesday, November 29, 2017 1:25 PM
To: 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>;
 Yuan <yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>>
Cc: Eric Owhadi <eric.owh...@esgyn.com<mailto:eric.owh...@esgyn.com>>; Narendra 
Goyal <narendra.go...@esgyn.com<mailto:narendra.go...@esgyn.com>>
Subject: RE: DCS is not started

1. Please also check to make sure all steps in the installer completed 
successfully.

2. From the shell, please check to see hbase classpath includes  trx jars.
    $  hbase classpath | grep trx

Prashanth

From: Selva Govindarajan [mailto:selva.govindara...@esgyn.com]
Sent: Tuesday, November 28, 2017 9:21 PM
To: 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>;
 Yuan <yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>>
Cc: Eric Owhadi <eric.owh...@esgyn.com<mailto:eric.owh...@esgyn.com>>; Narendra 
Goyal <narendra.go...@esgyn.com<mailto:narendra.go...@esgyn.com>>
Subject: RE: DCS is not started

It looks like the Transaction Manager failed to come up for some reason. The 
log directory $TRAF_HOME/logs should have files starting with tm_<nid>.log and 
trafodion_dtm.log.  These log files might give some clue to the problem.

Unless the Transaction Manager comes up, other processes will not be started.

Also check if there is a core file of TM program. The core file can be found in 
the directory pointed by /proc/sys/kernel/core_pattern. If there is no 
directory configured, the core file may be found at  $TRAF_HOME/sql/scripts

Selva

From: Huang, Jack [mailto:jack.hu...@dell.com]
Sent: Tuesday, November 28, 2017 9:09 PM
To: Yuan <yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>>; 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>
Cc: Eric Owhadi <eric.owh...@esgyn.com<mailto:eric.owh...@esgyn.com>>; Narendra 
Goyal <narendra.go...@esgyn.com<mailto:narendra.go...@esgyn.com>>
Subject: RE: DCS is not started

Only 1 trafodion node but 128G Mem configured for the server. Does it enough ?


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.hu...@dell.com<mailto:jack.hu...@dell.com>




From: Liu, Yuan (Yuan) [mailto:yuan....@esgyn.cn]
Sent: Wednesday, November 29, 2017 1:07 PM
To: Huang, Jack <jack.hu...@emc.com<mailto:jack.hu...@emc.com>>; 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>
Cc: Eric Owhadi <eric.owh...@esgyn.com<mailto:eric.owh...@esgyn.com>>; Narendra 
Goyal <narendra.go...@esgyn.com<mailto:narendra.go...@esgyn.com>>
Subject: RE: DCS is not started

How many trafodion nodes do you have? What is the memory of each node? I think 
you configured too many mxosrvrs.


Best regards,
Yuan

From: Huang, Jack [mailto:jack.hu...@dell.com]
Sent: Wednesday, November 29, 2017 12:16 PM
To: 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>
Cc: Liu, Yuan (Yuan) <yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>>; Eric Owhadi 
<eric.owh...@esgyn.com<mailto:eric.owh...@esgyn.com>>; Narendra Goyal 
<narendra.go...@esgyn.com<mailto:narendra.go...@esgyn.com>>
Subject: RE: DCS is not started

Sign! ckillall and sqstart , several mintues after, the trafodion env is still 
down!

[trafodion@trafodion logs]$ sqcheck

*** Checking Trafodion Environment ***

Checking if processes are up.
Checking attempt: 1; user specified max: 2. Execution time in seconds: 3.

The Trafodion environment is not up at all, or partially up and not 
operational. Check the logs.

Process         Configured      Actual      Down
-------         ----------      ------      ----
DTM             2               0           \$TM0 \$TM1
RMS             4               0           \$ZSC000 \$ZSC001 \$ZSM000 \$ZSM001
DcsMaster       1               1
DcsServer       1               0           1
mxosrvr         100             0           100
RestServer      1               1


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.hu...@dell.com<mailto:jack.hu...@dell.com>




From: Huang, Jack
Sent: Wednesday, November 29, 2017 10:13 AM
To: 'user@trafodion.incubator.apache.org' 
<user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>>
Cc: 'Liu, Yuan (Yuan)' <yuan....@esgyn.cn<mailto:yuan....@esgyn.cn>>; 'Eric 
Owhadi' <eric.owh...@esgyn.com<mailto:eric.owh...@esgyn.com>>; 'Narendra Goyal' 
<narendra.go...@esgyn.com<mailto:narendra.go...@esgyn.com>>
Subject: RE: DCS is not started


Thanks all. ckillall/sqstart is working now.

Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.hu...@dell.com<mailto:jack.hu...@dell.com>




From: Liu, Yuan (Yuan) [mailto:yuan....@esgyn.cn]
Sent: Wednesday, November 29, 2017 10:07 AM
To: 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>
Subject: RE: DCS is not started

Please use cstat to check if any process existed. If yes, then use ckillall to 
kill all process and then run cstat again.


Best regards,
Yuan

From: Narendra Goyal [mailto:narendra.go...@esgyn.com]
Sent: Wednesday, November 29, 2017 10:05 AM
To: 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>
Subject: RE: DCS is not started

Hi Jack,

Please try:


  *   ckillall
     *   this should kill all the orphan processes in the environment



  *   sqstart

-Narendra

From: Huang, Jack [mailto:jack.hu...@dell.com]
Sent: Tuesday, November 28, 2017 6:03 PM
To: 
user@trafodion.incubator.apache.org<mailto:user@trafodion.incubator.apache.org>
Subject: DCS is not started

Hi,
My trafodion env is down, how can I recover the trafodion environment?

[trafodion@trafodion ~]$ sqcheck

*** Checking Trafodion Environment ***

Checking if processes are up.
Checking attempt: 1; user specified max: 2. Execution time in seconds: 0.

The Trafodion environment is not up at all, or partially up and not 
operational. Check the logs.

Process         Configured      Actual      Down
-------         ----------      ------      ----
DTM             2               0           \$TM0 \$TM1
RMS             4               0           \$ZSC000 \$ZSC001 \$ZSM000 \$ZSM001
DcsMaster       1               0           1
DcsServer       1               0           1
mxosrvr         100             0           100
RestServer      1               1


The Trafodion environment is down.
[trafodion@trafodion ~]$ dcsstart

*** Checking Trafodion Environment ***

Checking if processes are up.
Checking attempt: 1; user specified max: 1. Execution time in seconds: 0.

The Trafodion environment is not up at all, or partially up and not 
operational. Check the logs.

Process         Configured      Actual      Down
-------         ----------      ------      ----
DTM             2               0           \$TM0 \$TM1
RMS             4               0           \$ZSC000 \$ZSC001 \$ZSM000 \$ZSM001
DcsMaster       1               0           1
DcsServer       1               0           1
mxosrvr         100             0           100
RestServer      1               1


The Trafodion environment is down.
DCS is not started. Please start Trafodion ...


[trafodion@trafodion ~]$ sqstart
Checking orphan processes: 3.
There are orphan processes from a previous SQ instance.
uid          pid   ppid  wchan   rss   vsz   time     stat cmd
---          ---   ----  -----   ---   ---   ----     ---- ---
trafodion     5952     1 hrtime 39412 402572 00:18:09 Ssl  
/home/trafodion/apache-trafodion-2.1.0/export/bin64/monitor COLD
trafodion     5953     1 hrtime 39192 402568 00:14:04 Ssl  
/home/trafodion/apache-trafodion-2.1.0/export/bin64/monitor COLD
trafodion     5938     1 poll_s  1440  21232 00:00:00 S    mpirun 
-disable-auto-cleanup -demux select -env SQ_IC TCP -env MPI_ERROR_LEVEL 2 -env 
SQ_PIDMAP 1 -env MPI_TMPDIR /home/trafodion/apache-trafodion-2.1.0/tmp -env 
TRAF_HOME /home/trafodion/apache-trafodion-2.1.0 -np 2 
/home/trafodion/apache-trafodion-2.1.0/export/bin64/monitor COLD
trafodion    11720  6953 wait    1780 106556 00:00:00 S+   /bin/bash 
/home/trafodion/apache-trafodion-2.1.0/sql/scripts/sqstart


[trafodion@trafodion ~]$ sqstop
SQ environment is not up.

Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.hu...@dell.com<mailto:jack.hu...@dell.com>




Reply via email to