I noticed Phoenix config parameters. Are Phoenix jars in place ? Can you capture jstack of the master when this happens ?
Cheers > On Dec 16, 2015, at 7:46 PM, F21 <[email protected]> wrote: > > Background: > > I am prototyping a HBase cluster using docker. Docker is 1.9.1 and is running > in a Ubuntu 15.10 64-bit VM with access to 6GB of RAM. > > Within docker, I am running 1 Zookeeper, HDFS (2.7.1) in HA mode (1 name > node, 1 standby name node, 3 journal nodes, 2 zookeeper failover controllers > (colocated with namenodes) and 3 datanodes). > > In terms of HBase I am running 1.1.2 and have 2 masters and 2 region servers > set up to use the HDFS cluster. > > All of the above are running Oracle Java 8. > > I am launching all my docker containers using docker-compose. However, I have > startup scripts in place to check that the HDFS cluster is up and safemode is > off before launching the HBase servers. > > Problem: > When launching the regionservers and masters, they do not launch reliably. > Often times, there will be one or more regionservers and masters which do not > launch properly. In those cases, the failed process will be using 100% of the > CPU core it is launched on and use very little memory (about 20 MB). The > process hangs and we need to forcefully terminate it. > > In the log files, we see that hbase-hdfs-master-hmaster2.log is empty and > hbase-hdfs-master-hmaster2.out contains some information but not much: > > Thu Dec 17 02:37:26 UTC 2015 Starting master on hmaster2 > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 23668 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1048576 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) 1048576 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > This is the command we are using to launch the hbase process: > > sudo -u hdfs /opt/hbase/bin/hbase-daemon.sh --config /opt/hbase/conf start > master > > The hbase-site.xml looks like this: > > <configuration> > <property> > <name>hbase.rootdir</name> > <value>hdfs://mycluster/hbase</value> > </property> > <property> > <name>zookeeper.znode.parent</name> > <value>/hbase</value> > </property> > <property> > <name>hbase.cluster.distributed</name> > <value>true</value> > </property> > <property> > <name>hbase.zookeeper.quorum</name> > <value>zk1</value> > </property> > <property> > <name>hbase.master.loadbalancer.class</name> > <value>org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer</value> > </property> > <property> > <name>hbase.coprocessor.master.classes</name> > <value>org.apache.phoenix.hbase.index.master.IndexMasterObserver</value> > </property> > </configuration> > > The hdfs-site.xml looks like this: > > <configuration> > <property> > <name>dfs.nameservices</name> > <value>mycluster</value> > </property> > <property> > <name>dfs.ha.namenodes.mycluster</name> > <value>nn1,nn2</value> > </property> > <property> > <name>dfs.namenode.rpc-address.mycluster.nn1</name> > <value>nn1:8020</value> > </property> > <property> > <name>dfs.namenode.rpc-address.mycluster.nn2</name> > <value>nn2:8020</value> > </property> > <property> > <name>dfs.namenode.http-address.mycluster.nn1</name> > <value>nn1:50070</value> > </property> > <property> > <name>dfs.namenode.http-address.mycluster.nn2</name> > <value>nn2:50070</value> > </property> > <property> > <name>dfs.client.failover.proxy.provider.mycluster</name> > <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> > </property> > </configuration> > > The core-site.xml looks like this: > <configuration> > <property><name>fs.defaultFS</name><value>hdfs://mycluster</value></property></configuration> > > And hbase-env.sh looks like this: > # Set environment variables here. > > # This script sets variables multiple times over the course of starting an > hbase process, > # so try to keep things idempotent unless you want to take an even deeper look > # into the startup scripts (bin/hbase, etc.) > > # The java implementation to use. Java 1.7+ required. > # export JAVA_HOME=/usr/java/jdk1.6.0/ > > # Extra Java CLASSPATH elements. Optional. > # export HBASE_CLASSPATH= > > # The maximum amount of heap to use. Default is left to JVM default. > # export HBASE_HEAPSIZE=1G > > # Uncomment below if you intend to use off heap cache. For example, to > allocate 8G of > # offheap, set the value to "8G". > # export HBASE_OFFHEAPSIZE=1G > > # Extra Java runtime options. > # Below are what we set by default. May only work with SUN JVM. > # For more on why as well as other possible settings, > # see http://wiki.apache.org/hadoop/PerformanceTuning > export HBASE_OPTS="-XX:+UseConcMarkSweepGC" > > # Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+ > export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m > -XX:MaxPermSize=128m" > export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m > -XX:MaxPermSize=128m" > > # Uncomment one of the below three options to enable java garbage collection > logging for the server-side processes. > > # This enables basic gc logging to the .out file. > # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCDateStamps" > > # This enables basic gc logging to its own file. > # If FILE-PATH is not replaced, the log file(.gc) would still be generated in > the HBASE_LOG_DIR . > # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>" > > # This enables basic GC logging to its own file with automatic log rolling. > Only applies to jdk 1.6.0_34+ and 1.7.0_2+. > # If FILE-PATH is not replaced, the log file(.gc) would still be generated in > the HBASE_LOG_DIR . > # export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M" > > # Uncomment one of the below three options to enable java garbage collection > logging for the client processes. > > # This enables basic gc logging to the .out file. > # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCDateStamps" > > # This enables basic gc logging to its own file. > # If FILE-PATH is not replaced, the log file(.gc) would still be generated in > the HBASE_LOG_DIR . > # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>" > > # This enables basic GC logging to its own file with automatic log rolling. > Only applies to jdk 1.6.0_34+ and 1.7.0_2+. > # If FILE-PATH is not replaced, the log file(.gc) would still be generated in > the HBASE_LOG_DIR . > # export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation > -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M" > > # See the package documentation for org.apache.hadoop.hbase.io.hfile for > other configurations > # needed setting up off-heap block caching. > > # Uncomment and adjust to enable JMX exporting > # See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to > configure remote password access. > # More details at: > http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html > # NOTE: HBase provides an alternative JMX implementation to fix the random > ports issue, please see JMX > # section in HBase Reference Guide for instructions. > > # export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false" > # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE > -Dcom.sun.management.jmxremote.port=10101" > # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE > -Dcom.sun.management.jmxremote.port=10102" > # export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE > -Dcom.sun.management.jmxremote.port=10103" > # export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE > -Dcom.sun.management.jmxremote.port=10104" > # export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE > -Dcom.sun.management.jmxremote.port=10105" > > # File naming hosts on which HRegionServers will run. > $HBASE_HOME/conf/regionservers by default. > # export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers > > # Uncomment and adjust to keep all the Region Server pages mapped to be > memory resident > #HBASE_REGIONSERVER_MLOCK=true > #HBASE_REGIONSERVER_UID="hbase" > > # File naming hosts on which backup HMaster will run. > $HBASE_HOME/conf/backup-masters by default. > # export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters > > # Extra ssh options. Empty by default. > # export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR" > > # Where log files are stored. $HBASE_HOME/logs by default. > # export HBASE_LOG_DIR=${HBASE_HOME}/logs > > # Enable remote JDWP debugging of major HBase processes. Meant for Core > Developers > # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug > -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070" > # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug > -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071" > # export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug > -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072" > # export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug > -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073" > > # A string representing this instance of hbase. $USER by default. > # export HBASE_IDENT_STRING=$USER > > # The scheduling priority for daemon processes. See 'man nice'. > # export HBASE_NICENESS=10 > > # The directory where pid files are stored. /tmp by default. > # export HBASE_PID_DIR=/var/hadoop/pids > > # Seconds to sleep between slave commands. Unset by default. This > # can be useful in large clusters, where, e.g., slave rsyncs can > # otherwise arrive faster than the master can service them. > # export HBASE_SLAVE_SLEEP=0.1 > > # Tell HBase whether it should manage it's own instance of Zookeeper or not. > # export HBASE_MANAGES_ZK=true > > # The default log rolling policy is RFA, where the log file is rolled as per > the size defined for the > # RFA appender. Please refer to the log4j.properties file to see more details > on this appender. > # In case one needs to do log rolling on a date change, one should set the > environment property > # HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA". > # For example: > # HBASE_ROOT_LOGGER=INFO,DRFA > # The reason for changing default to RFA is to avoid the boundary case of > filling out disk space as > # DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for > more context. > export HBASE_LOG_DIR=/var/log/hbase > export HBASE_PID_DIR=/var/run/hbase > export JAVA_HOME=/usr/lib/jvm/java-8-oracle > > The server still has plenty of ram available (1GB). > > It's not clear what is causing this as the logs are pretty sparse. Have any > of you seen a problem like this before?
