Thanks Dima, Now even if I use a network called hadoopnet.com <http://hadoopnet.com/> I still have the same problem. Here are my regionservers that get detected:
Region Servers Base Stats <http://192.168.99.100:33224/master-status#tab_baseStats>Memory <http://192.168.99.100:33224/master-status#tab_memoryStats>Requests <http://192.168.99.100:33224/master-status#tab_requestStats>Storefiles <http://192.168.99.100:33224/master-status#tab_storeStats>Compactions <http://192.168.99.100:33224/master-status#tab_compactStas> ServerName Start time Version Requests Per Second Num. Regions hadoop-slave1.hadoopnet.com,16020,1473137128613 <http://hadoop-slave1.hadoopnet.com:16030/rs-status> Tue Sep 06 04:45:28 UTC 2016 1.2.2 0 0 hadoop-slave1.hadoopnet.com.hadoopnet.com,16020,1473137128613 <http://hadoop-slave1.hadoopnet.com.hadoopnet.com:60010/rs-status> Tue Sep 06 04:45:28 UTC 2016 Unknown 0 0 hadoop-slave2.hadoopnet.com,16020,1473137127975 <http://hadoop-slave2.hadoopnet.com:16030/rs-status> Tue Sep 06 04:45:27 UTC 2016 1.2.2 0 0 hadoop-slave2.hadoopnet.com.hadoopnet.com,16020,1473137127975 <http://hadoop-slave2.hadoopnet.com.hadoopnet.com:60010/rs-status> Tue Sep 06 04:45:27 UTC 2016 Unknown 0 0 Total:4 2 nodes with inconsistent version 0 0 instead of just hadoop-slave1.hadoopnet.com,16020,1473137128613 <http://hadoop-slave1.hadoopnet.com:16030/rs-status> and hadoop-slave2.hadoopnet.com,16020,1473137127975 <http://hadoop-slave2.hadoopnet.com:16030/rs-status> This is the script I used to start the hadoop cluster --- #!/bin/bash # the default node number is 3 N=${1:-3} NETWORK=hadoopnet.com docker rm -f zk.$NETWORK &> /dev/null echo "start zk container..." docker run -p 2181:2181 --name zk.$NETWORK --hostname zk.$NETWORK --net=$NETWORK -itd -v conf:/opt/zookeeper/conf -v data:/tmp/zookeeper jplock/zookeeper # start hadoop master container docker rm -f hadoop-master.$NETWORK &> /dev/null echo "start hadoop-master container..." docker run -itd \ --net=$NETWORK \ -P \ --name hadoop-master.$NETWORK \ --hostname hadoop-master.$NETWORK \ --add-host zk.$NETWORK:$(docker inspect -f "{{with index .NetworkSettings.Networks \"${NETWORK}\"}}{{.IPAddress}}{{end}}" zk.$NETWORK) \ casertap/hhb # start hadoop slave container i=1 while [ $i -lt $N ] do docker rm -f hadoop-slave$i.$NETWORK &> /dev/null echo "start hadoop-slave$i container..." docker run -itd \ --net=$NETWORK \ --name hadoop-slave$i.$NETWORK \ --hostname hadoop-slave$i.$NETWORK \ --publish-all=false \ --add-host hadoop-master.$NETWORK:$(docker inspect -f "{{with index .NetworkSettings.Networks \"${NETWORK}\"}}{{.IPAddress}}{{end}}" hadoop-master.$NETWORK) \ --add-host zk.$NETWORK:$(docker inspect -f "{{with index .NetworkSettings.Networks \"${NETWORK}\"}}{{.IPAddress}}{{end}}" zk.$NETWORK) \ casertap/hhb i=$(( $i + 1 )) done # get into hadoop master container docker exec -it hadoop-master.$NETWORK bash --- Thanks, pierre > On 6 Sep 2016, at 08:47, Dima Spivak <[email protected]> wrote: > > Sounds good, Pierre. FWIW, if you want a preview, here's how to get a > 5-node HBase cluster running based on the master branch of HBase in about a > minute: > > 1. Source the clusterdock.sh script that defines the clusterdock_ helper > functions: source /dev/stdin <<< "$(curl -sL > http://tiny.cloudera.com/clusterdock.sh > <http://tiny.cloudera.com/clusterdock.sh>)" > 2. Start up a cluster: CLUSTERDOCK_TOPOLOGY_IMAGE= > hbasejenkinsuser-docker-hbase.bintray.io/dev/clusterdock:apache_hbase_topology > clusterdock_run ./bin/start_cluster -r > hbasejenkinsuser-docker-hbase.bintray.io --namespace dev apache_hbase > --hbase-version=master --hadoop-version=2.7.1 > --secondary-nodes='node-{2..5}' > > And that's it. Feel free to put a -h for help information (put it right > after the ./bin/start_cluster for details about the function or after the > apache_hbase for details about the Apache HBase topology. > > -Dima > > On Mon, Sep 5, 2016 at 3:44 PM, Pierre Caserta <[email protected] > <mailto:[email protected]>> > wrote: > >> Thanks for your answer. >> I will check the ticket https://issues.apache.org/jira/browse/HBASE-15961 >> <https://issues.apache.org/jira/browse/HBASE-15961> >> <https://issues.apache.org/jira/browse/HBASE-15961 >> <https://issues.apache.org/jira/browse/HBASE-15961>> regularly and try >> clusterdock as soon as the documentation comes out. >> I will try to use hostname with domain like: master.hadoopnet.com >> <http://master.hadoopnet.com/> < >> http://master.hadoopnet.com/ <http://master.hadoopnet.com/>> and network >> named hadoopnet.com <http://hadoopnet.com/> < >> http://hadoopnet.com/ <http://hadoopnet.com/>> to try if this resolve the >> problem. >> Currently my hostnames are hadoop-master, hadoop-slave1 and hadoop-slave2, >> maybe that is the problem. >> >>> On 5 Sep 2016, at 23:31, Dima Spivak <[email protected]> wrote: >>> >>> clusterdock uses --net=host for running the framework out of a container, >>> but each Hadoop/HBase cluster itself runs with its own bridge network. >> Just >>> suggesting clusterdock since it's what we now use for testing HBase >>> releases and it looks a bit more sophisticated than this other project >>> (e.g. no need to rebuild images for different cluster sizes). >>> >>> The error you're seeing is caused by not using the FQDN of the containers >>> when referring to them; Docker networks use the network name as the >> domain. >>> >>> On Monday, September 5, 2016, Pierre Caserta <[email protected] >> <mailto:[email protected] <mailto:[email protected]>>> >>> wrote: >>> >>>> That is a good script thanks but I would like to understand exactly what >>>> is the problem with my config without adding another level of >> abstraction >>>> and just running the clusterdock command. >>>> In your script I can see that you are using --net=host. I think this is >>>> the main difference compared to what I am doing which is creating a >> bridge >>>> network for the hadoop cluster. >>>> I have only 3 machines: hadoop-master, hadoop-slave1, hadoop-slave2. >>>> >>>> Why do those strange hadoop-slave2.hadoopnet alias appear in the web ui? >>>> It looks like the network name is used as part of the hostname. >>>> Any idea what it is happening in my case? >>>> >>>> Pierre >>>> >>>>> On 5 Sep 2016, at 16:48, Dima Spivak <[email protected] >>>> <javascript:;>> wrote: >>>>> >>>>> You should try the Apache HBase topology for clusterdock that was >>>> committed >>>>> a few months back. See HBASE-12721 for details. >>>>> >>>>> On Sunday, September 4, 2016, Pierre Caserta <[email protected] >> <mailto:[email protected] <mailto:[email protected]>> >>>> <javascript:;>> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> I am building a fully distributed hbase cluster with unmanaged >>>> zookeeper. >>>>>> I pretty much used this example and install hbase on top of it: >>>>>> https://github.com/kiwenlau/hadoop-cluster-docker >>>>>> >>>>>> Hadoop and hdfs works fine but I get this exception with hbase: >>>>>> >>>>>> 2016-09-05 06:27:12,268 INFO [hadoop-master:16000. >>>> activeMasterManager] >>>>>> zookeeper.MetaTableLocator: Failed verification of hbase:meta,,1 at >>>>>> address=hadoop-slave2,16020,1473052276351, >> exception=org.apache.hadoop. >>>>>> hbase.NotServingRegionException: Region hbase:meta,,1 is not online >> on >>>>>> hadoop-slave2.hadoopnet,16020,1473056813966 >>>>>> at org.apache.hadoop.hbase.regionserver.HRegionServer. >>>>>> getRegionByEncodedName(HRegionServer.java:2910) >>>>>> >>>>>> This is bloking because any command I enter on the hbase shell will >>>> return >>>>>> the following error: >>>>>> >>>>>> ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is >>>>>> initializing >>>>>> >>>>>> The containers are runned using --net=hadoopnet >>>>>> which is a network create as such: >>>>>> >>>>>> docker network create --driver=bridge hadoopnet >>>>>> >>>>>> The hbase webui is showing this: >>>>>> >>>>>> Region Servers >>>>>> ServerName Start time Version Requests Per Second Num. >>>>>> Regions >>>>>> hadoop-slave1,16020,1473056814064 Mon Sep 05 06:26:54 UTC 2016 >>>>>> 1.2.2 0 0 >>>>>> hadoop-slave1.hadoopnet,16020,1473056814064 Mon Sep 05 06:26:54 UTC >>>>>> 2016 Unknown 0 0 >>>>>> hadoop-slave2,16020,1473056813966 Mon Sep 05 06:26:53 UTC 2016 >>>>>> 1.2.2 0 0 >>>>>> hadoop-slave2.hadoopnet,16020,1473056813966 Mon Sep 05 06:26:53 UTC >>>>>> 2016 Unknown 0 0 >>>>>> Total:4 2 nodes with inconsistent version 0 >> 0 >>>>>> >>>>>> I should have only 2 regionservers but 2 strange >> hadoop-slave1.hadoopnet >>>>>> and hadoop-slave2.hadoopnet are added to the list. >>>>>> When I look at zk using: >>>>>> >>>>>> /usr/local/hbase/bin/hbase zkcli -server zk:2181 ls /hbase/rs >>>>>> >>>>>> I only see my 2 regionserver: hadoop-slave1,16020,1473056814064 and >>>>>> hadoop-slave2,16020,1473056813966 >>>>>> >>>>>> Looking at the zookeeper.MetaTableLocator: Failed verification error I >>>> see >>>>>> that hadoop-slave2,16020,1473052276351 and >>>> hadoop-slave2.hadoopnet,16020,1473056813966 >>>>>> get mixed up. >>>>>> >>>>>> here is my config on all server >>>>>> >>>>>> <?xml version="1.0" encoding="UTF-8"?> >>>>>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >>>>>> >>>>>> <configuration> >>>>>> <property> >>>>>> <name>hbase.rootdir</name> >>>>>> <value>hdfs://hadoop-master:9000/hbase</value> >>>>>> <description>The directory shared by region servers. >> Should >>>>>> be fully-qualified to include the filesystem to use. E.g: >>>>>> hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR</description> >>>>>> </property> >>>>>> <property> >>>>>> <name>hbase.master</name> >>>>>> <value>hdfs://hadoop-master:60000</value> >>>>>> <description>The host and port that the HBase master runs >>>>>> at.</description> >>>>>> </property> >>>>>> <property> >>>>>> <name>hbase.cluster.distributed</name> >>>>>> <value>true</value> >>>>>> <description>The mode the cluster will be in. Possible >>>>>> values are >>>>>> false: standalone and pseudo-distributed setups with >>>> managed >>>>>> Zookeeper >>>>>> true: fully-distributed with unmanaged Zookeeper Quorum >>>> (see >>>>>> hbase-env.sh)</description> >>>>>> </property> >>>>>> <property> >>>>>> <name>hbase.master.info.port</name> >>>>>> <value>60010</value> >>>>>> <description>The UI interface of HBase master >>>>>> runs.</description> >>>>>> </property> >>>>>> <property> >>>>>> <name>hbase.zookeeper.quorum</name> >>>>>> <value>zk</value> >>>>>> <description>string m_e_m_b_e_r_s is replaced by list of >>>>>> hosts separated by comma. Its generated by configure-slaves.sh on >> master >>>>>> node</description> >>>>>> </property> >>>>>> <property> >>>>>> <name>hbase.zookeeper.property.maxClientCnxns</name> >>>>>> <value>300</value> >>>>>> </property> >>>>>> <property> >>>>>> <name>hbase.zookeeper.property.datadir</name> >>>>>> <value>/tmp/zookeeper</value> >>>>>> <description>location of storage of zookeeper >>>>>> data</description> >>>>>> </property> >>>>>> <property> >>>>>> <name>hbase.zookeeper.property.clientPort</name> >>>>>> <value>2181</value> >>>>>> </property> >>>>>> >>>>>> </configuration> >>>>>> >>>>>> I created a stack overflow question as well: >> http://stackoverflow.com/ >>>>>> questions/39325041/hbase-on-docker-notservingregionexception- >>>>>> because-of-hostname-alisas <http://stackoverflow.com/ >>>>>> questions/39325041/hbase-on-docker-notservingregionexception- >>>>>> because-of-hostname-alisas> >>>>>> >>>>>> Thanks, >>>>>> Pierre >>>>> >>>>> >>>>> >>>>> -- >>>>> -Dima >>>> >>>> >>> >>> -- >>> -Dima
