Hi John, Thanks for sharing that. Might help other people who are facing the same issues.
JM 2013/4/30 John Foxinhead <[email protected]> > Now I post my configurations: > I use a 3 nodes cluster with all the nodes runnind hadoop, zookeeper and > hbase. Hbase master, a zookeeper daemon and Hadoop namenode run on the same > host. Hbase regionserver, a zookeeper daemon and hadoop datanode run on the > other 2 nodes. I called one of the datanodes "jobtracker" because of the > various configuration i tried, but it is a datanode like "datanode1" > because i configured also the jobtracker while installing hadoop but i > never used it as jobtracker, but as datanode, because hbase doesn't need > the use of Map-Reduce algorithm. > I run all on the same pc: the 3 nodes are 3 virtual machines running on > VirtualBox connected throught internal network or bridged adapter network > interfaces (theese are configurations of virtualbox) > It's important to know that i use 3 virtual machines because comunication > is very very slow, expecially at startup of hadoop, zookeeper and hbase. > > > HADOOP: > > hadoop-env.sh: > export JAVA_HOME=/usr/lib/jvm/java-7-oracle > export HADOOP_CLASSPATH=/home/debian/hadoop-1.0.4/lib > export HADOOP_HEAPSIZE=1000 > > core-site.xml: > <configuration> > <property> > <name>fs.default.name</name> > <value>hdfs://namenode:9000/</value> > </property> > </configuration> > > hdfs-site.xml: > <configuration> > <property> > <name>dfs.name.dir</name> > <value>/home/debian/hadoop-1.0.4/FILESYSTEM/name</value> > </property> > <property> > <name>dfs.data.dir</name> > <value>/home/debian/hadoop-1.0.4/FILESYSTEM/data</value> > </property> > <property> > <name>dfs.support.append</name> > <value>true</value> > </property> > <property> > <name>dfs.datanode.max.xcievers</name> > <value>4096</value> > </property> > </configuration> > > masters: > slaves: > jobtracker > datanode1 > > > HBASE: > > hbase-env.sh > export JAVA_HOME=/usr/lib/jvm/java-7-oracle > export HBASE_CLASSPATH=/home/debian/hbase-0.94.5/lib > export HBASE_MANAGES_ZK=false > > hbase-site.xml: > <configuration> > <property> > <name>dfs.support.append</name> > <value>true</value> > </property> > <property> > <name>hbase.rootdir</name> > <value>hdfs://namenode:9000/hbase</value> > </property> > <property> > <name>hbase.cluster.distributed</name> > <value>true</value> > </property> > <property> > <name>hbase.zookeeper.quorum</name> > <value>namenode,jobtracker,datanode1</value> > </property> > <property> > <name>hbase.zookeeper.property.dataDir</name> > <value>/home/debian/hbase-0.94.5/zookeeper/data</value> > </property> > <property> > <name>hbase.master</name> > <value>namenode:60000</value> > </property> > </configuration> > note: i think the property hbase.master doesn't work from years, so it can > be deleted, but after a lot of tries my hbase worked, so i left it there. > I'll try to delete it later. > regionservers: > jobtracker > datanode > > > OS FILES: > > /etc/hosts: > > 127.0.0.1 localhost > 127.0.0.1 debian01 > # HADOOP > 192.168.1.111 jobtracker > 192.168.1.112 datanode1 > 192.168.1.121 namenode > # The following lines are desirable for IPv6 capable hosts > ::1 ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > > /etc/hostname: > namenode (or jobtracker, or datanode1, depending on the node) > > /etc/network/interfaces (to set static IPs: on namenode: address > 192.168.1.121, on jobtracker: address 192.168.1.111, on datanode1: address > 192.168.1.112): > iface eth6 inet static > address 192.168.1.121 > netmask 255.255.255.0 > network 192.168.1.0 > broadcast 192.168.1.255 > gateway 192.168.1.254 > dns-nameserver 8.8.8.8 8.8.4.4 > note: eth6 because eth2 (where i had "bridged network adapter" virtual > interface) was remapped on eth6 (you can verifying it with "$ dmesg | grep > eth"), so replace eth6 with your interface. > > > MY PROBLEMS (think i copied hbase and hadoop directories from the working > pseudo-distribuited version directory, so pseudo-distribuited version > works): > > 1) After starting up hadoop, trying some shell comands to put a file in > hadoop filesystem and later get the same file from HFDS, i get the file, > but the file is empty. > SOLUTION: FILESYSTEM, FILESYSTEM/data and FILESYSTEM/name directory must > have 755 (rwxr-xr-x) permission. > 2) After starting up hadoop, trying some shell comands to put a file in > hadoop filesystem and later get the same file from HFDS, i receive > warnings/error (on log files) related to the mismatching beetween the > expected and the received ID of the blocks. > EXPLAINING: It could appen if, after the use of an HDFS, for examples > putting files into it, i use "bin/hadoop namenode -format" to formatting a > new HDFS, and i have changed the hdfs.data.dir and hdfs.name.dir to a > persistent location (default is a tmp location, which is cleared on restart > of the OS). "bin/hadoop namenode -format" formats the hdfs.name.dir > directory and get a new ID for HDFS blocks. It doesn't format hdfs.data.dir > directory on datanodes, so datanodes expect the old blocks' ID and there is > a mismatching. > SOLUTION: clear all hdfs.data.dir directories on all datanodes, then > reformat a new filesystem using bin/hadoop namenode -format on namenode. > > 3)Hbase, while starting managing zookeeper, can't connect to zookeeper (1): > SOLUTION: set HBASE_MANAGES_ZK=false, so that hbase can't manage zookeeper. > It's recommended in my case because i launch 3 virtual machines, so hbase > fails connecting to zookeeper because it reaches the fail limit before > zookeeper cluster starts up completely. So i run zookeeper on all of the 3 > nodes with "$ /bin/hbase-daemon.sh start zookeeper" and i wait some > minutes. This is because of the slow connection between 3 Virtual Machines. > Then i try zookeeper cluster with some "ls /" from zk shell (launch it with > "$ bin/hbase zkcli") and i ensure the shell connects with the right node on > the right port while launching zk shell (that is a zk client) > > 4)Hbase, without managing zookeeper, can't connect to zookeeper. All > configurations where right, as written above, but hbase launched a 1-node > zookeeper cluster on master at localhost and connect to it, Also master > doesn't start regionserver. It's a strange problem. > SOLUTION: This solution is as strange as the problem. Configuration files > were right, bit hbase didn't work, so i opened a regionserver > virtualmachine, i completely removed hbase directories, i copied > hbase-pseudo-distribuited directory and i renamed it as the previous one. I > manually copied all the configuration files from the hbase directory on the > master. Then I closed all the virtual machines, i made a backup of the old > master, and i deleted all the vms, except the backup and the slave vm in > which i re-copied configuration files. I made other 2 clones of this > virtual machine (with the new hbase folder) and i modified only > /etc/network/interfaces, so I set the proper IP for each of the VMs. Then > hbase was able to connect to the zookeeper cluster and hbase was able to > start regionserver. I think it was because of some rubbish left during the > lots of tries i made on master node, so copying conf files in a slave node > and making it became the new master solved my problem. Then i made another > backup, to clean the system from future rubbish and solve problems like > this. > > 5)Hbase connects to the running zookeeper cluster but there is the last > problem: master launches regionservers, but on the regionservers' nodes, > when the regionserver daemon starts, it try to connect to master at > localhost:60000, instead of connecting at namenode:60000. > SOLUTION: The property hbase.master in unuseful because it's not supported > for years. So that the problem is the file /etc/hostname. It's contento is > "debian01" to all the nodes, but it could be "namenode" on the namenode, > "datanode" on the datanode and "jobtracker" on the jobtracker (The hostname > used in hbase conf files referring to each node). This was my last changing > in configurations. When i changed also this, finally hbase worked properly. > Note: just relogging will not make effective the changes in /etc/hostname, > in fact when you relog you'll see, for example, something like > "debian@debian01", even if you already changed "debian01" with "namenode". > You need to completely shut down the OS and restart it to make the changes > work. > > > Now Hadoop, Zookeeper and Hbase work, and also some jar compiled to test > some simple instructions like Put and Get not from hbase shell, but from > Hbase Java API work. > Thank you all, and I hope someone else cold take advantage from my issues. > > > > > > > > 2013/4/30 John Foxinhead <[email protected]> > > > I solved the last problem: > > I modified the file /etc/hostname and i replaced the default hostname, > > "debian01" with "namenode", "jobtracker", or " datanode", the hostnames i > > used in hbase conf files. Now i start hbase fro master with > > "bin/start-hbase.sh" and regionservers, instead of trying to connect with > > master at localhost:60000, connect with namenode:60000. > > Now all is working good. Thank you all. Later I will post my > configuration > > files and make a summary of the problems I encountered, so that other > users > > can take advantage from those. > > > > > > 2013/4/30 John Foxinhead <[email protected]> > > > >> I solved my problem with zookeeper. I don't know how, maybe it was a > >> spell xD > >> I made this way: on a slave i removed the directory of hbase, and i > >> copied the diectory of hbase-pseudo-distribuited (which works). Then i > >> copied all the configurations from the virtual machines which runned as > >> master in the new directory, making it distribuited. Then i cloned the > >> virtual machine 2 times, i made some configuration in and in > >> /etc/network/interfaces file to set the proper IP on the VMs, and then > >> zookeeper magically worked. All the configuration were the same. Maybe i > >> made some wrong configuration in some OS file, or there was some rubbish > >> left by the hundreds of tries i made on the master. Then, changing the > VMs > >> working as master solved my problem. > >> Now: > >> - I start HDFS with "$ ~/hadoop-1.0.4/bin/start-dfs.sh" > >> - i try some command from hadoop shell to ensure it works (I found out > >> that the directory on local fs that datanodes and namenode use as > >> storage-space for HDFS' files' blocks need to has permission 755, > >> otherwise, even if permission are larger, when you put a file in HDFS, > the > >> file in HDFS is created, bit it's content isn't tranferred so when you > get > >> the file you find out that the file is empty) > >> - i start zookeeper on my 3 VMs with "$ > >> ~/hbase-0.94.5/bin/hbase-daemon.sh start zookeeper" and i wait 2-3 > minutes > >> to be sure zookeeper completely started. Then i check in logs for some > >> errors or warning, and i use "$ ~/hbase-0.94.5/bin/hbase zkcli" with > some > >> "ls" to ensure the client connect on zookeeper on the right node and > port > >> (2181). Related to zookeeper i found out that with HBASE_MANAGE_ZK=true > in > >> hbase-env.sh file, there was an error because zookeeper does't have > time to > >> set up properly before hbase master is launched. So, with a lot of VMs > (i > >> use 3, and they are a lot) it's better set HBASE_MANAGE_ZK=false and > start > >> it manually on the nodes so that you can wait until zookeeper is set up, > >> before launch master. > >> - All works properly until now so i start hbase with "$ > >> ~/hbase-0.94.5/bin/start-hbase.sh. Now the output shows that master > launch > >> also regionservers on regionservers' nodes (good, because before it > showed > >> only that the master was launched on localhost, but nothing about > >> regionserver). When i see the logs file on both master and > regionserver's > >> logs directory it shows that hbase daemons connect properly on the > >> zookeeper cluster reported in zookeeper.property.quorum (or something > >> similar) property in hbase-site.xml and the port also is right (2181, > the > >> same used by the tool zkcli). > >> > >> Now the problem is that master starts on localhost:60000, not at > >> namenode:60000, so on master node it's ok, but when regionserver try to > >> connect to master at localhost:60000 they don't find (naturally) > nothing at > >> it's launched MasterNotRunningException, so that regionserver, after > >> connecting to zookeeper, crash because of that. > >> I found out in logs file on regionserver that they connect to zookeeper > >> cluster and then they crash because they don't find a running master on > >> localhost:60000, so it's right. But the strange thing is that in conf > files > >> i never used "localhost". I also tried so set the property hbase.master > at > >> namenode:60000, but this property isn't used from years, so it doesn't > work > >> anymore. What can i do? > >> > > > > >
