I solved my problem with zookeeper. I don't know how, maybe it was a spell xD I made this way: on a slave i removed the directory of hbase, and i copied the diectory of hbase-pseudo-distribuited (which works). Then i copied all the configurations from the virtual machines which runned as master in the new directory, making it distribuited. Then i cloned the virtual machine 2 times, i made some configuration in and in /etc/network/interfaces file to set the proper IP on the VMs, and then zookeeper magically worked. All the configuration were the same. Maybe i made some wrong configuration in some OS file, or there was some rubbish left by the hundreds of tries i made on the master. Then, changing the VMs working as master solved my problem. Now: - I start HDFS with "$ ~/hadoop-1.0.4/bin/start-dfs.sh" - i try some command from hadoop shell to ensure it works (I found out that the directory on local fs that datanodes and namenode use as storage-space for HDFS' files' blocks need to has permission 755, otherwise, even if permission are larger, when you put a file in HDFS, the file in HDFS is created, bit it's content isn't tranferred so when you get the file you find out that the file is empty) - i start zookeeper on my 3 VMs with "$ ~/hbase-0.94.5/bin/hbase-daemon.sh start zookeeper" and i wait 2-3 minutes to be sure zookeeper completely started. Then i check in logs for some errors or warning, and i use "$ ~/hbase-0.94.5/bin/hbase zkcli" with some "ls" to ensure the client connect on zookeeper on the right node and port (2181). Related to zookeeper i found out that with HBASE_MANAGE_ZK=true in hbase-env.sh file, there was an error because zookeeper does't have time to set up properly before hbase master is launched. So, with a lot of VMs (i use 3, and they are a lot) it's better set HBASE_MANAGE_ZK=false and start it manually on the nodes so that you can wait until zookeeper is set up, before launch master. - All works properly until now so i start hbase with "$ ~/hbase-0.94.5/bin/start-hbase.sh. Now the output shows that master launch also regionservers on regionservers' nodes (good, because before it showed only that the master was launched on localhost, but nothing about regionserver). When i see the logs file on both master and regionserver's logs directory it shows that hbase daemons connect properly on the zookeeper cluster reported in zookeeper.property.quorum (or something similar) property in hbase-site.xml and the port also is right (2181, the same used by the tool zkcli).
Now the problem is that master starts on localhost:60000, not at namenode:60000, so on master node it's ok, but when regionserver try to connect to master at localhost:60000 they don't find (naturally) nothing at it's launched MasterNotRunningException, so that regionserver, after connecting to zookeeper, crash because of that. I found out in logs file on regionserver that they connect to zookeeper cluster and then they crash because they don't find a running master on localhost:60000, so it's right. But the strange thing is that in conf files i never used "localhost". I also tried so set the property hbase.master at namenode:60000, but this property isn't used from years, so it doesn't work anymore. What can i do?
