Hi J-D,

It turned out to be annoyingly simple: we have ZK server running on 3
machines and found we have to start ALL the ZK servers before we start
HBase.

With hindsight this is obvious and we could kick ourselves for not thinking
of it two days ago.

But I can't see any reminders about this in the ZK docs or the HBase docs,
and in the context of Hadoop you become used to thinking in terms of masters
controlling slaves... well, it's easy to forget to tell every ZK server to
get going. From Googling for the error/NPE I was getting it looks like a few
people have had the same problem but not solved it (or not let the world
know they fixed it).

Maybe there's a neat tool for controlling ZK ensembles but I haven't found
it. 

Meanwhile I think a note in the HBase Cluster setup docs saying "Start all
your ZKs, wait until you are sure they are running, then start HBase" could
save a others from wasting time.

But thanks for your help - you got us on the right track for sure.

Royston


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of
Jean-Daniel Cryans
Sent: 16 November 2011 17:30
To: [email protected]
Subject: Re: n00b trying to run HBase example code

Ah right:

> 2011-11-16 14:37:14,670 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server localhost/127.0.0.1:2181

Unless you have a ZK server running on every node (you shouldn't), then it's
not going to find it. Your job needs to know about your zookeeper
configuration.

Check this out if you need some more hints:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-su
mmary.html#classpath

J-D

On Wed, Nov 16, 2011 at 9:20 AM, Royston Sellman
<[email protected]> wrote:
> Thanks for your suggestion J-D, I hadn't tried that.
>
> So, following your advice, below is the error log from one of the 
> slaves in my cluster. Maybe those "connection refused" messages are 
> the cause of the exception... Does it ring any bells for you?
>
>
>
> 2011-11-16 14:37:14,641 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.home=/home/hadoop1
> 2011-11-16 14:37:14,641 INFO org.apache.zookeeper.ZooKeeper: Client 
> environment:user.dir=/mapred/taskTracker/jobcache/job_201111161435_000
> 1/atte
> mpt_201111161435_0001_m_000000_0/work
> 2011-11-16 14:37:14,641 INFO org.apache.zookeeper.ZooKeeper: 
> Initiating client connection, connectString=localhost:2181 
> sessionTimeout=180000 watcher=hconnection
> 2011-11-16 14:37:14,670 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server localhost/127.0.0.1:2181
> 2011-11-16 14:37:14,671 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x0 for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
> 2011-11-16 14:37:15,412 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server localhost/127.0.0.1:2181
> 2011-11-16 14:37:15,412 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x0 for server null, unexpected error, closing socket connection and 
> attempting reconnect ...
> (omitting 8 identical retries and fails) ...
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
> 2011-11-16 14:37:26,172 INFO org.apache.zookeeper.ClientCnxn: Opening 
> socket connection to server localhost/127.0.0.1:2181
> 2011-11-16 14:37:26,173 WARN org.apache.zookeeper.ClientCnxn: Session 
> 0x0 for server null, unexpected error, closing socket connection and 
> attempting reconnect
> java.net.ConnectException: Connection refused
>        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>        at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>        at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
> 2011-11-16 14:37:26,373 INFO org.apache.zookeeper.ZooKeeper: Session: 
> 0x0 closed
> 2011-11-16 14:37:26,373 ERROR
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat:
> org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to 
> connect to ZooKeeper but the connection closes immediately. This could 
> be a sign that the server has too many connections (30 is the 
> default). Consider inspecting your ZK server logs for that error and 
> then make sure you are reusing HBaseConfiguration as often as you can. 
> See HTable's javadoc for more information.
> 2011-11-16 14:37:26,375 INFO org.apache.zookeeper.ClientCnxn: 
> EventThread shut down
> 2011-11-16 14:37:26,423 WARN org.apache.hadoop.mapred.TaskTracker: 
> Error running child java.lang.NullPointerException
>        at
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.
> write(
> TableOutputFormat.java:127)
>        at
> org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.
> write(
> TableOutputFormat.java:82)
>        at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTas
> k.java
> :498)
>        at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutp
> utCont
> ext.java:80)
>        at SampleUploader$Uploader.map(SampleUploader.java:99)
>        at SampleUploader$Uploader.map(SampleUploader.java:1)
>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>        at 
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 2011-11-16 14:37:26,425 INFO org.apache.hadoop.mapred.TaskRunner: 
> Runnning cleanup for the task
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of 
> Jean-Daniel Cryans
> Sent: 15 November 2011 21:30
> To: [email protected]
> Subject: Re: n00b trying to run HBase example code
>
> If I remember correctly, this NPE doesn't come alone but you have to 
> go into that tasks' log to find the rest as the job output you have 
> only shows what ended up killing the task. Go to the jobtracker's web 
> ui, click on your job, click on the number of failed tasks, look at 
> the log of one of those tasks, and look for anything that looks bad at the
end of the log before the NPE.
>
> Hope this helps,
>
> J-D
>
> On Mon, Nov 14, 2011 at 3:30 PM, Royston Sellman 
> <[email protected]> wrote:
>> Apologies if this is the wrong forum, or has come up before, or is
> RTFM-class...
>>
>> I'm trying to run HBase example code. I think I have HBase running OK 
>> on
> my 6 node cluster: I can create tables with the HBase shell. I can 
> create tables with the ExampleClient from Chapter 13 of the Hadoop 
> book.  I can load the HBase Master: namenode:60000 web page and see tables
I've created.
> But when I try to run SampleUploader from the HBase examples I get the 
> (not very helpful to me) exception below. I get a similar error when 
> trying to run the HBseTemperatureImporter bulk load example from the 
> Hadoop book also i.e. mapred.JobClient failing 4 times before giving up.
>>
>> Thanks,
>> Royston.
>
>

Reply via email to