Also make sure that the $HBASE_HOME/hbase-<version>.jar, 
$HBASE_HOME/lib/zookeeper-<version>.jar, and the $HBASE_HOME/conf/ are all on 
the classpath in your $HADOOP_HOME/conf/hadoop-env.sh file. That configuration 
must be cluster wide.

With that, your map and reduce tasks can access zookeeper and hbase objects. 
You can then use the TableInputFormat with TableOutputFormat, or you can use 
TableInputFormat, and your reduce tasks can write data directly back into 
Hbase. You're problem, and your dataset, will dictate which of those methods is 
more efficient.

Travis Hegner
http://www.travishegner.com/

-----Original Message-----
From: Andrey Stepachev [mailto:[email protected]]
Sent: Monday, July 19, 2010 9:28 AM
To: [email protected]
Subject: Re: Run MR job when my data stays in hbase?

2010/7/19 elton sky <[email protected]>:

> My question is if I wanna run the backgroup process as a MR job, can I get
> data from hbase, rather than hdfs, with hadoop? How do I do that?
> I appreciate if anyone can provide some simple example code.

Look at org.apache.hadoop.hbase.mapreduce package in hbase sources
and as real example: org.apache.hadoop.hbase.mapreduce.RowCounter

The information contained in this communication is confidential and is intended 
only for the use of the named recipient.  Unauthorized use, disclosure, or 
copying is strictly prohibited and may be unlawful.  If you have received this 
communication in error, you should know that you are bound to confidentiality, 
and should please immediately notify the sender or our IT Department at  
866.459.4599.

Reply via email to