Also make sure that the $HBASE_HOME/hbase-<version>.jar, $HBASE_HOME/lib/zookeeper-<version>.jar, and the $HBASE_HOME/conf/ are all on the classpath in your $HADOOP_HOME/conf/hadoop-env.sh file. That configuration must be cluster wide.
With that, your map and reduce tasks can access zookeeper and hbase objects. You can then use the TableInputFormat with TableOutputFormat, or you can use TableInputFormat, and your reduce tasks can write data directly back into Hbase. You're problem, and your dataset, will dictate which of those methods is more efficient. Travis Hegner http://www.travishegner.com/ -----Original Message----- From: Andrey Stepachev [mailto:[email protected]] Sent: Monday, July 19, 2010 9:28 AM To: [email protected] Subject: Re: Run MR job when my data stays in hbase? 2010/7/19 elton sky <[email protected]>: > My question is if I wanna run the backgroup process as a MR job, can I get > data from hbase, rather than hdfs, with hadoop? How do I do that? > I appreciate if anyone can provide some simple example code. Look at org.apache.hadoop.hbase.mapreduce package in hbase sources and as real example: org.apache.hadoop.hbase.mapreduce.RowCounter The information contained in this communication is confidential and is intended only for the use of the named recipient. Unauthorized use, disclosure, or copying is strictly prohibited and may be unlawful. If you have received this communication in error, you should know that you are bound to confidentiality, and should please immediately notify the sender or our IT Department at 866.459.4599.
