Thanks, Dave. I have only 2 records in my HDFS file for testing.
Could you give an example of which setup and cleanup functions you are
referring to. This is my first MR HBase job using the new api.
The other commented code in the email thread below runs fine, but it is non-MR.
Is there any other setting needed for an MR job to update HBase?
The MR jobs are run against `hadoop jar <jar_name> <class_name>
<hdfs_file_name>`
And non-MR hbase jobs simply run against hbase `hbase <classname>`
I am suspecting that I am missing some setting. I made sure that the CLASSPATHs
are all good.
This is how I configure the MR job -
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set(TableOutputFormat.OUTPUT_TABLE, "blogposts");
Job job = new Job(conf, NAME);
FileInputFormat.addInputPath(job, new Path(args[0]));
job.setJarByClass(mapRedImport_from_hdfs.class);
job.setMapperClass(myMap.class);
job.setNumReduceTasks(0);
job.setOutputFormatClass(NullOutputFormat.class);
job.waitForCompletion(true);
return 0;
}
public static void main(String[] args) throws Exception {
int errCode = ToolRunner.run(new mapRedImport_from_hdfs(), args);
System.exit(errCode);
}
-----Original Message-----
From: Buttler, David [mailto:[email protected]]
Sent: Thursday, June 17, 2010 8:06 AM
To: [email protected]
Subject: RE: MapReduce job runs fine, but nothing is written to HTable
It looks to me as if you are not defining your input format correctly. Notice
that you only had two map input records.
Other issues:
You are not flushing
You are creating a new htable on each map. Put that in the setup and put the
flush in the cleanup
Dave
-----Original Message-----
From: Sharma, Avani [mailto:[email protected]]
Sent: Wednesday, June 16, 2010 7:06 PM
To: [email protected]
Subject: MapReduce job runs fine, but nothing is written to HTable
Hi,
I am running a job to write some data from a HDFS file to and Hbase table using
the new API.
The job runs fine without any errors, but I do not see the rows added to the
hbase table.
This is what my code looks like -
I am running this as hadoop jar <jar_file_name> <class_name> <hdfs_file_name>
private HTable table;
protected void map(ImmutableBytesWritable key, Text value,
Context context)
throws IOException, InterruptedException
{
table = new HTable( new HBaseConfiguration(),
"blogposts");
// Split input line on tab character
String [] splits = value.toString().split("\t");
String rowID = splits[0];
String cellValue = splits[1];
Put p = new Put(Bytes.toBytes(rowID));
p.add(Bytes.toBytes("post"), Bytes.toBytes("title"),
Bytes.toBytes(splits[1]));
table.put(p);
table.flushCommits();
}
/*
This commented code when run seprataely in a main program runs fine and
does update to the table
HTable table = new HTable(new HBaseConfiguration(), "blogposts");
Put p = new Put(Bytes.toBytes("post3"));
p.add(Bytes.toBytes("post"), Bytes.toBytes("title"),
Bytes.toBytes("abx"));
p.add(Bytes.toBytes("post"), Bytes.toBytes("author"),
Bytes.toBytes("hadings"));
p.add(Bytes.toBytes("image"), Bytes.toBytes("body"),
Bytes.toBytes("123.jpg"));
p.add(Bytes.toBytes("image"), Bytes.toBytes("header"),
Bytes.toBytes("7657.jpg"));
table.put(p);
*/
Run log
10/06/16 19:00:35 WARN mapred.JobClient: Use GenericOptionsParser for parsing
the arguments. Applications should implement Tool for the same.
10/06/16 19:00:35 INFO input.FileInputFormat: Total input paths to process : 1
10/06/16 19:00:36 INFO mapred.JobClient: Running job: job_201003301510_0157
10/06/16 19:00:37 INFO mapred.JobClient: map 0% reduce 0%
10/06/16 19:00:45 INFO mapred.JobClient: map 100% reduce 0%
10/06/16 19:00:47 INFO mapred.JobClient: Job complete: job_201003301510_0157
10/06/16 19:00:47 INFO mapred.JobClient: Counters: 5
10/06/16 19:00:47 INFO mapred.JobClient: Job Counters
10/06/16 19:00:47 INFO mapred.JobClient: Rack-local map tasks=1
10/06/16 19:00:47 INFO mapred.JobClient: Launched map tasks=1
10/06/16 19:00:47 INFO mapred.JobClient: FileSystemCounters
10/06/16 19:00:47 INFO mapred.JobClient: HDFS_BYTES_READ=31
10/06/16 19:00:47 INFO mapred.JobClient: Map-Reduce Framework
10/06/16 19:00:47 INFO mapred.JobClient: Map input records=2
10/06/16 19:00:47 INFO mapred.JobClient: Spilled Records=0
Thanks,
Avani