Hi
I am reading data from raw xml files and inserting data into hbase using
TableOutputFormat in a map reduce job. but due to heavy put statements, it
takes many hours to process the data. here is my sample code.
conf.set(TableOutputFormat.OUTPUT_TABLE, "mytable");
conf.set("xmlinput.start", "<adc>");
conf.set("xmlinput.end", "</adc>");
conf
.set(
"io.serializations",
"org.apache.hadoop.io.serializer.JavaSerialization,org.apache.hadoop.io.serializer.WritableSerialization");
Job job = new Job(conf, "Populate Table with Data");
FileInputFormat.setInputPaths(job, input);
job.setJarByClass(ParserDriver.class);
job.setMapperClass(MyParserMapper.class);
job.setNumReduceTasks(0);
job.setInputFormatClass(XmlInputFormat.class);
job.setOutputFormatClass(TableOutputFormat.class);
*and mapper code*
public class MyParserMapper extends
Mapper<LongWritable, Text, NullWritable, Writable> {
@Override
public void map(LongWritable key, Text value1,Context context)
throws IOException, InterruptedException {
*//doing some processing*
while(rItr.hasNext())
{
* //and this put statement runs for 132,622,560 times to
insert the data.*
context.write(NullWritable.get(), new
Put(rowId).add(Bytes.toBytes("CounterValues"),
Bytes.toBytes(counter.toString()), Bytes.toBytes(rElement.getTextTrim())));
}
}}
Is there any other way of doing this task so i can improve the performance?
--
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>