not yet, can u explain it more how to do it? thnx On Fri, Nov 5, 2010 at 8:28 PM, Buttler, David <[email protected]> wrote:
> Have you tried turning off auto flush, and managing the flush in your own > code (say every 1000 puts?) > Dave > > > -----Original Message----- > From: Shuja Rehman [mailto:[email protected]] > Sent: Friday, November 05, 2010 8:04 AM > To: [email protected] > Subject: Re: Best Way to Insert data into Hbase using Map Reduce > > Michael > > hum....so u are storing xml record in the hbase and in second job, u r > parsing. but in my case i am parsing it also in first phase. what i do, i > get xml file and i parse it using jdom and then putting data in hbase. so > parsing+putting both operations are in 1 phase and in mapper code. > > My actual problem is that after parsing file, i need to use put statement > millions of times and i think for each statement it connects to hbase and > then insert it and this might be the reason of slow processing. So i am > trying to figure out some way we i can first buffer data and then insert in > batch fashion. it means in one put statement, i can insert many records and > i think if i do in this way then the process will be very fast. > > secondly what does it means? "we write the raw record in via a single put() > so the map() method is a null writable." > > can u explain it more? > > Thanks > > > On Fri, Nov 5, 2010 at 5:05 PM, Michael Segel <[email protected] > >wrote: > > > > > Suja, > > > > Just did a quick glance. > > > > What is it that you want to do exactly? > > > > Here's how we do it... (at a high level.) > > > > Input is an XML file where we want to store the raw XML records in hbase, > > one record per row. > > > > Instead of using the output of the map() method, we write the raw record > in > > via a single put() so the map() method is a null writable. > > > > Its pretty fast. However fast is relative. > > > > Another thing... we store the xml record as a string (converted to > > bytecode) rather than a serialized object. > > > > Then you can break it down in to individual fields in a second batch job. > > (You can start with a DOM parser, and later move to a Stax parser. > > Depending on which DOM parser you have and the size of the record, it > should > > be 'fast enough'. A good implementation of Stax tends to be > > recursive/re-entrant code which is harder to maintain.) > > > > HTH > > > > -Mike > > > > > > > Date: Fri, 5 Nov 2010 16:13:02 +0500 > > > Subject: Best Way to Insert data into Hbase using Map Reduce > > > From: [email protected] > > > To: [email protected] > > > > > > Hi > > > > > > I am reading data from raw xml files and inserting data into hbase > using > > > TableOutputFormat in a map reduce job. but due to heavy put statements, > > it > > > takes many hours to process the data. here is my sample code. > > > > > > conf.set(TableOutputFormat.OUTPUT_TABLE, "mytable"); > > > conf.set("xmlinput.start", "<adc>"); > > > conf.set("xmlinput.end", "</adc>"); > > > conf > > > .set( > > > "io.serializations", > > > > > > > > > "org.apache.hadoop.io.serializer.JavaSerialization,org.apache.hadoop.io.serializer.WritableSerialization"); > > > > > > Job job = new Job(conf, "Populate Table with Data"); > > > > > > FileInputFormat.setInputPaths(job, input); > > > job.setJarByClass(ParserDriver.class); > > > job.setMapperClass(MyParserMapper.class); > > > job.setNumReduceTasks(0); > > > job.setInputFormatClass(XmlInputFormat.class); > > > job.setOutputFormatClass(TableOutputFormat.class); > > > > > > > > > *and mapper code* > > > > > > public class MyParserMapper extends > > > Mapper<LongWritable, Text, NullWritable, Writable> { > > > > > > @Override > > > public void map(LongWritable key, Text value1,Context context) > > > > > > throws IOException, InterruptedException { > > > *//doing some processing* > > > while(rItr.hasNext()) > > > { > > > * //and this put statement runs for 132,622,560 times > > to > > > insert the data.* > > > context.write(NullWritable.get(), new > > > Put(rowId).add(Bytes.toBytes("CounterValues"), > > > Bytes.toBytes(counter.toString()), > > Bytes.toBytes(rElement.getTextTrim()))); > > > > > > } > > > > > > }} > > > > > > Is there any other way of doing this task so i can improve the > > performance? > > > > > > > > > -- > > > Regards > > > Shuja-ur-Rehman Baig > > > <http://BLOCKEDpk.linkedin.com/in/shujamughal> > > > > > > > -- > Regards > Shuja-ur-Rehman Baig > <http://BLOCKEDpk.linkedin.com/in/shujamughal> > -- Regards Shuja-ur-Rehman Baig <http://pk.linkedin.com/in/shujamughal>
