What you need? bulk-upload, in the scheme of things, is a well documented feature. Its also one that has had some exercise and is known to work well. For a 0.89 release and trunk, documentation is here: http://hbase.apache.org/docs/r0.89.20100924/bulk-loads.html. The unit test you refer to below is good for figuring how to run a job (Bulk-upload was redone for 0.89/trunk and is much improved over what was available in 0.20.x)
On totalorderpartition, this is a partitioner class from hadoop. The MR partitioner -- the class that dictates which reducers get what map outputs -- is pluggable. The default partitioner does a hash of the output key to figure which reducer. This won't work if you are looking to have your hfile output totally sorted. If you can't figure what its about, I'd suggest you check out the hadoop book where it gets a good explication. On incremental upload, the doc. suggests you look at the output for LoadIncrementalHFiles command. Have you done that? You run the command and it'll add in whatever is ready for loading. St.Ack On Wed, Nov 10, 2010 at 6:47 AM, Shuja Rehman <[email protected]> wrote: > Hey Community, > > Well...it seems that nobody has experienced with the bulk load option. I > have found one class which might help to write the code for it. > > https://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java > > From this, you can get the idea how to write map reduce job to output in > HFiles format. But There is a little confusion about these two things > > 1-TotalOrderPartitioner > 2-configureIncrementalLoad > > Does anybody have idea about how these things and how to configure it for > the job? > > Thanks > > > > On Wed, Nov 10, 2010 at 1:02 AM, Shuja Rehman <[email protected]> wrote: > >> Hi >> >> I am trying to investigate the bulk load option as described in the >> following link. >> >> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html >> >> Does anybody have sample code or have used it before? >> Can it be helpful to insert data into existing table. In my scenario, I >> have one table with 1 column family in which data will be inserted every 15 >> minutes. >> >> Kindly share your experiences >> >> Thanks >> -- >> Regards >> Shuja-ur-Rehman Baig >> <http://pk.linkedin.com/in/shujamughal> >> >> > > > -- > Regards > Shuja-ur-Rehman Baig > <http://pk.linkedin.com/in/shujamughal> >
