I used BulkLoad to import data. The step of writing HFiles using m/r is
fast, but the step of loading HFiles to hbase takes lots of time. It
says  HFile at ****** no longer fits inside a single region. Splitting....
Even worth, sometimes it throws Region is not online Exception.

Thanks

On Fri, May 27, 2011 at 1:18 PM, Chris Tarnas <[email protected]> wrote:

> Yes, it does deal with data merging and yes, doing a major compaction would
> be needed to guarantee the store files are as small as possible.
>
> -chris
>
>
>
> On May 26, 2011, at 7:00 PM, Weihua JIANG <[email protected]> wrote:
>
> > Thanks. It seems quite useful.
> >
> > Does bulk load support data merging? I.e. there is a table with
> > existing data and I want to add more data into it. The new data row
> > key range is mixed with the existing data row key range. So, the final
> > effect is the new data shall be inserted into existing regions.
> >
> > If bulk load supports this feature, then it is the ideal solution to me?
> >
> > And do I need to perform a major compact after bulk load to ensure
> > store file number is small?
> >
> >
> > Thanks
> > Weihua
> >
> > 2011/5/27 Chris Tarnas <[email protected]>:
> >> Your second solution sounds quite similar to the bulk loader. Actually
> the bulk load is a bit simpler and bypasses even more of the regionserver's
> overhead:
> >>
> >> http://hbase.apache.org/bulk-loads.html
> >>
> >> Using M/R it creates HFiles in HDFS directly, then add the Hfiles them
> to the existing regionservers.
> >>
> >> -chris
> >>
> >>
> >> On May 26, 2011, at 12:38 AM, Weihua JIANG wrote:
> >>
> >>> Hi all,
> >>>
> >>> As I know, WAL is used to ensure the data is safe even if certain RS
> >>> or the whole HBase cluster is down. But, it is anyway a burden on each
> >>> put.
> >>>
> >>> I am wondering: is there any way to disable WAL while keeping data
> safety.
> >>>
> >>> An ideal solution to me looks like this:
> >>> 1. clients continuely put records with WAL disabled.
> >>> 2. clients call a certain HBase method to ensure all the
> >>> previously-put records are safely stored persistently, then it can
> >>> remove the records at client side.
> >>> 3. on errror, client re-put the maybe-lost records.
> >>>
> >>> Or a slightly different solution is:
> >>> 1. clients continuely put records on HDFS using sequential file.
> >>> 2. clients periodly flush HDFS file and remove the previously put
> >>> records at client side.
> >>> 3. after all records are stored on HDFS, use a map-reduce job to put
> >>> the records into HBase with WAL disabled.
> >>> 4. before each map-reduce task finish, a certain HBase method is
> >>> called to flush the memory data onto HDFS.
> >>> 5. if on error, certain map-reduce task is re-executed (equvalent to
> >>> replay log).
> >>>
> >>> Is there any way to do so in HBase? If no, do you have any plan to
> >>> support such usage model in near future?
> >>>
> >>>
> >>> Thanks
> >>> Weihua
> >>
> >>
>



-- 
Best wishes
Gan, Xiyun

Reply via email to