Re: Is there any way to disable WAL while keeping data safety

Gan, Xiyun Mon, 30 May 2011 18:40:01 -0700

Thanks a lot

Is there any suggestion on the Region is not online Exception?


On Tue, May 31, 2011 at 9:36 AM, Joey Echeverria <[email protected]> wrote:

> If you have a well defined key space, you'll get better performance if
> you pre-split your table and use the TotalOrderPartitioner with your
> MapReduce job.
>
> You can see an example of pre-splitting here:
> http://hbase.apache.org/book.html#precreate.regions.
>
> -Joey
>
> On Mon, May 30, 2011 at 9:31 PM, Gan, Xiyun <[email protected]> wrote:
> > I used BulkLoad to import data. The step of writing HFiles using m/r is
> > fast, but the step of loading HFiles to hbase takes lots of time. It
> > says  HFile at ****** no longer fits inside a single region.
> Splitting....
> > Even worth, sometimes it throws Region is not online Exception.
> >
> > Thanks
> >
> > On Fri, May 27, 2011 at 1:18 PM, Chris Tarnas <[email protected]> wrote:
> >
> >> Yes, it does deal with data merging and yes, doing a major compaction
> would
> >> be needed to guarantee the store files are as small as possible.
> >>
> >> -chris
> >>
> >>
> >>
> >> On May 26, 2011, at 7:00 PM, Weihua JIANG <[email protected]>
> wrote:
> >>
> >> > Thanks. It seems quite useful.
> >> >
> >> > Does bulk load support data merging? I.e. there is a table with
> >> > existing data and I want to add more data into it. The new data row
> >> > key range is mixed with the existing data row key range. So, the final
> >> > effect is the new data shall be inserted into existing regions.
> >> >
> >> > If bulk load supports this feature, then it is the ideal solution to
> me?
> >> >
> >> > And do I need to perform a major compact after bulk load to ensure
> >> > store file number is small?
> >> >
> >> >
> >> > Thanks
> >> > Weihua
> >> >
> >> > 2011/5/27 Chris Tarnas <[email protected]>:
> >> >> Your second solution sounds quite similar to the bulk loader.
> Actually
> >> the bulk load is a bit simpler and bypasses even more of the
> regionserver's
> >> overhead:
> >> >>
> >> >> http://hbase.apache.org/bulk-loads.html
> >> >>
> >> >> Using M/R it creates HFiles in HDFS directly, then add the Hfiles
> them
> >> to the existing regionservers.
> >> >>
> >> >> -chris
> >> >>
> >> >>
> >> >> On May 26, 2011, at 12:38 AM, Weihua JIANG wrote:
> >> >>
> >> >>> Hi all,
> >> >>>
> >> >>> As I know, WAL is used to ensure the data is safe even if certain RS
> >> >>> or the whole HBase cluster is down. But, it is anyway a burden on
> each
> >> >>> put.
> >> >>>
> >> >>> I am wondering: is there any way to disable WAL while keeping data
> >> safety.
> >> >>>
> >> >>> An ideal solution to me looks like this:
> >> >>> 1. clients continuely put records with WAL disabled.
> >> >>> 2. clients call a certain HBase method to ensure all the
> >> >>> previously-put records are safely stored persistently, then it can
> >> >>> remove the records at client side.
> >> >>> 3. on errror, client re-put the maybe-lost records.
> >> >>>
> >> >>> Or a slightly different solution is:
> >> >>> 1. clients continuely put records on HDFS using sequential file.
> >> >>> 2. clients periodly flush HDFS file and remove the previously put
> >> >>> records at client side.
> >> >>> 3. after all records are stored on HDFS, use a map-reduce job to put
> >> >>> the records into HBase with WAL disabled.
> >> >>> 4. before each map-reduce task finish, a certain HBase method is
> >> >>> called to flush the memory data onto HDFS.
> >> >>> 5. if on error, certain map-reduce task is re-executed (equvalent to
> >> >>> replay log).
> >> >>>
> >> >>> Is there any way to do so in HBase? If no, do you have any plan to
> >> >>> support such usage model in near future?
> >> >>>
> >> >>>
> >> >>> Thanks
> >> >>> Weihua
> >> >>
> >> >>
> >>
> >
> >
> >
> > --
> > Best wishes
> > Gan, Xiyun
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>



-- 
Best wishes
Gan, Xiyun

Re: Is there any way to disable WAL while keeping data safety

Reply via email to