Re: Is there any way to disable WAL while keeping data safety

Joey Echeverria Mon, 30 May 2011 18:36:29 -0700

If you have a well defined key space, you'll get better performance if
you pre-split your table and use the TotalOrderPartitioner with your
MapReduce job.


You can see an example of pre-splitting here:
http://hbase.apache.org/book.html#precreate.regions.

-Joey

On Mon, May 30, 2011 at 9:31 PM, Gan, Xiyun <[email protected]> wrote:
> I used BulkLoad to import data. The step of writing HFiles using m/r is
> fast, but the step of loading HFiles to hbase takes lots of time. It
> says  HFile at ****** no longer fits inside a single region. Splitting....
> Even worth, sometimes it throws Region is not online Exception.
>
> Thanks
>
> On Fri, May 27, 2011 at 1:18 PM, Chris Tarnas <[email protected]> wrote:
>
>> Yes, it does deal with data merging and yes, doing a major compaction would
>> be needed to guarantee the store files are as small as possible.
>>
>> -chris
>>
>>
>>
>> On May 26, 2011, at 7:00 PM, Weihua JIANG <[email protected]> wrote:
>>
>> > Thanks. It seems quite useful.
>> >
>> > Does bulk load support data merging? I.e. there is a table with
>> > existing data and I want to add more data into it. The new data row
>> > key range is mixed with the existing data row key range. So, the final
>> > effect is the new data shall be inserted into existing regions.
>> >
>> > If bulk load supports this feature, then it is the ideal solution to me?
>> >
>> > And do I need to perform a major compact after bulk load to ensure
>> > store file number is small?
>> >
>> >
>> > Thanks
>> > Weihua
>> >
>> > 2011/5/27 Chris Tarnas <[email protected]>:
>> >> Your second solution sounds quite similar to the bulk loader. Actually
>> the bulk load is a bit simpler and bypasses even more of the regionserver's
>> overhead:
>> >>
>> >> http://hbase.apache.org/bulk-loads.html
>> >>
>> >> Using M/R it creates HFiles in HDFS directly, then add the Hfiles them
>> to the existing regionservers.
>> >>
>> >> -chris
>> >>
>> >>
>> >> On May 26, 2011, at 12:38 AM, Weihua JIANG wrote:
>> >>
>> >>> Hi all,
>> >>>
>> >>> As I know, WAL is used to ensure the data is safe even if certain RS
>> >>> or the whole HBase cluster is down. But, it is anyway a burden on each
>> >>> put.
>> >>>
>> >>> I am wondering: is there any way to disable WAL while keeping data
>> safety.
>> >>>
>> >>> An ideal solution to me looks like this:
>> >>> 1. clients continuely put records with WAL disabled.
>> >>> 2. clients call a certain HBase method to ensure all the
>> >>> previously-put records are safely stored persistently, then it can
>> >>> remove the records at client side.
>> >>> 3. on errror, client re-put the maybe-lost records.
>> >>>
>> >>> Or a slightly different solution is:
>> >>> 1. clients continuely put records on HDFS using sequential file.
>> >>> 2. clients periodly flush HDFS file and remove the previously put
>> >>> records at client side.
>> >>> 3. after all records are stored on HDFS, use a map-reduce job to put
>> >>> the records into HBase with WAL disabled.
>> >>> 4. before each map-reduce task finish, a certain HBase method is
>> >>> called to flush the memory data onto HDFS.
>> >>> 5. if on error, certain map-reduce task is re-executed (equvalent to
>> >>> replay log).
>> >>>
>> >>> Is there any way to do so in HBase? If no, do you have any plan to
>> >>> support such usage model in near future?
>> >>>
>> >>>
>> >>> Thanks
>> >>> Weihua
>> >>
>> >>
>>
>
>
>
> --
> Best wishes
> Gan, Xiyun
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Is there any way to disable WAL while keeping data safety

Reply via email to