Re: Is there any way to disable WAL while keeping data safety

Weihua JIANG Thu, 26 May 2011 19:00:52 -0700

Thanks. It seems quite useful.

Does bulk load support data merging? I.e. there is a table with
existing data and I want to add more data into it. The new data row
key range is mixed with the existing data row key range. So, the final
effect is the new data shall be inserted into existing regions.


If bulk load supports this feature, then it is the ideal solution to me?

And do I need to perform a major compact after bulk load to ensure
store file number is small?


Thanks
Weihua

2011/5/27 Chris Tarnas <[email protected]>:
> Your second solution sounds quite similar to the bulk loader. Actually the 
> bulk load is a bit simpler and bypasses even more of the regionserver's 
> overhead:
>
> http://hbase.apache.org/bulk-loads.html
>
> Using M/R it creates HFiles in HDFS directly, then add the Hfiles them to the 
> existing regionservers.
>
> -chris
>
>
> On May 26, 2011, at 12:38 AM, Weihua JIANG wrote:
>
>> Hi all,
>>
>> As I know, WAL is used to ensure the data is safe even if certain RS
>> or the whole HBase cluster is down. But, it is anyway a burden on each
>> put.
>>
>> I am wondering: is there any way to disable WAL while keeping data safety.
>>
>> An ideal solution to me looks like this:
>> 1. clients continuely put records with WAL disabled.
>> 2. clients call a certain HBase method to ensure all the
>> previously-put records are safely stored persistently, then it can
>> remove the records at client side.
>> 3. on errror, client re-put the maybe-lost records.
>>
>> Or a slightly different solution is:
>> 1. clients continuely put records on HDFS using sequential file.
>> 2. clients periodly flush HDFS file and remove the previously put
>> records at client side.
>> 3. after all records are stored on HDFS, use a map-reduce job to put
>> the records into HBase with WAL disabled.
>> 4. before each map-reduce task finish, a certain HBase method is
>> called to flush the memory data onto HDFS.
>> 5. if on error, certain map-reduce task is re-executed (equvalent to
>> replay log).
>>
>> Is there any way to do so in HBase? If no, do you have any plan to
>> support such usage model in near future?
>>
>>
>> Thanks
>> Weihua
>
>

Re: Is there any way to disable WAL while keeping data safety

Reply via email to