Your second solution sounds quite similar to the bulk loader. Actually the bulk load is a bit simpler and bypasses even more of the regionserver's overhead:
http://hbase.apache.org/bulk-loads.html Using M/R it creates HFiles in HDFS directly, then add the Hfiles them to the existing regionservers. -chris On May 26, 2011, at 12:38 AM, Weihua JIANG wrote: > Hi all, > > As I know, WAL is used to ensure the data is safe even if certain RS > or the whole HBase cluster is down. But, it is anyway a burden on each > put. > > I am wondering: is there any way to disable WAL while keeping data safety. > > An ideal solution to me looks like this: > 1. clients continuely put records with WAL disabled. > 2. clients call a certain HBase method to ensure all the > previously-put records are safely stored persistently, then it can > remove the records at client side. > 3. on errror, client re-put the maybe-lost records. > > Or a slightly different solution is: > 1. clients continuely put records on HDFS using sequential file. > 2. clients periodly flush HDFS file and remove the previously put > records at client side. > 3. after all records are stored on HDFS, use a map-reduce job to put > the records into HBase with WAL disabled. > 4. before each map-reduce task finish, a certain HBase method is > called to flush the memory data onto HDFS. > 5. if on error, certain map-reduce task is re-executed (equvalent to > replay log). > > Is there any way to do so in HBase? If no, do you have any plan to > support such usage model in near future? > > > Thanks > Weihua
