Hey there, On Sun, Feb 19, 2012 at 1:44 PM, Manuel de Ferran <[email protected] > wrote:
> Greetings, > > on a testing platform (running HBase-0.90.3 on top of Hadoop-0.20-append), > we did the following : > - create a dummy table > - put a single row > - get this row from the shell > - wait a few minutes > - kill -9 the datanodes > > Because regionservers could not connect to datanodes, they shutdown. > > On restart, the row has vanished. But if we do the same and "flush 'dummy'" > from the Shell before killing the datanodes, the row is still there. > > Is it related to WAL ? MemStores ? What happened ? > > What are the recommended settings so rows are auto-flushed or at least > flushed more frequently ? > > > I can't speak for anyone else than me, but I do flush manually in sane intervals and depending on the amount of data that I put in. I typically store time series data in hbase and financial timeseries mean in my case intraday market data. I did some performance tests and found that flushing after every row insert kills write performance. Same is true if I write many thousand rows before I do a commit. I found a good balance (but that's data specific, I assume) in inserting 1000 rows and then flushing. Next 1000 rows, flushing. At the end of processing data, a final flush again. By doing so, I have never had any problems with lost data so far. Regards -- Ulrich Staudinger <http://goog_958005736>http://www.activequant.com Connect online: https://www.xing.com/profile/Ulrich_Staudinger
