Does Htable.getWriteBuffer() do a roll back? Jim
On Mon, Jan 31, 2011 at 8:04 PM, Ryan Rawson <[email protected]> wrote: > When you are using the buffer, you also need to flush it: > > htable.flushCommits(); > > If the call succeeds, the edits were persisted. If at any point you > get exceptions, the unfinished edits are left in the write buffer and > htable.getWriteBuffer() gets you them. > > -ryan > > On Mon, Jan 31, 2011 at 10:48 AM, Sean Bigdatafun > <[email protected]> wrote: >> On Fri, Jan 14, 2011 at 10:51 PM, tsuna <[email protected]> wrote: >> >>> On Fri, Jan 14, 2011 at 4:06 PM, Sean Bigdatafun >>> <[email protected]> wrote: >>> > But how can the client understand which k-v belongs to an individual RS? >>> > Does it need to scan the .META. table? (if so, it's an expensive op). On >>> the >>> > RegionServer side, is it like processing multiple requests in a batch per >>> > RPC? >>> >>> The client has to figure out which region each edit has to go to. The >>> client maintains a local cache of the META table, so when you >>> frequently use the same working set of regions (which is common for >>> most applications), the lookups are essentially free. >>> >>> The worst case is a client that does random-writes to all the regions >>> in a huge table. In this case, the client will end up discovering the >>> location of all the regions of that table and keep this in its >>> in-memory cache. But regions move around, are split etc. This does >>> cause extra META lookups, but the latency for a META lookup is >>> typically very small (even though the penalty incurred by the client >>> compared to cache hits in its local META cache is huge, comparatively >>> speaking). Note that right now neither HTable nor asynchbase >>> pro-actively evict unused entries from the local META cache to save >>> memory. I don't think anyone is running HBase at a scale where this >>> optimization would be useful. >>> >>> If you have a write-heavy application, you're always going to get >>> significantly higher throughput when you send your edits in batch to >>> the server. The downside to this is that when your client application >>> dies, you lose all the edits in the un-committed batch. Unlike >>> HTable, asynchbase puts an upper bound on the amount of time an edit >>> is allowed to remain in the client's buffer, which helps limit >>> data-loss when a client crashes (OpenTSDB sets this to 1s by default, >>> so when it dies, you know you lost at most 1s worth of datapoints). >>> >> * >> >> setWriteBufferSize(1024*1014*10); // 10MB >> >> * >> >> *setAutoFlush(false*); >> >> for(i=0; i<N; i++) { >> >> list.add(putitem[i]); >> >> } >> >> htable.put(list); >> >> >> For the above pseudo code (using put(List) to commit update in HBase), can I >> get a "batch transaction" success notification? >> * i.e., How can I know all the items have been successfully >> committed? -- it seems that I can't get such information, all are >> best-effort. Should I know some commits fail, I can do an application-level >> retry. >> * *setAutoFlush(true*); does not seem to help us to get any more >> reliable operation either. >> >> >> >> >> >>> >>> -- >>> Benoit "tsuna" Sigoure >>> Software Engineer @ www.StumbleUpon.com >>> >> >> >> >> -- >> --Sean >> >
