When you are using the buffer, you also need to flush it: htable.flushCommits();
If the call succeeds, the edits were persisted. If at any point you get exceptions, the unfinished edits are left in the write buffer and htable.getWriteBuffer() gets you them. -ryan On Mon, Jan 31, 2011 at 10:48 AM, Sean Bigdatafun <[email protected]> wrote: > On Fri, Jan 14, 2011 at 10:51 PM, tsuna <[email protected]> wrote: > >> On Fri, Jan 14, 2011 at 4:06 PM, Sean Bigdatafun >> <[email protected]> wrote: >> > But how can the client understand which k-v belongs to an individual RS? >> > Does it need to scan the .META. table? (if so, it's an expensive op). On >> the >> > RegionServer side, is it like processing multiple requests in a batch per >> > RPC? >> >> The client has to figure out which region each edit has to go to. The >> client maintains a local cache of the META table, so when you >> frequently use the same working set of regions (which is common for >> most applications), the lookups are essentially free. >> >> The worst case is a client that does random-writes to all the regions >> in a huge table. In this case, the client will end up discovering the >> location of all the regions of that table and keep this in its >> in-memory cache. But regions move around, are split etc. This does >> cause extra META lookups, but the latency for a META lookup is >> typically very small (even though the penalty incurred by the client >> compared to cache hits in its local META cache is huge, comparatively >> speaking). Note that right now neither HTable nor asynchbase >> pro-actively evict unused entries from the local META cache to save >> memory. I don't think anyone is running HBase at a scale where this >> optimization would be useful. >> >> If you have a write-heavy application, you're always going to get >> significantly higher throughput when you send your edits in batch to >> the server. The downside to this is that when your client application >> dies, you lose all the edits in the un-committed batch. Unlike >> HTable, asynchbase puts an upper bound on the amount of time an edit >> is allowed to remain in the client's buffer, which helps limit >> data-loss when a client crashes (OpenTSDB sets this to 1s by default, >> so when it dies, you know you lost at most 1s worth of datapoints). >> > * > > setWriteBufferSize(1024*1014*10); // 10MB > > * > > *setAutoFlush(false*); > > for(i=0; i<N; i++) { > > list.add(putitem[i]); > > } > > htable.put(list); > > > For the above pseudo code (using put(List) to commit update in HBase), can I > get a "batch transaction" success notification? > * i.e., How can I know all the items have been successfully > committed? -- it seems that I can't get such information, all are > best-effort. Should I know some commits fail, I can do an application-level > retry. > * *setAutoFlush(true*); does not seem to help us to get any more > reliable operation either. > > > > > >> >> -- >> Benoit "tsuna" Sigoure >> Software Engineer @ www.StumbleUpon.com >> > > > > -- > --Sean >
