On Mon, Jan 31, 2011 at 5:13 PM, Jim X <[email protected]> wrote: > Does Htable.getWriteBuffer() do a roll back? > > I guess not --- this only allows you to know what has not been successfully committed to the server after you catch the exception.
Correct me if I am wrong. Sean > Jim > > On Mon, Jan 31, 2011 at 8:04 PM, Ryan Rawson <[email protected]> wrote: > > When you are using the buffer, you also need to flush it: > > > > htable.flushCommits(); > > > > If the call succeeds, the edits were persisted. If at any point you > > get exceptions, the unfinished edits are left in the write buffer and > > htable.getWriteBuffer() gets you them. > > > > -ryan > > > > On Mon, Jan 31, 2011 at 10:48 AM, Sean Bigdatafun > > <[email protected]> wrote: > >> On Fri, Jan 14, 2011 at 10:51 PM, tsuna <[email protected]> wrote: > >> > >>> On Fri, Jan 14, 2011 at 4:06 PM, Sean Bigdatafun > >>> <[email protected]> wrote: > >>> > But how can the client understand which k-v belongs to an individual > RS? > >>> > Does it need to scan the .META. table? (if so, it's an expensive op). > On > >>> the > >>> > RegionServer side, is it like processing multiple requests in a batch > per > >>> > RPC? > >>> > >>> The client has to figure out which region each edit has to go to. The > >>> client maintains a local cache of the META table, so when you > >>> frequently use the same working set of regions (which is common for > >>> most applications), the lookups are essentially free. > >>> > >>> The worst case is a client that does random-writes to all the regions > >>> in a huge table. In this case, the client will end up discovering the > >>> location of all the regions of that table and keep this in its > >>> in-memory cache. But regions move around, are split etc. This does > >>> cause extra META lookups, but the latency for a META lookup is > >>> typically very small (even though the penalty incurred by the client > >>> compared to cache hits in its local META cache is huge, comparatively > >>> speaking). Note that right now neither HTable nor asynchbase > >>> pro-actively evict unused entries from the local META cache to save > >>> memory. I don't think anyone is running HBase at a scale where this > >>> optimization would be useful. > >>> > >>> If you have a write-heavy application, you're always going to get > >>> significantly higher throughput when you send your edits in batch to > >>> the server. The downside to this is that when your client application > >>> dies, you lose all the edits in the un-committed batch. Unlike > >>> HTable, asynchbase puts an upper bound on the amount of time an edit > >>> is allowed to remain in the client's buffer, which helps limit > >>> data-loss when a client crashes (OpenTSDB sets this to 1s by default, > >>> so when it dies, you know you lost at most 1s worth of datapoints). > >>> > >> * > >> > >> setWriteBufferSize(1024*1014*10); // 10MB > >> > >> * > >> > >> *setAutoFlush(false*); > >> > >> for(i=0; i<N; i++) { > >> > >> list.add(putitem[i]); > >> > >> } > >> > >> htable.put(list); > >> > >> > >> For the above pseudo code (using put(List) to commit update in HBase), > can I > >> get a "batch transaction" success notification? > >> * i.e., How can I know all the items have been successfully > >> committed? -- it seems that I can't get such information, all are > >> best-effort. Should I know some commits fail, I can do an > application-level > >> retry. > >> * *setAutoFlush(true*); does not seem to help us to get any more > >> reliable operation either. > >> > >> > >> > >> > >> > >>> > >>> -- > >>> Benoit "tsuna" Sigoure > >>> Software Engineer @ www.StumbleUpon.com > >>> > >> > >> > >> > >> -- > >> --Sean > >> > > > -- --Sean
