Simply don't set your status to 0 when you write it first. Absence mean not read. 1 mean read. So there is no risk that someone try to set 0 and someone else try to set 1.
Will that be an option? 2014-04-28 21:23 GMT-04:00 Li Li <[email protected]>: > I am using hbase to store information for a web spider. > I have a table to save information of a webpage, the rowkey is url, > and there are other columns such as status(int) and depth(int) > in the beginning, the status is 0. A worker thread will select urls > whose status is 0 and do something with it and modify it to 1,... > there are more than 1 urls link to a given url. > e.g. url1->url url2->url > there are two times insertion of url. If I do not use checkAndPut, > when thread 1 insert url and the worker thread do something with url > and modify its status to 1. Then thread 2 again insert url and reset > the status to 0, then the worker thread will do somthing again. That's > not I want. > > On Tue, Apr 29, 2014 at 8:56 AM, Jean-Marc Spaggiari > <[email protected]> wrote: > > Why do you want to make sure the row is only inserted once? If you insert > > the same raw twice the 2nd one will simple overwrite the first one and > > HBase will take care of the versions. > > > > regarding the codes fragments, I don't think the autoflush is going to > do a > > big difference compared to the cost of the check & put... > > > > > > 2014-04-28 20:50 GMT-04:00 Li Li <[email protected]>: > > > >> I must use checkAndPut to ensure a row is only inserted once. > >> if I have 1000 checkAndPut,will setAutoFlush(false) useful? > >> is there any performance difference of the following two code fragments? > >> 1. > >> table.setAutoFlush(false); > >> for(int i=0;i<1000;i++){ > >> Put put=... > >> table.checkAndPut(,....put); > >> } > >> 2. > >> table.setAutoFlush(true); > >> for(int i=0;i<1000;i++){ > >> Put put=... > >> table.checkAndPut(,....put); > >> } > >> > >> On Tue, Apr 29, 2014 at 8:36 AM, Jean-Marc Spaggiari > >> <[email protected]> wrote: > >> > It depends. Batch a list of puts/gets wll be way faster than > checkAndPut, > >> > but the result will not be the same... a batch of puts will not do any > >> > check... > >> > > >> > > >> > 2014-04-28 20:17 GMT-04:00 Li Li <[email protected]>: > >> > > >> >> but I have many checkAndPut operations. > >> >> will use batch a better solution? > >> >> > >> >> On Mon, Apr 28, 2014 at 8:01 PM, Jean-Marc Spaggiari > >> >> <[email protected]> wrote: > >> >> > Hi Li Li, > >> >> > > >> >> > Yes, threads will impact the performances. If you send all you > writes > >> >> with > >> >> > a single thread, a single HBase handler will take care of them, > etc. > >> >> HBase > >> >> > does not provide a single handler for a single client connexion. > It's > >> >> able > >> >> > to handle multiple threads and clients. > >> >> > > >> >> > However, it also all depends on the way you send your writes. If > you > >> >> send a > >> >> > single puts(<10000>) per seconds, if will not be better to send 10 > 000 > >> >> > threads with a single put. > >> >> > > >> >> > I will recommend you to run some perf tests on your installation to > >> find > >> >> a > >> >> > good number for your configuration. > >> >> > > >> >> > JM > >> >> > > >> >> > > >> >> > 2014-04-28 6:27 GMT-04:00 Li Li <[email protected]>: > >> >> > > >> >> >> hi all, > >> >> >> with the same read/write data, will threads count affect > >> performance? > >> >> >> e.g. I have 10,000 write request/second. I don't care the order > >> very > >> >> >> much. > >> >> >> how many writer threads should I use to obtain maximum > throughput? > >> >> >> > >> >> > >> >
