bq. write performance would be lower The above means poorer performance.
bq. I could batch them up application side Please do that. bq. I guess there is no way to turn that off? That's right. On Mon, Apr 15, 2013 at 11:15 AM, Kireet <[email protected]> wrote: > > > > Thanks for the reply. "write performance would be lower" -> this means > better? > > Also I think I used the wrong terminology regarding batching. I meant to > ask if it uses the client side write buffer. I would think not since the > append() method returns a Result. I could batch them up application side I > suppose. Append also seems to return the updated value. This seems like a > lot of unnecessary I/O in my case since I am not immediately interested in > the updated value. I guess there is no way to turn that off? > > > On 4/15/13 1:28 PM, Ted Yu wrote: > >> I assume you would select HBase 0.94.6.1 (the latest release) for this >> project. >> >> For #1, write performance would be lower if you choose to use Append (vs. >> using Put). >> >> bq. Can appends be batched by the client or do they execute immediately? >> This depends on your use case. Take a look at the following method in >> HTable where you can send a list of actions (Appends): >> >> public void batch(final List<?extends Row> actions, final Object[] >> results) >> For #2 >> bq. The other would be to prefix the timestamp row key with a random >> leading byte. >> >> This technique has been used elsewhere and is better than the first one. >> >> Cheers >> >> On Mon, Apr 15, 2013 at 6:09 AM, Kireet Reddy <kireet-Teh5dPVPL8nQT0dZR+* >> *[email protected] <kireet-teh5dpvpl8nqt0dzr%[email protected]>> >> wrote: >> >> I are planning to create a "scheduled task list" table in our hbase >>> cluster. Essentially we will define a table with key timestamp and then >>> the >>> row contents will be all the tasks that need to be processed within that >>> second (or whatever time period). I am trying to do the "reasonably wide >>> rows" design mentioned in the hbasecon opentsdb talk. A couple of >>> questions: >>> >>> 1. Should we use append or put to create tasks? Since these rows will not >>> live forever, storage space in not a concern, read/write performance is >>> more important. As concurrency increases I would guess the row lock may >>> become an issue in append? Can appends be batched by the client or do >>> they >>> execute immediately? >>> >>> 2. I am a little worried about hotspots. This basic design may cause >>> issues in terms of the table's performance. Many tasks will execute and >>> reschedule themselves using the same interval, t + 1 hour for example. So >>> many the writes may all go to the same block. Also, we have a lot of >>> other >>> data so I am worried it may impact performance of unrelated data if the >>> region server gets too busy servicing the task list table. I can think >>> of 2 >>> strategies to avoid this. One would be to create N different tables and >>> read/write tasks to them randomly. This may spread load across servers, >>> but >>> there is no guarantee hbase will place the tables on different region >>> servers, correct? The other would be to prefix the timestamp row key >>> with a >>> random leading byte. Then when reading from the task list table, >>> consumers >>> could scan from any/all possible values of the random byte + current >>> timestamp to obtain tasks. Both strategies seem like they could spread >>> out >>> load, but at the cost of more work/complexity to read tasks from the >>> table. >>> Do either of those approaches make sense? >>> >>> On the read side, it seems like a similar problem exists in that all >>> consumers will be reading rows based on the current timestamp. Is this >>> good >>> because the block will very likely be cached or bad because the region >>> server may become overloaded? I have a feeling the answer is going to be >>> "it depends". :) >>> >>> I did see the previous posts on queues and the tips there - use zookeeper >>> for coordination, schedule major compactions, etc. Sorry if these >>> questions >>> are basic, I am pretty new to hbase. Thanks! >>> >> >> > >
