Re: performance of Get from MR Job

Jean-Daniel Cryans Wed, 20 Jun 2012 10:36:50 -0700

Yeah I've overlooked the versions issue.

What I usually recommend is that if the timestamp is part of your data
model, it should be in the row key, a qualifier or a value. Since you
seem to rely on the timestamp for querying, it should definitely be
part of the row key but not at the beginning like you proposed. See
http://hbase.apache.org/book.html#rowkey.design


J-D

On Tue, Jun 19, 2012 at 11:35 PM, Marcin Cylke <[email protected]> wrote:
> On 19/06/12 19:31, Jean-Daniel Cryans wrote:
>> This is a common but hard problem. I do not have a good answer.
>
> Thanks for Your writeup. You've given a few suggestions, that I will
> surely follow.
>
> But what is bothering me, is my use of timestamps. As mentioned before,
> my column family has 2147483646 versions allowed. I store data there
> using those timestamps - a few rows with the same key but different
> timestamp. Preparing GETs with timestamp, for TimeRange {0, Timestamp}
> my performance is slopy (~130/sec). But setting doing sth like
> {timestamp-10000, timestamp} results in great speed improvement (~400/sec).
>
> Despite the {timestamp-10000, timestamp} being unrealistic in my
> situation, the whole issue seems strange, and thus related in some way
> to the use of timestamps.
>
> Would You recommend trying with complex keys - build of timestamp+my
> current key? Or this shouldn't change that much?
>
>
>> Finally kind of like Paul said, if you can emit your rows and somehow
>> batch them reducer-side in order to either do short scans or multi-get
>> (see HTable.get(List<Get>)) it could be faster.
>
> I'll try this solution, but I'm not that optimistic about it. I'll let
> You know whether this helped or not.
>
> Regards
> Marcin
>

Re: performance of Get from MR Job

Reply via email to