Hi Stack, Thanks for the response and confirmation. I went down the co-processor route and came to the same conclusion re: performance for increments.
Unfortunately the usecase generates a large number of reads and writes so I’ll use the put variant to flag the relationships for now and deal with counts/aggregates in some other way. Wishing everyone a Merry Christmas and a Happy New Year! -Alex. > On Dec 19, 2017, at 9:52 AM, Stack <[email protected]> wrote: > >> On Mon, Dec 18, 2017 at 8:17 PM, Alex Loffler <[email protected]> wrote: >> >> Hi Stack, >> >> Thanks for the response, I am trying to maintain an hourly count of >> messages between two keys/entities: Sender->recipient E.g. a->b >> >> There are multiple ways of modelling this, but one that seems to fit >> nicely is: >> Row key = a >> Col = b >> Timestamp/version= e.g hour-of-day or hour-of-epoch >> Val = count of messages >> >> This approach utilizes the three dimensions of rowkey, col & version >> nicely. >> >> I will never need to look messages up by recipient but will be frequently >> querying for all recipients contacted by a sender (ie. return the >> value(count) for each column (recipient) for a specific rowkey (sender) >> during a particular timespan - ie. at version x) >> >> Everything is in place for this to work except the ability to increment a >> specific version of a cell per the above. >> >> If I don’t keep count (increment) and just write a flag to represent a >> message between the two, this scheme/approach scales really nicely with the >> put version of addColumn >> >> If there’s a better pattern/approach, I’d really appreciate a pointer in >> the right direction >> >> I see. Makes sense. Nice. > > You can't use increment as is. Its model is hard-baked doing a read of the > most recent long, an add, and then a write-back of the new long value all > while under an exclusive row lock. You'd need to change Increment so it did > update at explicit version. > > The above manner in which we do Increments is 'convenient' but dog slow. > Rather, there should be a means of recording the increment values only -- > writes -- and then at read time, an aggregation. Can you cast your model > this way at all? > > For now, you could checkAndPut to an explicit coordinate doing read of old > value and writing back the new but this will be a costly op. You could cut > out the client-server round-trips by floating a coprocessor endpoint on the > server that did your increment-at-an-explicit-coordinate but it'd still be > a read-modify-write. > > Let us know if we can help in any way Alex, > S > > > > > > > > >> -Alex. >> >>> On Dec 18, 2017, at 8:49 AM, Stack <[email protected]> wrote: >>> >>> Hello Alex. We don't have such an ability. Can you say what the use case >> is >>> because I at least am having trouble understanding why you would want to >> do >>> such a thing. >>> >>> Thank you, >>> S >>> >>>> On Wed, Dec 13, 2017 at 2:07 PM, Alex Loffler <[email protected]> wrote: >>>> >>>> Hi Folks, >>>> >>>> I am using the HBase’s timestamp/version concept to track >>>> aggregates/counts for time periods/spans. >>>> >>>> The put function allows me to update a specific version, ie. >>>> put(rk).addColumn(cf, column, version, value) >>>> >>>> But I can’t find a way of incrementing a specific version ie. >>>> increment(rk).addColumn(cf, column, version, value) doesn’t exist. >>>> >>>> I can only find increment(rk).addColumn(cf, column, value) which >> exhibits >>>> the default behaviour of taking the latest version of the cell, >>>> incrementing it’s value and updating the timestamp/version with >>>> current-timestamp-millis. >>>> >>>> What I’d really like is an increment to the value in the specified >>>> cell/version without the version update. >>>> >>>> Am I missing something, is this not possible for some reason in not >>>> getting, or would it be a good feature request? >>>> >>>> Thanks again for a fantastic platform! >>>> -Alex. >>>> >>>> >>>> >>>> >>>> >> >>
