Lars,
I have been relying on the expected behavior (if I write another cell
with the same {key, family, qualifier, version} it won't return the
previous one) so you're answer was confusing to me. I did more
research and I found that the HBase guide specifies that behavior (see
section 5.8.1 of http://hbase.apache.org/book.html).
Have I misunderstood something? Can I rely on behavior that is
specified in the guide?
Thanks again!
--Tom
On Sun, Aug 26, 2012 at 6:43 AM, Eric Czech <[email protected]> wrote:
> Thanks for the info lars!
>
> In the potential use case I have for writing at the same timestamp,
> the values would always be the same anyways so I should be good.
>
> On Sat, Aug 25, 2012 at 9:12 PM, lars hofhansl <[email protected]> wrote:
>> I checked the code to be sure...
>>
>>
>> In ScanWildcardColumnTracker we have this:
>>
>> if (sameAsPreviousTSAndType(timestamp, type)) {
>> return ScanQueryMatcher.MatchCode.SKIP;
>> }
>>
>>
>> And in ExplicitColumnTracker there is this:
>>
>> if (sameAsPreviousTS(timestamp)) {
>> //If duplicate, skip this Key
>> return ScanQueryMatcher.MatchCode.SKIP;
>> }
>>
>>
>> I.e. the first KV is kept and the subsequent ones (with the same TS) are
>> skipped.
>>
>> My point remains, though: Do not rely on this.
>> (Though it will probably stay the way it is, because that is the most
>> efficient way to handle this in forward only scanners.)
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>> From: Tom Brown <[email protected]>
>> To: "[email protected]" <[email protected]>; lars hofhansl
>> <[email protected]>
>> Sent: Saturday, August 25, 2012 4:54 PM
>> Subject: Re: MemStore and prefix encoding
>>
>>
>> I thought when multiple values with the same key, family, qualifier and
>> timestamps were written, the one that was written latest (as determined by
>> position in the store) would be read. Is that not the case?
>>
>> --Tom
>>
>> On Saturday, August 25, 2012, lars hofhansl <[email protected]> wrote:
>>> The prefix encoding applies to blocks in the HFiles and in the block cache,
>>> but not to the memstore.
>>>
>>>
>>> #1 Yes. Each column family is its own store. All stores are flushed
>>> together, so have many add overhead (especially if a few tend to hold a lot
>>> of data, but the others don't, leading to very many small store files that
>>> need to be compacted).
>>> #2 There is only one key with the same key, column family, qualifier, and
>>> timestamp (if you write multiple with the same timestamp it is undefined
>>> which one you'll get back when you read the next time). So that does not
>>> make sense. Writes with the same key, column family, qualifier (each with a
>>> different timestamp) count towards the version limit.
>>>
>>> -- Lars
>>>
>>>
>>> ----- Original Message -----
>>> From: Eric Czech <[email protected]>
>>> To: user <[email protected]>
>>> Cc:
>>> Sent: Saturday, August 25, 2012 2:44 PM
>>> Subject: MemStore and prefix encoding
>>>
>>> Hi everyone,
>>>
>>> Does prefix encoding apply to rows in MemStores or does it only apply
>>> to rows on disk in HFiles? I'm trying to decide if I should still
>>> favor larger values in order to not repeat keys, column families, and
>>> qualifiers more than necessary and while prefix encoding seems to
>>> negate that concern for storage on disk, I'm not sure if it's still
>>> applicable to in-memory storage.
>>>
>>> Also, I had two other quick (unrelated) questions and I assume it'd be
>>> less annoying if I put them all in one email:
>>>
>>> 1. Do column families defined for a table introduce any overhead for
>>> rows that don't put any values in them? I don't think that's the case
>>> but I wanted to be sure.
>>>
>>> 2. Do writes with the same key, column family, qualifier, and
>>> timestamp count towards the version limit?
>>>
>>> Thanks for the help!
>>>
>>>