I would still caution relying on the sorting order between values of the same cf, qualifier and timestamp. If for example, there is a Delete, it will eclipse subsequent Puts given the same timestamp, even though Put happened after Delete.
Enis On Mon, Aug 27, 2012 at 9:20 AM, Tom Brown <[email protected]> wrote: > Lars, > > I have been relying on the expected behavior (if I write another cell > with the same {key, family, qualifier, version} it won't return the > previous one) so you're answer was confusing to me. I did more > research and I found that the HBase guide specifies that behavior (see > section 5.8.1 of http://hbase.apache.org/book.html). > > Have I misunderstood something? Can I rely on behavior that is > specified in the guide? > > Thanks again! > > --Tom > > On Sun, Aug 26, 2012 at 6:43 AM, Eric Czech <[email protected]> wrote: > > Thanks for the info lars! > > > > In the potential use case I have for writing at the same timestamp, > > the values would always be the same anyways so I should be good. > > > > On Sat, Aug 25, 2012 at 9:12 PM, lars hofhansl <[email protected]> > wrote: > >> I checked the code to be sure... > >> > >> > >> In ScanWildcardColumnTracker we have this: > >> > >> if (sameAsPreviousTSAndType(timestamp, type)) { > >> return ScanQueryMatcher.MatchCode.SKIP; > >> } > >> > >> > >> And in ExplicitColumnTracker there is this: > >> > >> if (sameAsPreviousTS(timestamp)) { > >> //If duplicate, skip this Key > >> return ScanQueryMatcher.MatchCode.SKIP; > >> } > >> > >> > >> I.e. the first KV is kept and the subsequent ones (with the same TS) > are skipped. > >> > >> My point remains, though: Do not rely on this. > >> (Though it will probably stay the way it is, because that is the most > efficient way to handle this in forward only scanners.) > >> > >> -- Lars > >> > >> > >> > >> ________________________________ > >> From: Tom Brown <[email protected]> > >> To: "[email protected]" <[email protected]>; lars hofhansl < > [email protected]> > >> Sent: Saturday, August 25, 2012 4:54 PM > >> Subject: Re: MemStore and prefix encoding > >> > >> > >> I thought when multiple values with the same key, family, qualifier and > timestamps were written, the one that was written latest (as determined by > position in the store) would be read. Is that not the case? > >> > >> --Tom > >> > >> On Saturday, August 25, 2012, lars hofhansl <[email protected]> > wrote: > >>> The prefix encoding applies to blocks in the HFiles and in the block > cache, but not to the memstore. > >>> > >>> > >>> #1 Yes. Each column family is its own store. All stores are flushed > together, so have many add overhead (especially if a few tend to hold a lot > of data, but the others don't, leading to very many small store files that > need to be compacted). > >>> #2 There is only one key with the same key, column family, qualifier, > and timestamp (if you write multiple with the same timestamp it is > undefined which one you'll get back when you read the next time). So that > does not make sense. Writes with the same key, column family, qualifier > (each with a different timestamp) count towards the version limit. > >>> > >>> -- Lars > >>> > >>> > >>> ----- Original Message ----- > >>> From: Eric Czech <[email protected]> > >>> To: user <[email protected]> > >>> Cc: > >>> Sent: Saturday, August 25, 2012 2:44 PM > >>> Subject: MemStore and prefix encoding > >>> > >>> Hi everyone, > >>> > >>> Does prefix encoding apply to rows in MemStores or does it only apply > >>> to rows on disk in HFiles? I'm trying to decide if I should still > >>> favor larger values in order to not repeat keys, column families, and > >>> qualifiers more than necessary and while prefix encoding seems to > >>> negate that concern for storage on disk, I'm not sure if it's still > >>> applicable to in-memory storage. > >>> > >>> Also, I had two other quick (unrelated) questions and I assume it'd be > >>> less annoying if I put them all in one email: > >>> > >>> 1. Do column families defined for a table introduce any overhead for > >>> rows that don't put any values in them? I don't think that's the case > >>> but I wanted to be sure. > >>> > >>> 2. Do writes with the same key, column family, qualifier, and > >>> timestamp count towards the version limit? > >>> > >>> Thanks for the help! > >>> > >>> >
