Hello Todd, Thanks for pointing this out for me. The client was running 0.90.1, while the cluster was running 0.90.3. I upgraded both to the latest CDH3 distro version, 0.90.4, and the problem seems to have gone away (simultaneous scanner + inc produces consistent results). I still don't know what the root of the problem was, but this simple upgrade was enough to fix it.
Thanks! On Jan 16, 2012, at 6:21 PM, Todd Lipcon wrote: > Hi Young, > > This is interesting and unexpected behavior. What version are you running? > > If you can write a unit test (or system test) that demonstrates the > problem against a running cluster, that would be excellent. > > -Todd > > On Fri, Jan 13, 2012 at 4:59 PM, Young <[email protected]> wrote: >> I'm having an odd problem with incrementing counters simultaneously during a >> scan (both in separate processes). >> >> For low rate counters, there is no problem (< 1 increment per second), but >> for the higher rate counters (>10 increments per second), there is an >> inconsistency in the counter values. >> >> Averaging the values over time gives the correct count (i.e. the counter >> itself is still increasing correctly), but at certain samples the counter >> drops down to some seemingly random number. This random number is >> consistent for about a day and a half then jumps to a different random >> number for the next day and a half - this cycle coincides exactly with >> compaction of the table in question. >> >> Again, the counter value itself, when it is not equal to the random number >> of the day, is correct. I'm wondering if there is something going on >> underneath that would cause >> 1) the incorrect but consistent number when incrementing and scanning >> simultaneously >> 2) the random number reset and its relationship with compaction of the table >> >> Keep in mind that most of the hbase settings are at default. >> >> Thanks! >> p.s. I ran a smaller experiment using hbase shell, and found the counters to >> be consistent even for the high rate counters. I am wondering if there is a >> buffering issue with the htable scanner object if it is unable to obtain a >> lock on the row it will default to the data on disk? >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera
