I'm using CDH 5.9, the document show its HBase version is hbase-1.2.0+cdh5.9.1+222. ( https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_59.html ) I have no idea if HBASE-15378 is included.
On Wed, Mar 1, 2017 at 9:33 PM, Ted Yu <[email protected]> wrote: > Which hbase version are you using ? > > Does it include HBASE-15378 ? > > > On Mar 1, 2017, at 5:02 AM, Hef <[email protected]> wrote: > > > > Hi, > > I'm encountering a strange behavior on MapReduce when using HBase as > input > > format. I run my MR tasks on a same table, same dataset, with a same > > pattern of Fuzzy Row Filter, multiple times. The Input Records counters > > shown are not consistent, the smallest number can be 40% less than the > > largest one. > > > > More specifically, > > - the table is split into 18 regions, distributed on 3 region server. The > > TTL is set to 10 days for the record, though the dataset for MR only > > includes those inserted in 7days. > > > > - The row key is defined as: > > sault(1byte) + time_of_hour(4bytes) + uuid(36bytes) > > > > > > - The scan is created as below: > > > > Scan scan = new Scan(); > > scan.setBatch(100); > > scan.setCaching(10000); > > scan.setCacheBlocks(false); > > scan.setMaxVersions(1); > > > > > > And the row filter for the scan is a FuzzyRowFilter that filters only > > events of a given time_of_hour. > > > > Everything looks fine while the result is out of expect. > > A same task runs 10 times, the Input Records counters show 6 different > > numbers, and the final output shows 6 different results. > > > > Does anyone has every faced this problem before? > > What could be the cause of this inconsistency of HBase scan result? > > > > Thanks >
