Since cache for ClientScanner might or might not be empty during your test runs, it was hard to tell whether you hit the bug described by HBASE-15378.
I would suggest you upgrade to a release with HBASE-15378. On Thu, Mar 2, 2017 at 7:59 PM, Hef <[email protected]> wrote: > Thanks for the hint, which led me to investigate from the client side and > finally had this problem resolved. > > I reviewed the code and found that 1.0.0-cdh5.6.1, an old version of > hbase-client was used in my project. After updated to 1.2.0-cdh5.9.0, > consistent with the one server is running, my tasks work correctly. > > I looked into the source of HBase 1.2.0-cdh5.9.0, HBASE-15378 is not > patched. And I also went through all release notes from CDH HBase 5.6 to > 5.9, nothing about this inconsistent scan behavior had been mentioned. > Though the problem has been resolved for now , I have no idea what the root > cause actually is, and whether it will come out again if my dataset grows > larger, without HBASE-15378. > > > > On Thu, Mar 2, 2017 at 12:09 AM, Sean Busbey <[email protected]> wrote: > > > The place to check for include JIRAs on top of those in the ASF release > is > > here: > > > > http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.9. > > 1.releasenotes.html > > > > HBASE-15378 is not in CDH5.9.1. > > > > On Wed, Mar 1, 2017 at 9:58 AM, Ted Yu <[email protected]> wrote: > > > I don't see it here: > > > > > > http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.9. > > 1.CHANGES.txt?_ga=1.10311413.1914112506.1454459553 > > > > > > On Wed, Mar 1, 2017 at 5:46 AM, Hef <[email protected]> wrote: > > > > > >> I'm using CDH 5.9, the document show its HBase version is > > >> hbase-1.2.0+cdh5.9.1+222. ( > > >> https://www.cloudera.com/documentation/enterprise/ > > >> release-notes/topics/cdh_vd_cdh_package_tarball_59.html > > >> ) > > >> I have no idea if HBASE-15378 is included. > > >> > > >> On Wed, Mar 1, 2017 at 9:33 PM, Ted Yu <[email protected]> wrote: > > >> > > >> > Which hbase version are you using ? > > >> > > > >> > Does it include HBASE-15378 ? > > >> > > > >> > > On Mar 1, 2017, at 5:02 AM, Hef <[email protected]> wrote: > > >> > > > > >> > > Hi, > > >> > > I'm encountering a strange behavior on MapReduce when using HBase > as > > >> > input > > >> > > format. I run my MR tasks on a same table, same dataset, with a > same > > >> > > pattern of Fuzzy Row Filter, multiple times. The Input Records > > counters > > >> > > shown are not consistent, the smallest number can be 40% less than > > the > > >> > > largest one. > > >> > > > > >> > > More specifically, > > >> > > - the table is split into 18 regions, distributed on 3 region > > server. > > >> The > > >> > > TTL is set to 10 days for the record, though the dataset for MR > only > > >> > > includes those inserted in 7days. > > >> > > > > >> > > - The row key is defined as: > > >> > > sault(1byte) + time_of_hour(4bytes) + uuid(36bytes) > > >> > > > > >> > > > > >> > > - The scan is created as below: > > >> > > > > >> > > Scan scan = new Scan(); > > >> > > scan.setBatch(100); > > >> > > scan.setCaching(10000); > > >> > > scan.setCacheBlocks(false); > > >> > > scan.setMaxVersions(1); > > >> > > > > >> > > > > >> > > And the row filter for the scan is a FuzzyRowFilter that filters > > only > > >> > > events of a given time_of_hour. > > >> > > > > >> > > Everything looks fine while the result is out of expect. > > >> > > A same task runs 10 times, the Input Records counters show 6 > > different > > >> > > numbers, and the final output shows 6 different results. > > >> > > > > >> > > Does anyone has every faced this problem before? > > >> > > What could be the cause of this inconsistency of HBase scan > result? > > >> > > > > >> > > Thanks > > >> > > > >> > > >
