Re: HBase scan returns inconsistent results on multiple runs for same dataset

Hef Wed, 01 Mar 2017 05:46:32 -0800

I'm using CDH 5.9, the document show its HBase version is
hbase-1.2.0+cdh5.9.1+222.  (
https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_59.html
)
I have no idea if  HBASE-15378  is included.


On Wed, Mar 1, 2017 at 9:33 PM, Ted Yu <[email protected]> wrote:

> Which hbase version are you using ?
>
> Does it include HBASE-15378 ?
>
> > On Mar 1, 2017, at 5:02 AM, Hef <[email protected]> wrote:
> >
> > Hi,
> > I'm encountering a strange behavior on MapReduce when using HBase as
> input
> > format. I run my MR tasks on a same table, same dataset, with a same
> > pattern of Fuzzy Row Filter, multiple times. The Input Records counters
> > shown are not consistent, the smallest number can be 40% less than the
> > largest one.
> >
> > More specifically,
> > - the table is split into 18 regions, distributed on 3 region server. The
> > TTL is set to 10 days for the record, though the dataset for MR only
> > includes those inserted in 7days.
> >
> > - The row key is defined as:
> > sault(1byte) + time_of_hour(4bytes) + uuid(36bytes)
> >
> >
> > - The scan is created as below:
> >
> > Scan scan = new Scan();
> > scan.setBatch(100);
> > scan.setCaching(10000);
> > scan.setCacheBlocks(false);
> > scan.setMaxVersions(1);
> >
> >
> > And the row filter for the scan is a FuzzyRowFilter that filters only
> > events of a given time_of_hour.
> >
> > Everything looks fine while the result is out of expect.
> > A same task runs 10 times, the Input Records counters  show 6 different
> > numbers, and the final output shows 6 different results.
> >
> > Does anyone has every faced this problem before?
> > What could be the cause of this inconsistency of HBase scan result?
> >
> > Thanks
>

Re: HBase scan returns inconsistent results on multiple runs for same dataset

Reply via email to