Unfortunately, without already knowing that is the reason, it is difficult to 
get to that point. Container logs, nodemanager logs, nothing indicated anything 
incorrect was happening other than inconsistent exports/rowcounter results. I 
had reviewed all the hbase/yarn/hdfs bugs in the list but didn't see one that 
seemed like a smoking gun, just a bunch of possible ones. My ignorance of the 
inner workings of hbase/yarn likely played a big part in that though. I do 
appreciate you pointing out 'the one' !






From: Ted Yu
Sent: Tuesday, February 20, 11:15 PM
Subject: Re: Inconsistent rows exported/counted when looking at a set, 
unchanged past time frame.
To: user@hbase.apache.org


If you look at 
https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_fixed_in_58.html#fixed_issues585
 , you would see the following: HBASE-15378 - Scanner cannot handle heartbeat 
message with no results which fixed what you observed in previous release. FYI 
On Tue, Feb 20, 2018 at 9:07 PM, Andrew Kettmann < 
andrew.kettm...@evolve24.com> wrote: > Josh, > > We upgraded from CDH 5.8.0 -> 
5.8.5 seems to have fixed the issue. 3 > Rowcounts in a row that were not 
consistent before on a static table are > now consistent. We are doing some 
further testing but it looks like you > called it with: > > 'scans on 
RegionServers stop prematurely before all of the data is read' > > Thanks for 
the pointer in that direction, I was bashing my face against > this for two 
weeks trying to figure out this inconsistency. I appreciate > the clue! > > 
Andrew Kettmann > Consultant, Platform Services Group > > -----Original 
Message----- > From: Josh Elser [mailto:els...@apache.org] > Sent: Monday, 
February 12, 2018 11:59 AM > To: user@hbase.apache.org > Subject: Re: 
Inconsistent rows exported/counted when looking at a set, > unchanged past time 
frame. > > Hi Andrew, > > Yes. The answer is, of course, that you should see 
consistent results from > HBase if there are no mutations in flight to that 
table. Whether you're > reading "current" or "back-in-time", as long as you're 
not dealing with raw > scans (where compactions may persist delete tombstones), 
this should hold > just the same. > > Are you modifying older cells with newer 
data when you insert data? > Remember that MAX_VERSIONS for a table defaults to 
1. Consider the > following: > > * Timestamps are of the form "tX", and t1 < t2 
< t3 < .. > * You are querying from the time range: [t1, t5]. > * You have a 
cell for "row1" with at t3 with value "foo". > * RowCounter over [t1, t5] would 
return "1" > * Your ingest writes a new cell for "row1" of "bar" at t6. > * 
RowCounter over [t1, t5] would return "0" normally, or "1" is you use > RAW 
scans *** > * A compaction would run over the region containing "row1" > * 
RowCounter over [t1, t5] would return "0" (RAW or normal) > > It's also 
possible that you're hitting some sort of bug around missing > records at query 
time. I'm not sure what the CDH versions you're using line > up to, but there 
have certainly been issues in the past around query-time > data loss (e.g. 
scans on RegionServers stop prematurely before all of the > data is read). > > 
Good luck! > > *** Going off of memory here. I think this is how it works, but 
you should > be able to test easily ;) > > On 2/9/18 5:30 PM, Andrew Kettmann 
wrote: > > A simpler question would be this: > > > > Given: > > > > > > * a set 
timeframe in the past (2-3 days roughly a year ago) > > * we are NOT removing 
records from the table at all > > * We ARE inserting into this table actively > 
> > > Should I expect two consecutive runs of the rowcounter mapreduce job to > 
return an identical number? > > > > > > Andrew Kettmann > > Consultant, 
Platform Services Group > > > > From: Andrew Kettmann > > Sent: Thursday, 
February 08, 2018 11:35 AM > > To: user@hbase.apache.org > > Subject: 
Inconsistent rows exported/counted when looking at a set, > unchanged past time 
frame. > > > > First the version details: > > > > Running HBASE/Yarn/HDFS using 
Cloudera manager 5.12.1. > > Hbase: Version 1.2.0-cdh5.8.0 > > HDFS/YARN: 
Hadoop 2.6.0-cdh5.8.0 > > Hbck and hdfs fsck return healthy > > > > 15 nodes, 
sized down recently from 30 (other service requirements > > reduced. Solr, etc) 
> > > > > > The simplest example of the inconsistency is using rowcounter. If I 
run > the same mapreduce job twice in a row, I get different counts: > > > > 
hbase org.apache.hadoop.hbase.mapreduce.Driver rowcounter > > 
-Dmapreduce.map.speculative=false TABLENAME --starttime=1485907200000 > > 
--endtime=1486058400000 > > > > Looking at 
org.​apache.​hadoop.​hbase.​mapreduce.​RowCounter​$ > 
RowCounterMapper​$Counters: > > Run 1: 4876683 > > Run 2: 4866351 > > > > 
Similarly with exports of the same date/time. Consecutive runs of the > export 
get different results: > > hbase org.apache.hadoop.hbase.mapreduce.Export \ > > 
-Dmapred.map.tasks.speculative.execution=false \ > > 
-Dmapred.reduce.tasks.speculative.execution=false \ TABLENAME \ > > HDFSPATH 1 
1485907200000 1486058400000 > > > > From Map Input/output records: > > Run 1: 
4296778 > > Run 2: 4297307 > > > > None of the results show anything for 
spilled records, no failed maps. > Sometimes the row count increases, sometimes 
it decreases. We aren’t using > any row filter queries, we just want to export 
chunks of the data for a > specific time range. This table is actively being 
read/written to, but I am > asking about a date range in early 2017 in this 
case, so that should have > no impact I would have thought. Another point is 
that the rowcount job and > the export return ridiculously different numbers. 
There should be no older > versions of rows involved as we are set to only keep 
the newest, and I can > confirm that there are rows that are consistently 
missing from the exports. > Table definition is below. > > > > 
hbase(main):001:0> describe 'TABLENAME' > > Table TABLENAME is ENABLED > > 
TABLENAME > > COLUMN FAMILIES DESCRIPTION > > {NAME => 'text', 
DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', > > REPLICATION_SCOPE => 
'0', COMPRESSION => 'SNAPPY', VERSIONS => '1', > > MIN_VERSIONS => '0', TTL => 
'FOREVER', KEEP_DELETED_CELLS => 'FALSE', > > BLO CKSIZE => '65536', IN_MEMORY 
=> 'false', BLOCKCACHE => 'true'} > > 1 row(s) in 0.2800 seconds > > > > Any 
advice/suggestions would be greatly appreciated, are some of my > assumptions 
wrong regarding import/export and that it should be consistent > given 
consistent date/times? > > > > > > Andrew Kettmann > > Platform Services Group 
> > >

Reply via email to