Hi,
I am scanning hbase through Hive. I am using a coprocessor implementing
postScannerNext(). For each result, I am executing Get operation.
postScannerNext(final ObserverContext<RegionCoprocessorEnvironment> e,
final InternalScanner s, final List<Result> results, final int
limit,
final boolean hasMore){
List<Result> newResults = new ArrayList<Result>(results.size());
for (Result result : results) {
Get get = new Get(result.getRow());
Result newResult =
e.getEnvironment().getRegion().get(get);
newResults.add(newResult);
}
results.clear();
results.addAll(newResults);
return hasMore;
}
I am scanning 100 million records with TimeRange. Sometimes I am getting
duplicate rows. I think postScannerNext() is sending result twice for
same row.
>From hive , I am NOT setting allowPartialResults.
MaxResultSize : 2097152
Caching : 2147483647
Batch : -1
To my knowledge, server returns partial results only when we explicitly set
to (scan.setAllowPartialResults=TRUE). I am not doing this.
Any idea, why this happens.
Thanks,
Raju,
(972)273-0155.