Hi,

I am scanning hbase through Hive.  I am using a coprocessor implementing
postScannerNext().  For each result, I am executing Get operation.

postScannerNext(final ObserverContext<RegionCoprocessorEnvironment> e,
            final InternalScanner s, final List<Result> results, final int
limit,
            final boolean hasMore){

      List<Result> newResults = new ArrayList<Result>(results.size());
            for (Result result : results) {
                    Get get = new Get(result.getRow());
                    Result newResult =
e.getEnvironment().getRegion().get(get);
                    newResults.add(newResult);
            }
                results.clear();
                results.addAll(newResults);
return hasMore;
}


I am scanning 100 million records with TimeRange.  Sometimes I am getting
duplicate rows.  I think postScannerNext()  is sending result twice for
same row.



>From hive , I am NOT setting allowPartialResults.
MaxResultSize : 2097152
Caching : 2147483647
Batch : -1

To my knowledge, server returns partial results only when we explicitly set
to (scan.setAllowPartialResults=TRUE).  I am not doing this.

Any idea, why this happens.

Thanks,
Raju,
(972)273-0155.

Reply via email to