The benefit of using a BatchScanner in the AccumuloRowInputFormat is that
it can fetch multiple ranges in parallel within each Mapper. This may be
able to help you manage your MapReduce job resources a bit better (see the
discussion in the JIRA issue for details). If you don't need to use it, I
wouldn't use that option. If you have to use it because of performance
issues, then you can mitigate the row-splitting problem using the
WholeRowIterator, but that will come with its own performance implications.
You might also be able to mitigate by resolving the
single-row-represented-as-multiple-rows problem with a Combiner or in your
On Thu, Dec 1, 2016 at 1:51 AM Massimilian Mattetti <massi...@il.ibm.com>
> I see, so the only solution here would be either to use a WholeRowIterator
> or to avoid enabling the BatchScanner. Since each executor will work on a
> single tablet I guess that the benefit of using a BatchScanner is that it
> can fetch multiple ranges over the same tablet in parallel, am I correct?
> From: Christopher <ctubb...@apache.org>
> To: email@example.com
> Date: 30/11/2016 18:48
> Subject: Re: BatchScanner behavior with AccumuloRowInputFormat
> You'd only have to worry about this behavior if you set
> RowInputFormat.setBatchScan(job, true), available since 1.7.0.
> By default, our InputFormats use a regular Accumulo Scanner.
> See *https://issues.apache.org/jira/browse/ACCUMULO-3602*
> <https://issues.apache.org/jira/browse/ACCUMULO-3602> and
> On Wed, Nov 30, 2016 at 9:42 AM Massimilian Mattetti <
> *massi...@il.ibm.com* <massi...@il.ibm.com>> wrote:
> Hi all,
> as you already know, the AccumuloRowInputFormat is internally using a
> RowIterator for iterating over all the key value pairs of a single row. In
> the past when I was using the RowIterator together with a BatchScanner I
> had the problem of a single row be split into multiple rows due to the fact
> that a BatchScanner can interleave key-value pairs of different rows.
> Should I expect the same behavior when using the AccumuloRowInputFormat
> with a BatchScanner (enabled via setBatchScan)?