On Tue, Feb 26, 2013 at 9:12 PM, Mike Hugo <[email protected]> wrote:
> Is there a way to "reset" the column iterator back to the "beginning" when
> using the AccumuloRowInputFormat? We have a case in which we need to
> iterate over the columns for a row at least twice and it could be a large
> row that may not fit in memory.
>
> I think we can work around this by having a separate scanner used within
> the map method for this purpose. Other than that, is there a way to clone
> or copy or reset the column iterator such that we can iterate over it more
> than once?
>
Currently, no. It's not immediately obvious how we could change the
InputFormat to accomplish this. The RecordReader creates a scanner, does
the seeking/fetching for the InputSplit once in its initialize method, then
iterates over the scanner, grouping together rows as appropriate. Going
back to the beginning of a row would require us to seek the scanner again,
and replace the old iterator with a new one. We could make a special
RecordReader with a reset method, but I don't know how we could call the
method. Interactions with the RecordReader are handled by the MapContext,
and I don't know if you can use a custom MapContext. Maybe we could have
an InputFormat that gives you a Scanner directly that you could reseek in
the Mapper, but we'd have to spend some time thinking about it to make sure
it would work.
Billie
> Thanks,
>
> Mike
>
> public void map(Text key, PeekingIterator<Map.Entry<Key, Value>>
> columnIterator, Context context) {
> while (columnIterator.hasNext()) {
> Map.Entry<Key, Value> kv = columnIterator.next();
> }
>
> * // reset column iterator back to the beginning*
>
> while (columnIterator.hasNext()) {
> Map.Entry<Key, Value> kv = columnIterator.next();
> }
>
> }
>