Thanks Billie,
The TimestampFilter is configured with an end time:
IteratorSetting timestampIterator = new IteratorSetting(1,
"tsBefore", TimestampFilter.class);
TimestampFilter.setEnd(timestampIterator, endTime, true);
We have validated that all the records we're interested in have a timestamp
that's less than the end time we're passing in. E.g. the timestamp being
passed to the timestamp filter is
1361907184183 and a sample timestamp on a record in the table is
1361849294237.
The only difference between the two runs is whether we set the ranges or
not: AccumuloRowInputFormat.setRanges(job.getConfiguration(), ranges);
Running a scan from the accumulo shell we see all the data is there, as
well as running a scan via the Java API (not map-reduce, just a straight up
scanner), but for some reason the Mapper just never hits those rows.
Is there any other visibility type of issue I might be hitting? I don't
think there is, as the two map / reduce runs (one with a range, one
without) are kicked off the same way, with the same username/password, and
by the same unix user.
Any other thoughts? I'm sure we're missing something simple but I can't
pinpoint it.
Thanks,
Mike
On Tue, Feb 26, 2013 at 4:45 PM, Billie Rinaldi <[email protected]> wrote:
> On Tue, Feb 26, 2013 at 12:31 PM, Mike Hugo <[email protected]> wrote:
>
>> Our row keys are a combination of two elements, like this:
>>
>> foo/bar
>> foo/baz
>> foo/bee
>>
>> eee/blah
>> eee/boo
>>
>> When running without any ranges set, we're missing an entire prefix worth
>> - e.g. we don't get any rows that start with "foo"
>>
>
> That sounds like a clue, because Accumulo doesn't know about the format of
> your row keys. If it were dropping arbitrary rows, I would expect you to
> see some foo-prefixed rows and not others. Are there any other differences
> in the two runs? How is the TimestampFilter configured?
>
> Billie
>
>
>
>>
>> When I tried running with the range set, I did a prefix range on "foo"
>> and it then found the rows starting with "foo"
>>
>>
>> On Tue, Feb 26, 2013 at 2:28 PM, Billie Rinaldi <[email protected]>wrote:
>>
>>> Have you noticed any pattern in the rows it seems to be missing? E.g.
>>> every other row, the last row in each tablet, etc.? When you set a range,
>>> what range did you set?
>>>
>>> Billie
>>>
>>>
>>>
>>> On Tue, Feb 26, 2013 at 12:17 PM, Mike Hugo <[email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm running a map reduce job over a table using AccumuloRowInputFormat.
>>>> For debugging purposes I'm logging the key.getRow() so I can see what rows
>>>> it's finding as it progresses.
>>>>
>>>> If I don't specify any ranges on the input format, it skips significant
>>>> number of rows - that is, I don't see any logging indicating that it
>>>> traversed them.
>>>>
>>>> To see if it was a visibility issue, I tried explicitly setting a
>>>> range, like this:
>>>>
>>>> AccumuloRowInputFormat.setRanges(job.getConfiguration(),
>>>> ranges);
>>>>
>>>> When doing that it does process the rows that it otherwise skips.
>>>>
>>>> The same TimestampFilter is being applied in both scenarios, no other
>>>> filters / iterators are being used.
>>>>
>>>> Any thoughts on why, when run without the ranges specified, it isn't
>>>> seeing a significant portion of the data?
>>>>
>>>> Thanks,
>>>>
>>>> Mike
>>>>
>>>
>>>
>>
>