We have been using HBase Scans to feed MapReduce jobs for over a year now. However, on close inspection, we have seen instances where some block of rows are inexplicably missing.
We thought that this may happen during region splits or with jobs with many mappers, but we have seen, for example, 1000 rows missing from a 150,000 row scan coming from a single mapper. It is not easily reproducible - launching the job again includes all the rows. Does anyone have any insight into what may be going on, or if there is a bug somewhere? Thank you.
