I was experiencing aborted scans on certain conditions. In these cases I
was simply missing so many rows that only a fraction was inputted, without
warning. After lots of testing I was able to pinpoint and reproduce the
error when scanning over a single region, single column family, single
store file. So really just a single (major_compacted) storefile. I scan
over this region using a single Scan in a local jobtracker context. (So not
mapreduce, although this has exactly the same behaviour). Finally, I
noticed the number of input rows is dependent on the
hbase.client.scanner.caching property. See following example runs that
scans over this region with a specific start and stop key:

-Dhbase.client.scanner.caching=1
inputrows=1506

-Dhbase.client.scanner.caching=10000
inputrows=1240

-Dhbase.client.scanner.caching=1240
inputrows=1506

-Dhbase.client.scanner.caching=1241
inputrows=1240

This is weird huh? So setting the cache to 1241 in this case aborts the
scan silently. Removing the stoprow yields the same amout. Setting the
caching to 1 with no stoprow yields all rows. (Several hundreds of
thousands).

Neither the client nor the regionserver log any warning whatsoever. I had
the hbase.client.scanner.max.result.size set to 90100100. After removing
this property it all works like a charm!!! All rows are properly inputted,
regardless of hbase.client.scanner.caching. As an extra verification I
checked the regionserver for warnings that I would expect without this
property and this seems to be the case:
2012-07-25 11:46:52,889 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 60
020, responseTooLarge for: next(-1937592840574159040, 10000) from
x.x.x.x:39398: Size: 3
38.1m
2012-07-25 11:47:14,359 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 60
020, responseTooLarge for: next(-1937592840574159040, 10000) from
x.x.x.x:39407: Size: 1
86.6m

So, anyone know what this could be? I am willing to debug this behaviour at
the regionserver level, but before I do I want to make sure I am not
running into something that has already been solved. This is
on hbase-0.90.6-cdh3u4, using snappy.

Reply via email to