Okay, I've tried that test, as well as making sure speculative execution is turned off. Neither made a difference. It's not only a problem with writing to the target table - The number of map input records for the job is wrong, as well. But it's correct when we run jobs that do not write to HBase, such as a row count.
I ran another job to calculate the number of missed rows per region of the source table (which is not consistent between runs), by comparing the source table with the target table. An interesting thing I found is that the number of skipped rows is always a multiple of 999. This is especially interesting because our scanner caching is 1000. So I think we're skipping over the scanner cache sometimes. To get a sense of how many we are missing, the latest run missed 183,816 out of 29,572,075 rows in the source table. Any ideas? Thanks, Sean On Fri, Mar 18, 2011 at 9:58 AM, Michael Segel <[email protected]>wrote: > > Sean, > > Here's a simple test. > > Modify your code so that you aren't using the TableOutputFormat class, but > a null writable and inside the map() method you actually do the write > yourself. > > Also make sure to explicitly flush and close your HTable connection when > your mapper ends. > > > > > From: [email protected] > > Date: Fri, 18 Mar 2011 09:50:47 -0400 > > Subject: Scan isn't processing all rows > > To: [email protected] > > > > Hi all, > > > > We're experiencing a problem where a map-only job using TableInputFormat > and > > TableOutputFormat to export data from one table into another is not > reading > > all of the rows in the source table. That is, # map input records != # > > records in the table. Anyone have any clue how that could happen? > > > > Some more detail: > > > > It appears to only happen when we are writing results to the destination > > table. If I comment out the lines where where data is written from the > > mapper (context.write), then the number of input records is correct. > > > > I verified that the rows that did not get written to the output table, so > > it's not just a counter problem. We aren't using any filter or anything, > > just a straight-up scan to try to read everything in the table. > > > > We're on hbase-0.89.20100924. > > > > Thanks, > > Sean >
