I have had replication running for about a week now, and have had a lot of data flowing to our slave cluster over that time. Now, I'm running the verifyrep MR job over a 1-hour period a couple days ago (which should be fully replicated), and I'm seeing a small number of "BADROWS". Spot-checking a few of them, the issue seems to be that the rows are present, and have the same values, but a single cell in the row will be off by 1ms.
For instance, the log reports this error: java.lang.Exception: This result was different: keyvalues={01e581745c6a43aba01adf105af4e4a92013071015/data:!\xDF\xE0\x01/1373470622986/Put/vlen=8, 01e581745c6a43aba01adf105af4e4a92013071015/data:&s\xC0\x01/1373470923084/Put/vlen=8, 01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223717/Put/vlen=8, 01e581745c6a43aba01adf105af4e4a92013071015/data:/\x9B\x80\x01/1373471523316/Put/vlen=8, 01e581745c6a43aba01adf105af4e4a92013071015/data:4/`\x01/1373471822913/Put/vlen=8} compared to keyvalues={01e581745c6a43aba01adf105af4e4a92013071015/data:!\xDF\xE0\x01/1373470622986/Put/vlen=8, 01e581745c6a43aba01adf105af4e4a92013071015/data:&s\xC0\x01/1373470923084/Put/vlen=8, 01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223716/Put/vlen=8, 01e581745c6a43aba01adf105af4e4a92013071015/data:/\x9B\x80\x01/1373471523316/Put/vlen=8, 01e581745c6a43aba01adf105af4e4a92013071015/data:4/`\x01/1373471822913/Put/vlen=8} Some diffing reduces the issue down to: 01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223717/Put/vlen=8 compared to 01e581745c6a43aba01adf105af4e4a92013071015/data:+\x07\xA0\x01/1373471223716/Put/vlen=8. I'm assuming that the value before "/Put" is the cell's timestamp, which means that the copies are off by 1ms. Any idea what could cause this? So far (the job is still running), the problem seems rare (about 0.05% of rows). Thanks, Patrick