There's concurrent thread on the mailing list that refers to atomicity issues in 0.90 and issues with scans, may I suggest you run the test on 0.92.1 or 0.94.0? I did my testing on 0.94 and didn't get any issues after fixing the scanner.
J-D On Thu, May 31, 2012 at 3:05 AM, Ondřej Stašek <[email protected]> wrote: > Hallo J-D. > > Thanks for reply. I've modified my code to use scanner copies - > table.getScanner(new Scan(scan)) and run it again. Even after that I got an > error: > > 12/05/31 10:42:39 INFO hbase.TestPutScan: Run 5 put 1000000 rows > 12/05/31 10:44:09 INFO hbase.TestPutScan: Run 5 scan + del every 10th row > 12/05/31 10:44:33 ERROR hbase.TestPutScan: Expected value: value 0402040 > 0000005, got: value 0402041 0000004 > > It seems that 1 row was skipped during scan. Strange. > > I'll keep testing. > > Ondrej Stasek > > > On 30.5.2012 21:05, Jean-Daniel Cryans wrote: >> >> There you go: >> >> 12/05/30 18:54:17 DEBUG client.MetaScanner: Scanning .META. starting >> at row=testtable,,00000000000000 for max=10 rows using >> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@f593af >> 12/05/30 18:54:17 DEBUG >> client.HConnectionManager$HConnectionImplementation: Cached location >> for >> testtable,test_row_0496107,1338404055995.e9c7a4ca97eb2be372445af4d3772031. >> is sv4r25s44:62023 >> 12/05/30 18:54:17 DEBUG >> client.HConnectionManager$HConnectionImplementation: Removed >> testtable,,1338404055995.9389fe5538f19a6f2df27e3958dcb434. for >> tableName=testtable from cache because of test_row_0012550 >> 12/05/30 18:54:17 DEBUG >> client.HConnectionManager$HConnectionImplementation: Cached location >> for testtable,,1338404055995.9389fe5538f19a6f2df27e3958dcb434. is >> sv4r25s44:62023 >> 12/05/30 18:57:47 INFO hbase.TestPutScan: Run 5 scan >> 12/05/30 18:57:47 ERROR hbase.TestPutScan: Expected value: value >> 0000001 0000005, got: value 0496107 0000005 >> >> That's a split so the ClientScanner did a reset on the start row. So >> I'm going to fix your code and see if I can get anything else. >> >> J-D >> >> On Wed, May 30, 2012 at 11:56 AM, Jean-Daniel Cryans >> <[email protected]> wrote: >>> >>> I'm running it here, but I just remembered about this issue: >>> >>> "HTable.ClientScanner needs to clone the Scan object" >>> https://issues.apache.org/jira/browse/HBASE-4891 >>> >>> And since you are reusing that Scan object, you could definitely hit this >>> issue. >>> >>> J-D >>> >>> On Tue, May 29, 2012 at 11:37 PM, Ondřej Stašek >>> <[email protected]> wrote: >>>> >>>> Here it is: >>>> >>>> http://pastebin.com/0AgsQjur >>>> >>>> >>>> On 29.5.2012 22:44, Jean-Daniel Cryans wrote: >>>>> >>>>> Care to share that TestPutScan? Just attach it in a pastebin >>>>> >>>>> Thx, >>>>> >>>>> J-D >>>>> >>>>> On Tue, May 29, 2012 at 6:13 AM, Ondřej Stašek >>>>> <[email protected]> wrote: >>>>>> >>>>>> My program writes changes to HBase table by issuing lots of Puts >>>>>> (autoCommit >>>>>> turned off, flush on end) and afterwards uses ResultScanner on whole >>>>>> table >>>>>> to read all rows and act upon them. My problem is that on several >>>>>> occasions >>>>>> scan does not return expected rows. Either scan does not start on the >>>>>> beginning of table or somewhere during scan I got old data (not those >>>>>> written by Puts before). >>>>>> >>>>>> I have even written simple test application to simulate this behavior: >>>>>> 1. write 1M simple numbered rows to a table >>>>>> 2. scan through table to test output, delete every 10th row >>>>>> 3. scan again after delete >>>>>> 4. repeat until error found >>>>>> >>>>>> Sample output: >>>>>> >>>>>> 12/05/29 00:32:12 INFO hbase.TestPutScan: Run 342 put 1000000 rows >>>>>> 12/05/29 00:32:35 INFO hbase.TestPutScan: Run 342 scan + del every >>>>>> 10th >>>>>> row >>>>>> 12/05/29 00:33:29 INFO hbase.TestPutScan: Run 342 scan >>>>>> 12/05/29 00:33:29 ERROR hbase.TestPutScan: Expected value: value >>>>>> 0000001 >>>>>> 0000342, got: value 0281999 0000342 >>>>>> >>>>>> This means, that program expected to get first row, but got 281999th. >>>>>> >>>>>> This test ran on "minicluster" of 2 regionservers runing Cloudera's >>>>>> cdh3u4 >>>>>> distribution. >>>>>> >>>>>> Today I got 3 errors like that and from RS's log it seems that in the >>>>>> same >>>>>> time hbase balancer issued reassign command for this table region >>>>>> (table >>>>>> have only 1 region). >>>>>> >>>>>> Any pointers on what to check or what to send you to help resolve this >>>>>> issue? >>>>>> >>>>>> Regards >>>>>> >>>>>> Ondrej Stasek >>>>>> >>>> >>>> -- >>>> Ondřej Stašek >>>> Programátor senior >>>> Seznam.cz, a.s. >>>> Nádražní 159/21 >>>> 370 01 České Budějovice 6 >>>> >>>> tel.: +420 386 325 467 >>>> gsm: +420 603 857 602 >>>> icq: 164660005 >>>> [email protected] >>>> http://www.seznam.cz >>>> >
