Re: Problems with scan after lot of Puts

Jean-Daniel Cryans Thu, 31 May 2012 10:45:37 -0700

There's  concurrent thread on the mailing list that refers to
atomicity issues in 0.90 and issues with scans, may I suggest you run
the test on 0.92.1 or 0.94.0? I did my testing on 0.94 and didn't get
any issues after fixing the scanner.


J-D

On Thu, May 31, 2012 at 3:05 AM, Ondřej Stašek
<[email protected]> wrote:
> Hallo J-D.
>
>  Thanks for reply. I've modified my code to use scanner copies -
> table.getScanner(new Scan(scan)) and run it again. Even after that I got an
> error:
>
> 12/05/31 10:42:39 INFO hbase.TestPutScan: Run 5 put 1000000 rows
> 12/05/31 10:44:09 INFO hbase.TestPutScan: Run 5 scan + del every 10th row
> 12/05/31 10:44:33 ERROR hbase.TestPutScan: Expected value: value 0402040
> 0000005, got: value 0402041 0000004
>
> It seems that 1 row was skipped during scan. Strange.
>
> I'll keep testing.
>
>  Ondrej Stasek
>
>
> On 30.5.2012 21:05, Jean-Daniel Cryans wrote:
>>
>> There you go:
>>
>> 12/05/30 18:54:17 DEBUG client.MetaScanner: Scanning .META. starting
>> at row=testtable,,00000000000000 for max=10 rows using
>>
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@f593af
>> 12/05/30 18:54:17 DEBUG
>> client.HConnectionManager$HConnectionImplementation: Cached location
>> for
>> testtable,test_row_0496107,1338404055995.e9c7a4ca97eb2be372445af4d3772031.
>> is sv4r25s44:62023
>> 12/05/30 18:54:17 DEBUG
>> client.HConnectionManager$HConnectionImplementation: Removed
>> testtable,,1338404055995.9389fe5538f19a6f2df27e3958dcb434. for
>> tableName=testtable from cache because of test_row_0012550
>> 12/05/30 18:54:17 DEBUG
>> client.HConnectionManager$HConnectionImplementation: Cached location
>> for testtable,,1338404055995.9389fe5538f19a6f2df27e3958dcb434. is
>> sv4r25s44:62023
>> 12/05/30 18:57:47 INFO hbase.TestPutScan: Run 5 scan
>> 12/05/30 18:57:47 ERROR hbase.TestPutScan: Expected value: value
>> 0000001 0000005, got: value 0496107 0000005
>>
>> That's a split so the ClientScanner did a reset on the start row. So
>> I'm going to fix your code and see if I can get anything else.
>>
>> J-D
>>
>> On Wed, May 30, 2012 at 11:56 AM, Jean-Daniel Cryans
>> <[email protected]>  wrote:
>>>
>>> I'm running it here, but I just remembered about this issue:
>>>
>>> "HTable.ClientScanner needs to clone the Scan object"
>>> https://issues.apache.org/jira/browse/HBASE-4891
>>>
>>> And since you are reusing that Scan object, you could definitely hit this
>>> issue.
>>>
>>> J-D
>>>
>>> On Tue, May 29, 2012 at 11:37 PM, Ondřej Stašek
>>> <[email protected]>  wrote:
>>>>
>>>> Here it is:
>>>>
>>>> http://pastebin.com/0AgsQjur
>>>>
>>>>
>>>> On 29.5.2012 22:44, Jean-Daniel Cryans wrote:
>>>>>
>>>>> Care to share that TestPutScan? Just attach it in a pastebin
>>>>>
>>>>> Thx,
>>>>>
>>>>> J-D
>>>>>
>>>>> On Tue, May 29, 2012 at 6:13 AM, Ondřej Stašek
>>>>> <[email protected]>    wrote:
>>>>>>
>>>>>> My program writes changes to HBase table by issuing lots of Puts
>>>>>> (autoCommit
>>>>>> turned off, flush on end) and afterwards uses ResultScanner on whole
>>>>>> table
>>>>>> to read all rows and act upon them. My problem is that on several
>>>>>> occasions
>>>>>> scan does not return expected rows. Either scan does not start on the
>>>>>> beginning of table or somewhere during scan I got old data (not those
>>>>>> written by Puts before).
>>>>>>
>>>>>> I have even written simple test application to simulate this behavior:
>>>>>> 1. write 1M simple numbered rows to a table
>>>>>> 2. scan through table to test output, delete every 10th row
>>>>>> 3. scan again after delete
>>>>>> 4. repeat until error found
>>>>>>
>>>>>> Sample output:
>>>>>>
>>>>>> 12/05/29 00:32:12 INFO hbase.TestPutScan: Run 342 put 1000000 rows
>>>>>> 12/05/29 00:32:35 INFO hbase.TestPutScan: Run 342 scan + del every
>>>>>> 10th
>>>>>> row
>>>>>> 12/05/29 00:33:29 INFO hbase.TestPutScan: Run 342 scan
>>>>>> 12/05/29 00:33:29 ERROR hbase.TestPutScan: Expected value: value
>>>>>> 0000001
>>>>>> 0000342, got: value 0281999 0000342
>>>>>>
>>>>>> This means, that program expected to get first row, but got 281999th.
>>>>>>
>>>>>> This test ran on "minicluster" of 2 regionservers runing Cloudera's
>>>>>> cdh3u4
>>>>>> distribution.
>>>>>>
>>>>>> Today I got 3 errors like that and from RS's log it seems that in the
>>>>>> same
>>>>>> time hbase balancer issued reassign command for this table region
>>>>>> (table
>>>>>> have only 1 region).
>>>>>>
>>>>>> Any pointers on what to check or what to send you to help resolve this
>>>>>> issue?
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Ondrej Stasek
>>>>>>
>>>>
>>>> --
>>>> Ondřej Stašek
>>>> Programátor senior
>>>> Seznam.cz, a.s.
>>>> Nádražní 159/21
>>>> 370 01 České Budějovice 6
>>>>
>>>> tel.: +420 386 325 467
>>>> gsm: +420 603 857 602
>>>> icq: 164660005
>>>> [email protected]
>>>> http://www.seznam.cz
>>>>
>

Re: Problems with scan after lot of Puts

Reply via email to