What I am looking at exactly? The first run fetches 2 rows at a time
(multiple invocations of the method you pasted previously?) and isn't
using a start row? If those 4 rows were fetched with the same scan, it
looks like the row keys are out of order.

You could also try writing a unit test, see how we test HTable and try
something like that (HBaseClusterTestCase sets up an in-memory cluster
for you): 
http://svn.apache.org/viewvc/hbase/tags/0.20.6/src/test/org/apache/hadoop/hbase/client/TestClient.java?view=markup

J-D

On Tue, Aug 31, 2010 at 9:39 AM, Kelvin Rawls <[email protected]> wrote:
> thanks for reply, example of output and keys below, if I do 2 at a time it 
> seems to work and with 4 at a time fails.  Notice repeated IDs
>
>>>> Perform getIngestedIds operation on Content MBean for: 4 IDs <<<
>
>>>> Doc Id: 105724125d074b891d385854d04f39:c4ef671418deeced1bc7ee21a5a0c7
>>>> Doc Id: 0e3b7d681ca79761ca69f43818d9c9:619e9b53ff76d791be6db5ab918ecae
>>>> Doc Id: 10b89e4889a147ee32834f22d2afed23:a7cb32c8f81473bb8ab47c40bbb399
>>>> Doc Id: 0579baefde6664f2bb3bc93aee97a:a9b534d23fdd93f0d2336eada3e44f
>
>
>>>> Now call getIngestedIds operation on Content MBean for: 4 IDs with lastDoc 
>>>> ID = 0579baefde6664f2bb3bc93aee97a:a9b534d23fdd93f0d2336eada3e44f<<<
>
>>>> Doc Id: 105724125d074b891d385854d04f39:c4ef671418deeced1bc7ee21a5a0c7
>>>> Doc Id: 0e3b7d681ca79761ca69f43818d9c9:619e9b53ff76d791be6db5ab918ecae
>>>> Doc Id: 10cfbe25df8835b6f26fc696db84b32:658676d2efd9a4ee4a9e29932ec8916
>>>> Doc Id: 10b89e4889a147ee32834f22d2afed23:a7cb32c8f81473bb8ab47c40bbb399
>
> btw, the keys are <hash of normalized content>:<hash of normalized url>
>
> Test cluster is multi-use and not easily upgraded just yet.  I will work on 
> setting up another test cluster with latest.  Also, not in code below I am 
> also using regular expression filter to not get some keys returned.  Can 
> setStartRow and other filters conflict?
>
> Thanks again,
>
> Kelvin
> ________________________________________
> From: [email protected] [[email protected]] On Behalf Of Jean-Daniel Cryans 
> [[email protected]]
> Sent: Tuesday, August 31, 2010 12:22 PM
> To: [email protected]
> Subject: Re: Scan startRow seems to be broke in HBase 0.20.2
>
> It's more about the question missing information, like an example
> output of your query and a sample of your dataset. Also you are using
> 0.20.2, which 4 minor revisions old.
>
> So I tried a simple test in the shell using HBase 0.20.2 just as a sanity 
> check:
>
> hbase(main):005:0> scan 't'
> ROW                          COLUMN+CELL
>  1                           column=f:, timestamp=1283271502185,
> value=val1
>  2                           column=f:, timestamp=1283271507825,
> value=val2
>  3                           column=f:, timestamp=1283271512665,
> value=val3
> 3 row(s) in 0.0300 seconds
> hbase(main):006:0> scan 't', {STARTROW => '2'}
> ROW                          COLUMN+CELL
>  2                           column=f:, timestamp=1283271507825,
> value=val2
>  3                           column=f:, timestamp=1283271512665,
> value=val3
>
> As you can see it works, under the hood it calls exactly the same
> method. Are your keys sorted the way you think they are?
>
> J-D
>
> On Tue, Aug 31, 2010 at 9:06 AM, Kelvin Rawls <[email protected]> wrote:
>> It seems my question is not clear:
>>
>> does this call:
>>
>> scan.setStartRow(Bytes.toBytes(lastDoc))
>>
>> .. have any effect on rows returned for anyone else?
>>
>> Thanks,
>>
>> Kelvin
>> ________________________________________
>> From: Kelvin Rawls [[email protected]]
>> Sent: Monday, August 30, 2010 11:25 AM
>> To: [email protected]
>> Subject: Scan startRow seems to be broke in HBase 0.20.2
>>
>> No matter what I tell it, this seems to return Row IDs from the beginning of 
>> the table.
>>
>> code
>>
>>    public List<String> getKeys(String lastDoc, int N) {
>>       List<String> results = new ArrayList<String>();
>>        try {
>>            Scan scan = new Scan();
>>            scan.setStartRow(Bytes.toBytes(lastDoc));
>>            StringBuilder regExp = new StringBuilder();
>>            regExp.append("MYROWFLAGTRUE");
>>            SingleColumnValueFilter scvf = new 
>> SingleColumnValueFilter("MYROW".getBytes(),
>>                            "FLAG".getBytes(), CompareFilter.CompareOp.EQUAL,
>>                            new RegexStringComparator(regExp.toString()));
>>                    scvf.setFilterIfMissing(true);
>>                    scan.setFilter(scvf);
>>            ResultScanner scanner = table.getScanner(scan);
>>            for (Result rr : scanner.next(N)) {
>>                String next_str = Bytes.toString(rr.getRow());
>>                results.add(next_str);
>>            }
>>            scanner.close();
>>        } catch (IOException ex) {
>>            m_log.error("Error getting keys", ex);
>>        }
>>        m_log.debug("Returning " + results.size() + " ids");
>>        return results;
>>   }
>>
>> Thanks for any help.
>>
>> Kelvin L. Rawls
>>
>> 410-290-6240, office
>> 301-221-1308, cell
>> 703 741-3120, fax
>> www.iswcorp.com
>>
>

Reply via email to