RE: Scan startRow seems to be broke in HBase 0.20.2

Kelvin Rawls Tue, 31 Aug 2010 09:40:15 -0700

thanks for reply, example of output and keys below, if I do 2 at a time it 
seems to work and with 4 at a time fails.  Notice repeated IDs


>>> Perform getIngestedIds operation on Content MBean for: 4 IDs <<<

>>> Doc Id: 105724125d074b891d385854d04f39:c4ef671418deeced1bc7ee21a5a0c7
>>> Doc Id: 0e3b7d681ca79761ca69f43818d9c9:619e9b53ff76d791be6db5ab918ecae
>>> Doc Id: 10b89e4889a147ee32834f22d2afed23:a7cb32c8f81473bb8ab47c40bbb399
>>> Doc Id: 0579baefde6664f2bb3bc93aee97a:a9b534d23fdd93f0d2336eada3e44f


>>> Now call getIngestedIds operation on Content MBean for: 4 IDs with lastDoc 
>>> ID = 0579baefde6664f2bb3bc93aee97a:a9b534d23fdd93f0d2336eada3e44f<<<

>>> Doc Id: 105724125d074b891d385854d04f39:c4ef671418deeced1bc7ee21a5a0c7
>>> Doc Id: 0e3b7d681ca79761ca69f43818d9c9:619e9b53ff76d791be6db5ab918ecae
>>> Doc Id: 10cfbe25df8835b6f26fc696db84b32:658676d2efd9a4ee4a9e29932ec8916
>>> Doc Id: 10b89e4889a147ee32834f22d2afed23:a7cb32c8f81473bb8ab47c40bbb399

btw, the keys are <hash of normalized content>:<hash of normalized url>

Test cluster is multi-use and not easily upgraded just yet.  I will work on 
setting up another test cluster with latest.  Also, not in code below I am also 
using regular expression filter to not get some keys returned.  Can setStartRow 
and other filters conflict? 

Thanks again,

Kelvin 
________________________________________
From: [email protected] [[email protected]] On Behalf Of Jean-Daniel Cryans 
[[email protected]]
Sent: Tuesday, August 31, 2010 12:22 PM
To: [email protected]
Subject: Re: Scan startRow seems to be broke in HBase 0.20.2

It's more about the question missing information, like an example
output of your query and a sample of your dataset. Also you are using
0.20.2, which 4 minor revisions old.

So I tried a simple test in the shell using HBase 0.20.2 just as a sanity check:

hbase(main):005:0> scan 't'
ROW                          COLUMN+CELL
 1                           column=f:, timestamp=1283271502185,
value=val1
 2                           column=f:, timestamp=1283271507825,
value=val2
 3                           column=f:, timestamp=1283271512665,
value=val3
3 row(s) in 0.0300 seconds
hbase(main):006:0> scan 't', {STARTROW => '2'}
ROW                          COLUMN+CELL
 2                           column=f:, timestamp=1283271507825,
value=val2
 3                           column=f:, timestamp=1283271512665,
value=val3

As you can see it works, under the hood it calls exactly the same
method. Are your keys sorted the way you think they are?

J-D

On Tue, Aug 31, 2010 at 9:06 AM, Kelvin Rawls <[email protected]> wrote:
> It seems my question is not clear:
>
> does this call:
>
> scan.setStartRow(Bytes.toBytes(lastDoc))
>
> .. have any effect on rows returned for anyone else?
>
> Thanks,
>
> Kelvin
> ________________________________________
> From: Kelvin Rawls [[email protected]]
> Sent: Monday, August 30, 2010 11:25 AM
> To: [email protected]
> Subject: Scan startRow seems to be broke in HBase 0.20.2
>
> No matter what I tell it, this seems to return Row IDs from the beginning of 
> the table.
>
> code
>
>    public List<String> getKeys(String lastDoc, int N) {
>       List<String> results = new ArrayList<String>();
>        try {
>            Scan scan = new Scan();
>            scan.setStartRow(Bytes.toBytes(lastDoc));
>            StringBuilder regExp = new StringBuilder();
>            regExp.append("MYROWFLAGTRUE");
>            SingleColumnValueFilter scvf = new 
> SingleColumnValueFilter("MYROW".getBytes(),
>                            "FLAG".getBytes(), CompareFilter.CompareOp.EQUAL,
>                            new RegexStringComparator(regExp.toString()));
>                    scvf.setFilterIfMissing(true);
>                    scan.setFilter(scvf);
>            ResultScanner scanner = table.getScanner(scan);
>            for (Result rr : scanner.next(N)) {
>                String next_str = Bytes.toString(rr.getRow());
>                results.add(next_str);
>            }
>            scanner.close();
>        } catch (IOException ex) {
>            m_log.error("Error getting keys", ex);
>        }
>        m_log.debug("Returning " + results.size() + " ids");
>        return results;
>   }
>
> Thanks for any help.
>
> Kelvin L. Rawls
>
> 410-290-6240, office
> 301-221-1308, cell
> 703 741-3120, fax
> www.iswcorp.com
>

RE: Scan startRow seems to be broke in HBase 0.20.2

Reply via email to