thanks for reply, example of output and keys below, if I do 2 at a time it seems to work and with 4 at a time fails. Notice repeated IDs
>>> Perform getIngestedIds operation on Content MBean for: 4 IDs <<< >>> Doc Id: 105724125d074b891d385854d04f39:c4ef671418deeced1bc7ee21a5a0c7 >>> Doc Id: 0e3b7d681ca79761ca69f43818d9c9:619e9b53ff76d791be6db5ab918ecae >>> Doc Id: 10b89e4889a147ee32834f22d2afed23:a7cb32c8f81473bb8ab47c40bbb399 >>> Doc Id: 0579baefde6664f2bb3bc93aee97a:a9b534d23fdd93f0d2336eada3e44f >>> Now call getIngestedIds operation on Content MBean for: 4 IDs with lastDoc >>> ID = 0579baefde6664f2bb3bc93aee97a:a9b534d23fdd93f0d2336eada3e44f<<< >>> Doc Id: 105724125d074b891d385854d04f39:c4ef671418deeced1bc7ee21a5a0c7 >>> Doc Id: 0e3b7d681ca79761ca69f43818d9c9:619e9b53ff76d791be6db5ab918ecae >>> Doc Id: 10cfbe25df8835b6f26fc696db84b32:658676d2efd9a4ee4a9e29932ec8916 >>> Doc Id: 10b89e4889a147ee32834f22d2afed23:a7cb32c8f81473bb8ab47c40bbb399 btw, the keys are <hash of normalized content>:<hash of normalized url> Test cluster is multi-use and not easily upgraded just yet. I will work on setting up another test cluster with latest. Also, not in code below I am also using regular expression filter to not get some keys returned. Can setStartRow and other filters conflict? Thanks again, Kelvin ________________________________________ From: [email protected] [[email protected]] On Behalf Of Jean-Daniel Cryans [[email protected]] Sent: Tuesday, August 31, 2010 12:22 PM To: [email protected] Subject: Re: Scan startRow seems to be broke in HBase 0.20.2 It's more about the question missing information, like an example output of your query and a sample of your dataset. Also you are using 0.20.2, which 4 minor revisions old. So I tried a simple test in the shell using HBase 0.20.2 just as a sanity check: hbase(main):005:0> scan 't' ROW COLUMN+CELL 1 column=f:, timestamp=1283271502185, value=val1 2 column=f:, timestamp=1283271507825, value=val2 3 column=f:, timestamp=1283271512665, value=val3 3 row(s) in 0.0300 seconds hbase(main):006:0> scan 't', {STARTROW => '2'} ROW COLUMN+CELL 2 column=f:, timestamp=1283271507825, value=val2 3 column=f:, timestamp=1283271512665, value=val3 As you can see it works, under the hood it calls exactly the same method. Are your keys sorted the way you think they are? J-D On Tue, Aug 31, 2010 at 9:06 AM, Kelvin Rawls <[email protected]> wrote: > It seems my question is not clear: > > does this call: > > scan.setStartRow(Bytes.toBytes(lastDoc)) > > .. have any effect on rows returned for anyone else? > > Thanks, > > Kelvin > ________________________________________ > From: Kelvin Rawls [[email protected]] > Sent: Monday, August 30, 2010 11:25 AM > To: [email protected] > Subject: Scan startRow seems to be broke in HBase 0.20.2 > > No matter what I tell it, this seems to return Row IDs from the beginning of > the table. > > code > > public List<String> getKeys(String lastDoc, int N) { > List<String> results = new ArrayList<String>(); > try { > Scan scan = new Scan(); > scan.setStartRow(Bytes.toBytes(lastDoc)); > StringBuilder regExp = new StringBuilder(); > regExp.append("MYROWFLAGTRUE"); > SingleColumnValueFilter scvf = new > SingleColumnValueFilter("MYROW".getBytes(), > "FLAG".getBytes(), CompareFilter.CompareOp.EQUAL, > new RegexStringComparator(regExp.toString())); > scvf.setFilterIfMissing(true); > scan.setFilter(scvf); > ResultScanner scanner = table.getScanner(scan); > for (Result rr : scanner.next(N)) { > String next_str = Bytes.toString(rr.getRow()); > results.add(next_str); > } > scanner.close(); > } catch (IOException ex) { > m_log.error("Error getting keys", ex); > } > m_log.debug("Returning " + results.size() + " ids"); > return results; > } > > Thanks for any help. > > Kelvin L. Rawls > > 410-290-6240, office > 301-221-1308, cell > 703 741-3120, fax > www.iswcorp.com >
