What I am looking at exactly? The first run fetches 2 rows at a time (multiple invocations of the method you pasted previously?) and isn't using a start row? If those 4 rows were fetched with the same scan, it looks like the row keys are out of order.
You could also try writing a unit test, see how we test HTable and try something like that (HBaseClusterTestCase sets up an in-memory cluster for you): http://svn.apache.org/viewvc/hbase/tags/0.20.6/src/test/org/apache/hadoop/hbase/client/TestClient.java?view=markup J-D On Tue, Aug 31, 2010 at 9:39 AM, Kelvin Rawls <[email protected]> wrote: > thanks for reply, example of output and keys below, if I do 2 at a time it > seems to work and with 4 at a time fails. Notice repeated IDs > >>>> Perform getIngestedIds operation on Content MBean for: 4 IDs <<< > >>>> Doc Id: 105724125d074b891d385854d04f39:c4ef671418deeced1bc7ee21a5a0c7 >>>> Doc Id: 0e3b7d681ca79761ca69f43818d9c9:619e9b53ff76d791be6db5ab918ecae >>>> Doc Id: 10b89e4889a147ee32834f22d2afed23:a7cb32c8f81473bb8ab47c40bbb399 >>>> Doc Id: 0579baefde6664f2bb3bc93aee97a:a9b534d23fdd93f0d2336eada3e44f > > >>>> Now call getIngestedIds operation on Content MBean for: 4 IDs with lastDoc >>>> ID = 0579baefde6664f2bb3bc93aee97a:a9b534d23fdd93f0d2336eada3e44f<<< > >>>> Doc Id: 105724125d074b891d385854d04f39:c4ef671418deeced1bc7ee21a5a0c7 >>>> Doc Id: 0e3b7d681ca79761ca69f43818d9c9:619e9b53ff76d791be6db5ab918ecae >>>> Doc Id: 10cfbe25df8835b6f26fc696db84b32:658676d2efd9a4ee4a9e29932ec8916 >>>> Doc Id: 10b89e4889a147ee32834f22d2afed23:a7cb32c8f81473bb8ab47c40bbb399 > > btw, the keys are <hash of normalized content>:<hash of normalized url> > > Test cluster is multi-use and not easily upgraded just yet. I will work on > setting up another test cluster with latest. Also, not in code below I am > also using regular expression filter to not get some keys returned. Can > setStartRow and other filters conflict? > > Thanks again, > > Kelvin > ________________________________________ > From: [email protected] [[email protected]] On Behalf Of Jean-Daniel Cryans > [[email protected]] > Sent: Tuesday, August 31, 2010 12:22 PM > To: [email protected] > Subject: Re: Scan startRow seems to be broke in HBase 0.20.2 > > It's more about the question missing information, like an example > output of your query and a sample of your dataset. Also you are using > 0.20.2, which 4 minor revisions old. > > So I tried a simple test in the shell using HBase 0.20.2 just as a sanity > check: > > hbase(main):005:0> scan 't' > ROW COLUMN+CELL > 1 column=f:, timestamp=1283271502185, > value=val1 > 2 column=f:, timestamp=1283271507825, > value=val2 > 3 column=f:, timestamp=1283271512665, > value=val3 > 3 row(s) in 0.0300 seconds > hbase(main):006:0> scan 't', {STARTROW => '2'} > ROW COLUMN+CELL > 2 column=f:, timestamp=1283271507825, > value=val2 > 3 column=f:, timestamp=1283271512665, > value=val3 > > As you can see it works, under the hood it calls exactly the same > method. Are your keys sorted the way you think they are? > > J-D > > On Tue, Aug 31, 2010 at 9:06 AM, Kelvin Rawls <[email protected]> wrote: >> It seems my question is not clear: >> >> does this call: >> >> scan.setStartRow(Bytes.toBytes(lastDoc)) >> >> .. have any effect on rows returned for anyone else? >> >> Thanks, >> >> Kelvin >> ________________________________________ >> From: Kelvin Rawls [[email protected]] >> Sent: Monday, August 30, 2010 11:25 AM >> To: [email protected] >> Subject: Scan startRow seems to be broke in HBase 0.20.2 >> >> No matter what I tell it, this seems to return Row IDs from the beginning of >> the table. >> >> code >> >> public List<String> getKeys(String lastDoc, int N) { >> List<String> results = new ArrayList<String>(); >> try { >> Scan scan = new Scan(); >> scan.setStartRow(Bytes.toBytes(lastDoc)); >> StringBuilder regExp = new StringBuilder(); >> regExp.append("MYROWFLAGTRUE"); >> SingleColumnValueFilter scvf = new >> SingleColumnValueFilter("MYROW".getBytes(), >> "FLAG".getBytes(), CompareFilter.CompareOp.EQUAL, >> new RegexStringComparator(regExp.toString())); >> scvf.setFilterIfMissing(true); >> scan.setFilter(scvf); >> ResultScanner scanner = table.getScanner(scan); >> for (Result rr : scanner.next(N)) { >> String next_str = Bytes.toString(rr.getRow()); >> results.add(next_str); >> } >> scanner.close(); >> } catch (IOException ex) { >> m_log.error("Error getting keys", ex); >> } >> m_log.debug("Returning " + results.size() + " ids"); >> return results; >> } >> >> Thanks for any help. >> >> Kelvin L. Rawls >> >> 410-290-6240, office >> 301-221-1308, cell >> 703 741-3120, fax >> www.iswcorp.com >> >
