I'm confused... You mention that you are hashing your key, and you want to do a scan w a start and stop value?
Could you elaborate? With respect to hashing, if you use a SHA-1 hash, your values will be unique. (you talked about rehashing ...) Sent from my iPhone On Jan 25, 2012, at 7:56 AM, "Peter Wolf" <opus...@gmail.com> wrote: > Hello all, > > I am looking for advice on speeding up my Scanning. > > I want to iterate over all rows where a particular column (language) equals a > particular value ("JA"). > > I am already creating my row keys using that column in the first bytes. And > I do my scans using partial row matching, like this... > > public static byte[] calculateStartRowKey(String language) { > int languageHash = language.length() > 0 ? language.hashCode() : 0; > byte[] language2 = Bytes.toBytes(languageHash); > byte[] accountID2 = Bytes.toBytes(0); > byte[] timestamp2 = Bytes.toBytes(0); > return Bytes.add(Bytes.add(language2, accountID2), timestamp2); > } > > public static byte[] calculateEndRowKey(String language) { > int languageHash = language.length() > 0 ? language.hashCode() : 0; > byte[] language2 = Bytes.toBytes(languageHash + 1); > byte[] accountID2 = Bytes.toBytes(0); > byte[] timestamp2 = Bytes.toBytes(0); > return Bytes.add(Bytes.add(language2, accountID2), timestamp2); > } > > Scan scan = new Scan(calculateStartRowKey(language), > calculateEndRowKey(language)); > > > Since I am using a hash value for the string, I need to re-check the column > to make sure that some other string does not get the same hash value > > Filter filter = new SingleColumnValueFilter(resultFamily, languageCol, > CompareFilter.CompareOp.EQUAL, Bytes.toBytes(language)); > scan.setFilter(filter); > > I am using the Cloudera 0.09.4 release, and a cluster of 3 machines on EC2. > > I think that this should be really fast, but it is not. Any advice on how to > debug/speed it up? > > Thanks > Peter > > > >