Hi there- Quick sanity check: what caching level are you using? (default is 1) I know this is basic, but it's always good to double-check.
If "language" is already in the lead position of the rowkey, why use the filter? As for EC2, that's a wildcard. On 1/25/12 7:56 AM, "Peter Wolf" <[email protected]> wrote: >Hello all, > >I am looking for advice on speeding up my Scanning. > >I want to iterate over all rows where a particular column (language) >equals a particular value ("JA"). > >I am already creating my row keys using that column in the first bytes. >And I do my scans using partial row matching, like this... > > public static byte[] calculateStartRowKey(String language) { > int languageHash = language.length() > 0 ? language.hashCode() : >0; > byte[] language2 = Bytes.toBytes(languageHash); > byte[] accountID2 = Bytes.toBytes(0); > byte[] timestamp2 = Bytes.toBytes(0); > return Bytes.add(Bytes.add(language2, accountID2), timestamp2); > } > > public static byte[] calculateEndRowKey(String language) { > int languageHash = language.length() > 0 ? language.hashCode() : >0; > byte[] language2 = Bytes.toBytes(languageHash + 1); > byte[] accountID2 = Bytes.toBytes(0); > byte[] timestamp2 = Bytes.toBytes(0); > return Bytes.add(Bytes.add(language2, accountID2), timestamp2); > } > > Scan scan = new Scan(calculateStartRowKey(language), >calculateEndRowKey(language)); > > >Since I am using a hash value for the string, I need to re-check the >column to make sure that some other string does not get the same hash >value > > Filter filter = new SingleColumnValueFilter(resultFamily, >languageCol, CompareFilter.CompareOp.EQUAL, Bytes.toBytes(language)); > scan.setFilter(filter); > >I am using the Cloudera 0.09.4 release, and a cluster of 3 machines on >EC2. > >I think that this should be really fast, but it is not. Any advice on >how to debug/speed it up? > >Thanks >Peter > > > > >
