Does it make sense to have better defaults so the performance out of the box is better?
~Jeff On 1/25/2012 8:06 AM, Peter Wolf wrote:
Ah ha! I appear to be insane ;-) Adding the following speeded things up quite a bit scan.setCacheBlocks(true); scan.setCaching(1000); Thank you, it was a duh! P On 1/25/12 8:13 AM, Doug Meil wrote:Hi there- Quick sanity check: what caching level are you using? (default is 1) I know this is basic, but it's always good to double-check. If "language" is already in the lead position of the rowkey, why use the filter? As for EC2, that's a wildcard. On 1/25/12 7:56 AM, "Peter Wolf"<[email protected]> wrote:Hello all, I am looking for advice on speeding up my Scanning. I want to iterate over all rows where a particular column (language) equals a particular value ("JA"). I am already creating my row keys using that column in the first bytes. And I do my scans using partial row matching, like this... public static byte[] calculateStartRowKey(String language) { int languageHash = language.length()> 0 ? language.hashCode() : 0; byte[] language2 = Bytes.toBytes(languageHash); byte[] accountID2 = Bytes.toBytes(0); byte[] timestamp2 = Bytes.toBytes(0); return Bytes.add(Bytes.add(language2, accountID2), timestamp2); } public static byte[] calculateEndRowKey(String language) { int languageHash = language.length()> 0 ? language.hashCode() : 0; byte[] language2 = Bytes.toBytes(languageHash + 1); byte[] accountID2 = Bytes.toBytes(0); byte[] timestamp2 = Bytes.toBytes(0); return Bytes.add(Bytes.add(language2, accountID2), timestamp2); } Scan scan = new Scan(calculateStartRowKey(language), calculateEndRowKey(language)); Since I am using a hash value for the string, I need to re-check the column to make sure that some other string does not get the same hash value Filter filter = new SingleColumnValueFilter(resultFamily, languageCol, CompareFilter.CompareOp.EQUAL, Bytes.toBytes(language)); scan.setFilter(filter); I am using the Cloudera 0.09.4 release, and a cluster of 3 machines on EC2. I think that this should be really fast, but it is not. Any advice on how to debug/speed it up? Thanks Peter
-- Jeff Whiting Qualtrics Senior Software Engineer [email protected]
