Hi, I'm not intimately familiar with Nutch 2.x's use of HBase, but it seems a lot of time is spend on various scans, even with improvements in GORA-119 and such.
Would http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/ help speed things up? If you look at the comments, you'll see the author of Phoenix, who essentially borrowed this idea and added it to Phoenix with great success. See: http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix .html https://issues.apache.org/jira/browse/HBASE-6618 For those of you who know more about how Nutch 2.x uses HBase - could this help speed things up? Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/

