Hi,

I'm not intimately familiar with Nutch 2.x's use of HBase, but it seems a
lot of time is spend on various scans, even with improvements in GORA-119
and such.

Would
http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/
help
speed things up?

If you look at the comments, you'll see the author of Phoenix, who
essentially borrowed this idea and added it to Phoenix with great success.

See:
http://phoenix-hbase.blogspot.com/2013/05/demystifying-skip-scan-in-phoenix
.html
https://issues.apache.org/jira/browse/HBASE-6618

For those of you who know more about how Nutch 2.x uses HBase - could this
help speed things up?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

Reply via email to