Hi, Sorry for the newbie question, but seeing how HBase-related operations (all HBase scans, I believe?) take a while in Nutch 2.x I thought I'd ask - what are the HBase keys like?
That is: Are they designed in such a way that they lend themselves to fast scans, fast writes, and avoid RegionServer hotspotting? http://hbase.apache.org/book/perf.reading.html seems to contain a good number of performance-oriented HBase tips that, to me, sound like they are applicable to how Nutch uses HBase. For example: Maybe scan.setCacheBlocks(false) should be called if there is no point in caching blocks? Or maybe block cache is valuable and its size should be specifically set? Or maybe put.setWriteToWAL(false) should be used to speed up writes if one is OK living without a WAL? Any feedback/tips would be appreciated. Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/

