Hi,

Sorry for the newbie question, but seeing how HBase-related operations (all
HBase scans, I believe?) take a while in Nutch 2.x I thought I'd ask - what
are the HBase keys like?

That is:
Are they designed in such a way that they lend themselves to fast scans,
fast writes, and avoid RegionServer hotspotting?

http://hbase.apache.org/book/perf.reading.html seems to contain a good
number of performance-oriented HBase tips that, to me, sound like they are
applicable to how Nutch uses HBase.

For example:
Maybe scan.setCacheBlocks(false) should be called if there is no point in
caching blocks?  Or maybe block cache is valuable and its size should be
specifically set?

Or maybe put.setWriteToWAL(false) should be used to speed up writes if one
is OK living without a WAL?

Any feedback/tips would be appreciated.

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

Reply via email to