Tables "in memory", not cached, etc. - performance advice needed

Otis Gospodnetic Wed, 19 Jan 2011 10:41:17 -0800

Hi,

I've searched high and low (including on our own search-hadoop.com, Google, 
HBase wiki), but could not find the details around marking a table as being "in 
memory".  This recent ML response from Lars was the best I could find: 
http://search-hadoop.com/m/0S2mB1QDpIh


Also, my use-case is similar to what o.p. in that thread described:

* 1 big table with raw data that is constantly being written to and from which 
a 
MR job reads some small percentage of rows every N minutes.  The MR job never 
reads the same row twice - it reads only rows inserted after its last run.

* 1 smaller table that the MR job writes to every N minutes and from which data 
is read via scans by users.


It sounds like the main suggestions are:
1) don't foolishly waste precious RAM/heap on the big table whose rows are read 
just once
2) mark the smaller and frequently scanned table as "in memory"

Would anyone have any other performance-related advice that may be applicable 
to 
this specific setup?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

Tables "in memory", not cached, etc. - performance advice needed

Reply via email to