Hi, I've searched high and low (including on our own search-hadoop.com, Google, HBase wiki), but could not find the details around marking a table as being "in memory". This recent ML response from Lars was the best I could find: http://search-hadoop.com/m/0S2mB1QDpIh
Also, my use-case is similar to what o.p. in that thread described: * 1 big table with raw data that is constantly being written to and from which a MR job reads some small percentage of rows every N minutes. The MR job never reads the same row twice - it reads only rows inserted after its last run. * 1 smaller table that the MR job writes to every N minutes and from which data is read via scans by users. It sounds like the main suggestions are: 1) don't foolishly waste precious RAM/heap on the big table whose rows are read just once 2) mark the smaller and frequently scanned table as "in memory" Would anyone have any other performance-related advice that may be applicable to this specific setup? Thanks, Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
