Hi Lin, On Thu, Aug 23, 2012 at 7:56 PM, Lin Ma <[email protected]> wrote: > Harsh, thanks for the detailed information. > > Two more comments, > > 1. I want to confirm my understanding is correct. At the beginning client > cache has nothing, when it issue request for a table, if the region server > location is not known, it will request from root META region to get region > server information step by step, then cache the region server information. > If cache already contain the requested region information, it will use > directly from cache. In this way, cache grows when cache miss for requested > region information;
You have it correct now. Region locations are cached only if they are not available. And they are cached on need-basis, not all at once. > 2. "far outweighs the other items it caches (scan results, etc.)", you mean > GET API of HBase cache results? Sorry I am not aware of this feature before. > How the results are cached, and whether we can control it (supposing a > client is doing random read pattern, we do not want to cache information > since each read may be unique row-key access)? Appreciate if you could point > me to some more detailed information. Am speaking of Scanner value caching, not Gets exactly. See more about Scanner (client) caching at http://hbase.apache.org/book.html#perf.hbase.client.caching > regards, > Lin > > > On Thu, Aug 23, 2012 at 9:35 PM, Harsh J <[email protected]> wrote: >> >> Hi Lin, >> >> On Thu, Aug 23, 2012 at 4:31 PM, Lin Ma <[email protected]> wrote: >> > Thank you Abhishek, >> > >> > Two more comments, >> > >> > -- "Client only caches information as needed for its queries and not >> > necessarily for 'all' region servers." -- how did client know which >> > region >> > server information is necessary to be cached in current HBase >> > implementation? >> >> What Abhishek meant here is that it caches only the needed table's >> rows from META. It also only caches the specific region required for >> the row you're looking up/operating on, AFAICT. >> >> > -- When the client loads region server information for the first time? >> > Did >> > client persistent cache information at client side about region server >> > information? >> >> The client loads up regionserver information for a table, when it is >> requested to perform an operation on that table (on a specific row or >> the whole). It does not immediately, upon initialization, cache the >> whole of META's contents. >> >> Your question makes sense though, that it does seem to be such that a >> client *may* use quite a bit of memory space in trying to cache the >> META entries locally, but practically we've not had this cause issues >> for users yet. The amount of memory cached for META far outweighs the >> other items it caches (scan results, etc.). At least I have not seen >> any reports of excessive client memory usage just due to region >> locations of tables being cached. >> >> I think there's more benefits storing/caching it than not doing so, >> and so far we've not needed the extra complexity of persisting the >> cache to a local or non-RAM storage than keeping it in memory. >> >> -- >> Harsh J > > -- Harsh J
