Hi All, I'm trying to understand how different configuration will affect performance for my use cases. My table has the following the following schema. I'm storing event logs in a single column family. The row key is in the format [company][timestamp][uuid].
My access pattern is fairly simple. Every X retrieve the last X worth of events. The X is typically small... e.g. Every min give me the last min of events or every hour give me the last hour of events. Occasionally, I might request historical data, e.g. Give me all events from August 2012. I need the queries requesting the most recent data to be really fast and am ok with the historical queries being slow. The configuration options I'm interested in are: scanner-caching and block-cache usage. I noticed in the Java api to create column families that there is an option to "setCacheDataOnWrite". What does this do exactly? It's also recommended that for sequential queries, the blockCache on scan be disabled. How does scanner caching work? Is this per Scan or is it a shared cache? Does scanner caching use the same cache as the block cache? If I have multiple Scan's with caching enabled AND it's a shared cache how does eviction work? Ideally I always want the most recently written data to be in the cache with as few cache evictions as possible. For my use case, if I want the best performance to be on the most recent events, what configuration of block cache and scanner caching should I use? Thanks in advance. - Pradeep