[ https://issues.apache.org/jira/browse/CASSANDRA-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benedict resolved CASSANDRA-6709. --------------------------------- Resolution: Later Closing as the new sstable format most likely makes this unnecessary by eliminating the need for a separate key cache, although we *may* want to revisit this at some point afterwards since a separate cache could still be beneficial by improving memory occupancy rate, so closing as later instead of duplicate. > Changes to KeyCache > ------------------- > > Key: CASSANDRA-6709 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6709 > Project: Cassandra > Issue Type: Improvement > Reporter: Benedict > Priority: Minor > > It seems to me that KeyCache can be improved in a number of ways, but first > let's state the basic goal of KeyCache: to reduce the average query response > time by providing an exact seek position in a file for a given key. > As it stands, KeyCache is both 100% accurate, but requires a lot of overhead > per entry. > I propose to make KeyCache *mostly* accurate (say 99.9999%), by which I means > it will always fail accurately, but may rarely return an incorrect address, > and code the end users of it to be able to retry to confirm they seeked to > the correct position in the file, and to retry without the cache if they did > not. > The advantage of this is that we can both take the cache off-heap easily, and > pack a lot more items into the cache. If we permit collisions across files > and simply use the (full 128-bit) murmur hash of the key + cfId + file > generation, we should get enough uniqueness to rarely have an erroneuous > collision, however we will be using only 20 bytes per key, instead of the > current 100 + <key length> bytes. This should allow us to answer far more > queries from the key cache than before, so the positive improvement to > performance should be greater than the negative drain. > For the structure I propose an associative cache, where a single contiguous > address space is broken up into regions of, say, 8 entries, plus one counter. > The counter tracks the recency of access of each of the entries, so that on > write the least recently accessed/written can be replaced. A linear probe > within the region is used to determine if the entry we're looking for is > present. This should be very quick, as the entire region should fit into one > or two lines of L1. > Advantage: we may see 5x bump in cache hit-rate, or even more for large keys. -- This message was sent by Atlassian JIRA (v6.2#6252)