*>"What's huge? Number of gigs, ballpark."* Data is in the range of 30-40 GB per calendar day per data source if we consider usage sources like SWITCH or IN, and in the range of 5-10 GB for non usage ones like Billing etc. And we use multiple source correlation on the aforesaid data per day.
>"*The cassandra row-cache is LRU, and the page cache of OS:es is >"LRU:ish" (but generally you might see evictions at any time when >unlucky).*" As it is with telecom data records, even the records which have a high occurrence (if we measure the stats after a certain period of time - say EOD) do not always follow a "frequently used" pattern. So, we decided that we require some sort of "list-based-caching" instead of LRU - so that we have a control on which ones we actually *want* to keep in memory and which we dont. >"*If you use an external cache, keep in mind that you instantly have the >problem that the cache can become inconsistent with data in Cassandra.*" Yaa... thats the reason why I'm trying to find out whether Cassandra itself has some trick to do it (maybe, some sort of configuration/list support for row-caching - wishful thinking!) Any suggestions? -SG. On Fri, Jul 15, 2011 at 9:39 PM, Peter Schuller <peter.schul...@infidyne.com > wrote: > > As we work on telecom data records (voice call/sms/GPRS xDRs), the data > > volume is simply HUGE, and we definitely need a “controlled” caching > > mechanism in front of the Cassandra layer. > > What's huge? Number of gigs, ballpark. > > > By the term “controlled cache layer”, what I am trying to suggest is > > something like maybe maintaining a list of most high-usage (and > therefore, > > high occurrence) phone numbers somewhere, and the cache layer will hold > all > > live data and counters for those numbers in memory. Therefore, all > > The cassandra row-cache is LRU, and the page cache of OS:es is > "LRU:ish" (but generally you might see evictions at any time when > unlucky). > > If you use an external cache, keep in mind that you instantly have the > problem that the cache can become inconsistent with data in Cassandra. > You may also want to wait for the off-heap row cache support to be in > a released version to be more efficient w.r.t. memory usage and GC > overhead than the normal row caching behavior. > > But before asking what the appropriate external cache is, make sure > you actually do need one first since the lack of guaranteed > consistency with the Cassandra cluster is usually something that is > nice to avoid. > > -- > / Peter Schuller (@scode on twitter) > -- Get me at GMail --> sumanthewhiz[at]gmail[dot]com ... or there's Yahoo --> sumanthewhiz[at]yahoo[dot]com