The size increase that I was noticing was being held onto by the keys in the LRUCache, *after* the fix. :)
Worst case scenario (and most frequent) my DisMax queries are using up to 5 query fields (qf) with different boosts associated with them. They are also have up to 4 phrase fields (pf) to boost if terms are found close to each other. The minimum match (mm) is varying from 1 to 100%. So, if there is no canonical representation, than shouldn't the original query URL be enough to reproduce the query, as well as act as a cache key? I do see some overhead in parsing the query again, but that should only be seen during warming, and hopefully(?) isn't huge. Granted, the URL doesn't encompass all the details that the DisMax queries have in them, but the key only has to last the lifetime of the object in the cache. If the DisMax settings are changed, the server would need a restart and the caches would clear anyway. I suspect that just using the original query URL as a key would be even lighter weight on simple queries, and has the added benefit of dealing with hashcode clashes very rapidly (equals() being fast on Strings). -Todd -----Original Message----- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 29, 2008 8:35 AM To: solr-dev@lucene.apache.org Subject: Re: Query Cache Memory Usage - Could it be better? : We've recently switched over to using a more complex DisMax search for : some of our queries. In the process of measuring the performance impact, : I noticed that my Query Cache had grown considerably with these new : Queries. I made sure to use the same data set for before and after I would ask if maybe the cche size increase is coming from the recently discovered bug where dismax (and other types of queries) weren't getting unique hashCodes (SOLR-805) but since you are the one that reported that bug, i'm going to guess you already ruled that out with a local patch :) : comparisons, and the same query set where I modified the queries to use : the new search (qt=foo). Before, my queries took up an average of 800 : bytes. Now they take an average of 3500 bytes. Overall, that will mean I : will hold 75% less queries in my query cache then I used to. And the : query cache is very important for performance. do make any osrt of fair comparison, we need to understand what types of queries you were executing against hte stnadard request handler and how thta compares with *all* the options you are using on the dismax handler (ie: standard q vs dismax q,pf,qf,bf,bq) ... i suspect the dismax quiers are much bigger because they are much more complicated. : is, could we decrease the memory footprint of the queries in the cache, : by *not* holding onto the Query object itself in the cache at all. Could : we turn it into a canonical string that represents the query, and use : that for the key in the cache? I'm assuming that would take up much less : memory, at least in the case that I'm experiencing. part of the problem is that there is no canonical string for an arbitrary query arg -- and even if the strings were unique, there is no one parser that can reconstruct a Query objects from it's string representation (which would be neccessary for autowarming) but i suspect if you are seeing that big of a differencee in the relative memory consumption of the Query objects in the cache, it's going to be because the dismax queries are that much more complicated then what you were using before. -Hoss