The size increase that I was noticing was being held onto by the keys in
the LRUCache, *after* the fix. :)

Worst case scenario (and most frequent) my DisMax queries are using up
to 5 query fields (qf) with different boosts associated with them. They
are also have up to 4 phrase fields (pf) to boost if terms are found
close to each other. The minimum match (mm) is varying from 1 to 100%.

So, if there is no canonical representation, than shouldn't the original
query URL be enough to reproduce the query, as well as act as a cache
key? I do see some overhead in parsing the query again, but that should
only be seen during warming, and hopefully(?) isn't huge. Granted, the
URL doesn't encompass all the details that the DisMax queries have in
them, but the key only has to last the lifetime of the object in the
cache. If the DisMax settings are changed, the server would need a
restart and the caches would clear anyway.

I suspect that just using the original query URL as a key would be even
lighter weight on simple queries, and has the added benefit of dealing
with hashcode clashes very rapidly (equals() being fast on Strings).

-Todd

-----Original Message-----
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 29, 2008 8:35 AM
To: solr-dev@lucene.apache.org
Subject: Re: Query Cache Memory Usage - Could it be better?


: We've recently switched over to using a more complex DisMax search for
: some of our queries. In the process of measuring the performance
impact,
: I noticed that my Query Cache had grown considerably with these new
: Queries. I made sure to use the same data set for before and after

I would ask if maybe the cche size increase is coming from the recently 
discovered bug where dismax (and other types of queries) weren't getting

unique hashCodes (SOLR-805) but since you are the one that reported that

bug, i'm going to guess you already ruled that out with a local patch :)

: comparisons, and the same query set where I modified the queries to
use
: the new search (qt=foo). Before, my queries took up an average of 800
: bytes. Now they take an average of 3500 bytes. Overall, that will mean
I
: will hold 75% less queries in my query cache then I used to. And the
: query cache is very important for performance.

do make any osrt of fair comparison, we need to understand what types of

queries you were executing against hte stnadard request handler and how 
thta compares with *all* the options you are using on the dismax handler

(ie: standard q vs dismax q,pf,qf,bf,bq) ... i suspect the dismax quiers

are much bigger because they are much more complicated.

: is, could we decrease the memory footprint of the queries in the
cache,
: by *not* holding onto the Query object itself in the cache at all.
Could
: we turn it into a canonical string that represents the query, and use
: that for the key in the cache? I'm assuming that would take up much
less
: memory, at least in the case that I'm experiencing.

part of the problem is that there is no canonical string for an
arbitrary 
query arg -- and even if the strings were unique, there is no one parser

that can reconstruct a Query objects from it's string representation 
(which would be neccessary for autowarming)

but i suspect if you are seeing that big of a differencee in the
relative 
memory consumption of the Query objects in the cache, it's going to be 
because the dismax queries are that much more complicated then what you 
were using before.



-Hoss


Reply via email to