Re: caching repeated OR'd terms

2010-05-07 Thread Lance Norskog
I would suggest benchmarking this before doing any more complex
design. A field with only 10k unique integer or string values will
search very very quickly.

On Thu, May 6, 2010 at 7:54 AM, Nagelberg, Kallin
 wrote:
> Hey everyone,
>
> I'm having some difficulty figuring out the best way to optimize for a 
> certain query situation. My documents have a many-valued field that stores 
> lists of IDs. All in all there are probably about 10,000 distinct IDs 
> throughout my index. I need to be able to query and find all documents that 
> contain a given set of IDs. Ie, I want to find all documents that contain IDs 
> 3, 202, 3030 or 505. Currently I'm implementing this like so:
>
> q= (myfield:3) OR (myfield:202) OR (myfield:3030) OR (myfield:505).
>
> It's possible that there could be upwards of hundreds of terms, although 90% 
> of the time it will be under 10. Ideally I would like to do this with a 
> filter query, but I have read that it is impossible to cache OR'd terms in a 
> fq, though this feature may come soon. The problem is that the combinations 
> of OR'd terms will almost always be unique, so the query cache will have a 
> very low hit rate. It would be great if the individual terms could be cached 
> individually, but I'm not sure how to accomplish that.
>
> Any suggestions would be welcome!
> -Kallin Nagelberg
>
>



-- 
Lance Norskog
goks...@gmail.com


caching repeated OR'd terms

2010-05-06 Thread Nagelberg, Kallin
Hey everyone,

I'm having some difficulty figuring out the best way to optimize for a certain 
query situation. My documents have a many-valued field that stores lists of 
IDs. All in all there are probably about 10,000 distinct IDs throughout my 
index. I need to be able to query and find all documents that contain a given 
set of IDs. Ie, I want to find all documents that contain IDs 3, 202, 3030 or 
505. Currently I'm implementing this like so:

q= (myfield:3) OR (myfield:202) OR (myfield:3030) OR (myfield:505).

It's possible that there could be upwards of hundreds of terms, although 90% of 
the time it will be under 10. Ideally I would like to do this with a filter 
query, but I have read that it is impossible to cache OR'd terms in a fq, 
though this feature may come soon. The problem is that the combinations of OR'd 
terms will almost always be unique, so the query cache will have a very low hit 
rate. It would be great if the individual terms could be cached individually, 
but I'm not sure how to accomplish that.

Any suggestions would be welcome!
-Kallin Nagelberg