I have a project where we need to search 1B docs and still have results <
700ms. The problem is, we are using geofiltering and that is happening *
before* the queries, so we have to geofilter on the 1B docs to restrict our
set of docs first, and then do the query on a name field. But it seems that
it would be better and faster to run the main query first, and only then
filter out that subset of docs by geo. Here is what a typical query looks
like:

?shards=<list of 20 nodes>
&q={!boost
b=sum(recip(geodist(geo_lat_long,38.2493581,-122.0399663),1,1,1))}(given_name:Barack
OR given_name_exact:Barack^4.0) AND family_name:Obama
&fq={!geofilt pt=38.2493581,-122.0399663 sfield=geo_lat_long d=120}
&fq=(-source:somedatasource)
&rows=4
QTime=1040

I've looked at the "cache=false" param, and the "cost=" param, but that's
not going to help much because we still have to do the filtering. (We
*will* use
"cache=false" to avoid the overhead of caching queries that will very
rarely be the same.)

Is there any way to indicate a filter query should happen *after* the other
results? The other fq on source restricts the docset somewhat, but
different variations don't eliminate a high number of docs, so we could use
the "cost" param to run the fq on source before the fq on geo, but it would
only help very minimally in some cases.


Thanks,
-Jay

Reply via email to