Re: Query vs Filter Query Usage

2011-08-25 Thread Joshua Harness
Erick -

Thanks for the insight. Does the filter cache just cache the internal
document id's of the result set, correct (as opposed to the document)? If
so, am I correct in the following math:

10,000,000 document index
Internal Document id is 32 bit unsigned int
Max Memory Used by a single cache slot in the filter cache = 32 bits x
10,000,000 docs = 320,000,000 bits or 38 MB

Of course, I realize there some additional overhead if we're dealing with
Integer objects as opposed to primitives -- and I'm way off if the internal
document id is implemented as a long.

Also, does SOLR fail gracefully when an OOM occurs (e.g. the cache fails but
the query still succeeds)?

Thanks!

Josh

On Thu, Aug 25, 2011 at 2:55 PM, Erick Erickson wrote:

> The pitfalls of filter queries is also their strength. The results will be
> cached and re-used if possible. This will take some memory,
> of course. Depending upon how big your index is, this could
> be quite a lot.
>
> Yet another time/space tradeoff But yeah, use filter queries
> until you have OOMs, then get more memory ...
>
> Best
> Erick
>
> On Wed, Aug 24, 2011 at 8:07 PM, Joshua Harness 
> wrote:
> > Shawn -
> >
> > Thanks for your reply. Given that my application is mainly used as
> > faceted search, would the following types of queries make sense or are
> there
> > other pitfalls to consider?
> >
> > *q=*:*&fq=someField:someValue&fq=anotherField:anotherValue*
> >
> > Thanks!
> >
> > Josh
> >
> > On Wed, Aug 24, 2011 at 4:48 PM, Shawn Heisey  wrote:
> >
> >> On 8/24/2011 2:02 PM, Joshua Harness wrote:
> >>
> >>>  I've done some basic query performance testing on my SOLR
> instance,
> >>> which allows users to search via a faceted search interface. As such,
> >>> document relevancy is less important to me since I am performing exact
> >>> match
> >>> searching. Comparing using filter queries with a plain query has
> yielded
> >>> remarkable performance.  However, I'm suspicious of statements like
> >>> 'always
> >>> use filter queries since they are so much faster'. In my experience,
> >>> things
> >>> are never so straightforward. Can anybody provide any further guidance?
> >>> What
> >>> are the pitfalls of relying heavily on filter queries? When would one
> want
> >>> to use plain vanilla SOLR queries as opposed to filter queries?
> >>>
> >>
> >> Completely separate from any performance consideration, the key to their
> >> usage lies in their name:  They are filters.  They are particularly
> useful
> >> in a faceted situation, because you can have more than one of them, and
> the
> >> overall result is the intersection (AND) of them all.
> >>
> >> When someone tells the interface to restrict their search by a facet,
> you
> >> can simply add a filter query with the field:value relating to that
> facet
> >> and reissue the query.  If they decide to remove that restriction, you
> just
> >> have to remove the filter query.  You don't have to try and combine the
> >> various pieces in the query, which means you'll have much less hassle
> with
> >> parentheses.
> >>
> >> If you need a union (OR) operation with your filters, you'll have to use
> >> more complex construction within a single filter query, or not use them
> at
> >> all.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >
>


Re: Query vs Filter Query Usage

2011-08-24 Thread Joshua Harness
Shawn -

 Thanks for your reply. Given that my application is mainly used as
faceted search, would the following types of queries make sense or are there
other pitfalls to consider?

*q=*:*&fq=someField:someValue&fq=anotherField:anotherValue*

Thanks!

Josh

On Wed, Aug 24, 2011 at 4:48 PM, Shawn Heisey  wrote:

> On 8/24/2011 2:02 PM, Joshua Harness wrote:
>
>>  I've done some basic query performance testing on my SOLR instance,
>> which allows users to search via a faceted search interface. As such,
>> document relevancy is less important to me since I am performing exact
>> match
>> searching. Comparing using filter queries with a plain query has yielded
>> remarkable performance.  However, I'm suspicious of statements like
>> 'always
>> use filter queries since they are so much faster'. In my experience,
>> things
>> are never so straightforward. Can anybody provide any further guidance?
>> What
>> are the pitfalls of relying heavily on filter queries? When would one want
>> to use plain vanilla SOLR queries as opposed to filter queries?
>>
>
> Completely separate from any performance consideration, the key to their
> usage lies in their name:  They are filters.  They are particularly useful
> in a faceted situation, because you can have more than one of them, and the
> overall result is the intersection (AND) of them all.
>
> When someone tells the interface to restrict their search by a facet, you
> can simply add a filter query with the field:value relating to that facet
> and reissue the query.  If they decide to remove that restriction, you just
> have to remove the filter query.  You don't have to try and combine the
> various pieces in the query, which means you'll have much less hassle with
> parentheses.
>
> If you need a union (OR) operation with your filters, you'll have to use
> more complex construction within a single filter query, or not use them at
> all.
>
> Thanks,
> Shawn
>
>


Query vs Filter Query Usage

2011-08-24 Thread Joshua Harness
All -

 I apologize if this question has been asked before - I couldn't seem to
find a straightforward answer by researching it on google and stackoverflow.
I am trying to understand when I should use filter queries vs plain vanilla
queries.  Here's what I understand:

* Filter queries can be much faster since as of SOLR 1.4 they are
parallelized with the main query and are cached in the filter cache. This is
in contrast with SOLR < 1.4 where the filter query was ran on the doc set
after the main query returned - essentially causing an O(n) operation.
* Filter queries do not affect document score. Use them if one doesn't want
the filter query to impact the score.

 I've done some basic query performance testing on my SOLR instance,
which allows users to search via a faceted search interface. As such,
document relevancy is less important to me since I am performing exact match
searching. Comparing using filter queries with a plain query has yielded
remarkable performance.  However, I'm suspicious of statements like 'always
use filter queries since they are so much faster'. In my experience, things
are never so straightforward. Can anybody provide any further guidance? What
are the pitfalls of relying heavily on filter queries? When would one want
to use plain vanilla SOLR queries as opposed to filter queries?

Thanks!

Josh


SOLR Support for Lucene Nested Documents

2011-08-04 Thread Joshua Harness
I noticed that lucene supports 'Nested Documents'. However - I
couldn't find mention of this feature within SOLR. Does anybody know
how to leverage this lucene feature through SOLR?

Thanks!

Josh Harness


SOLR Support for Span Queries

2011-08-04 Thread Joshua Harness
How does one issue span queries in SOLR (Span, SpanNear, etc)? I've
done a bit of research and it seems that these are not supported. It
would seem that I need to implement a QueryParserPlugin to accomplish
this. Is this the correct path? Surely this has been done before. Does
anybody have links to examples? I had trouble finding anything.

Thanks!

Josh Harness