Re: retrieving large number of docs

Erick Erickson Wed, 03 Jun 2015 11:13:18 -0700

Are these indexes on different machines? Because if they're in the
same JVM, you might be able to use cross-core joins. Be aware, though,
that joining on high-cardinality fields (which, by definition, docID
probably is) is where pseudo joins perform worst.


Have you considered flattening the data and including whatever
information you have in your "from" index in your main index? Because
< 100ms response is probably not going to be tough if you have to have
two indexes/cores.

Best,
Erick

On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein <joels...@gmail.com> wrote:
> You may have to do something custom to meet your needs.
>
> 10,000 DocID's is not huge but you're latency requirement are pretty low.
>
> Are your DocID's by any chance integers? This can make custom PostFilters
> run much faster.
>
> You should also be aware of the Streaming API in Solr 5.1 which will give
> you fast Map/Reduce approaches (
> http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html).
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Jun 3, 2015 at 1:46 PM, Robust Links <pey...@robustlinks.com> wrote:
>
>> Hey Joel
>>
>> see below
>>
>> On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein <joels...@gmail.com> wrote:
>>
>> > A few questions for you:
>> >
>> > How large can the list of filtering ID's be?
>> >
>>
>> >> 10k
>>
>>
>> >
>> > What's your expectation on latency?
>> >
>>
>> 10> latency <100
>>
>>
>> >
>> > What version of Solr are you using?
>> >
>>
>> 5.0.0
>>
>>
>> >
>> > SolrCloud or not?
>> >
>>
>> not
>>
>>
>>
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>> >
>> > On Wed, Jun 3, 2015 at 1:23 PM, Robust Links <pey...@robustlinks.com>
>> > wrote:
>> >
>> > > Hi
>> > >
>> > > I have a set of document IDs from one core and i want to query another
>> > core
>> > > using the ids retrieved from the first core...the constraint is that
>> the
>> > > size of doc ID set can be very large. I want to:
>> > >
>> > > 1) retrieve these docs from the 2nd index
>> > > 2) facet on the results
>> > >
>> > > I can think of 3 solutions:
>> > >
>> > > 1) boolean query
>> > > 2) terms fq
>> > > 3) use a DB rather than Solr
>> > >
>> > > I am trying to keep latencies down so prefer to not use (3). The
>> problem
>> > > with (1) is maxBooleanclauses is hardwired and I am not sure when I
>> will
>> > > hit the exception. Option (2) seems to also hit limits.. so if I do
>> > >
>> > > select?fl=*&q=*:*&facet=true&facet.field=title&fq={!terms
>> > > f=id}<LONG_LIST_OF_IDS>
>> > >
>> > > solr just goes blank. I have tried adding cost=200 to try to run the
>> > query
>> > > first fq={!terms f=id cost=200} but still no good. Paging on doc IDs
>> > could
>> > > be a solution but the problem then is that the faceting results
>> > correspond
>> > > to the paged IDs and not the global set.
>> > >
>> > > My filter cache spec is as follows
>> > >
>> > >   <filterCache class="solr.FastLRUCache"
>> > >                  size="1000000"
>> > >                  initialSize="1000000"
>> > >                  autowarmCount="100000"/>
>> > >
>> > >
>> > > What would be the best way for me to solve this problem?
>> > >
>> > > thank you
>> > >
>> >
>>

Re: retrieving large number of docs

Reply via email to