Re: retrieving large number of docs

Robust Links Wed, 03 Jun 2015 12:33:18 -0700

Hi Erick

they are on the same JVM. I had already tried the core join strategy but
that doesnt solve the faceting problem... i.e if i have 2 cores, core0 and
core1, and I run this query on core0


/select?&q=<QUERY>fq={!join from=id1 to=id2
fromIndex=core1}&facet=true&facet.field=tag

has 2 problems
1) i need to specify the docIDs with the fq (so back to the same
fq={!terms} problem), and
2) faceting doesnt work


Flattening the data is not possible due to security reasons.

Am I using join correctly?

thank you Erick

Peyman

On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Are these indexes on different machines? Because if they're in the
> same JVM, you might be able to use cross-core joins. Be aware, though,
> that joining on high-cardinality fields (which, by definition, docID
> probably is) is where pseudo joins perform worst.
>
> Have you considered flattening the data and including whatever
> information you have in your "from" index in your main index? Because
> < 100ms response is probably not going to be tough if you have to have
> two indexes/cores.
>
> Best,
> Erick
>
> On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein <joels...@gmail.com>
> wrote:
> > You may have to do something custom to meet your needs.
> >
> > 10,000 DocID's is not huge but you're latency requirement are pretty low.
> >
> > Are your DocID's by any chance integers? This can make custom PostFilters
> > run much faster.
> >
> > You should also be aware of the Streaming API in Solr 5.1 which will give
> > you fast Map/Reduce approaches (
> >
> http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
> ).
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Wed, Jun 3, 2015 at 1:46 PM, Robust Links <pey...@robustlinks.com>
> wrote:
> >
> >> Hey Joel
> >>
> >> see below
> >>
> >> On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein <joels...@gmail.com>
> wrote:
> >>
> >> > A few questions for you:
> >> >
> >> > How large can the list of filtering ID's be?
> >> >
> >>
> >> >> 10k
> >>
> >>
> >> >
> >> > What's your expectation on latency?
> >> >
> >>
> >> 10> latency <100
> >>
> >>
> >> >
> >> > What version of Solr are you using?
> >> >
> >>
> >> 5.0.0
> >>
> >>
> >> >
> >> > SolrCloud or not?
> >> >
> >>
> >> not
> >>
> >>
> >>
> >> >
> >> > Joel Bernstein
> >> > http://joelsolr.blogspot.com/
> >> >
> >> > On Wed, Jun 3, 2015 at 1:23 PM, Robust Links <pey...@robustlinks.com>
> >> > wrote:
> >> >
> >> > > Hi
> >> > >
> >> > > I have a set of document IDs from one core and i want to query
> another
> >> > core
> >> > > using the ids retrieved from the first core...the constraint is that
> >> the
> >> > > size of doc ID set can be very large. I want to:
> >> > >
> >> > > 1) retrieve these docs from the 2nd index
> >> > > 2) facet on the results
> >> > >
> >> > > I can think of 3 solutions:
> >> > >
> >> > > 1) boolean query
> >> > > 2) terms fq
> >> > > 3) use a DB rather than Solr
> >> > >
> >> > > I am trying to keep latencies down so prefer to not use (3). The
> >> problem
> >> > > with (1) is maxBooleanclauses is hardwired and I am not sure when I
> >> will
> >> > > hit the exception. Option (2) seems to also hit limits.. so if I do
> >> > >
> >> > > select?fl=*&q=*:*&facet=true&facet.field=title&fq={!terms
> >> > > f=id}<LONG_LIST_OF_IDS>
> >> > >
> >> > > solr just goes blank. I have tried adding cost=200 to try to run the
> >> > query
> >> > > first fq={!terms f=id cost=200} but still no good. Paging on doc IDs
> >> > could
> >> > > be a solution but the problem then is that the faceting results
> >> > correspond
> >> > > to the paged IDs and not the global set.
> >> > >
> >> > > My filter cache spec is as follows
> >> > >
> >> > >   <filterCache class="solr.FastLRUCache"
> >> > >                  size="1000000"
> >> > >                  initialSize="1000000"
> >> > >                  autowarmCount="100000"/>
> >> > >
> >> > >
> >> > > What would be the best way for me to solve this problem?
> >> > >
> >> > > thank you
> >> > >
> >> >
> >>
>

Re: retrieving large number of docs

Reply via email to