Hi Erick they are on the same JVM. I had already tried the core join strategy but that doesnt solve the faceting problem... i.e if i have 2 cores, core0 and core1, and I run this query on core0
/select?&q=<QUERY>fq={!join from=id1 to=id2 fromIndex=core1}&facet=true&facet.field=tag has 2 problems 1) i need to specify the docIDs with the fq (so back to the same fq={!terms} problem), and 2) faceting doesnt work Flattening the data is not possible due to security reasons. Am I using join correctly? thank you Erick Peyman On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Are these indexes on different machines? Because if they're in the > same JVM, you might be able to use cross-core joins. Be aware, though, > that joining on high-cardinality fields (which, by definition, docID > probably is) is where pseudo joins perform worst. > > Have you considered flattening the data and including whatever > information you have in your "from" index in your main index? Because > < 100ms response is probably not going to be tough if you have to have > two indexes/cores. > > Best, > Erick > > On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein <joels...@gmail.com> > wrote: > > You may have to do something custom to meet your needs. > > > > 10,000 DocID's is not huge but you're latency requirement are pretty low. > > > > Are your DocID's by any chance integers? This can make custom PostFilters > > run much faster. > > > > You should also be aware of the Streaming API in Solr 5.1 which will give > > you fast Map/Reduce approaches ( > > > http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html > ). > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Wed, Jun 3, 2015 at 1:46 PM, Robust Links <pey...@robustlinks.com> > wrote: > > > >> Hey Joel > >> > >> see below > >> > >> On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein <joels...@gmail.com> > wrote: > >> > >> > A few questions for you: > >> > > >> > How large can the list of filtering ID's be? > >> > > >> > >> >> 10k > >> > >> > >> > > >> > What's your expectation on latency? > >> > > >> > >> 10> latency <100 > >> > >> > >> > > >> > What version of Solr are you using? > >> > > >> > >> 5.0.0 > >> > >> > >> > > >> > SolrCloud or not? > >> > > >> > >> not > >> > >> > >> > >> > > >> > Joel Bernstein > >> > http://joelsolr.blogspot.com/ > >> > > >> > On Wed, Jun 3, 2015 at 1:23 PM, Robust Links <pey...@robustlinks.com> > >> > wrote: > >> > > >> > > Hi > >> > > > >> > > I have a set of document IDs from one core and i want to query > another > >> > core > >> > > using the ids retrieved from the first core...the constraint is that > >> the > >> > > size of doc ID set can be very large. I want to: > >> > > > >> > > 1) retrieve these docs from the 2nd index > >> > > 2) facet on the results > >> > > > >> > > I can think of 3 solutions: > >> > > > >> > > 1) boolean query > >> > > 2) terms fq > >> > > 3) use a DB rather than Solr > >> > > > >> > > I am trying to keep latencies down so prefer to not use (3). The > >> problem > >> > > with (1) is maxBooleanclauses is hardwired and I am not sure when I > >> will > >> > > hit the exception. Option (2) seems to also hit limits.. so if I do > >> > > > >> > > select?fl=*&q=*:*&facet=true&facet.field=title&fq={!terms > >> > > f=id}<LONG_LIST_OF_IDS> > >> > > > >> > > solr just goes blank. I have tried adding cost=200 to try to run the > >> > query > >> > > first fq={!terms f=id cost=200} but still no good. Paging on doc IDs > >> > could > >> > > be a solution but the problem then is that the faceting results > >> > correspond > >> > > to the paged IDs and not the global set. > >> > > > >> > > My filter cache spec is as follows > >> > > > >> > > <filterCache class="solr.FastLRUCache" > >> > > size="1000000" > >> > > initialSize="1000000" > >> > > autowarmCount="100000"/> > >> > > > >> > > > >> > > What would be the best way for me to solve this problem? > >> > > > >> > > thank you > >> > > > >> > > >> >