Re: retrieving large number of docs

Robust Links Thu, 04 Jun 2015 08:15:50 -0700

that worked but seem unable to run

1) phrase queries: i.e.


*core1*/select?fl=title&q={!join from=id to=id fromIndex=*core0*}
titleNormalized:"*text pdf*"&facet=true&facet.field=tags

or 2) run filters on core0

*core1*/select?fl=title&q={!join from=id to=id fromIndex=*core0*}
titleNormalized:"*text pdf*"&fq=user:76&facet=true&facet.field=tags

i am thinking a better design is to build a custom searchcomponent on core0
and add it as the last-component to the default search component on core0
(both cores are on the same JVM). the custom core aware component will
access core1 as follows:

// inform of core0 //

public void inform(SolrCore core){

  SolrCore core1 = core.getCoreDescriptor().getCoreContainer().getCore(
"core1");

   SolrIndexSearcher = core1.getNewestSearcher(false).get();

}

then i intercept the default search handler

public void process(ResponseBuilder rb) throws IOException {

   SolrIndexSearcher core0 = rb.req.getSearcher();

   SolrParams params = rb.req.getParams();

   Iterator<Integer> docIt = rb.getResults().docList.iterator();

   String tagname;

   String id;

   while(docIt.hasNext())

   {

    Integer docID = docIt.next();

  id = core0.doc(docID).get("id");

  tagname = doc.search(id);

.....

do faceting on the docs;

     }

   }





On Thu, Jun 4, 2015 at 10:29 AM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Hi Rob,
> according to your use case you have to :
>
> Call the /select from *core1 *in this way* :*
>
> *core1*/select?fl=title&q={!join from=id to=id fromIndex=*core0*}
> titleNormalized:pdf&facet=true&facet.field=tags
>
> Hope this clarify your problem.
>
> Cheers
>
> 2015-06-04 15:00 GMT+01:00 Robust Links <pey...@robustlinks.com>:
>
> > my requirement is to join core1 onto core0. restating the requirements
> > again. I have 2 cores
> >
> > core0
> > --------
> > field:id
> > field: text
> >
> > core1
> > --------
> > field:id
> > field tag
> >
> >
> > I want to
> >
> > 1) query text field of core0, together with filters
> > 2) use the {id} of matches (which can be >>10K) to retrieve the docs in
> > core 1 with same id and
> > 3) facet on tags in core1
> >
> > so my /select is to run on core0 and facet on tag field of core1
> >
> > thank you Alessandro
> >
> >
> > On Thu, Jun 4, 2015 at 9:28 AM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> > > Lets try to make clear some point :
> > >
> > > Index TO : is the one you are using to call the select request handler
> > > Index From : Tags
> > > Is titleNormalized present in the "Tags" index ? Because there is where
> > the
> > > query will run.
> > >
> > > The documents in tags satisfying the query will be joined with the
> index
> > TO
> > > .
> > > The resulting documents can be filtered and faceted.
> > > I did use this approach a lot of times.
> > > And I can tell you it is working in this way.
> > > Maybe you misunderstood the Join feature, or I misunderstood your
> > > requirement.
> > >
> > > Cheers
> > >
> > > 2015-06-04 13:27 GMT+01:00 Robust Links <pey...@robustlinks.com>:
> > >
> > > > try it for yourself and see if it works Alessandro. Not only cant i
> get
> > > > facets but i even get field errors when i run such join queries
> > > >
> > > > select?fl=title&q={!join from=id to=id
> > fromIndex=Tags}titleNormalized:pdf
> > > >
> > > > <lst name="error">
> > > > <str name="msg">undefined field titleNormalized</str>
> > > > <int name="code">400</int>
> > > > </lst>
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Jun 4, 2015 at 5:19 AM, Alessandro Benedetti <
> > > > benedetti.ale...@gmail.com> wrote:
> > > >
> > > > > Hi Rob,
> > > > > Reading your use case I can not understand why the Query Time join
> is
> > > > not a
> > > > > fit for you !
> > > > > The documents returned by the Query Time Join will be from core1,
> so
> > > > > faceting and filter querying that core, would definitely be
> possible
> > !
> > > > > I can not see your problem honestly !
> > > > >
> > > > > Cheers
> > > > >
> > > > > 2015-06-04 1:47 GMT+01:00 Robust Links <pey...@robustlinks.com>:
> > > > >
> > > > > > that doesnt work either, and even if it did, joining is not going
> > to
> > > > be a
> > > > > > solution since i cant query 1 core and facet on the result of the
> > > > other.
> > > > > To
> > > > > > sum up, my problem is
> > > > > >
> > > > > > core0
> > > > > > --------
> > > > > > field:id
> > > > > > field: text
> > > > > >
> > > > > > core1
> > > > > > --------
> > > > > > field:id
> > > > > > field tag
> > > > > >
> > > > > >
> > > > > > I want to
> > > > > >
> > > > > > 1) query text field of core0,
> > > > > > 2) use the {id} of matches (which can be >>10K) to retrieve the
> > docs
> > > in
> > > > > > core 1 with same id and
> > > > > > 3) facet on tags in core1
> > > > > >
> > > > > > Is this possible without denormalizing (which is not an option)?
> > > > > >
> > > > > > thank you
> > > > > >
> > > > > > On Wed, Jun 3, 2015 at 4:24 PM, Jack Krupansky <
> > > > jack.krupan...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Specify the join query parser for the main query. See:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser
> > > > > > >
> > > > > > >
> > > > > > > -- Jack Krupansky
> > > > > > >
> > > > > > > On Wed, Jun 3, 2015 at 3:32 PM, Robust Links <
> > > pey...@robustlinks.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Erick
> > > > > > > >
> > > > > > > > they are on the same JVM. I had already tried the core join
> > > > strategy
> > > > > > but
> > > > > > > > that doesnt solve the faceting problem... i.e if i have 2
> > cores,
> > > > > core0
> > > > > > > and
> > > > > > > > core1, and I run this query on core0
> > > > > > > >
> > > > > > > > /select?&q=<QUERY>fq={!join from=id1 to=id2
> > > > > > > > fromIndex=core1}&facet=true&facet.field=tag
> > > > > > > >
> > > > > > > > has 2 problems
> > > > > > > > 1) i need to specify the docIDs with the fq (so back to the
> > same
> > > > > > > > fq={!terms} problem), and
> > > > > > > > 2) faceting doesnt work
> > > > > > > >
> > > > > > > >
> > > > > > > > Flattening the data is not possible due to security reasons.
> > > > > > > >
> > > > > > > > Am I using join correctly?
> > > > > > > >
> > > > > > > > thank you Erick
> > > > > > > >
> > > > > > > > Peyman
> > > > > > > >
> > > > > > > > On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson <
> > > > > > erickerick...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Are these indexes on different machines? Because if they're
> > in
> > > > the
> > > > > > > > > same JVM, you might be able to use cross-core joins. Be
> > aware,
> > > > > > though,
> > > > > > > > > that joining on high-cardinality fields (which, by
> > definition,
> > > > > docID
> > > > > > > > > probably is) is where pseudo joins perform worst.
> > > > > > > > >
> > > > > > > > > Have you considered flattening the data and including
> > whatever
> > > > > > > > > information you have in your "from" index in your main
> index?
> > > > > Because
> > > > > > > > > < 100ms response is probably not going to be tough if you
> > have
> > > to
> > > > > > have
> > > > > > > > > two indexes/cores.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Erick
> > > > > > > > >
> > > > > > > > > On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein <
> > > > > joels...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > You may have to do something custom to meet your needs.
> > > > > > > > > >
> > > > > > > > > > 10,000 DocID's is not huge but you're latency requirement
> > are
> > > > > > pretty
> > > > > > > > low.
> > > > > > > > > >
> > > > > > > > > > Are your DocID's by any chance integers? This can make
> > custom
> > > > > > > > PostFilters
> > > > > > > > > > run much faster.
> > > > > > > > > >
> > > > > > > > > > You should also be aware of the Streaming API in Solr 5.1
> > > which
> > > > > > will
> > > > > > > > give
> > > > > > > > > > you fast Map/Reduce approaches (
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
> > > > > > > > > ).
> > > > > > > > > >
> > > > > > > > > > Joel Bernstein
> > > > > > > > > > http://joelsolr.blogspot.com/
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 3, 2015 at 1:46 PM, Robust Links <
> > > > > > pey...@robustlinks.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Hey Joel
> > > > > > > > > >>
> > > > > > > > > >> see below
> > > > > > > > > >>
> > > > > > > > > >> On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein <
> > > > > > joels...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >> > A few questions for you:
> > > > > > > > > >> >
> > > > > > > > > >> > How large can the list of filtering ID's be?
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >> >> 10k
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> >
> > > > > > > > > >> > What's your expectation on latency?
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >> 10> latency <100
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> >
> > > > > > > > > >> > What version of Solr are you using?
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >> 5.0.0
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> >
> > > > > > > > > >> > SolrCloud or not?
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >> not
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> >
> > > > > > > > > >> > Joel Bernstein
> > > > > > > > > >> > http://joelsolr.blogspot.com/
> > > > > > > > > >> >
> > > > > > > > > >> > On Wed, Jun 3, 2015 at 1:23 PM, Robust Links <
> > > > > > > > pey...@robustlinks.com>
> > > > > > > > > >> > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > Hi
> > > > > > > > > >> > >
> > > > > > > > > >> > > I have a set of document IDs from one core and i
> want
> > to
> > > > > query
> > > > > > > > > another
> > > > > > > > > >> > core
> > > > > > > > > >> > > using the ids retrieved from the first core...the
> > > > constraint
> > > > > > is
> > > > > > > > that
> > > > > > > > > >> the
> > > > > > > > > >> > > size of doc ID set can be very large. I want to:
> > > > > > > > > >> > >
> > > > > > > > > >> > > 1) retrieve these docs from the 2nd index
> > > > > > > > > >> > > 2) facet on the results
> > > > > > > > > >> > >
> > > > > > > > > >> > > I can think of 3 solutions:
> > > > > > > > > >> > >
> > > > > > > > > >> > > 1) boolean query
> > > > > > > > > >> > > 2) terms fq
> > > > > > > > > >> > > 3) use a DB rather than Solr
> > > > > > > > > >> > >
> > > > > > > > > >> > > I am trying to keep latencies down so prefer to not
> > use
> > > > (3).
> > > > > > The
> > > > > > > > > >> problem
> > > > > > > > > >> > > with (1) is maxBooleanclauses is hardwired and I am
> > not
> > > > sure
> > > > > > > when
> > > > > > > > I
> > > > > > > > > >> will
> > > > > > > > > >> > > hit the exception. Option (2) seems to also hit
> > limits..
> > > > so
> > > > > > if I
> > > > > > > > do
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > select?fl=*&q=*:*&facet=true&facet.field=title&fq={!terms
> > > > > > > > > >> > > f=id}<LONG_LIST_OF_IDS>
> > > > > > > > > >> > >
> > > > > > > > > >> > > solr just goes blank. I have tried adding cost=200
> to
> > > try
> > > > to
> > > > > > run
> > > > > > > > the
> > > > > > > > > >> > query
> > > > > > > > > >> > > first fq={!terms f=id cost=200} but still no good.
> > > Paging
> > > > on
> > > > > > doc
> > > > > > > > IDs
> > > > > > > > > >> > could
> > > > > > > > > >> > > be a solution but the problem then is that the
> > faceting
> > > > > > results
> > > > > > > > > >> > correspond
> > > > > > > > > >> > > to the paged IDs and not the global set.
> > > > > > > > > >> > >
> > > > > > > > > >> > > My filter cache spec is as follows
> > > > > > > > > >> > >
> > > > > > > > > >> > >   <filterCache class="solr.FastLRUCache"
> > > > > > > > > >> > >                  size="1000000"
> > > > > > > > > >> > >                  initialSize="1000000"
> > > > > > > > > >> > >                  autowarmCount="100000"/>
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > > What would be the best way for me to solve this
> > problem?
> > > > > > > > > >> > >
> > > > > > > > > >> > > thank you
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > --------------------------
> > > > >
> > > > > Benedetti Alessandro
> > > > > Visiting card : http://about.me/alessandro_benedetti
> > > > >
> > > > > "Tyger, tyger burning bright
> > > > > In the forests of the night,
> > > > > What immortal hand or eye
> > > > > Could frame thy fearful symmetry?"
> > > > >
> > > > > William Blake - Songs of Experience -1794 England
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Re: retrieving large number of docs

Reply via email to