Re: getting cached terms inside UpdateRequestProcessor...

Roxana Danger Thu, 22 Oct 2015 09:11:13 -0700

Hi Alexandre,
The DIH is executed correctly and the tokenized representation is obtained
correctly, but the URP chain is not executed with the call:
http://localhost:8983/solr/reed_jobs/update/details?commit=true
Isn't it the correct URL? is there any parameter missing?
Best,
Roxana




On 22 October 2015 at 16:17, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> Well, I guess I imagined three steps:
> 1) Run DIH
> 2) Get the tokenized representation for each document using facets or
> other approaches
> 3) Submit document partial-update request with additional custom
> processing through URP
>
> Your example seems to be skipping step 2, so the URP chain does not
> know which documents to actually work on and is basically an empty
> call.
>
> Again, I suspect knowing the business objectives may bring other
> solutions to the front.
>
> Regards,
>    Alex.
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 22 October 2015 at 10:49, Roxana Danger
> <roxana.dan...@reedonline.co.uk> wrote:
> > Hi Alex,
> >
> > My idea behind this is avoid two calls: first, the importer and after the
> > updater. As there is an update processor chain than can be used after the
> > DIH, I thorough it was possible to get a real-time updater.
> >
> > So, I am getting your advice and dividing the process in different
> steps. I
> > have the following configuration:
> >
> > <updateRequestProcessorChain name="retrieveDetails">
> >       <processor class="MyUpdater1"/>
> >       <processor class="MyUpdater2"/>
> >       <processor class="solr.LogUpdateProcessorFactory" />
> >       <processor class="solr.RunUpdateProcessorFactory" />
> > </updateRequestProcessorChain>
> >
> > <requestHandler name="/update/details" class="solr.UpdateRequestHandler">
> >       <lst name="defaults">
> >          <str name="update.chain">retrieveDetails</str>
> >       </lst>
> >    </requestHandler>
> >
> > <requestHandler name="/dataimport"
> > class="org.apache.solr.handler.dataimport.DataImportHandler">
> >     <lst name="defaults">
> >       <str name="config">db-data-config.xml</str>
> >       <!-- <str name="update.chain">retrieveDetails</str> -->
> >     </lst>
> >   </requestHandler>
> >
> > So, after import (notice it does not contains the updtate.chain). I have
> > try to run the update with the following request:
> > http://localhost:8983/solr/reed_jobs/update/details?commit=true
> > but it returns immediately with status 0 but does not execute the
> update...
> > How should the update be called for reindex/update all the imported docs.
> > with my chain?
> >
> >
> > Best regards,
> > Roxana
> >
> >
> > On 22 October 2015 at 14:14, Alexandre Rafalovitch <arafa...@gmail.com>
> > wrote:
> >
> >> You are doing things out of order. It's DIH, URP, then indexer. Any
> >> attempt to subvert that order for the record being indexed will end in
> >> problems.
> >>
> >> Have you considered doing a dual path? Index, then update. Of course,
> >> your fields all need to be stored for that.
> >>
> >> Also, perhaps you need to rethink the problem on a higher level. If
> >> all you need to do is to extract tokenized content of a field during
> >> search, you can do that in several ways, such as faceting on that
> >> field, or - I believe - using terms end-point.
> >>
> >> Regards,
> >>   Alex.
> >> ----
> >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 22 October 2015 at 06:20, Roxana Danger
> >> <roxana.dan...@reedonline.co.uk> wrote:
> >> > Hello,
> >> >
> >> > I would like to create an updateRequestProcessorChain that should to
> be
> >> > executed after a DB DIH. I am extending UpdateRequestProcessorFactory
> and
> >> > the UpdateRequestProcessor classes. The method processAdd of my
> >> > UpdateRequestProcessor should be able to update the documents with
> the
> >> > indexed terms associated to a field. Notice that these terms should
> have
> >> > been extracted with an analyzer before my updateRequestProcessorChain
> >> > processor begins to execute.
> >> >
> >> > The problem I am getting is that at the point where processAdd is
> >> executed
> >> > the field containing the terms has not been filled. To retrieve the
> >> terms I
> >> > am using the SolrIndexSearcher provided during the request
> >> > (req.getSearcher()). However, it seems that this searcher uses only
> the
> >> > data physically stored and does not consider any of the imported data.
> >> >
> >> > Any idea on how can I access to searcher with all indexed/cached data
> >> when
> >> > the processAdd method is executed?
> >> >
> >> > Thank you very much in advance.
> >>
> >
> >
> >
> > --
> > Roxana Danger | Data Scientist Dragon Court, 27-29 Macklin Street,
> London,
> > WC2B 5LX Tel: 020 7067 4568 [image: reed.co.uk] <http://www.reed.co.uk/>
> The
> > UK's #1 job site. <http://www.reed.co.uk/> [image: Follow us on Twitter]
> > <https://twitter.com/reedcouk>
> > <https://www.linkedin.com/company/reed.co.uk> [image:
> > Like us on Facebook] <https://www.facebook.com/reedcouk/>
> > <https://plus.google.com/u/0/+reedcouk/posts> It's time to Love Mondays
> »
> > <http://www.reed.co.uk/lovemondays>
>



-- 
Roxana Danger | Data Scientist Dragon Court, 27-29 Macklin Street, London,
WC2B 5LX Tel: 020 7067 4568 [image: reed.co.uk] <http://www.reed.co.uk/> The
UK's #1 job site. <http://www.reed.co.uk/> [image: Follow us on Twitter]
<https://twitter.com/reedcouk>
<https://www.linkedin.com/company/reed.co.uk> [image:
Like us on Facebook] <https://www.facebook.com/reedcouk/>
<https://plus.google.com/u/0/+reedcouk/posts> It's time to Love Mondays »
<http://www.reed.co.uk/lovemondays>

Re: getting cached terms inside UpdateRequestProcessor...

Reply via email to