Re: getting cached terms inside UpdateRequestProcessor...

Alexandre Rafalovitch Thu, 22 Oct 2015 08:45:46 -0700

Well, I guess I imagined three steps:
1) Run DIH
2) Get the tokenized representation for each document using facets or
other approaches
3) Submit document partial-update request with additional custom
processing through URP


Your example seems to be skipping step 2, so the URP chain does not
know which documents to actually work on and is basically an empty
call.

Again, I suspect knowing the business objectives may bring other
solutions to the front.

Regards,
   Alex.
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 22 October 2015 at 10:49, Roxana Danger
<roxana.dan...@reedonline.co.uk> wrote:
> Hi Alex,
>
> My idea behind this is avoid two calls: first, the importer and after the
> updater. As there is an update processor chain than can be used after the
> DIH, I thorough it was possible to get a real-time updater.
>
> So, I am getting your advice and dividing the process in different steps. I
> have the following configuration:
>
> <updateRequestProcessorChain name="retrieveDetails">
>       <processor class="MyUpdater1"/>
>       <processor class="MyUpdater2"/>
>       <processor class="solr.LogUpdateProcessorFactory" />
>       <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>
>
> <requestHandler name="/update/details" class="solr.UpdateRequestHandler">
>       <lst name="defaults">
>          <str name="update.chain">retrieveDetails</str>
>       </lst>
>    </requestHandler>
>
> <requestHandler name="/dataimport"
> class="org.apache.solr.handler.dataimport.DataImportHandler">
>     <lst name="defaults">
>       <str name="config">db-data-config.xml</str>
>       <!-- <str name="update.chain">retrieveDetails</str> -->
>     </lst>
>   </requestHandler>
>
> So, after import (notice it does not contains the updtate.chain). I have
> try to run the update with the following request:
> http://localhost:8983/solr/reed_jobs/update/details?commit=true
> but it returns immediately with status 0 but does not execute the update...
> How should the update be called for reindex/update all the imported docs.
> with my chain?
>
>
> Best regards,
> Roxana
>
>
> On 22 October 2015 at 14:14, Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
>
>> You are doing things out of order. It's DIH, URP, then indexer. Any
>> attempt to subvert that order for the record being indexed will end in
>> problems.
>>
>> Have you considered doing a dual path? Index, then update. Of course,
>> your fields all need to be stored for that.
>>
>> Also, perhaps you need to rethink the problem on a higher level. If
>> all you need to do is to extract tokenized content of a field during
>> search, you can do that in several ways, such as faceting on that
>> field, or - I believe - using terms end-point.
>>
>> Regards,
>>   Alex.
>> ----
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 22 October 2015 at 06:20, Roxana Danger
>> <roxana.dan...@reedonline.co.uk> wrote:
>> > Hello,
>> >
>> > I would like to create an updateRequestProcessorChain that should to be
>> > executed after a DB DIH. I am extending UpdateRequestProcessorFactory and
>> > the UpdateRequestProcessor classes. The method processAdd of my
>> > UpdateRequestProcessor should be able to update the documents with  the
>> > indexed terms associated to a field. Notice that these terms should have
>> > been extracted with an analyzer before my updateRequestProcessorChain
>> > processor begins to execute.
>> >
>> > The problem I am getting is that at the point where processAdd is
>> executed
>> > the field containing the terms has not been filled. To retrieve the
>> terms I
>> > am using the SolrIndexSearcher provided during the request
>> > (req.getSearcher()). However, it seems that this searcher uses only the
>> > data physically stored and does not consider any of the imported data.
>> >
>> > Any idea on how can I access to searcher with all indexed/cached data
>> when
>> > the processAdd method is executed?
>> >
>> > Thank you very much in advance.
>>
>
>
>
> --
> Roxana Danger | Data Scientist Dragon Court, 27-29 Macklin Street, London,
> WC2B 5LX Tel: 020 7067 4568 [image: reed.co.uk] <http://www.reed.co.uk/> The
> UK's #1 job site. <http://www.reed.co.uk/> [image: Follow us on Twitter]
> <https://twitter.com/reedcouk>
> <https://www.linkedin.com/company/reed.co.uk> [image:
> Like us on Facebook] <https://www.facebook.com/reedcouk/>
> <https://plus.google.com/u/0/+reedcouk/posts> It's time to Love Mondays »
> <http://www.reed.co.uk/lovemondays>

Re: getting cached terms inside UpdateRequestProcessor...

Reply via email to