Well, every day we update a lot of documents (usually several millions) so the DIH is a good fit.
Calling the update chain would make sense there: after all a data import is just a batch update. Otherwise, the same operations would have to be made upfront, possibly in another environment and/or language. That's probably what I'm gonna do anyway. Thanks for your help! John On 08/10/15 13:39, Upayavira wrote: > You can either specify the update chain via an update.chain request > parameter, or you can configure a new request parameter with its own URL > and separate update.chain value. > > I have no idea how you would then reference that in the DIH - I've never > really used it. > > Upayavira > > On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote: >> After some further investigation, for those interested: the >> SignatureUpdateProcessorFactory fields were somehow mis-configured (I >> guess copied over from another collection). The initial import had been >> made using a data import handler: I suppose the update chain isn't >> called in this process and no signature field is created - am I right?. >> >> The first time a document was updated, a signature field with value >> "0000000000000000" was added. The next time, the same signature was >> generated for the new udpate, which triggered the deletion of all >> documents with the same signature (i.e. the first one) as overwriteDupes >> was set to true. Correct behavior but quite tricky... >> >> So my conclusion here (please correct me if I'm wrong) is of course to >> fix the signature configuration problem, but also to manage calling the >> update chain (or maybe a simplified one, e.g. by skipping logging) in >> the data import handler. Is there an easy way to do this? Conceptually, >> shouldn't the update chain be callable from the data import process - >> maybe it is? >> >> John >> >> >> On 08/10/15 09:43, Upayavira wrote: >>> Yay! >>> >>> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote: >>>> Yes indeed, the update chain had been activated... I commented it out >>>> again and the problem vanished. >>>> >>>> Good job, thanks Erick and Upayavira! >>>> John >>>> >>>> >>>> On 08/10/15 08:58, Upayavira wrote: >>>>> Look for the DedupUpdateProcessor in an update chain. >>>>> >>>>> that is there, but commented out IIRC in the techproducts sample >>>>> configs. >>>>> >>>>> Perhaps you uncommented it to use your own update processors, but didn't >>>>> remove that component? >>>>> >>>>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote: >>>>>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in >>>>>> INFO level, the update request just gets mentioned. No exception. I >>>>>> reran it with the DEBUG level, but most of the log was related to jetty. >>>>>> Here's a line I noticed though: >>>>>> >>>>>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest: >>>>>> {wt=json&commit=true&update.chain=dedupe} >>>>>> >>>>>> The update.chain parameter wasn't part of the original request, and >>>>>> "dedupe" looks suspicious to me. Perhaps should I investigate further >>>>>> there? >>>>>> >>>>>> Thanks, >>>>>> John. >>>>>> >>>>>> >>>>>> On 08/10/15 08:25, John Smith wrote: >>>>>>> The ids are all different: they're unique numbers followed by a couple >>>>>>> of keywords. I've made a test with a small collection of 10 documents to >>>>>>> make sure I can manage them manually: all ids are confirmed as >>>>>>> different. >>>>>>> >>>>>>> I also dumped the exact command, here's one example: >>>>>>> >>>>>>> <add><doc><field name="Id">101084385_Sebago_ sebago shoes</field><field >>>>>>> name="Clicks" update="set">1</field><field name="Boost" >>>>>>> update="set">1.8701925463775</field></doc></add> >>>>>>> >>>>>>> It's sent as the body of a POST request to >>>>>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true, with a >>>>>>> Content-Type: text/xml header. I still noted the consistent loss of >>>>>>> another document with the update above. >>>>>>> >>>>>>> John >>>>>>> >>>>>>> >>>>>>> On 08/10/15 00:38, Upayavira wrote: >>>>>>>> What ID are you using? Are you possibly using the same ID field for >>>>>>>> both, so the second document you visit causes the first to be >>>>>>>> overwritten? >>>>>>>> >>>>>>>> Upayavira >>>>>>>> >>>>>>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote: >>>>>>>>> This certainly should not be happening. I'd >>>>>>>>> take a careful look at what you actually send. >>>>>>>>> My _guess_ is that you're not sending the update >>>>>>>>> command you think you are.... >>>>>>>>> >>>>>>>>> As a test you could just curl (or use post.jar) to >>>>>>>>> send these types of commands up individually. >>>>>>>>> >>>>>>>>> Perhaps looking at the solr log would help too... >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Erick >>>>>>>>> >>>>>>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <solr-u...@remailme.net> >>>>>>>>> wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I'm bumping on the following problem with update XML messages. The >>>>>>>>>> idea >>>>>>>>>> is to record the number of clicks for a document: each time, a >>>>>>>>>> message >>>>>>>>>> is sent to .../update such as this one: >>>>>>>>>> >>>>>>>>>> <add> >>>>>>>>>> <doc> >>>>>>>>>> <field name="Id">abc</field> >>>>>>>>>> <field name="Clicks" update="set">1</field> >>>>>>>>>> <field name="Boost" update="set">1.05</field> >>>>>>>>>> </doc> >>>>>>>>>> </add> >>>>>>>>>> >>>>>>>>>> (Clicks is an int field; Boost is a float field, it's updated to >>>>>>>>>> reflect >>>>>>>>>> the change in popularity using a formula based on the number of >>>>>>>>>> clicks). >>>>>>>>>> >>>>>>>>>> At the moment in the dev environment, changes are committed >>>>>>>>>> immediately. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> When a document is updated, the changes are indeed reflected in the >>>>>>>>>> search results. If I click on the same document again, all goes well. >>>>>>>>>> But when I click on an other document, the latter gets updated as >>>>>>>>>> expected but the former is plainly deleted. It can no longer be found >>>>>>>>>> and the admin core Overview page counts 1 document less. If I click >>>>>>>>>> on a >>>>>>>>>> 3rd document, so goes the 2nd one. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The schema is the default one amended to remove unneeded fields and >>>>>>>>>> add >>>>>>>>>> new ones, nothing fancy. All fields are stored="true" and there's no >>>>>>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, >>>>>>>>>> with >>>>>>>>>> the same outcome. It looks like a bug to me but I might have >>>>>>>>>> overlooked >>>>>>>>>> something? This is my first attempt at atomic updates. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> John. >>>>>>>>>>