Hi Allessandro, In the example I set the value to 1, but it's actually incremented in the code, so with time it should go up. You're right though, I could use an inc update instead.
John On 08/10/15 16:45, Alessandro Benedetti wrote: > Not related to the deletion problem, only as a curiosity for your use case : > > <field name="Clicks" update="set">1</field> > > Have i misunderstood your use case, or you should use : > > inc > > Increments a numeric value by a specific amount. > > Must be specified as a single numeric value. > > Basically overtime you click, you always set the value for that field to > "1" . > So a document with 1 click will be considered equal to one with 1000 clicks. > My 2 cents > > Cheers > > On 8 October 2015 at 14:10, John Smith <solr-u...@remailme.net> wrote: > >> Well, every day we update a lot of documents (usually several millions) >> so the DIH is a good fit. >> >> Calling the update chain would make sense there: after all a data import >> is just a batch update. Otherwise, the same operations would have to be >> made upfront, possibly in another environment and/or language. That's >> probably what I'm gonna do anyway. >> >> Thanks for your help! >> John >> >> >> On 08/10/15 13:39, Upayavira wrote: >>> You can either specify the update chain via an update.chain request >>> parameter, or you can configure a new request parameter with its own URL >>> and separate update.chain value. >>> >>> I have no idea how you would then reference that in the DIH - I've never >>> really used it. >>> >>> Upayavira >>> >>> On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote: >>>> After some further investigation, for those interested: the >>>> SignatureUpdateProcessorFactory fields were somehow mis-configured (I >>>> guess copied over from another collection). The initial import had been >>>> made using a data import handler: I suppose the update chain isn't >>>> called in this process and no signature field is created - am I right?. >>>> >>>> The first time a document was updated, a signature field with value >>>> "0000000000000000" was added. The next time, the same signature was >>>> generated for the new udpate, which triggered the deletion of all >>>> documents with the same signature (i.e. the first one) as overwriteDupes >>>> was set to true. Correct behavior but quite tricky... >>>> >>>> So my conclusion here (please correct me if I'm wrong) is of course to >>>> fix the signature configuration problem, but also to manage calling the >>>> update chain (or maybe a simplified one, e.g. by skipping logging) in >>>> the data import handler. Is there an easy way to do this? Conceptually, >>>> shouldn't the update chain be callable from the data import process - >>>> maybe it is? >>>> >>>> John >>>> >>>> >>>> On 08/10/15 09:43, Upayavira wrote: >>>>> Yay! >>>>> >>>>> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote: >>>>>> Yes indeed, the update chain had been activated... I commented it out >>>>>> again and the problem vanished. >>>>>> >>>>>> Good job, thanks Erick and Upayavira! >>>>>> John >>>>>> >>>>>> >>>>>> On 08/10/15 08:58, Upayavira wrote: >>>>>>> Look for the DedupUpdateProcessor in an update chain. >>>>>>> >>>>>>> that is there, but commented out IIRC in the techproducts sample >>>>>>> configs. >>>>>>> >>>>>>> Perhaps you uncommented it to use your own update processors, but >> didn't >>>>>>> remove that component? >>>>>>> >>>>>>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote: >>>>>>>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in >>>>>>>> INFO level, the update request just gets mentioned. No exception. I >>>>>>>> reran it with the DEBUG level, but most of the log was related to >> jetty. >>>>>>>> Here's a line I noticed though: >>>>>>>> >>>>>>>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest: >>>>>>>> {wt=json&commit=true&update.chain=dedupe} >>>>>>>> >>>>>>>> The update.chain parameter wasn't part of the original request, and >>>>>>>> "dedupe" looks suspicious to me. Perhaps should I investigate >> further >>>>>>>> there? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> John. >>>>>>>> >>>>>>>> >>>>>>>> On 08/10/15 08:25, John Smith wrote: >>>>>>>>> The ids are all different: they're unique numbers followed by a >> couple >>>>>>>>> of keywords. I've made a test with a small collection of 10 >> documents to >>>>>>>>> make sure I can manage them manually: all ids are confirmed as >> different. >>>>>>>>> I also dumped the exact command, here's one example: >>>>>>>>> >>>>>>>>> <add><doc><field name="Id">101084385_Sebago_ sebago >> shoes</field><field >>>>>>>>> name="Clicks" update="set">1</field><field name="Boost" >>>>>>>>> update="set">1.8701925463775</field></doc></add> >>>>>>>>> >>>>>>>>> It's sent as the body of a POST request to >>>>>>>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true, >> with a >>>>>>>>> Content-Type: text/xml header. I still noted the consistent loss of >>>>>>>>> another document with the update above. >>>>>>>>> >>>>>>>>> John >>>>>>>>> >>>>>>>>> >>>>>>>>> On 08/10/15 00:38, Upayavira wrote: >>>>>>>>>> What ID are you using? Are you possibly using the same ID field >> for >>>>>>>>>> both, so the second document you visit causes the first to be >>>>>>>>>> overwritten? >>>>>>>>>> >>>>>>>>>> Upayavira >>>>>>>>>> >>>>>>>>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote: >>>>>>>>>>> This certainly should not be happening. I'd >>>>>>>>>>> take a careful look at what you actually send. >>>>>>>>>>> My _guess_ is that you're not sending the update >>>>>>>>>>> command you think you are.... >>>>>>>>>>> >>>>>>>>>>> As a test you could just curl (or use post.jar) to >>>>>>>>>>> send these types of commands up individually. >>>>>>>>>>> >>>>>>>>>>> Perhaps looking at the solr log would help too... >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Erick >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith < >> solr-u...@remailme.net> >>>>>>>>>>> wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I'm bumping on the following problem with update XML messages. >> The idea >>>>>>>>>>>> is to record the number of clicks for a document: each time, a >> message >>>>>>>>>>>> is sent to .../update such as this one: >>>>>>>>>>>> >>>>>>>>>>>> <add> >>>>>>>>>>>> <doc> >>>>>>>>>>>> <field name="Id">abc</field> >>>>>>>>>>>> <field name="Clicks" update="set">1</field> >>>>>>>>>>>> <field name="Boost" update="set">1.05</field> >>>>>>>>>>>> </doc> >>>>>>>>>>>> </add> >>>>>>>>>>>> >>>>>>>>>>>> (Clicks is an int field; Boost is a float field, it's updated >> to reflect >>>>>>>>>>>> the change in popularity using a formula based on the number of >> clicks). >>>>>>>>>>>> At the moment in the dev environment, changes are committed >> immediately. >>>>>>>>>>>> >>>>>>>>>>>> When a document is updated, the changes are indeed reflected in >> the >>>>>>>>>>>> search results. If I click on the same document again, all goes >> well. >>>>>>>>>>>> But when I click on an other document, the latter gets updated >> as >>>>>>>>>>>> expected but the former is plainly deleted. It can no longer be >> found >>>>>>>>>>>> and the admin core Overview page counts 1 document less. If I >> click on a >>>>>>>>>>>> 3rd document, so goes the 2nd one. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> The schema is the default one amended to remove unneeded fields >> and add >>>>>>>>>>>> new ones, nothing fancy. All fields are stored="true" and >> there's no >>>>>>>>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone >> mode, with >>>>>>>>>>>> the same outcome. It looks like a bug to me but I might have >> overlooked >>>>>>>>>>>> something? This is my first attempt at atomic updates. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> John. >>>>>>>>>>>> >> >