Not related to the deletion problem, only as a curiosity for your use case :
<field name="Clicks" update="set">1</field> Have i misunderstood your use case, or you should use : inc Increments a numeric value by a specific amount. Must be specified as a single numeric value. Basically overtime you click, you always set the value for that field to "1" . So a document with 1 click will be considered equal to one with 1000 clicks. My 2 cents Cheers On 8 October 2015 at 14:10, John Smith <solr-u...@remailme.net> wrote: > Well, every day we update a lot of documents (usually several millions) > so the DIH is a good fit. > > Calling the update chain would make sense there: after all a data import > is just a batch update. Otherwise, the same operations would have to be > made upfront, possibly in another environment and/or language. That's > probably what I'm gonna do anyway. > > Thanks for your help! > John > > > On 08/10/15 13:39, Upayavira wrote: > > You can either specify the update chain via an update.chain request > > parameter, or you can configure a new request parameter with its own URL > > and separate update.chain value. > > > > I have no idea how you would then reference that in the DIH - I've never > > really used it. > > > > Upayavira > > > > On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote: > >> After some further investigation, for those interested: the > >> SignatureUpdateProcessorFactory fields were somehow mis-configured (I > >> guess copied over from another collection). The initial import had been > >> made using a data import handler: I suppose the update chain isn't > >> called in this process and no signature field is created - am I right?. > >> > >> The first time a document was updated, a signature field with value > >> "0000000000000000" was added. The next time, the same signature was > >> generated for the new udpate, which triggered the deletion of all > >> documents with the same signature (i.e. the first one) as overwriteDupes > >> was set to true. Correct behavior but quite tricky... > >> > >> So my conclusion here (please correct me if I'm wrong) is of course to > >> fix the signature configuration problem, but also to manage calling the > >> update chain (or maybe a simplified one, e.g. by skipping logging) in > >> the data import handler. Is there an easy way to do this? Conceptually, > >> shouldn't the update chain be callable from the data import process - > >> maybe it is? > >> > >> John > >> > >> > >> On 08/10/15 09:43, Upayavira wrote: > >>> Yay! > >>> > >>> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote: > >>>> Yes indeed, the update chain had been activated... I commented it out > >>>> again and the problem vanished. > >>>> > >>>> Good job, thanks Erick and Upayavira! > >>>> John > >>>> > >>>> > >>>> On 08/10/15 08:58, Upayavira wrote: > >>>>> Look for the DedupUpdateProcessor in an update chain. > >>>>> > >>>>> that is there, but commented out IIRC in the techproducts sample > >>>>> configs. > >>>>> > >>>>> Perhaps you uncommented it to use your own update processors, but > didn't > >>>>> remove that component? > >>>>> > >>>>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote: > >>>>>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in > >>>>>> INFO level, the update request just gets mentioned. No exception. I > >>>>>> reran it with the DEBUG level, but most of the log was related to > jetty. > >>>>>> Here's a line I noticed though: > >>>>>> > >>>>>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest: > >>>>>> {wt=json&commit=true&update.chain=dedupe} > >>>>>> > >>>>>> The update.chain parameter wasn't part of the original request, and > >>>>>> "dedupe" looks suspicious to me. Perhaps should I investigate > further > >>>>>> there? > >>>>>> > >>>>>> Thanks, > >>>>>> John. > >>>>>> > >>>>>> > >>>>>> On 08/10/15 08:25, John Smith wrote: > >>>>>>> The ids are all different: they're unique numbers followed by a > couple > >>>>>>> of keywords. I've made a test with a small collection of 10 > documents to > >>>>>>> make sure I can manage them manually: all ids are confirmed as > different. > >>>>>>> > >>>>>>> I also dumped the exact command, here's one example: > >>>>>>> > >>>>>>> <add><doc><field name="Id">101084385_Sebago_ sebago > shoes</field><field > >>>>>>> name="Clicks" update="set">1</field><field name="Boost" > >>>>>>> update="set">1.8701925463775</field></doc></add> > >>>>>>> > >>>>>>> It's sent as the body of a POST request to > >>>>>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true, > with a > >>>>>>> Content-Type: text/xml header. I still noted the consistent loss of > >>>>>>> another document with the update above. > >>>>>>> > >>>>>>> John > >>>>>>> > >>>>>>> > >>>>>>> On 08/10/15 00:38, Upayavira wrote: > >>>>>>>> What ID are you using? Are you possibly using the same ID field > for > >>>>>>>> both, so the second document you visit causes the first to be > >>>>>>>> overwritten? > >>>>>>>> > >>>>>>>> Upayavira > >>>>>>>> > >>>>>>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote: > >>>>>>>>> This certainly should not be happening. I'd > >>>>>>>>> take a careful look at what you actually send. > >>>>>>>>> My _guess_ is that you're not sending the update > >>>>>>>>> command you think you are.... > >>>>>>>>> > >>>>>>>>> As a test you could just curl (or use post.jar) to > >>>>>>>>> send these types of commands up individually. > >>>>>>>>> > >>>>>>>>> Perhaps looking at the solr log would help too... > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Erick > >>>>>>>>> > >>>>>>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith < > solr-u...@remailme.net> > >>>>>>>>> wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> I'm bumping on the following problem with update XML messages. > The idea > >>>>>>>>>> is to record the number of clicks for a document: each time, a > message > >>>>>>>>>> is sent to .../update such as this one: > >>>>>>>>>> > >>>>>>>>>> <add> > >>>>>>>>>> <doc> > >>>>>>>>>> <field name="Id">abc</field> > >>>>>>>>>> <field name="Clicks" update="set">1</field> > >>>>>>>>>> <field name="Boost" update="set">1.05</field> > >>>>>>>>>> </doc> > >>>>>>>>>> </add> > >>>>>>>>>> > >>>>>>>>>> (Clicks is an int field; Boost is a float field, it's updated > to reflect > >>>>>>>>>> the change in popularity using a formula based on the number of > clicks). > >>>>>>>>>> > >>>>>>>>>> At the moment in the dev environment, changes are committed > immediately. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> When a document is updated, the changes are indeed reflected in > the > >>>>>>>>>> search results. If I click on the same document again, all goes > well. > >>>>>>>>>> But when I click on an other document, the latter gets updated > as > >>>>>>>>>> expected but the former is plainly deleted. It can no longer be > found > >>>>>>>>>> and the admin core Overview page counts 1 document less. If I > click on a > >>>>>>>>>> 3rd document, so goes the 2nd one. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> The schema is the default one amended to remove unneeded fields > and add > >>>>>>>>>> new ones, nothing fancy. All fields are stored="true" and > there's no > >>>>>>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone > mode, with > >>>>>>>>>> the same outcome. It looks like a bug to me but I might have > overlooked > >>>>>>>>>> something? This is my first attempt at atomic updates. > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> John. > >>>>>>>>>> > > -- -------------------------- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England