Not related to the deletion problem, only as a curiosity for your use case :

<field name="Clicks" update="set">1</field>

Have i misunderstood your use case, or you should use :

inc

Increments a numeric value by a specific amount.

Must be specified as a single numeric value.

Basically overtime you click, you always set the value for that field to
"1" .
So a document with 1 click will be considered equal to one with 1000 clicks.
My 2 cents

Cheers

On 8 October 2015 at 14:10, John Smith <solr-u...@remailme.net> wrote:

> Well, every day we update a lot of documents (usually several millions)
> so the DIH is a good fit.
>
> Calling the update chain would make sense there: after all a data import
> is just a batch update. Otherwise, the same operations would have to be
> made upfront, possibly in another environment and/or language. That's
> probably what I'm gonna do anyway.
>
> Thanks for your help!
> John
>
>
> On 08/10/15 13:39, Upayavira wrote:
> > You can either specify the update chain via an update.chain request
> > parameter, or you can configure a new request parameter with its own URL
> > and separate update.chain value.
> >
> > I have no idea how you would then reference that in the DIH - I've never
> > really used it.
> >
> > Upayavira
> >
> > On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote:
> >> After some further investigation, for those interested: the
> >> SignatureUpdateProcessorFactory fields were somehow mis-configured (I
> >> guess copied over from another collection). The initial import had been
> >> made using a data import handler: I suppose the update chain isn't
> >> called in this process and no signature field is created - am I right?.
> >>
> >> The first time a document was updated, a signature field with value
> >> "0000000000000000" was added. The next time, the same signature was
> >> generated for the new udpate, which triggered the deletion of all
> >> documents with the same signature (i.e. the first one) as overwriteDupes
> >> was set to true. Correct behavior but quite tricky...
> >>
> >> So my conclusion here (please correct me if I'm wrong) is of course to
> >> fix the signature configuration problem, but also to manage calling the
> >> update chain (or maybe a simplified one, e.g. by skipping logging) in
> >> the data import handler. Is there an easy way to do this? Conceptually,
> >> shouldn't the update chain be callable from the data import process -
> >> maybe it is?
> >>
> >> John
> >>
> >>
> >> On 08/10/15 09:43, Upayavira wrote:
> >>> Yay!
> >>>
> >>> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
> >>>> Yes indeed, the update chain had been activated... I commented it out
> >>>> again and the problem vanished.
> >>>>
> >>>> Good job, thanks Erick and Upayavira!
> >>>> John
> >>>>
> >>>>
> >>>> On 08/10/15 08:58, Upayavira wrote:
> >>>>> Look for the DedupUpdateProcessor in an update chain.
> >>>>>
> >>>>> that is there, but commented out IIRC in the techproducts sample
> >>>>> configs.
> >>>>>
> >>>>> Perhaps you uncommented it to use your own update processors, but
> didn't
> >>>>> remove that component?
> >>>>>
> >>>>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
> >>>>>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
> >>>>>> INFO level, the update request just gets mentioned. No exception. I
> >>>>>> reran it with the DEBUG level, but most of the log was related to
> jetty.
> >>>>>> Here's a line I noticed though:
> >>>>>>
> >>>>>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
> >>>>>> {wt=json&commit=true&update.chain=dedupe}
> >>>>>>
> >>>>>> The update.chain parameter wasn't part of the original request, and
> >>>>>> "dedupe" looks suspicious to me. Perhaps should I investigate
> further
> >>>>>> there?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> John.
> >>>>>>
> >>>>>>
> >>>>>> On 08/10/15 08:25, John Smith wrote:
> >>>>>>> The ids are all different: they're unique numbers followed by a
> couple
> >>>>>>> of keywords. I've made a test with a small collection of 10
> documents to
> >>>>>>> make sure I can manage them manually: all ids are confirmed as
> different.
> >>>>>>>
> >>>>>>> I also dumped the exact command, here's one example:
> >>>>>>>
> >>>>>>> <add><doc><field name="Id">101084385_Sebago_ sebago
> shoes</field><field
> >>>>>>> name="Clicks" update="set">1</field><field name="Boost"
> >>>>>>> update="set">1.8701925463775</field></doc></add>
> >>>>>>>
> >>>>>>> It's sent as the body of a POST request to
> >>>>>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true,
> with a
> >>>>>>> Content-Type: text/xml header. I still noted the consistent loss of
> >>>>>>> another document with the update above.
> >>>>>>>
> >>>>>>> John
> >>>>>>>
> >>>>>>>
> >>>>>>> On 08/10/15 00:38, Upayavira wrote:
> >>>>>>>> What ID are you using? Are you possibly using the same ID field
> for
> >>>>>>>> both, so the second document you visit causes the first to be
> >>>>>>>> overwritten?
> >>>>>>>>
> >>>>>>>> Upayavira
> >>>>>>>>
> >>>>>>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
> >>>>>>>>> This certainly should not be happening. I'd
> >>>>>>>>> take a careful look at what you actually send.
> >>>>>>>>> My _guess_ is that you're not sending the update
> >>>>>>>>> command you think you are....
> >>>>>>>>>
> >>>>>>>>> As a test you could just curl (or use post.jar) to
> >>>>>>>>> send these types of commands up individually.
> >>>>>>>>>
> >>>>>>>>> Perhaps looking at the solr log would help too...
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Erick
> >>>>>>>>>
> >>>>>>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <
> solr-u...@remailme.net>
> >>>>>>>>> wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I'm bumping on the following problem with update XML messages.
> The idea
> >>>>>>>>>> is to record the number of clicks for a document: each time, a
> message
> >>>>>>>>>> is sent to .../update such as this one:
> >>>>>>>>>>
> >>>>>>>>>> <add>
> >>>>>>>>>> <doc>
> >>>>>>>>>> <field name="Id">abc</field>
> >>>>>>>>>> <field name="Clicks" update="set">1</field>
> >>>>>>>>>> <field name="Boost" update="set">1.05</field>
> >>>>>>>>>> </doc>
> >>>>>>>>>> </add>
> >>>>>>>>>>
> >>>>>>>>>> (Clicks is an int field; Boost is a float field, it's updated
> to reflect
> >>>>>>>>>> the change in popularity using a formula based on the number of
> clicks).
> >>>>>>>>>>
> >>>>>>>>>> At the moment in the dev environment, changes are committed
> immediately.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> When a document is updated, the changes are indeed reflected in
> the
> >>>>>>>>>> search results. If I click on the same document again, all goes
> well.
> >>>>>>>>>> But  when I click on an other document, the latter gets updated
> as
> >>>>>>>>>> expected but the former is plainly deleted. It can no longer be
> found
> >>>>>>>>>> and the admin core Overview page counts 1 document less. If I
> click on a
> >>>>>>>>>> 3rd document, so goes the 2nd one.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> The schema is the default one amended to remove unneeded fields
> and add
> >>>>>>>>>> new ones, nothing fancy. All fields are stored="true" and
> there's no
> >>>>>>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone
> mode, with
> >>>>>>>>>> the same outcome. It looks like a bug to me but I might have
> overlooked
> >>>>>>>>>> something? This is my first attempt at atomic updates.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> John.
> >>>>>>>>>>
>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to