Well, every day we update a lot of documents (usually several millions)
so the DIH is a good fit.

Calling the update chain would make sense there: after all a data import
is just a batch update. Otherwise, the same operations would have to be
made upfront, possibly in another environment and/or language. That's
probably what I'm gonna do anyway.

Thanks for your help!
John


On 08/10/15 13:39, Upayavira wrote:
> You can either specify the update chain via an update.chain request
> parameter, or you can configure a new request parameter with its own URL
> and separate update.chain value. 
>
> I have no idea how you would then reference that in the DIH - I've never
> really used it.
>
> Upayavira
>
> On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote:
>> After some further investigation, for those interested: the
>> SignatureUpdateProcessorFactory fields were somehow mis-configured (I
>> guess copied over from another collection). The initial import had been
>> made using a data import handler: I suppose the update chain isn't
>> called in this process and no signature field is created - am I right?.
>>
>> The first time a document was updated, a signature field with value
>> "0000000000000000" was added. The next time, the same signature was
>> generated for the new udpate, which triggered the deletion of all
>> documents with the same signature (i.e. the first one) as overwriteDupes
>> was set to true. Correct behavior but quite tricky...
>>
>> So my conclusion here (please correct me if I'm wrong) is of course to
>> fix the signature configuration problem, but also to manage calling the
>> update chain (or maybe a simplified one, e.g. by skipping logging) in
>> the data import handler. Is there an easy way to do this? Conceptually,
>> shouldn't the update chain be callable from the data import process -
>> maybe it is?
>>
>> John
>>
>>
>> On 08/10/15 09:43, Upayavira wrote:
>>> Yay!
>>>
>>> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
>>>> Yes indeed, the update chain had been activated... I commented it out
>>>> again and the problem vanished.
>>>>
>>>> Good job, thanks Erick and Upayavira!
>>>> John
>>>>
>>>>
>>>> On 08/10/15 08:58, Upayavira wrote:
>>>>> Look for the DedupUpdateProcessor in an update chain.
>>>>>
>>>>> that is there, but commented out IIRC in the techproducts sample
>>>>> configs.
>>>>>
>>>>> Perhaps you uncommented it to use your own update processors, but didn't
>>>>> remove that component?
>>>>>
>>>>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
>>>>>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
>>>>>> INFO level, the update request just gets mentioned. No exception. I
>>>>>> reran it with the DEBUG level, but most of the log was related to jetty.
>>>>>> Here's a line I noticed though:
>>>>>>
>>>>>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
>>>>>> {wt=json&commit=true&update.chain=dedupe}
>>>>>>
>>>>>> The update.chain parameter wasn't part of the original request, and
>>>>>> "dedupe" looks suspicious to me. Perhaps should I investigate further
>>>>>> there?
>>>>>>
>>>>>> Thanks,
>>>>>> John.
>>>>>>
>>>>>>
>>>>>> On 08/10/15 08:25, John Smith wrote:
>>>>>>> The ids are all different: they're unique numbers followed by a couple
>>>>>>> of keywords. I've made a test with a small collection of 10 documents to
>>>>>>> make sure I can manage them manually: all ids are confirmed as 
>>>>>>> different.
>>>>>>>
>>>>>>> I also dumped the exact command, here's one example:
>>>>>>>
>>>>>>> <add><doc><field name="Id">101084385_Sebago_ sebago shoes</field><field
>>>>>>> name="Clicks" update="set">1</field><field name="Boost"
>>>>>>> update="set">1.8701925463775</field></doc></add>
>>>>>>>
>>>>>>> It's sent as the body of a POST request to
>>>>>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true, with a
>>>>>>> Content-Type: text/xml header. I still noted the consistent loss of
>>>>>>> another document with the update above.
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On 08/10/15 00:38, Upayavira wrote:
>>>>>>>> What ID are you using? Are you possibly using the same ID field for
>>>>>>>> both, so the second document you visit causes the first to be
>>>>>>>> overwritten?
>>>>>>>>
>>>>>>>> Upayavira
>>>>>>>>
>>>>>>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
>>>>>>>>> This certainly should not be happening. I'd
>>>>>>>>> take a careful look at what you actually send.
>>>>>>>>> My _guess_ is that you're not sending the update
>>>>>>>>> command you think you are....
>>>>>>>>>
>>>>>>>>> As a test you could just curl (or use post.jar) to
>>>>>>>>> send these types of commands up individually.
>>>>>>>>>
>>>>>>>>> Perhaps looking at the solr log would help too...
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Erick
>>>>>>>>>
>>>>>>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <solr-u...@remailme.net>
>>>>>>>>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I'm bumping on the following problem with update XML messages. The 
>>>>>>>>>> idea
>>>>>>>>>> is to record the number of clicks for a document: each time, a 
>>>>>>>>>> message
>>>>>>>>>> is sent to .../update such as this one:
>>>>>>>>>>
>>>>>>>>>> <add>
>>>>>>>>>> <doc>
>>>>>>>>>> <field name="Id">abc</field>
>>>>>>>>>> <field name="Clicks" update="set">1</field>
>>>>>>>>>> <field name="Boost" update="set">1.05</field>
>>>>>>>>>> </doc>
>>>>>>>>>> </add>
>>>>>>>>>>
>>>>>>>>>> (Clicks is an int field; Boost is a float field, it's updated to 
>>>>>>>>>> reflect
>>>>>>>>>> the change in popularity using a formula based on the number of 
>>>>>>>>>> clicks).
>>>>>>>>>>
>>>>>>>>>> At the moment in the dev environment, changes are committed 
>>>>>>>>>> immediately.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> When a document is updated, the changes are indeed reflected in the
>>>>>>>>>> search results. If I click on the same document again, all goes well.
>>>>>>>>>> But  when I click on an other document, the latter gets updated as
>>>>>>>>>> expected but the former is plainly deleted. It can no longer be found
>>>>>>>>>> and the admin core Overview page counts 1 document less. If I click 
>>>>>>>>>> on a
>>>>>>>>>> 3rd document, so goes the 2nd one.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The schema is the default one amended to remove unneeded fields and 
>>>>>>>>>> add
>>>>>>>>>> new ones, nothing fancy. All fields are stored="true" and there's no
>>>>>>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone mode, 
>>>>>>>>>> with
>>>>>>>>>> the same outcome. It looks like a bug to me but I might have 
>>>>>>>>>> overlooked
>>>>>>>>>> something? This is my first attempt at atomic updates.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> John.
>>>>>>>>>>

Reply via email to