Hi Allessandro,

In the example I set the value to 1, but it's actually incremented in
the code, so with time it should go up. You're right though, I could use
an inc update instead.

John


On 08/10/15 16:45, Alessandro Benedetti wrote:
> Not related to the deletion problem, only as a curiosity for your use case :
>
> <field name="Clicks" update="set">1</field>
>
> Have i misunderstood your use case, or you should use :
>
> inc
>
> Increments a numeric value by a specific amount.
>
> Must be specified as a single numeric value.
>
> Basically overtime you click, you always set the value for that field to
> "1" .
> So a document with 1 click will be considered equal to one with 1000 clicks.
> My 2 cents
>
> Cheers
>
> On 8 October 2015 at 14:10, John Smith <solr-u...@remailme.net> wrote:
>
>> Well, every day we update a lot of documents (usually several millions)
>> so the DIH is a good fit.
>>
>> Calling the update chain would make sense there: after all a data import
>> is just a batch update. Otherwise, the same operations would have to be
>> made upfront, possibly in another environment and/or language. That's
>> probably what I'm gonna do anyway.
>>
>> Thanks for your help!
>> John
>>
>>
>> On 08/10/15 13:39, Upayavira wrote:
>>> You can either specify the update chain via an update.chain request
>>> parameter, or you can configure a new request parameter with its own URL
>>> and separate update.chain value.
>>>
>>> I have no idea how you would then reference that in the DIH - I've never
>>> really used it.
>>>
>>> Upayavira
>>>
>>> On Thu, Oct 8, 2015, at 09:25 AM, John Smith wrote:
>>>> After some further investigation, for those interested: the
>>>> SignatureUpdateProcessorFactory fields were somehow mis-configured (I
>>>> guess copied over from another collection). The initial import had been
>>>> made using a data import handler: I suppose the update chain isn't
>>>> called in this process and no signature field is created - am I right?.
>>>>
>>>> The first time a document was updated, a signature field with value
>>>> "0000000000000000" was added. The next time, the same signature was
>>>> generated for the new udpate, which triggered the deletion of all
>>>> documents with the same signature (i.e. the first one) as overwriteDupes
>>>> was set to true. Correct behavior but quite tricky...
>>>>
>>>> So my conclusion here (please correct me if I'm wrong) is of course to
>>>> fix the signature configuration problem, but also to manage calling the
>>>> update chain (or maybe a simplified one, e.g. by skipping logging) in
>>>> the data import handler. Is there an easy way to do this? Conceptually,
>>>> shouldn't the update chain be callable from the data import process -
>>>> maybe it is?
>>>>
>>>> John
>>>>
>>>>
>>>> On 08/10/15 09:43, Upayavira wrote:
>>>>> Yay!
>>>>>
>>>>> On Thu, Oct 8, 2015, at 08:38 AM, John Smith wrote:
>>>>>> Yes indeed, the update chain had been activated... I commented it out
>>>>>> again and the problem vanished.
>>>>>>
>>>>>> Good job, thanks Erick and Upayavira!
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 08/10/15 08:58, Upayavira wrote:
>>>>>>> Look for the DedupUpdateProcessor in an update chain.
>>>>>>>
>>>>>>> that is there, but commented out IIRC in the techproducts sample
>>>>>>> configs.
>>>>>>>
>>>>>>> Perhaps you uncommented it to use your own update processors, but
>> didn't
>>>>>>> remove that component?
>>>>>>>
>>>>>>> On Thu, Oct 8, 2015, at 07:38 AM, John Smith wrote:
>>>>>>>> Oh, I forgot Erick's mention of the logs: there's nothing unusual in
>>>>>>>> INFO level, the update request just gets mentioned. No exception. I
>>>>>>>> reran it with the DEBUG level, but most of the log was related to
>> jetty.
>>>>>>>> Here's a line I noticed though:
>>>>>>>>
>>>>>>>> org.apache.solr.servlet.HttpSolrCall; Closing out SolrRequest:
>>>>>>>> {wt=json&commit=true&update.chain=dedupe}
>>>>>>>>
>>>>>>>> The update.chain parameter wasn't part of the original request, and
>>>>>>>> "dedupe" looks suspicious to me. Perhaps should I investigate
>> further
>>>>>>>> there?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> John.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/10/15 08:25, John Smith wrote:
>>>>>>>>> The ids are all different: they're unique numbers followed by a
>> couple
>>>>>>>>> of keywords. I've made a test with a small collection of 10
>> documents to
>>>>>>>>> make sure I can manage them manually: all ids are confirmed as
>> different.
>>>>>>>>> I also dumped the exact command, here's one example:
>>>>>>>>>
>>>>>>>>> <add><doc><field name="Id">101084385_Sebago_ sebago
>> shoes</field><field
>>>>>>>>> name="Clicks" update="set">1</field><field name="Boost"
>>>>>>>>> update="set">1.8701925463775</field></doc></add>
>>>>>>>>>
>>>>>>>>> It's sent as the body of a POST request to
>>>>>>>>> http://127.0.0.1:8080/solr/ato_test/update?wt=json&commit=true,
>> with a
>>>>>>>>> Content-Type: text/xml header. I still noted the consistent loss of
>>>>>>>>> another document with the update above.
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 08/10/15 00:38, Upayavira wrote:
>>>>>>>>>> What ID are you using? Are you possibly using the same ID field
>> for
>>>>>>>>>> both, so the second document you visit causes the first to be
>>>>>>>>>> overwritten?
>>>>>>>>>>
>>>>>>>>>> Upayavira
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 7, 2015, at 06:38 PM, Erick Erickson wrote:
>>>>>>>>>>> This certainly should not be happening. I'd
>>>>>>>>>>> take a careful look at what you actually send.
>>>>>>>>>>> My _guess_ is that you're not sending the update
>>>>>>>>>>> command you think you are....
>>>>>>>>>>>
>>>>>>>>>>> As a test you could just curl (or use post.jar) to
>>>>>>>>>>> send these types of commands up individually.
>>>>>>>>>>>
>>>>>>>>>>> Perhaps looking at the solr log would help too...
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Erick
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 7, 2015 at 6:32 AM, John Smith <
>> solr-u...@remailme.net>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm bumping on the following problem with update XML messages.
>> The idea
>>>>>>>>>>>> is to record the number of clicks for a document: each time, a
>> message
>>>>>>>>>>>> is sent to .../update such as this one:
>>>>>>>>>>>>
>>>>>>>>>>>> <add>
>>>>>>>>>>>> <doc>
>>>>>>>>>>>> <field name="Id">abc</field>
>>>>>>>>>>>> <field name="Clicks" update="set">1</field>
>>>>>>>>>>>> <field name="Boost" update="set">1.05</field>
>>>>>>>>>>>> </doc>
>>>>>>>>>>>> </add>
>>>>>>>>>>>>
>>>>>>>>>>>> (Clicks is an int field; Boost is a float field, it's updated
>> to reflect
>>>>>>>>>>>> the change in popularity using a formula based on the number of
>> clicks).
>>>>>>>>>>>> At the moment in the dev environment, changes are committed
>> immediately.
>>>>>>>>>>>>
>>>>>>>>>>>> When a document is updated, the changes are indeed reflected in
>> the
>>>>>>>>>>>> search results. If I click on the same document again, all goes
>> well.
>>>>>>>>>>>> But  when I click on an other document, the latter gets updated
>> as
>>>>>>>>>>>> expected but the former is plainly deleted. It can no longer be
>> found
>>>>>>>>>>>> and the admin core Overview page counts 1 document less. If I
>> click on a
>>>>>>>>>>>> 3rd document, so goes the 2nd one.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The schema is the default one amended to remove unneeded fields
>> and add
>>>>>>>>>>>> new ones, nothing fancy. All fields are stored="true" and
>> there's no
>>>>>>>>>>>> <copyField>. I've tried versions 5.2.1 & 5.3.1 in standalone
>> mode, with
>>>>>>>>>>>> the same outcome. It looks like a bug to me but I might have
>> overlooked
>>>>>>>>>>>> something? This is my first attempt at atomic updates.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> John.
>>>>>>>>>>>>
>>
>

Reply via email to