Re: Reindexing using dataimporthandler

Erick Erickson Mon, 27 Apr 2020 06:19:41 -0700

You’re welcome.

Solr is a huge beast, I don’t think any single individual
knows all the bits and pieces… Or, in my case, can
remember them ;)


> On Apr 27, 2020, at 9:15 AM, Bjarke Buur Mortensen <morten...@eluence.com> 
> wrote:
> 
> Wow, thanks. Erick. That's actually much better :-)
> You live and you learn.
> 
> Cheers,
> Bjarke
> 
> Den man. 27. apr. 2020 kl. 15.00 skrev Erick Erickson <
> erickerick...@gmail.com>:
> 
>> What about the Collections API REINDEXCOLLECTION? That has the
>> advantage of being something officially supported, puts the source
>> collection into read-only mode, uses a much more efficient query
>> process (streaming actually) etc.
>> 
>> It has the disadvantage of producing a new collection under the
>> covers and aliasing to it. But you can always rename the collection
>> later.
>> 
>> Best,
>> Erick
>> 
>>> On Apr 27, 2020, at 8:23 AM, Bjarke Buur Mortensen <
>> morten...@eluence.com> wrote:
>>> 
>>> Thanks for the reply,
>>> I'm on solr 8.2 so cursorMark is there.
>>> 
>>> Doing this from one collection to another collection, and then use a
>>> collection alias is probably the way to go, but  actually, my suggestion
>>> was a little more bold:
>>> 
>>> I'm indexing on top of the same core, i.e from
>>> http://localhost:8983/solr/mycollection to
>>> http://localhost:8983/solr/mycollection
>>> 
>>> (This is why I suggested adding a version:[* TO
>> <current_highest_version>]
>>> to ensure it terminates for large imports.)
>>> 
>>> With this in mind, are you still thinking this is a safe approach?
>>> 
>>> Thanks,
>>> Bjarke
>>> 
>>> 
>>> Den man. 27. apr. 2020 kl. 13.46 skrev Emir Arnautović <
>>> emir.arnauto...@sematext.com>:
>>> 
>>>> Hi Bjarke,
>>>> I don’t see a problem with that approach if you have enough resources to
>>>> handle both cores at the same time, especially if you are doing that
>> while
>>>> serving production queries. The only issue is that if you plan to do
>> that
>>>> then you have to have all fields stored. Also note that cursorMark
>> support
>>>> was added a bit later to entity processor, so if you are running a bit
>>>> older version of Solr, you might not have cursors - I’ve found it the
>> hard
>>>> way.
>>>> 
>>>> Emir
>>>> --
>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>> 
>>>> 
>>>> 
>>>>> On 27 Apr 2020, at 13:11, Bjarke Buur Mortensen <morten...@eluence.com
>>> 
>>>> wrote:
>>>>> 
>>>>> Hi list,
>>>>> 
>>>>> Let's say I add a copyField to my solr schema, or change the analysis
>>>> chain
>>>>> of a field or some other change.
>>>>> It seems to me to be an alluring choice to use a very simple
>>>>> dataimporthandler to reindex all documents, by using a
>>>> SolrEntityProcessor
>>>>> that points to itself. I have just done this for a very small
>> collection,
>>>>> but I was wondering what the caveats are, since this is not the
>>>> recommended
>>>>> practice. What can go wrong using this approach?
>>>>> 
>>>>> <document> <entity name="all_from_self" processor="SolrEntityProcessor"
>>>> url=
>>>>> "http://localhost:8983/solr/mycollection"; qt="lucene" query="*:*" wt=
>>>>> "javabin" rows="1000" cursorMark="true" sort="id asc" fl=
>>>>> "*,orig_version_l:_version_"/> </document>
>>>>> 
>>>>> PS: (It is probably necessary to add a version:[* TO
>>>>> <current_highest_version>] to ensure it terminates for large imports)
>>>>> PPS: (Obviously you shouldn't add the clean parameter)
>>>>> 
>>>>> /Bjarke
>>>> 
>>>> 
>> 
>>

Re: Reindexing using dataimporthandler

Reply via email to