Re: Reindexing using dataimporthandler

Erick Erickson Mon, 27 Apr 2020 06:00:47 -0700

What about the Collections API REINDEXCOLLECTION? That has the
advantage of being something officially supported, puts the source
collection into read-only mode, uses a much more efficient query
process (streaming actually) etc.


It has the disadvantage of producing a new collection under the
covers and aliasing to it. But you can always rename the collection
later.

Best,
Erick

> On Apr 27, 2020, at 8:23 AM, Bjarke Buur Mortensen <morten...@eluence.com> 
> wrote:
> 
> Thanks for the reply,
> I'm on solr 8.2 so cursorMark is there.
> 
> Doing this from one collection to another collection, and then use a
> collection alias is probably the way to go, but  actually, my suggestion
> was a little more bold:
> 
> I'm indexing on top of the same core, i.e from
> http://localhost:8983/solr/mycollection to
> http://localhost:8983/solr/mycollection
> 
> (This is why I suggested adding a version:[* TO <current_highest_version>]
> to ensure it terminates for large imports.)
> 
> With this in mind, are you still thinking this is a safe approach?
> 
> Thanks,
> Bjarke
> 
> 
> Den man. 27. apr. 2020 kl. 13.46 skrev Emir Arnautović <
> emir.arnauto...@sematext.com>:
> 
>> Hi Bjarke,
>> I don’t see a problem with that approach if you have enough resources to
>> handle both cores at the same time, especially if you are doing that while
>> serving production queries. The only issue is that if you plan to do that
>> then you have to have all fields stored. Also note that cursorMark support
>> was added a bit later to entity processor, so if you are running a bit
>> older version of Solr, you might not have cursors - I’ve found it the hard
>> way.
>> 
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 27 Apr 2020, at 13:11, Bjarke Buur Mortensen <morten...@eluence.com>
>> wrote:
>>> 
>>> Hi list,
>>> 
>>> Let's say I add a copyField to my solr schema, or change the analysis
>> chain
>>> of a field or some other change.
>>> It seems to me to be an alluring choice to use a very simple
>>> dataimporthandler to reindex all documents, by using a
>> SolrEntityProcessor
>>> that points to itself. I have just done this for a very small collection,
>>> but I was wondering what the caveats are, since this is not the
>> recommended
>>> practice. What can go wrong using this approach?
>>> 
>>> <document> <entity name="all_from_self" processor="SolrEntityProcessor"
>> url=
>>> "http://localhost:8983/solr/mycollection"; qt="lucene" query="*:*" wt=
>>> "javabin" rows="1000" cursorMark="true" sort="id asc" fl=
>>> "*,orig_version_l:_version_"/> </document>
>>> 
>>> PS: (It is probably necessary to add a version:[* TO
>>> <current_highest_version>] to ensure it terminates for large imports)
>>> PPS: (Obviously you shouldn't add the clean parameter)
>>> 
>>> /Bjarke
>> 
>>

Re: Reindexing using dataimporthandler

Reply via email to