What about the Collections API REINDEXCOLLECTION? That has the advantage of being something officially supported, puts the source collection into read-only mode, uses a much more efficient query process (streaming actually) etc.
It has the disadvantage of producing a new collection under the covers and aliasing to it. But you can always rename the collection later. Best, Erick > On Apr 27, 2020, at 8:23 AM, Bjarke Buur Mortensen <morten...@eluence.com> > wrote: > > Thanks for the reply, > I'm on solr 8.2 so cursorMark is there. > > Doing this from one collection to another collection, and then use a > collection alias is probably the way to go, but actually, my suggestion > was a little more bold: > > I'm indexing on top of the same core, i.e from > http://localhost:8983/solr/mycollection to > http://localhost:8983/solr/mycollection > > (This is why I suggested adding a version:[* TO <current_highest_version>] > to ensure it terminates for large imports.) > > With this in mind, are you still thinking this is a safe approach? > > Thanks, > Bjarke > > > Den man. 27. apr. 2020 kl. 13.46 skrev Emir Arnautović < > emir.arnauto...@sematext.com>: > >> Hi Bjarke, >> I don’t see a problem with that approach if you have enough resources to >> handle both cores at the same time, especially if you are doing that while >> serving production queries. The only issue is that if you plan to do that >> then you have to have all fields stored. Also note that cursorMark support >> was added a bit later to entity processor, so if you are running a bit >> older version of Solr, you might not have cursors - I’ve found it the hard >> way. >> >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >>> On 27 Apr 2020, at 13:11, Bjarke Buur Mortensen <morten...@eluence.com> >> wrote: >>> >>> Hi list, >>> >>> Let's say I add a copyField to my solr schema, or change the analysis >> chain >>> of a field or some other change. >>> It seems to me to be an alluring choice to use a very simple >>> dataimporthandler to reindex all documents, by using a >> SolrEntityProcessor >>> that points to itself. I have just done this for a very small collection, >>> but I was wondering what the caveats are, since this is not the >> recommended >>> practice. What can go wrong using this approach? >>> >>> <document> <entity name="all_from_self" processor="SolrEntityProcessor" >> url= >>> "http://localhost:8983/solr/mycollection" qt="lucene" query="*:*" wt= >>> "javabin" rows="1000" cursorMark="true" sort="id asc" fl= >>> "*,orig_version_l:_version_"/> </document> >>> >>> PS: (It is probably necessary to add a version:[* TO >>> <current_highest_version>] to ensure it terminates for large imports) >>> PPS: (Obviously you shouldn't add the clean parameter) >>> >>> /Bjarke >> >>