You’re welcome. Solr is a huge beast, I don’t think any single individual knows all the bits and pieces… Or, in my case, can remember them ;)
> On Apr 27, 2020, at 9:15 AM, Bjarke Buur Mortensen <morten...@eluence.com> > wrote: > > Wow, thanks. Erick. That's actually much better :-) > You live and you learn. > > Cheers, > Bjarke > > Den man. 27. apr. 2020 kl. 15.00 skrev Erick Erickson < > erickerick...@gmail.com>: > >> What about the Collections API REINDEXCOLLECTION? That has the >> advantage of being something officially supported, puts the source >> collection into read-only mode, uses a much more efficient query >> process (streaming actually) etc. >> >> It has the disadvantage of producing a new collection under the >> covers and aliasing to it. But you can always rename the collection >> later. >> >> Best, >> Erick >> >>> On Apr 27, 2020, at 8:23 AM, Bjarke Buur Mortensen < >> morten...@eluence.com> wrote: >>> >>> Thanks for the reply, >>> I'm on solr 8.2 so cursorMark is there. >>> >>> Doing this from one collection to another collection, and then use a >>> collection alias is probably the way to go, but actually, my suggestion >>> was a little more bold: >>> >>> I'm indexing on top of the same core, i.e from >>> http://localhost:8983/solr/mycollection to >>> http://localhost:8983/solr/mycollection >>> >>> (This is why I suggested adding a version:[* TO >> <current_highest_version>] >>> to ensure it terminates for large imports.) >>> >>> With this in mind, are you still thinking this is a safe approach? >>> >>> Thanks, >>> Bjarke >>> >>> >>> Den man. 27. apr. 2020 kl. 13.46 skrev Emir Arnautović < >>> emir.arnauto...@sematext.com>: >>> >>>> Hi Bjarke, >>>> I don’t see a problem with that approach if you have enough resources to >>>> handle both cores at the same time, especially if you are doing that >> while >>>> serving production queries. The only issue is that if you plan to do >> that >>>> then you have to have all fields stored. Also note that cursorMark >> support >>>> was added a bit later to entity processor, so if you are running a bit >>>> older version of Solr, you might not have cursors - I’ve found it the >> hard >>>> way. >>>> >>>> Emir >>>> -- >>>> Monitoring - Log Management - Alerting - Anomaly Detection >>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >>>> >>>> >>>> >>>>> On 27 Apr 2020, at 13:11, Bjarke Buur Mortensen <morten...@eluence.com >>> >>>> wrote: >>>>> >>>>> Hi list, >>>>> >>>>> Let's say I add a copyField to my solr schema, or change the analysis >>>> chain >>>>> of a field or some other change. >>>>> It seems to me to be an alluring choice to use a very simple >>>>> dataimporthandler to reindex all documents, by using a >>>> SolrEntityProcessor >>>>> that points to itself. I have just done this for a very small >> collection, >>>>> but I was wondering what the caveats are, since this is not the >>>> recommended >>>>> practice. What can go wrong using this approach? >>>>> >>>>> <document> <entity name="all_from_self" processor="SolrEntityProcessor" >>>> url= >>>>> "http://localhost:8983/solr/mycollection" qt="lucene" query="*:*" wt= >>>>> "javabin" rows="1000" cursorMark="true" sort="id asc" fl= >>>>> "*,orig_version_l:_version_"/> </document> >>>>> >>>>> PS: (It is probably necessary to add a version:[* TO >>>>> <current_highest_version>] to ensure it terminates for large imports) >>>>> PPS: (Obviously you shouldn't add the clean parameter) >>>>> >>>>> /Bjarke >>>> >>>> >> >>