Hello dear list! I have a problem with the new IndexWriter mechanism in 1.15. Hopefully someone can point out to me what I should do differently.
I have a couple of test systems running different versions of a web application and there is a separate SOLR core for each of them. There is a single VM that crawls and indexes content from scratch for every test system that has been redeployed. So up until 1.14 I would simply specify the target core (solr.server.url) when calling bin/crawl. Say, today I have redeployed test_system_1, so I call bin/crawl to update the SOLR core test_system_1. Now with 1.15 I cannot explicitly choose a target index anymore, so I tried the following: In index-writers.xml, I specified an IndexWriter for each of my systems/cores. In order to choose which IndexWriter to use, I specified an exchange for every test system in exhanges.xml. It maps the host name (unique to each test system) to the correct IndexWriter (and therefore the correct core). This leaves me with two problems though: 1. I only ever want to index to one specific core during one crawl cycle and I already KNOW its name. However, the Exchange expressions are evaluated for every single document I'm indexing. The expression evaluates fine though, so it "works" and this being a test environment, I could live with it. 2. All IndexWriters referenced by ANY of the Exchanges must actually reference existing cores, even when only one of the IndexWriters is ever actually being used. If any of the references cores does NOT exist, Nutch will get a 404 for the non-existing core during the indexing phase and break. I assume Nutch checks all referenced IndexWriters before starting indexing just to be sure they are all available. Problem #2 is the crux for me since I can't reliably guarantee that all (unrelated) cores are available during a certain crawl (and why should I need to?). It's possible that my design is broken or my use case uncommon. But it seems to me that I should be able to somewhat easily achieve what I could with 1.14, i.e. explicitly choose the target core for each call of bin/crawl. A solution would of course be to set up a separate crawling VM for each test system, each with a single IndexWriter. But that can't be the way to go. Grateful for any kind of pointer towards a solution! Felix