Hi Alessandro, Some interesting testing today that seems to have gotten me closer to what the issue is. When I run the version of the index that is working correctly against my database table that has the extra field in it, the index suddenly increases in size. This is even though the data importer is running the same SELECT as before (which doesn't include the extra column) and loads the same number of rows.
After scratching my head for a bit and browsing through both versions of the table I am loading from (with and without the extra field), I noticed that the natural ordering of the tables is different. These tables are "staging" tables that I populate with another set of queries and inserts to get the data into a format that is easy to ingest into Solr. When I add the extra field to these queries, it changes the Oracle query plan as the field is contained in a different table that I need to join to. As I don't specify an "ORDER BY" on the query (as I didn't think it would make a difference and would slow the query down), Oracle is free to chose how it orders the result set. Adding the extra field changes that natural ordering, which affects the order things go into my staging table. As I don't specify an "ORDER BY" when I select things out of the staging table, my data in the scenario that is working is being loaded in a different order to the scenario which doesn't work. I am currently running full loads to verify this under each scenario, as I have now forced the data in the scenario that doesn't work to be in the same order as the scenario that does. Will see how this load goes overnight. This leads to the question of what difference does it make to Solr what order I load the data in? I also noticed that the .cfs file is quite large in the second scenario, even though this is supposed to be disabled by default in Solr. I checked my Solr config and there is no override of the default. In answer to your questions: 1) same number of documents - YES ~14,000,000 documents 2) identical documents ( + 1 new field each not indexed) - YES, the second scenario has one extra field that is stored but not indexed 3) same number of deleted documents - YES, there are zero deleted documents in both scenarios 4) they both were born from scratch ( an empty index) - YES, both start from a brand new virtual server with a brand new installation of Solr I am using the default auto commit, which I think is 15000. Thanks again for your assistance. Regards, David David Howe Java Domain Architect Postal Systems Level 16, 111 Bourke Street Melbourne VIC 3000 T 0391067904 M 0424036591 E david.h...@auspost.com.au W auspost.com.au W startrack.com.au Australia Post is committed to providing our customers with excellent service. If we can assist you in any way please telephone 13 13 18 or visit our website. The information contained in this email communication may be proprietary, confidential or legally professionally privileged. It is intended exclusively for the individual or entity to which it is addressed. You should only read, disclose, re-transmit, copy, distribute, act in reliance on or commercialise the information if you are authorised to do so. Australia Post does not represent, warrant or guarantee that the integrity of this email communication has been maintained nor that the communication is free of errors, virus or interference. If you are not the addressee or intended recipient please notify us by replying direct to the sender and then destroy any electronic or paper copy of this message. Any views expressed in this email communication are taken to be those of the individual sender, except where the sender specifically attributes those views to Australia Post and is authorised to do so. Please consider the environment before printing this email.