@Alessandro I will see if I can reproduce the same issue just by turning off omitNorms on field type. I'll open another mail thread if required. Thanks.
On Thu, Feb 15, 2018 at 6:12 AM, Howe, David <david.h...@auspost.com.au> wrote: > > Hi Alessandro, > > Some interesting testing today that seems to have gotten me closer to what > the issue is. When I run the version of the index that is working > correctly against my database table that has the extra field in it, the > index suddenly increases in size. This is even though the data importer is > running the same SELECT as before (which doesn't include the extra column) > and loads the same number of rows. > > After scratching my head for a bit and browsing through both versions of > the table I am loading from (with and without the extra field), I noticed > that the natural ordering of the tables is different. These tables are > "staging" tables that I populate with another set of queries and inserts to > get the data into a format that is easy to ingest into Solr. When I add > the extra field to these queries, it changes the Oracle query plan as the > field is contained in a different table that I need to join to. As I don't > specify an "ORDER BY" on the query (as I didn't think it would make a > difference and would slow the query down), Oracle is free to chose how it > orders the result set. Adding the extra field changes that natural > ordering, which affects the order things go into my staging table. As I > don't specify an "ORDER BY" when I select things out of the staging table, > my data in the scenario that is working is being loaded in a different > order to the scenario which doesn't work. > > I am currently running full loads to verify this under each scenario, as I > have now forced the data in the scenario that doesn't work to be in the > same order as the scenario that does. Will see how this load goes > overnight. > > This leads to the question of what difference does it make to Solr what > order I load the data in? > > I also noticed that the .cfs file is quite large in the second scenario, > even though this is supposed to be disabled by default in Solr. I checked > my Solr config and there is no override of the default. > > In answer to your questions: > > 1) same number of documents - YES ~14,000,000 documents > 2) identical documents ( + 1 new field each not indexed) - YES, the second > scenario has one extra field that is stored but not indexed > 3) same number of deleted documents - YES, there are zero deleted > documents in both scenarios > 4) they both were born from scratch ( an empty index) - YES, both start > from a brand new virtual server with a brand new installation of Solr > > I am using the default auto commit, which I think is 15000. > > Thanks again for your assistance. > > Regards, > > David > > David Howe > Java Domain Architect > Postal Systems > Level 16, 111 Bourke Street Melbourne VIC 3000 > > T 0391067904 > > M 0424036591 > > E david.h...@auspost.com.au > > W auspost.com.au > W startrack.com.au > > Australia Post is committed to providing our customers with excellent > service. If we can assist you in any way please telephone 13 13 18 or visit > our website. > > The information contained in this email communication may be proprietary, > confidential or legally professionally privileged. It is intended > exclusively for the individual or entity to which it is addressed. You > should only read, disclose, re-transmit, copy, distribute, act in reliance > on or commercialise the information if you are authorised to do so. > Australia Post does not represent, warrant or guarantee that the integrity > of this email communication has been maintained nor that the communication > is free of errors, virus or interference. > > If you are not the addressee or intended recipient please notify us by > replying direct to the sender and then destroy any electronic or paper copy > of this message. Any views expressed in this email communication are taken > to be those of the individual sender, except where the sender specifically > attributes those views to Australia Post and is authorised to do so. > > Please consider the environment before printing this email. >