RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-19 Thread Alessandro Benedetti
Hi David, good to know that sorting solved your problem. I understand perfectly that given the urgency of your situation, having the solution ready takes priority over continuing with the investigations. I would recommend anyway to open a Jira issue in Apache Solr with all the information

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-18 Thread Howe, David
Hi Erick & Alessandro, I have solved my problem by re-ordering the data in the SQL query. I don't know why it works but it does. I can consistently re-produce the problem without changing anything else except the database table. As our Solr build is scripted and we always build a new Solr

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Erick Erickson
I didn't mean to imply that _you'd_ changed things, the _defaults_ may have changed. So the "string" fieldType may be defined with docValues="true" in your new schema and "false" in your old schema without you intentionally changing anything at _all_. That's why the LukeRequestHandler will

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Erick, I'm 99% sure that I haven't changed the field types between the two snapshots as all of my test runs are completely scripted and build a new Solr server from scratch (both the virtual machine and the Solr software). I can diff the scripts between two runs to make sure I haven't

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Erick Erickson
Well, I'm not entirely sure either ;) What I'm seeing. And, BTW, I'm making a couple of assumptions here. In the one listing, your biggest segment starts with _7l and in the other its _zd. The aggregate size is 2,815M for _7l and 705M for _zd. So multiplying the individual files in _zd by 4

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Erick, Thinking some more about the differences between the two sort orders has suggested another possibility. We also have a geo spatial field defined in the index: echo "$(date) Creating geoLocation field" curl -X POST -H 'Content-type:application/json' --data-binary '{

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Erick, Below is the file listing for when the index is loaded with the table ordered in a way that produces the smaller index. I have checked the console, and we have no deleted docs and we have the same number of docs in the index as there are rows in the staging table that we load from.

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Alessandro, There are 14,061,990 records in the staging table and that is how many documents that we end up with in Solr. I would be surprised if we have a problem with the id, as we use the primary key of the table as the id in Solr so it must be unique. The primary key of the staging

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Alessandro Benedetti
It's a silly thing, but to confirm the direction that Erick is suggesting : How many rows in the DB ? If updates are happening on Solr ( causing the deletes), I would expect a greater number of documents in the DB than in the Solr index. Is the DB primary key ( if any) the same of the uniqueKey

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David
Hi Emir, We have no copy field definitions. To keep things simple, we have a one to one mapping between the columns in our staging table and the fields in our Solr index. Regards, David David Howe Java Domain Architect Postal Systems Level 16, 111 Bourke Street Melbourne VIC 3000 T

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Emir Arnautović
Hi David, I skimmed through thread and don’t see if already eliminated, so will ask: Can you check if there are some copyField rules that are triggered when new field is added. You mentioned that ordering fixed the size of the index, but might be worth checking. Emir -- Monitoring - Log

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Erick Erickson
This isn't terribly useful without a similar dump of "the other" index directory. The point is to compare the different extensions some segment where the sum of all the files in that segment is roughly equal. So if you have a listing of the old index around, that would help. bq: We don't have any

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Howe, David
Hi Erick, I have the full dump of the Solr index file sizes as well if that is of any help. I have attached it below this message. We don't have any deleted docs in our index, as we always build it from a brand new virtual machine with a brand new installation of Solr. The ordering is

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Erick Erickson
David: Rats, the cfs files make everything I'd hoped to understand with the sizes ambiguous, since they conceal the underlying sizes of each other extension. We can approach it a bit differently though. Take one segment that's _not_ in cfs format where the total size of all files making up that

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Pratik Patel
@Alessandro I will see if I can reproduce the same issue just by turning off omitNorms on field type. I'll open another mail thread if required. Thanks. On Thu, Feb 15, 2018 at 6:12 AM, Howe, David wrote: > > Hi Alessandro, > > Some interesting testing today that

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Howe, David
Hi Alessandro, Some interesting testing today that seems to have gotten me closer to what the issue is. When I run the version of the index that is working correctly against my database table that has the extra field in it, the index suddenly increases in size. This is even though the data

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Alessandro Benedetti
@Pratik: you should have investigated. I understand that solved your issue, but in case you needed norms it doesn't make sense that cause your index to grow up by a factor of 30. You must have faced a nasty bug if it was just the norms. @Howe : *Compound File* .cfs, .cfe An optional

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Howe, David
Subject: RE: Index size increases disproportionately to size of added field when indexed=false I have set docValues=false on all of the string fields in our index that have indexed=false and stored=true. This gave a small improvement in the index size from 13.3GB to 12.82GB. I have also tried

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Pratik Patel
You are right, in my case this field type was applied to many text fields. These includes many copy fields and dynamic fields as well. In my case, only specifying omitNorms=true for field type "text_general" fixed the issue. I didn't do anything else or had any other bug. On Wed, Feb 14, 2018 at

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Alessandro Benedetti
Hi pratik, how is it possible that just the norms for a single field were causing such a massive index size increment in your case ? In your case I think it was for a field type used by multiple fields, but it's still suspicious in my opinions, norms should be that big. If I remember correctly in

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Erick Erickson
1067904 >> >> M 0424036591 >> >> E david.h...@auspost.com.au >> >> W auspost.com.au >> W startrack.com.au >> >> -Original Message- >> From: Howe, David [mailto:david.h...@auspost.com.au] >> Sent: Wednesday, 14 Feb

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Pratik Patel
h...@auspost.com.au] > Sent: Wednesday, 14 February 2018 7:26 AM > To: solr-user@lucene.apache.org > Subject: RE: Index size increases disproportionately to size of added > field when indexed=false > > > Thanks Hoss. I will try setting docValues to false, as we only ever want >

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
auspost.com.au W startrack.com.au -Original Message- From: Howe, David [mailto:david.h...@auspost.com.au] Sent: Wednesday, 14 February 2018 7:26 AM To: solr-user@lucene.apache.org Subject: RE: Index size increases disproportionately to size of added field when indexed=false Thanks Hoss. I

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Thanks Hoss. I will try setting docValues to false, as we only ever want to be able to retrieve the value of this field. Regards, David David Howe Java Domain Architect Postal Systems Level 16, 111 Bourke Street Melbourne VIC 3000 T 0391067904 M 0424036591 E david.h...@auspost.com.au

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Hi Erick, Thanks for responding. You are correct that we don't have any deleted docs. When we want to re-index (once a fortnight), we build a brand new installation of Solr from scratch and re-import the new data into an empty index. I will try setting docValues to false and see if that

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Hi Alessandro, The docker image is like a disk image of the entire server, so it includes the operating system, the Solr installation and the data. Because we run in the cloud and our index isn't that big, this is an easy and fast way for us to scale our Solr cluster without having to

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread David Hastings
To piggy back on this, what would be the right scenarios to use docvalues='true'? On Tue, Feb 13, 2018 at 1:10 PM, Chris Hostetter wrote: > > : We are using Solr 7.1.0 to index a database of addresses. We have found > : that our index size increases massively when we

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Chris Hostetter
: We are using Solr 7.1.0 to index a database of addresses. We have found : that our index size increases massively when we add one extra field to : the index, even though that field is stored and not indexed, and doesn’t what about docValues? : When we run an index load without the

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Erick Erickson
David: Right, Optimize Is Evil. Well, actually in your case it's not. In your specific case you can optimize every time you build your index and be OK, gory details here: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ But that's just for background. The key

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Alessandro Benedetti
Hi David, given the fact that you are actually building a new index from scratch, my shot in the dark didn't hit any target. When you say : "Once the import finishes we save the docker image in the AWS docker repository. We then build our cluster using that image as the base" Do you mean just

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David
Hi Alessanro, Thanks for responding. We rebuild the index every time starting from a fresh installation of Solr. Because we are running at AWS, we have automated our deployment so we start with the base docker image, configure Solr and then import our data every time the data changes (it

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Alessandro Benedetti
I assume you re-index in full right ? My shot in the dark is that this increment is temporary. You re-index, so effectively delete and add all documents ( this means that even if the new field is just stored, you re-build the entire index for all the fields). Create new segments and the old docs

Index size increases disproportionately to size of added field when indexed=false

2018-02-12 Thread Howe, David
Hi, We are using Solr 7.1.0 to index a database of addresses. We have found that our index size increases massively when we add one extra field to the index, even though that field is stored and not indexed, and doesn’t contain a lot of data. When this occurs, we also observe a significant