Hi Shawn and everyone who replied to the thread, The solr version is 5.2.1 and each document is returning multi-valued fields for majority of fields defined in schema.xml. I'm in the process of pasting the content of my files to a paste website and soon will update.
Thanks, Srinivas On 11/19/2018 2:31 AM, Srinivas Kashyap wrote: > I have a solr core with some 20 fields in it.(all are stored and indexed). > For an environment, the number of documents are around 0.29 million. When I > run the full import through DIH, indexing is completing successfully. But, it > is occupying the disk space of around 5 GB. Is there a possibility where I > can go and check, which document is consuming more memory? Put in another > way, can I sort the index based on size? I am not aware of any way to do that. Might be one that I don't know about, but if there were a way, seems like I would have come across it before. It is not very that the large index size is due to a single document or a handful of documents. It is more likely that most documents are relatively large. I could be wrong about that, though. If you have 290000 documents (which is how I interpreted 0.29 million) and the total index size is about 5 GB, then the average size per document in the index is about 18 kilobytes.This is in my view pretty large. Typically I think that most documents are 1-2 kilobytes. Can we get your Solr version, a copy of your schema, and exactly what Solr returns in search results for a typically sized document? You'll need to use a paste website or a file-sharing website ... if you try to attach these things to a message, the mailing list will most likely eat them, and we'll never see them. If you need to redact the information in search results ... please do it in a way that we can still see the exact size of the text -- don't just remove information, replace it with information that's the same length. Thanks, Shawn ________________________________ DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.