Worst case is 3X. That happens when there are no merges until the commit.

With tlogs, worst case is more than that. I’ve seen humongous tlogs with a 
batch load and no hard commit until the end. If you do that several times, then 
you have a few old humongous tlogs. Bleah.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 19, 2018, at 7:40 AM, David Hastings <hastings.recurs...@gmail.com> 
> wrote:
> 
> Also a full import, assuming the documents were already indexed, will just
> double your index size until a merge/optimize is ran since you are just
> marking a document as deleted, not taking back any space, and then adding
> another completely new document on top of it.
> 
> On Mon, Nov 19, 2018 at 10:36 AM Shawn Heisey <apa...@elyograg.org> wrote:
> 
>> On 11/19/2018 2:31 AM, Srinivas Kashyap wrote:
>>> I have a solr core with some 20 fields in it.(all are stored and
>> indexed). For an environment, the number of documents are around 0.29
>> million. When I run the full import through DIH, indexing is completing
>> successfully. But, it is occupying the disk space of around 5 GB. Is there
>> a possibility where I can go and check, which document is consuming more
>> memory? Put in another way, can I sort the index based on size?
>> 
>> I am not aware of any way to do that.  Might be one that I don't know
>> about, but if there were a way, seems like I would have come across it
>> before.
>> 
>> It is not very that the large index size is due to a single document or
>> a handful of documents.  It is more likely that most documents are
>> relatively large.  I could be wrong about that, though.
>> 
>> If you have 290000 documents (which is how I interpreted 0.29 million)
>> and the total index size is about 5 GB, then the average size per
>> document in the index is about 18 kilobytes.This is in my view pretty
>> large.  Typically I think that most documents are 1-2 kilobytes.
>> 
>> Can we get your Solr version, a copy of your schema, and exactly what
>> Solr returns in search results for a typically sized document?  You'll
>> need to use a paste website or a file-sharing website ... if you try to
>> attach these things to a message, the mailing list will most likely eat
>> them, and we'll never see them. If you need to redact the information in
>> search results ... please do it in a way that we can still see the exact
>> size of the text -- don't just remove information, replace it with
>> information that's the same length.
>> 
>> Thanks,
>> Shawn
>> 
>> 

Reply via email to