Re: Slow indexing speed when index size is large?
Hi Shawn, Thanks for the information. Regards, Edwin On 14 October 2016 at 20:19, Shawn Heisey wrote: > On 10/13/2016 9:58 PM, Zheng Lin Edwin Yeo wrote: > > Thanks for the reply Shawn. Currently, my heap allocation to each Solr > > instance is 22GB. Is that big enough? > > I can't answer that question. I know little about your install. Even > if I *did* know a few more things about your install, I could only make > a *guess* about how much heap you need, and I'd probably be wrong. > > https://lucidworks.com/blog/2012/07/23/sizing-hardware-in- > the-abstract-why-we-dont-have-a-definitive-answer/ > > I did write down what I consider to be a good way to figure out a > correct heap size, but it requires experimentation with your live > system, which might cause disruption of your search service: > > https://wiki.apache.org/solr/SolrPerformanceProblems#How_ > much_heap_space_do_I_need.3F > > Thanks, > Shawn > >
Re: Slow indexing speed when index size is large?
On 10/13/2016 9:58 PM, Zheng Lin Edwin Yeo wrote: > Thanks for the reply Shawn. Currently, my heap allocation to each Solr > instance is 22GB. Is that big enough? I can't answer that question. I know little about your install. Even if I *did* know a few more things about your install, I could only make a *guess* about how much heap you need, and I'd probably be wrong. https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ I did write down what I consider to be a good way to figure out a correct heap size, but it requires experimentation with your live system, which might cause disruption of your search service: https://wiki.apache.org/solr/SolrPerformanceProblems#How_much_heap_space_do_I_need.3F Thanks, Shawn
Re: Slow indexing speed when index size is large?
Thanks for the reply Shawn. Currently, my heap allocation to each Solr instance is 22GB. Is that big enough? Regards, Edwin On 13 October 2016 at 23:56, Shawn Heisey wrote: > On 10/13/2016 9:20 AM, Zheng Lin Edwin Yeo wrote: > > Would like to find out, will the indexing speed in a collection with a > > very large index size be much slower than one which is still empty or > > a very small index size? This is assuming that the configurations, > > indexing code and the files to be indexed are the same. Currently, I > > have a setup in which the collection is still empty, and I managed to > > achieve an indexing speed of more than 7GB/hr. I also have another > > setup in which the collection has an index size of 1.6TB, and when I > > tried to index new documents to it, the indexing speed is less than > > 0.7GB/hr. > > I have noticed this phenomenon myself. As the amount of index data > already present increases, indexing slows down. Best guess as to the > cause: more frequent and longer-lasting garbage collections. > > Indexing involves a LOT of memory allocation. Most of the memory chunks > that get allocated are quickly discarded because they do not need to be > retained. > > If you understand how the Java memory model works, then you know that > this means there will be a lot of garbage collection. Each GC will tend > to take longer if there are a large number of objects allocated that are > NOT garbage. > > When the index is large, Lucene/Solr must allocate and retain a larger > amount of memory just to ensure that everything works properly. This > leaves less free memory, so indexing will cause more frequent garbage > collections ... and because the amount of retained memory is > correspondingly larger, each garbage collection will take longer than it > would with a smaller index. A ten to one difference in speed does seem > extreme, though. > > You might want to increase the heap allocated to each Solr instance, so > GC is less frequent. This can take memory away from the OS disk cache, > though. If the amount of OS disk cache drops too low, general > performance may suffer. > > Thanks, > Shawn > >
Re: Slow indexing speed when index size is large?
On 10/13/2016 9:20 AM, Zheng Lin Edwin Yeo wrote: > Would like to find out, will the indexing speed in a collection with a > very large index size be much slower than one which is still empty or > a very small index size? This is assuming that the configurations, > indexing code and the files to be indexed are the same. Currently, I > have a setup in which the collection is still empty, and I managed to > achieve an indexing speed of more than 7GB/hr. I also have another > setup in which the collection has an index size of 1.6TB, and when I > tried to index new documents to it, the indexing speed is less than > 0.7GB/hr. I have noticed this phenomenon myself. As the amount of index data already present increases, indexing slows down. Best guess as to the cause: more frequent and longer-lasting garbage collections. Indexing involves a LOT of memory allocation. Most of the memory chunks that get allocated are quickly discarded because they do not need to be retained. If you understand how the Java memory model works, then you know that this means there will be a lot of garbage collection. Each GC will tend to take longer if there are a large number of objects allocated that are NOT garbage. When the index is large, Lucene/Solr must allocate and retain a larger amount of memory just to ensure that everything works properly. This leaves less free memory, so indexing will cause more frequent garbage collections ... and because the amount of retained memory is correspondingly larger, each garbage collection will take longer than it would with a smaller index. A ten to one difference in speed does seem extreme, though. You might want to increase the heap allocated to each Solr instance, so GC is less frequent. This can take memory away from the OS disk cache, though. If the amount of OS disk cache drops too low, general performance may suffer. Thanks, Shawn