Hi Shawn, Thank you for the detailed suggestions. Although, I would like to understand the maxMergeCount and maxThreadCount params better. The documentation <https://lucene.apache.org/solr/guide/7_3/indexconfig-in-solrconfig.html#mergescheduler> mentions that
maxMergeCount : The maximum number of simultaneous merges that are allowed. maxThreadCount : The maximum number of simultaneous merge threads that should be running at once Since one thread can only do 1 merge at any given point of time, how does maxMergeCount being greater than maxThreadCount help anyway? I am having difficulty wrapping my head around this, and would appreciate if you could help clear it for me. Thanks, Rahul On Thu, Jun 13, 2019 at 7:33 AM Shawn Heisey <apa...@elyograg.org> wrote: > On 6/6/2019 9:00 AM, Rahul Goswami wrote: > > *OP Reply* : Total 48 GB per node... I couldn't see another software > using > > a lot of memory. > > I am honestly not sure about the reason for change of directory factory > to > > SimpleFSDirectoryFactory. But I was told that with mmap at one point we > > started to see the shared memory usage on Windows go up significantly, > > intermittently freezing the system. > > Could the choice of DirectoryFactory here be a factor for the long > > updates/frequent merges? > > With about 24GB of RAM to cache 1.4TB of index data, you're never going > to have good performance. Any query you do is probably going to read > more than 24GB of data from the index, which means that it cannot come > from memory, some of it must come from disk, which is incredibly slow > compared to memory. > > MMap is more efficient than "simple" filesystem access. I do not know > if you would see markedly better performance, but getting rid of the > DirectoryFactory config and letting Solr choose its default might help. > > > How many total documents (maxDoc, not numDoc) are in that 1.4 TB of > > space? > > *OP Reply:* Also, there are nearly 12.8 million total docs (maxDoc, NOT > > numDoc) in that 1.4 TB space > > Unless you're doing faceting or grouping on fields with extremely high > cardinality, which I find to be rarely useful except for data mining, > 24GB of heap for 12.8 million docs seems very excessive. I was > expecting this number to be something like 500 million or more ... that > small document count must mean each document is HUGE. Can you take > steps to reduce the index size, perhaps by setting stored, indexed, > and/or docValues to "false" on some of your fields, and having your > application go to the system of record for full details on each > document? You will have to reindex after making changes like that. > > >> Can you share the GC log that Solr writes? > > *OP Reply:* Please find the GC logs and thread dumps at this location > > https://drive.google.com/open?id=1slsYkAcsH7OH-7Pma91k6t5T72-tIPlw > > The larger GC log was unrecognized by both gcviwer and gceasy.io ... the > smaller log shows heap usage about 10GB, but it only covers 10 minutes, > so it's not really conclusive for diagnosis. The first thing I can > suggest to try is to reduce the heap size to 12GB ... but I do not know > if that's actually going to work. Indexing might require more memory. > The idea here is to make more memory available to the OS disk cache ... > with your index size, you're probably going to need to add memory to the > system (not the heap). > > > Another observation is that the CPU usage reaches around 70% (through > > manual monitoring) when the indexing starts and the merges are observed. > It > > is well below 50% otherwise. > > Indexing will increase load, and that increase is often very > significant. Adding memory to the system is your best bet for better > performance. I'd want 1TB of memory for a 1.4TB index ... but I know > that memory sizes that high are extremely expensive, and for most > servers, not even possible. 512GB or 256GB is more attainable, and > would have better performance than 48GB. > > > Also, should something be altered with the mergeScheduler setting ? > > "mergeScheduler":{ > > "class":"org.apache.lucene.index.ConcurrentMergeScheduler", > > "maxMergeCount":2, > > "maxThreadCount":2}, > > Do not configure maxThreadCount beyond 1 unless your data is on SSD. It > will slow things down a lot due to the fact that standard disks must > move the disk head to read/write from different locations, and head > moves take time. SSD can do I/O from any location without pauses, so > more threads would probably help performance rather than hurt it. > > Increase maxMergeCount to 6 -- at 2, large merges will probably stop > indexing entirely. With a larger number, Solr can keep indexing even > when there's a huge segment merge happening. > > Thanks, > Shawn >