I have a solrcloud setup on Windows server with below config:
3 nodes,
24 shards with replication factor 2
Each node hosts 16 cores.

16 CPU cores, or 16 Solr cores? The info may not be all that useful either way, but just in case, it should be clarified.

Index size is 1.4 TB per node
Xms 8 GB , Xmx 24 GB
Directory factory used is SimpleFSDirectoryFactory

How much total memory in the server? Is there other software using significant levels of memory?

Why did you opt to change the DirectoryFactory away from Solr's default? The default is chosen with care ... any other choice will probably result in lower performance. The default in recent versions of Solr is NRTCachingDirectoryFactory, which uses MMap for file access.


The screenshot described here might become useful for more in-depth troubleshooting:


How many total documents (maxDoc, not numDoc) are in that 1.4 TB of space?

The cloud is all nice and green for the most part. Only when we start
indexing, within a few seconds, I start seeing Read timeouts and socket
write errors and replica recoveries thereafter. We are indexing in 2
parallel threads in batches of 50 docs per update request. After examining
the thread dump, I see segment merges happening. My understanding is that
this is the cause, and the timeouts and recoveries are the symptoms. Is my
understanding correct? If yes, what steps could I take to help the
situation. I do see that the difference between "Num Docs" and "Max Docs"
is about 20%.

Segment merges are a completely normal part of Lucene's internal operation. They should never cause problems like you have described.

My best guess is that a 24GB heap is too small. Or possibly WAY too large, although with the index size you have mentioned, that seems unlikely.

Can you share the GC log that Solr writes? The problem should occur during the timeframe covered by the log, and the log should be as large as possible. You'll need to use a file sharing site -- attaching it to an email is not going to work.

What version of Solr?


