You may find that buying some more memory will be your best bang for the buck 
in your set up.  32-64 gb isn’t expensive, 

> On Dec 27, 2017, at 6:57 PM, Suresh Pendap <spen...@walmartlabs.com> wrote:
> 
> What is the downside of configuring ramBufferSizeMB to be equal to 5GB ?
> Is it only that the window of time for flush is larger, so recovery time will 
> be higher in case of a crash?
> 
> Thanks
> Suresh
> 
> On 12/27/17, 1:34 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:
> 
>    You are probably hitting more and more background merging which will
>    slow things down. Your system looks to be severely undersized for this
>    scale.
> 
>    One thing you can try (and I emphasize I haven't prototyped this) is
>    to increase your RamBufferSizeMB solrcofnig.xml setting significantly.
>    By default, Solr won't merge segments to greater than 5G, so
>    theoretically you could just set your ramBufferSizeMB to that figure
>    and avoid merging all together. Or you could try configuring the
>    NoMergePolicy in solrconfig.xml (but beware that you're going to
>    create a lot of segments unless you set the rambuffersize higher).
> 
>    How this will affect your indexing throughput I frankly have no data.
>    You can see that with numbers like this, though, a 4G heap is much too
>    small.
> 
>    Best,
>    Erick
> 
>    On Wed, Dec 27, 2017 at 2:18 AM, Prasad Tendulkar
>    <pra...@cumulus-systems.com> wrote:
>> Hello All,
>> 
>> We have been building a Solr based solution to hold a large amount of data 
>> (approx 4 TB/day or > 24 Billion documents per day). We are developing a 
>> prototype on a small scale just to evaluate Solr performance gradually. Here 
>> is our setup configuration.
>> 
>> Solr cloud:
>> node1: 16 GB RAM, 8 Core CPU, 1TB disk
>> node2: 16 GB RAM, 8 Core CPU, 1TB disk
>> 
>> Zookeeper is also installed on above 2 machines in cluster mode.
>> Solr commit intervals: Soft commit 3 minutes, Hard commit 15 seconds
>> Schema: Basic configuration. 5 fields indexed (out of one is text_general), 
>> 6 fields stored.
>> Collection: 12 shards (6 per node)
>> Heap memory: 4 GB per node
>> Disk cache: 12 GB per node
>> Document is a syslog message.
>> 
>> Documents are being ingested into Solr from different nodes. 12 SolrJ 
>> clients ingest data into the Solr cloud.
>> 
>> We are experiencing issues when we keep the setup running for long time and 
>> after processing around 100 GB of index size (I.e. Around 600 Million 
>> documents). Note that we are only indexing the data and not querying it. So 
>> there should not be any query overhead. From the VM analysis we figured out 
>> that over time the disk operations starts declining and so does the CPU, RAM 
>> and Network usage of the Solr nodes. We concluded that Solr is unable to 
>> handle one big collection due to index read/write overhead and most of the 
>> time it ends up doing only the commit (evident in Solr logs). And because of 
>> that indexing is getting hampered (?)
>> 
>> So we thought of creating small sized collections instead of one big 
>> collection anticipating the commit performance might improve. But eventually 
>> the performance degrades even with that and we observe more or less similar 
>> charts for CPU, memory, disk and network.
>> 
>> To put forth some stats here are the number of documents processed every hour
>> 
>> 1St hour: 250 million
>> 2nd hour: 250 million
>> 3rd hour: 240 million
>> 4th hour: 200 million
>> .
>> .
>> 11th hour: 80 million
>> 
>> Could you please help us identifying the root cause of degradation in the 
>> performance? Are we doing something wrong with the Solr configuration or the 
>> collections/sharding etc? Due to this performance degradation we are 
>> currently stuck with Solr.
>> 
>> Thank you very much in advance.
>> 
>> Prasad Tendulkar
>> 
>> 
> 
> 
> 

Reply via email to