Re: SolrCloud Large Cluster Performance Issues

2018-06-25 Thread Shawn Heisey

On 6/24/2018 7:38 PM, 苗海泉 wrote:

Hello, everyone, we encountered two solr problems and hoped to get help.
Our data volume is very large, 24.5TB a day, and the number of records is
110 billion. We originally used 49 solr nodes. Because of insufficient
storage, we expanded to 100. For a solr cluster composed of multiple
machines, we found that the performance of 60 solrclouds and the overall
performance of 49 solr clusters are the same. How do we optimize it? Now
the cluster speed is 1.5 million on average per second. Why is that?


I can't really tell what your question is.  You've asked how to optimize 
something, but it's not clear exactly what you want to optimize.  You 
also asked about a cluster speed of 1.5 million per second, but you 
haven't indicated what is happening at that rate.   1.5 million *what* 
per second?  If you're talking about queries per second or documents 
indexed per second, you're already getting better performance than I 
would have expected.


We'll need a lot more detail about exactly what kind of problems you've 
encountered and what you think *should* be happening that isn't happening.



The second problem solrhome can only specify a solrhome, but now the disk
is divided into two directories, another solr can be stored using hdfs, but
the overall indexing performance is not up to standard, how to do, thank
you for your attention.


I would use symlinks to point some of the index cores to the second 
directory. It is possible to reduce this to one symlink rather than one 
for each core.  Moving things to the second location will likely be a 
manual process.


If you're on Windows, things are a little bit different, but NTFS does 
have a feature that offers very similar functionality to symlinks:


https://en.wikipedia.org/wiki/NTFS_junction_point

Thanks,
Shawn



Re: SolrCloud Large Cluster Performance Issues

2018-06-25 Thread Emir Arnautović
Hi,
With such a big cluster a lot of things can go wrong and it is hard to give any 
answer without looking into it more and understanding your model. I assume that 
you are monitoring your system (both Solr/ZK and components that index/query) 
so it should be the first thing to look at and see if there are some 
bottlenecks. If you doubled number of nodes and don’t see increase in indexing 
throughput, it is likely that the bottleneck is indexing component or that you 
did not spread the load to your entire cluster. With more nodes, there is more 
pressure on ZK so check that as well. 
You will have to dive in and search for bottleneck or find some Solr consultant 
and let him do it for you.

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 25 Jun 2018, at 03:38, 苗海泉  wrote:
> 
> Hello, everyone, we encountered two solr problems and hoped to get help.
> Our data volume is very large, 24.5TB a day, and the number of records is
> 110 billion. We originally used 49 solr nodes. Because of insufficient
> storage, we expanded to 100. For a solr cluster composed of multiple
> machines, we found that the performance of 60 solrclouds and the overall
> performance of 49 solr clusters are the same. How do we optimize it? Now
> the cluster speed is 1.5 million on average per second. Why is that?
> 
> The second problem solrhome can only specify a solrhome, but now the disk
> is divided into two directories, another solr can be stored using hdfs, but
> the overall indexing performance is not up to standard, how to do, thank
> you for your attention.
> [image: Mailtrack]
> 
> Sender
> notified by
> Mailtrack
> 
> 18/06/25
> 上午9:38:13



SolrCloud Large Cluster Performance Issues

2018-06-24 Thread 苗海泉
Hello, everyone, we encountered two solr problems and hoped to get help.
Our data volume is very large, 24.5TB a day, and the number of records is
110 billion. We originally used 49 solr nodes. Because of insufficient
storage, we expanded to 100. For a solr cluster composed of multiple
machines, we found that the performance of 60 solrclouds and the overall
performance of 49 solr clusters are the same. How do we optimize it? Now
the cluster speed is 1.5 million on average per second. Why is that?

The second problem solrhome can only specify a solrhome, but now the disk
is divided into two directories, another solr can be stored using hdfs, but
the overall indexing performance is not up to standard, how to do, thank
you for your attention.
[image: Mailtrack]

Sender
notified by
Mailtrack

18/06/25
上午9:38:13