It really sounds like you're re-inventing SolrCloud, but you know your requirements best.
Erick On Wed, Nov 2, 2016 at 8:48 PM, Kent Mu <solr.st...@gmail.com> wrote: > Thanks Erick! > Actually, similar to solrcloud, we split our data to 8 customized shards(1 > master with 4 slaves), and each with one ctrix and two apache web server to > reduce server pressure through load balancing. > As we are running an e-commerce site, the number of reviews of selling > products grows very fast, we get the modulus on product code to put the > reviews in the proper customized solr shard, so that we can relatively > reduce the index size on each solr. > we will first try to upgrade the physical memory, and let's see what it > will happen. if the query performance is not ideal, we will try to deploy > solr in physical machine, or we can use SSD instead. > > “Rome was not built in a day”, so we can explore it step by step. > Ha ha... > Best Regards! > Kent > > 2016-11-03 1:10 GMT+08:00 Erick Erickson <erickerick...@gmail.com>: > >> You need to move to SolrCloud when it's >> time to shard ;)..... >> >> More seriously, at some point simply adding more >> memory will not be adequate. Either your JVM >> heap will to grow to a point where you start encountering >> GC pauses or the time to serve requests will >> increase unacceptably. "when?" you ask? well >> unfortunately there are no guidelines that can be >> guaranteed, here's a long blog on the subject: >> >> https://lucidworks.com/blog/sizing-hardware-in-the- >> abstract-why-we-dont-have-a-definitive-answer/ >> >> The short form is you need to stress-test your >> index and query patterns. >> >> Now, I've seen 20M docs strain a 32G Java heap. I've >> seen 300M docs give very nice response times with >> 12G of memory. It Depends (tm). >> >> Whether to put Solr on bare metal or not: There's >> inevitably some penalty for a VM. That said there are lots >> of places that use VMs successfully. Again, stress >> testing is the key. >> >> And finally, using docValues for any field that sorts, >> facets or groups will reduce the JVM requirements >> significantly, albeit by using OS memory space, see >> Uwe's excellent blog: >> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html >> >> Best, >> Erick >> >> On Tue, Nov 1, 2016 at 10:23 PM, Kent Mu <solr.st...@gmail.com> wrote: >> > Thanks, I got it, Erick! >> > >> > the size of our index data is more than 30GB every year now, and it is >> > still growing up, and actually our solr now is running on a virtual >> > machine. so I wonder if we need to deploy solr in a physical machine, or >> I >> > can just upgrade the physical memory of our Virtual machines? >> > >> > Best, >> > Kent >> > >> > 2016-11-02 11:33 GMT+08:00 Erick Erickson <erickerick...@gmail.com>: >> > >> >> Kent: OK, I see now. Then a minor pedantic point... >> >> >> >> It'll avoid confusion if you use master and slaves >> >> rather than master and replicas when talking about >> >> non-cloud setups. >> >> >> >> The equivalent in SolrCloud is leader and replicas. >> >> >> >> No big deal either way, just FYI. >> >> >> >> Best, >> >> Erick >> >> >> >> On Tue, Nov 1, 2016 at 8:09 PM, Kent Mu <solr.st...@gmail.com> wrote: >> >> > Thanks a lot for your reply, Shawn! >> >> > >> >> > no other applications on the server, I agree with you that we need to >> >> > upgrade physical memory, and allocate the reasonable jvm size, so that >> >> the >> >> > operating system have spare memory available to cache the index. >> >> > >> >> > actually, we have nearly 100 million of data every year now, and it is >> >> > still growing, and actually our solr now is running on a virtual >> machine. >> >> > so I wonder if we need to deploy solr in a physical machine. >> >> > >> >> > Best Regards! >> >> > Kent >> >> > >> >> > 2016-11-01 21:18 GMT+08:00 Shawn Heisey <apa...@elyograg.org>: >> >> > >> >> >> On 11/1/2016 1:07 AM, Kent Mu wrote: >> >> >> > Hi friends! We come across an issue when we use the solrj(4.9.1) to >> >> >> > connect to solr server, our deployment is one master with 10 >> replicas. >> >> >> > we index data to the master, and search data from the replicas via >> >> >> > load balancing. the error stack is as below: *Timeout occured while >> >> >> > waiting response from server at: >> >> >> > http://review.solrsearch3.cnsuning.com/solr/commodityReview >> >> >> > <http://review.solrsearch3.cnsuning.com/solr/commodityReview>* >> >> >> > org.apache.solr.client.solrj.SolrServerException: Timeout occured >> >> >> > while waiting response from server at: >> >> >> >> >> >> This shows that you are connecting to port 80. It is relatively >> rare to >> >> >> run Solr on port 80, though it is possible. Do you have an >> intermediate >> >> >> layer, like a proxy or a load balancer? If so, you'll need to ensure >> >> >> that there's not a problem there. If it works normally when >> replication >> >> >> isn't happening, that's probably not a worry. >> >> >> >> >> >> > It takes place not often. after analysis, we find that only when >> the >> >> >> > replicas Synchronous Data from master solr server. it seem that >> when >> >> >> > the replicas block search requests when synchronizing data from >> >> >> > master, is that true? >> >> >> >> >> >> Solr should be able to continue serving requests while replication >> >> >> happens. I have never heard of this happening before, and I never >> ran >> >> >> into it when I was using replication a long time ago on version >> 1.4.x. >> >> >> I think it is more likely that you've got a memory issue than a >> bug. If >> >> >> it IS a bug, it will *not* be fixed in a 4.x version, you would need >> to >> >> >> upgrade to 6.x and see whether it's still a problem. Version 6.2.1 >> is >> >> >> the latest at the moment, and release plans are underway for 6.3 >> right >> >> now. >> >> >> >> >> >> > I wonder if it is because that our solr server hardware >> configuration >> >> >> > is too low? the physical memory is 8G with 4 cores. and the JVM we >> set >> >> >> > is Xms512m, Xmx7168m. >> >> >> >> >> >> The following assumes that there is no other software on the server, >> >> >> like a database, or an application server, web server, etc. If there >> >> >> is, any issues are likely to be a result of extreme memory >> starvation, >> >> >> and possibly swapping. Additional physical memory is definitely >> needed >> >> >> if there is other software on the server beyond basic OS tools. >> >> >> >> >> >> If the total index data that is on your server is larger than about >> 1.5 >> >> >> to 2GB, chances are excellent that you do not have enough free >> memory to >> >> >> cache that data effectively, which can lead to major performance >> >> >> issues. You've only left about 1GB of memory in the system for that >> >> >> purpose, and that memory must also run the entire operating system, >> >> >> which can take a significant percentage of 1GB. With a large index, >> I >> >> >> would strongly recommend adding memory to this server. >> >> >> >> >> >> https://wiki.apache.org/solr/SolrPerformanceProblems >> >> >> >> >> >> As mentioned in that wiki page, for good performance Solr absolutely >> >> >> requires that the operating system have spare memory available to >> cache >> >> >> the index. In general, allocating almost all your memory to the Java >> >> >> heap is a bad idea with Solr. >> >> >> >> >> >> If your index *is* smaller than 1.5 to 2GB, allocating a 7GB heap is >> >> >> probably not necessary, unless you are doing *incredibly* >> memory-hungry >> >> >> queries, such as grouping, faceting, or sorting on many fields. If >> you >> >> >> can reduce the heap size, there would be more memory available for >> >> caching. >> >> >> >> >> >> Indexing can sometimes cause very large merges to happen, and a full >> >> >> index optimize would rewrite the entire index. Replication copies >> the >> >> >> changed index files, and if the size of the changes is significant, >> >> >> additional memory can be required for good performance. See the >> special >> >> >> note on the wiki page above about optimizes. >> >> >> >> >> >> Thanks, >> >> >> Shawn >> >> >> >> >> >> >> >> >>