Re: Solr search and index rate optimization
hello dear thanks for replying it means 3 ZK instances are more than enough in my case On Fri, Jan 8, 2016 at 10:07 PM, Erick Ericksonwrote: > Here's a longer form of Toke's answer: > > https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > BTW, on the surface, having 5 ZK nodes isn't doing you any real good. > Zookeeper isn't really involved in serving queries or handling > updates, it's purpose is to have the state of the cluster (nodes up, > recovering, down, etc) and notify Solr listeners when that state > changes. There's no good reason to have 5 with a small cluster and by > "small" I mean < 100s of nodes. > > Best, > Erick > > On Fri, Jan 8, 2016 at 2:40 AM, Toke Eskildsen > wrote: > > On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote: > >> i wanted to ask that i need to index after evey 15 min with hard commit > >> (real time records) and currently have 5 zookeeper instances and 2 solr > >> instances in one machine serving 200 users with 32GB RAM. whereas i > wanted > >> to serve more than 10,000 users so what should be my machine specs and > what > >> should be my architecture for this much serve rate along with index > rate. > > > > It depends on your system and if we were forced to guess, our guess > > would be very loose. > > > > > > Fortunately you do have a running system with real queries: Make a copy > > on two similar machines (you will probably need more hardware anyway) > > and simulate growing traffic, measuring response times at appropriate > > points: 200 users, 500, 1000, 2000 etc. > > > > If you are very lucky, your current system scales all the way. If not, > > you should have enough data to make an educated guess of the amount of > > machines you need. You should have at least 3 measuring point to > > extrapolate from as scaling is not always linear. > > > > - Toke Eskildsen, State and University Library, Denmark > > > > >
Re: Solr search and index rate optimization
thanks for replying currently my machine specs are 32 GB RAM 4 core processor windows server 2008 64bit 500 GB HD 16 GB swap memorey now the already running machine with cpu usage not more than 10% already consumed all the RAM and now started to use swap memorey what my guess is my server will chok when swap memorey will end. i am only running solr and ZK instances there any wild idea what is happening and why memorey consumption is too high. all the field cache and query caches are set to 1GB in solrconfig and along with serving queries i am running delta after every 15 minute. On Fri, Jan 8, 2016 at 3:40 PM, Toke Eskildsenwrote: > On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote: > > i wanted to ask that i need to index after evey 15 min with hard commit > > (real time records) and currently have 5 zookeeper instances and 2 solr > > instances in one machine serving 200 users with 32GB RAM. whereas i > wanted > > to serve more than 10,000 users so what should be my machine specs and > what > > should be my architecture for this much serve rate along with index rate. > > It depends on your system and if we were forced to guess, our guess > would be very loose. > > > Fortunately you do have a running system with real queries: Make a copy > on two similar machines (you will probably need more hardware anyway) > and simulate growing traffic, measuring response times at appropriate > points: 200 users, 500, 1000, 2000 etc. > > If you are very lucky, your current system scales all the way. If not, > you should have enough data to make an educated guess of the amount of > machines you need. You should have at least 3 measuring point to > extrapolate from as scaling is not always linear. > > - Toke Eskildsen, State and University Library, Denmark > > >
Re: Solr search and index rate optimization
bq: Well, a good reason would be if you want your system to continue to operate if 2 ZK nodes lose communication with the rest of the cluster or go down completely My argument is usually that if you are losing 2 of 3 ZK nodes at the same time with any regularity, you probably have problems that won't be solved by adding more ZK nodes ;) So I agree that if you want to guard against 2 nodes dropping ZK below quorum going to 5 is an option. I've just seen very few situations where that makes any practical difference and it does add to maintenance... BTW, let's say you have a running cluster and _all_ the ZK nodes die. You'll still be able to run queries, but you won't be able to update any docs. And Solr nodes coming online won't be able to make themselves known to the rest of the cluster etc, but at least you aren't totally dead in the water. Not really disagreeing, just expressing solidarity with the ops folks who don't want to maintain hardware that has really marginal benefit ;) Best, Erick On Sat, Jan 9, 2016 at 8:52 PM, Steve Davidswrote: > bq. There's no good reason to have 5 with a small cluster and by "small" I > mean < 100s of nodes. > > Well, a good reason would be if you want your system to continue to operate > if 2 ZK nodes lose communication with the rest of the cluster or go down > completely. Just to be clear though, the ZK nodes definitely don't need to > be beefy machines compared to your Solr data nodes since they are just > doing light-weight orchestration. But yea, for a 2 data node system one > might be willing to go with a 3 node ensemble to tolerate a single ZK > node dying, just depends on how much cash you are willing to spend and > availability level you are looking for. > > -Steve > > > On Fri, Jan 8, 2016 at 12:07 PM, Erick Erickson > wrote: > >> Here's a longer form of Toke's answer: >> >> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ >> >> BTW, on the surface, having 5 ZK nodes isn't doing you any real good. >> Zookeeper isn't really involved in serving queries or handling >> updates, it's purpose is to have the state of the cluster (nodes up, >> recovering, down, etc) and notify Solr listeners when that state >> changes. There's no good reason to have 5 with a small cluster and by >> "small" I mean < 100s of nodes. >> >> Best, >> Erick >> >> On Fri, Jan 8, 2016 at 2:40 AM, Toke Eskildsen >> wrote: >> > On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote: >> >> i wanted to ask that i need to index after evey 15 min with hard commit >> >> (real time records) and currently have 5 zookeeper instances and 2 solr >> >> instances in one machine serving 200 users with 32GB RAM. whereas i >> wanted >> >> to serve more than 10,000 users so what should be my machine specs and >> what >> >> should be my architecture for this much serve rate along with index >> rate. >> > >> > It depends on your system and if we were forced to guess, our guess >> > would be very loose. >> > >> > >> > Fortunately you do have a running system with real queries: Make a copy >> > on two similar machines (you will probably need more hardware anyway) >> > and simulate growing traffic, measuring response times at appropriate >> > points: 200 users, 500, 1000, 2000 etc. >> > >> > If you are very lucky, your current system scales all the way. If not, >> > you should have enough data to make an educated guess of the amount of >> > machines you need. You should have at least 3 measuring point to >> > extrapolate from as scaling is not always linear. >> > >> > - Toke Eskildsen, State and University Library, Denmark >> > >> > >>
Re: Solr search and index rate optimization
bq. There's no good reason to have 5 with a small cluster and by "small" I mean < 100s of nodes. Well, a good reason would be if you want your system to continue to operate if 2 ZK nodes lose communication with the rest of the cluster or go down completely. Just to be clear though, the ZK nodes definitely don't need to be beefy machines compared to your Solr data nodes since they are just doing light-weight orchestration. But yea, for a 2 data node system one might be willing to go with a 3 node ensemble to tolerate a single ZK node dying, just depends on how much cash you are willing to spend and availability level you are looking for. -Steve On Fri, Jan 8, 2016 at 12:07 PM, Erick Ericksonwrote: > Here's a longer form of Toke's answer: > > https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ > > BTW, on the surface, having 5 ZK nodes isn't doing you any real good. > Zookeeper isn't really involved in serving queries or handling > updates, it's purpose is to have the state of the cluster (nodes up, > recovering, down, etc) and notify Solr listeners when that state > changes. There's no good reason to have 5 with a small cluster and by > "small" I mean < 100s of nodes. > > Best, > Erick > > On Fri, Jan 8, 2016 at 2:40 AM, Toke Eskildsen > wrote: > > On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote: > >> i wanted to ask that i need to index after evey 15 min with hard commit > >> (real time records) and currently have 5 zookeeper instances and 2 solr > >> instances in one machine serving 200 users with 32GB RAM. whereas i > wanted > >> to serve more than 10,000 users so what should be my machine specs and > what > >> should be my architecture for this much serve rate along with index > rate. > > > > It depends on your system and if we were forced to guess, our guess > > would be very loose. > > > > > > Fortunately you do have a running system with real queries: Make a copy > > on two similar machines (you will probably need more hardware anyway) > > and simulate growing traffic, measuring response times at appropriate > > points: 200 users, 500, 1000, 2000 etc. > > > > If you are very lucky, your current system scales all the way. If not, > > you should have enough data to make an educated guess of the amount of > > machines you need. You should have at least 3 measuring point to > > extrapolate from as scaling is not always linear. > > > > - Toke Eskildsen, State and University Library, Denmark > > > > >
Re: Solr search and index rate optimization
On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote: > i wanted to ask that i need to index after evey 15 min with hard commit > (real time records) and currently have 5 zookeeper instances and 2 solr > instances in one machine serving 200 users with 32GB RAM. whereas i wanted > to serve more than 10,000 users so what should be my machine specs and what > should be my architecture for this much serve rate along with index rate. It depends on your system and if we were forced to guess, our guess would be very loose. Fortunately you do have a running system with real queries: Make a copy on two similar machines (you will probably need more hardware anyway) and simulate growing traffic, measuring response times at appropriate points: 200 users, 500, 1000, 2000 etc. If you are very lucky, your current system scales all the way. If not, you should have enough data to make an educated guess of the amount of machines you need. You should have at least 3 measuring point to extrapolate from as scaling is not always linear. - Toke Eskildsen, State and University Library, Denmark
Re: Solr search and index rate optimization
Here's a longer form of Toke's answer: https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ BTW, on the surface, having 5 ZK nodes isn't doing you any real good. Zookeeper isn't really involved in serving queries or handling updates, it's purpose is to have the state of the cluster (nodes up, recovering, down, etc) and notify Solr listeners when that state changes. There's no good reason to have 5 with a small cluster and by "small" I mean < 100s of nodes. Best, Erick On Fri, Jan 8, 2016 at 2:40 AM, Toke Eskildsenwrote: > On Fri, 2016-01-08 at 10:55 +0500, Zap Org wrote: >> i wanted to ask that i need to index after evey 15 min with hard commit >> (real time records) and currently have 5 zookeeper instances and 2 solr >> instances in one machine serving 200 users with 32GB RAM. whereas i wanted >> to serve more than 10,000 users so what should be my machine specs and what >> should be my architecture for this much serve rate along with index rate. > > It depends on your system and if we were forced to guess, our guess > would be very loose. > > > Fortunately you do have a running system with real queries: Make a copy > on two similar machines (you will probably need more hardware anyway) > and simulate growing traffic, measuring response times at appropriate > points: 200 users, 500, 1000, 2000 etc. > > If you are very lucky, your current system scales all the way. If not, > you should have enough data to make an educated guess of the amount of > machines you need. You should have at least 3 measuring point to > extrapolate from as scaling is not always linear. > > - Toke Eskildsen, State and University Library, Denmark > >