Re: Can I use multiple cores
You really can't tell until you prototype and measure. Here's a long blog on why what you're asking, although a reasonable request, is just about impossible to answer without prototyping and measuring. http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Tue, Aug 12, 2014 at 10:36 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: And how many machines running the SOLR ? On 12 August 2014 22:12, Noble Paul noble.p...@gmail.com wrote: The machines were 32GB ram boxes. You must do the RAM requirement And how many machines running the SOLR ? I expect that I will have to add more servers. What I am looking for is how do I calculate how many servers I need.
Re: Can I use multiple cores
Hi Ramprasad, You can certainly have a system with hundreds of cores. I know of more than a few people who have done that successfully in their setups. At the same time, I'd also recommend to you to have a look at SolrCloud. SolrCloud takes away the operational pains like replication/recovery etc. to a major extent. I don't know about your security requirements and hard bounds on that front but look at routing in SolrCloud to also figure out multi-tenancy implementation here: * SolrCloud Document Routing by Joel: http://searchhub.org/2013/06/13/solr-cloud-document-routing/ * Multi-level composite-id routing in SolrCloud: http://searchhub.org/2014/01/06/10590/ On Mon, Aug 11, 2014 at 11:40 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores -- Anshum Gupta http://www.anshumgupta.net
Re: Can I use multiple cores
On Tue, 2014-08-12 at 08:40 +0200, Ramprasad Padmanabhan wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. If standard searches are always inside a single client's emails and not across all cores, this should scale simply by adding new machines linear to the corpus size. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? Yes. One core per client ensures than ranking works well. It makes it easy to remove users and if part of the users are inactive for long periods of time, you can use dynamic loading of cores. That is under the presumption that you will have a few thousand clients. If your expected scale is millions, I am not sure it will work. - Toke Eskildsen, State and University Library, Denmark
Re: Can I use multiple cores
I think this question is more aimed at design and performance of large number of cores. Also solr is designed to handle multiple cores effectively, however it would be interesting to know If you have observed any performance problem with growing number of cores, with number of nodes and solr version. Regards Harshvardhan Ojha On Tue, Aug 12, 2014 at 12:33 PM, Anshum Gupta ans...@anshumgupta.net wrote: Hi Ramprasad, You can certainly have a system with hundreds of cores. I know of more than a few people who have done that successfully in their setups. At the same time, I'd also recommend to you to have a look at SolrCloud. SolrCloud takes away the operational pains like replication/recovery etc. to a major extent. I don't know about your security requirements and hard bounds on that front but look at routing in SolrCloud to also figure out multi-tenancy implementation here: * SolrCloud Document Routing by Joel: http://searchhub.org/2013/06/13/solr-cloud-document-routing/ * Multi-level composite-id routing in SolrCloud: http://searchhub.org/2014/01/06/10590/ On Mon, Aug 11, 2014 at 11:40 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores -- Anshum Gupta http://www.anshumgupta.net
Re: Can I use multiple cores
Are there documented benchmarks with number of cores As of now I just have a test bed. We have 150 million records ( will go up to 1000 M ) , distributed in 400 cores. A single machine 16GB RAM + 16 cores search is working fine But I still am not sure will this work fine in production Obviously I can always add more nodes to solr, but I need to justify how much I need. On 12 August 2014 12:48, Harshvardhan Ojha ojha.harshvard...@gmail.com wrote: I think this question is more aimed at design and performance of large number of cores. Also solr is designed to handle multiple cores effectively, however it would be interesting to know If you have observed any performance problem with growing number of cores, with number of nodes and solr version. Regards Harshvardhan Ojha On Tue, Aug 12, 2014 at 12:33 PM, Anshum Gupta ans...@anshumgupta.net wrote: Hi Ramprasad, You can certainly have a system with hundreds of cores. I know of more than a few people who have done that successfully in their setups. At the same time, I'd also recommend to you to have a look at SolrCloud. SolrCloud takes away the operational pains like replication/recovery etc. to a major extent. I don't know about your security requirements and hard bounds on that front but look at routing in SolrCloud to also figure out multi-tenancy implementation here: * SolrCloud Document Routing by Joel: http://searchhub.org/2013/06/13/solr-cloud-document-routing/ * Multi-level composite-id routing in SolrCloud: http://searchhub.org/2014/01/06/10590/ On Mon, Aug 11, 2014 at 11:40 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores -- Anshum Gupta http://www.anshumgupta.net
Re: Can I use multiple cores
On Tue, 2014-08-12 at 11:50 +0200, Ramprasad Padmanabhan wrote: Are there documented benchmarks with number of cores As of now I just have a test bed. We have 150 million records ( will go up to 1000 M ) , distributed in 400 cores. A single machine 16GB RAM + 16 cores search is working fine About 6M records for a single machine. That is not a lot. What is a typical query rate for a core? I would guess that the CPU is idle most of the time and that you could serve quite a lot more cores from a single machine by increasing RAM or using SSDs (if you are not doing so already). How large is a typical core in GB? But I still am not sure will this work fine in production 16 cores is not many for a single machine and since you can direct any search to a single core, you can scale up forever. What is it you are worried about? Obviously I can always add more nodes to solr, but I need to justify how much I need. Are you worried about cost? - Toke Eskildsen, State and University Library, Denmark
Re: Can I use multiple cores
Sorry for missing information. My solr-cores take less than 200MB of disk What I am worried about is If I run too many cores from a single solr machine there will be a limit to the number of concurrent searches it can support. I am still benchmarking for this. Also another major bottleneck I find is adding data to solr. I have a cron job that picks data from Mysql Live DB and adds to solr. If I run each core addition serially it works , but If try a multiprocessed system then this addition simply hangs. Even if all processes are talking to different cores. This means beyond some point my insertion will take too long and I will have to have multiple servers. Too bad because actually there is no problem with data search , only with data add
Re: Can I use multiple cores
On Tue, 2014-08-12 at 14:14 +0200, Ramprasad Padmanabhan wrote: Sorry for missing information. My solr-cores take less than 200MB of disk So ~3GB/server. If you do not have special heavy queries, high query rate or heavy requirements for index availability, that really sounds like you could put a lot more cores on each machine. What I am worried about is If I run too many cores from a single solr machine there will be a limit to the number of concurrent searches it can support. I am still benchmarking for this. By all means, benchmark! Try to pinpoint what limits the amount of concurrent searches: CPU or IO? I have a cron job that picks data from Mysql Live DB and adds to solr. If I run each core addition serially it works , but If try a multiprocessed system then this addition simply hangs. Even if all processes are talking to different cores. Are you sure the problem is in the Solr end? Have you tried running the multithreaded extraction without adding the data to Solr? - Toke Eskildsen, State and University Library, Denmark
Re: Can I use multiple cores
Hi Ramprasad, I have used it in a cluster with millions of users (1 user per core) in legacy cloud mode .We used the on demand core loading feature where each Solr had 30,000 cores and at a time only 2000 cores were in memory. You are just hitting 400 and I don't see much of a problem . What is your h/w bTW? On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores -- - Noble Paul
Re: Can I use multiple cores
Hi Paul and Ramprasad, I follow your discussion with interest as I will have more or less the same requirement. When you say that you use on demand core loading, are you talking about LotsOfCore stuff? Erick told me that it does not work very well in a distributed environnement. How do you handle this problem? Do you use multiple single Solr instances? What about failover? Thanks for your answer, Aurelien Le 12/08/2014 14:48, Noble Paul a écrit : Hi Ramprasad, I have used it in a cluster with millions of users (1 user per core) in legacy cloud mode .We used the on demand core loading feature where each Solr had 30,000 cores and at a time only 2000 cores were in memory. You are just hitting 400 and I don't see much of a problem . What is your h/w bTW? On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores
Re: Can I use multiple cores
On 12 August 2014 18:18, Noble Paul noble.p...@gmail.com wrote: Hi Ramprasad, I have used it in a cluster with millions of users (1 user per core) in legacy cloud mode .We used the on demand core loading feature where each Solr had 30,000 cores and at a time only 2000 cores were in memory. You are just hitting 400 and I don't see much of a problem . What is your h/w bTW? On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores I have a single machine 16GB Ram with 16 cpu cores What is the h/w you are using
RE: Can I use multiple cores
Ramprasad Padmanabhan [ramprasad...@gmail.com] wrote: I have a single machine 16GB Ram with 16 cpu cores Ah! I thought you had more machines, each with 16 Solr cores. This changes a lot. 400 Solr cores of ~200MB ~= 80GB of data. You're aiming for 7 times that, so about 500GB of data. Running that on a single machine with 16GB of RAM is not unrealistic, but it depends a lot on how often a search is issued and whether or not you can unload inactive cores and accept the startup penalty of loading it the first time a user searches for something. Searches will be really slow if you are using a spinning drive. You might be interested in http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/ As for indexing then I can understand if you run into problems with 400 concurrent updates to your single machine setup. You should limit the amount of concurrent updates to a bit more than the number of cores, so try with 20 or 40. - Toke Eskildsen
Re: Can I use multiple cores
The machines were 32GB ram boxes. You must do the RAM requirement calculation for your indexes . Just the no:of indexes alone won't be enough to arrive at the RAM requirement On Tue, Aug 12, 2014 at 6:59 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: On 12 August 2014 18:18, Noble Paul noble.p...@gmail.com wrote: Hi Ramprasad, I have used it in a cluster with millions of users (1 user per core) in legacy cloud mode .We used the on demand core loading feature where each Solr had 30,000 cores and at a time only 2000 cores were in memory. You are just hitting 400 and I don't see much of a problem . What is your h/w bTW? On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores I have a single machine 16GB Ram with 16 cpu cores What is the h/w you are using -- - Noble Paul
Re: Can I use multiple cores
And how many machines running the SOLR ? On 12 August 2014 22:12, Noble Paul noble.p...@gmail.com wrote: The machines were 32GB ram boxes. You must do the RAM requirement And how many machines running the SOLR ? I expect that I will have to add more servers. What I am looking for is how do I calculate how many servers I need.