Re: Can I use multiple cores

2014-08-13 Thread Erick Erickson
You really can't tell until you prototype and measure. Here's a long
blog on why what you're asking, although a reasonable request,
is just about impossible to answer without prototyping and measuring.

http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick


On Tue, Aug 12, 2014 at 10:36 PM, Ramprasad Padmanabhan 
ramprasad...@gmail.com wrote:

 And how many machines running the SOLR ?




 On 12 August 2014 22:12, Noble Paul noble.p...@gmail.com wrote:

  The machines were 32GB ram boxes. You must do the RAM requirement
 

 And how many machines running the SOLR ?

 I expect that I will have to add more servers. What I am looking for is how
 do I calculate how many servers I need.



Re: Can I use multiple cores

2014-08-12 Thread Anshum Gupta
Hi Ramprasad,

You can certainly have a system with hundreds of cores. I know of more than
a few people who have done that successfully in their setups.

At the same time, I'd also recommend to you to have a look at SolrCloud.
SolrCloud takes away the operational pains like replication/recovery etc.
to a major extent. I don't know about your security requirements and hard
bounds on that front but look at routing in SolrCloud to also figure out
multi-tenancy implementation here:
* SolrCloud Document Routing by Joel:
http://searchhub.org/2013/06/13/solr-cloud-document-routing/
* Multi-level composite-id routing in SolrCloud:
http://searchhub.org/2014/01/06/10590/



On Mon, Aug 11, 2014 at 11:40 PM, Ramprasad Padmanabhan 
ramprasad...@gmail.com wrote:

 I need to store in SOLR all data of my clients mailing activitiy

 The data contains meta data like From;To:Date;Time:Subject etc

 I would easily have 1000 Million records every 2 months.

 What I am currently doing is creating cores per client. So I have 400 cores
 already.

 Is this a good idea to do ?

 What is the general practice for creating cores




-- 

Anshum Gupta
http://www.anshumgupta.net


Re: Can I use multiple cores

2014-08-12 Thread Toke Eskildsen
On Tue, 2014-08-12 at 08:40 +0200, Ramprasad Padmanabhan wrote:
 I need to store in SOLR all data of my clients mailing activitiy
 
 The data contains meta data like From;To:Date;Time:Subject etc
 
 I would easily have 1000 Million records every 2 months.

If standard searches are always inside a single client's emails and not
across all cores, this should scale simply by adding new machines linear
to the corpus size.

 What I am currently doing is creating cores per client. So I have 400 cores
 already.
 
 Is this a good idea to do ?

Yes. One core per client ensures than ranking works well. It makes it
easy to remove users and if part of the users are inactive for long
periods of time, you can use dynamic loading of cores.

That is under the presumption that you will have a few thousand clients.
If your expected scale is millions, I am not sure it will work.

- Toke Eskildsen, State and University Library, Denmark




Re: Can I use multiple cores

2014-08-12 Thread Harshvardhan Ojha
I think this question is more aimed at design and performance of large
number of cores.
Also solr is designed to handle multiple cores effectively, however it
would be interesting to know If you have observed any performance problem
with growing number of cores, with number of nodes and solr version.

Regards
Harshvardhan Ojha


On Tue, Aug 12, 2014 at 12:33 PM, Anshum Gupta ans...@anshumgupta.net
wrote:

 Hi Ramprasad,

 You can certainly have a system with hundreds of cores. I know of more than
 a few people who have done that successfully in their setups.

 At the same time, I'd also recommend to you to have a look at SolrCloud.
 SolrCloud takes away the operational pains like replication/recovery etc.
 to a major extent. I don't know about your security requirements and hard
 bounds on that front but look at routing in SolrCloud to also figure out
 multi-tenancy implementation here:
 * SolrCloud Document Routing by Joel:
 http://searchhub.org/2013/06/13/solr-cloud-document-routing/
 * Multi-level composite-id routing in SolrCloud:
 http://searchhub.org/2014/01/06/10590/



 On Mon, Aug 11, 2014 at 11:40 PM, Ramprasad Padmanabhan 
 ramprasad...@gmail.com wrote:

  I need to store in SOLR all data of my clients mailing activitiy
 
  The data contains meta data like From;To:Date;Time:Subject etc
 
  I would easily have 1000 Million records every 2 months.
 
  What I am currently doing is creating cores per client. So I have 400
 cores
  already.
 
  Is this a good idea to do ?
 
  What is the general practice for creating cores
 



 --

 Anshum Gupta
 http://www.anshumgupta.net



Re: Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
Are there documented benchmarks with number of cores
As of now I just have a test bed.


We have 150 million records ( will go up to 1000 M )  , distributed in 400
cores.
A single machine 16GB RAM + 16 cores  search  is working fine

But I still am not sure will this work fine in production
Obviously I can always add more nodes to solr, but I need to justify how
much I need.





On 12 August 2014 12:48, Harshvardhan Ojha ojha.harshvard...@gmail.com
wrote:

 I think this question is more aimed at design and performance of large
 number of cores.
 Also solr is designed to handle multiple cores effectively, however it
 would be interesting to know If you have observed any performance problem
 with growing number of cores, with number of nodes and solr version.

 Regards
 Harshvardhan Ojha


 On Tue, Aug 12, 2014 at 12:33 PM, Anshum Gupta ans...@anshumgupta.net
 wrote:

  Hi Ramprasad,
 
  You can certainly have a system with hundreds of cores. I know of more
 than
  a few people who have done that successfully in their setups.
 
  At the same time, I'd also recommend to you to have a look at SolrCloud.
  SolrCloud takes away the operational pains like replication/recovery etc.
  to a major extent. I don't know about your security requirements and hard
  bounds on that front but look at routing in SolrCloud to also figure out
  multi-tenancy implementation here:
  * SolrCloud Document Routing by Joel:
  http://searchhub.org/2013/06/13/solr-cloud-document-routing/
  * Multi-level composite-id routing in SolrCloud:
  http://searchhub.org/2014/01/06/10590/
 
 
 
  On Mon, Aug 11, 2014 at 11:40 PM, Ramprasad Padmanabhan 
  ramprasad...@gmail.com wrote:
 
   I need to store in SOLR all data of my clients mailing activitiy
  
   The data contains meta data like From;To:Date;Time:Subject etc
  
   I would easily have 1000 Million records every 2 months.
  
   What I am currently doing is creating cores per client. So I have 400
  cores
   already.
  
   Is this a good idea to do ?
  
   What is the general practice for creating cores
  
 
 
 
  --
 
  Anshum Gupta
  http://www.anshumgupta.net
 



Re: Can I use multiple cores

2014-08-12 Thread Toke Eskildsen
On Tue, 2014-08-12 at 11:50 +0200, Ramprasad Padmanabhan wrote:
 Are there documented benchmarks with number of cores
 As of now I just have a test bed.
 
 
 We have 150 million records ( will go up to 1000 M )  , distributed in 400
 cores.
 A single machine 16GB RAM + 16 cores  search  is working fine

About 6M records for a single machine. That is not a lot. What is a
typical query rate for a core?

I would guess that the CPU is idle most of the time and that you could
serve quite a lot more cores from a single machine by increasing RAM or
using SSDs (if you are not doing so already). How large is a typical
core in GB?

 But I still am not sure will this work fine in production

16 cores is not many for a single machine and since you can direct any
search to a single core, you can scale up forever. What is it you are
worried about?

 Obviously I can always add more nodes to solr, but I need to justify how
 much I need.

Are you worried about cost?

- Toke Eskildsen, State and University Library, Denmark




Re: Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
Sorry for missing information. My solr-cores take less than 200MB of disk

What I am worried about is If I run too many cores from a single solr
machine there will be a limit to the number of concurrent searches it can
support. I am still benchmarking for this.


Also another major bottleneck I find is adding data to solr.
I have a cron job that picks data from Mysql Live DB and adds to solr. If I
run each core addition serially it works , but If try a multiprocessed
system then this addition simply hangs. Even if all processes are talking
to different cores.

 This means beyond some point my insertion will take too long and I will
have to have multiple servers. Too bad because actually there is no problem
with data search , only with data add


Re: Can I use multiple cores

2014-08-12 Thread Toke Eskildsen
On Tue, 2014-08-12 at 14:14 +0200, Ramprasad Padmanabhan wrote:
 Sorry for missing information. My solr-cores take less than 200MB of
 disk 

So ~3GB/server. If you do not have special heavy queries, high query
rate or heavy requirements for index availability, that really sounds
like you could put a lot more cores on each machine.

 What I am worried about is If I run too many cores from a single solr
 machine there will be a limit to the number of concurrent searches it
 can support. I am still benchmarking for this. 

By all means, benchmark! Try to pinpoint what limits the amount of
concurrent searches: CPU or IO?

 I have a cron job that picks data from Mysql Live DB and adds to solr.
 If I run each core addition serially it works , but If try a
 multiprocessed system then this addition simply hangs. Even if all
 processes are talking to different cores. 

Are you sure the problem is in the Solr end? Have you tried running the
multithreaded extraction without adding the data to Solr?

- Toke Eskildsen, State and University Library, Denmark




Re: Can I use multiple cores

2014-08-12 Thread Noble Paul
Hi Ramprasad,


I have used it in a cluster with millions of users (1 user per core) in
legacy cloud mode .We used the on demand core loading feature where each
Solr had 30,000 cores and at a time only 2000 cores were in memory. You are
just hitting 400 and I don't see much of a problem . What is your h/w bTW?


On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan 
ramprasad...@gmail.com wrote:

 I need to store in SOLR all data of my clients mailing activitiy

 The data contains meta data like From;To:Date;Time:Subject etc

 I would easily have 1000 Million records every 2 months.

 What I am currently doing is creating cores per client. So I have 400 cores
 already.

 Is this a good idea to do ?

 What is the general practice for creating cores




-- 
-
Noble Paul


Re: Can I use multiple cores

2014-08-12 Thread Aurélien MAZOYER

Hi Paul and Ramprasad,

I follow your discussion with interest as I will have more or less the 
same requirement.
When you say that you use on demand core loading, are you talking about 
LotsOfCore stuff?
Erick told me that it does not work very well in a distributed 
environnement.
How do you handle this problem? Do you use multiple single Solr 
instances? What about failover?


Thanks for your answer,

Aurelien

Le 12/08/2014 14:48, Noble Paul a écrit :

Hi Ramprasad,


I have used it in a cluster with millions of users (1 user per core) in
legacy cloud mode .We used the on demand core loading feature where each
Solr had 30,000 cores and at a time only 2000 cores were in memory. You are
just hitting 400 and I don't see much of a problem . What is your h/w bTW?


On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan 
ramprasad...@gmail.com wrote:


I need to store in SOLR all data of my clients mailing activitiy

The data contains meta data like From;To:Date;Time:Subject etc

I would easily have 1000 Million records every 2 months.

What I am currently doing is creating cores per client. So I have 400 cores
already.

Is this a good idea to do ?

What is the general practice for creating cores








Re: Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
On 12 August 2014 18:18, Noble Paul noble.p...@gmail.com wrote:

 Hi Ramprasad,


 I have used it in a cluster with millions of users (1 user per core) in
 legacy cloud mode .We used the on demand core loading feature where each
 Solr had 30,000 cores and at a time only 2000 cores were in memory. You are
 just hitting 400 and I don't see much of a problem . What is your h/w bTW?


 On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan 
 ramprasad...@gmail.com wrote:

  I need to store in SOLR all data of my clients mailing activitiy
 
  The data contains meta data like From;To:Date;Time:Subject etc
 
  I would easily have 1000 Million records every 2 months.
 
  What I am currently doing is creating cores per client. So I have 400
 cores
  already.
 
  Is this a good idea to do ?
 
  What is the general practice for creating cores
 


I have a single machine 16GB Ram with 16 cpu cores

What is the h/w you are using


RE: Can I use multiple cores

2014-08-12 Thread Toke Eskildsen
Ramprasad Padmanabhan [ramprasad...@gmail.com] wrote:
 I have a single machine 16GB Ram with 16 cpu cores

Ah! I thought you had more machines, each with 16 Solr cores.

This changes a lot. 400 Solr cores of ~200MB ~= 80GB of data. You're aiming for 
7 times that, so about 500GB of data. Running that on a single machine with 
16GB of RAM is not unrealistic, but it depends a lot on how often a search is 
issued and whether or not you can unload inactive cores and accept the startup 
penalty of loading it the first time a user searches for something. Searches 
will be really slow if you are using a spinning drive.

You might be interested in 
http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/

As for indexing then I can understand if you run into problems with 400 
concurrent updates to your single machine setup. You should limit the amount of 
concurrent updates to a bit more than the number of cores, so try with 20 or 40.

- Toke Eskildsen


Re: Can I use multiple cores

2014-08-12 Thread Noble Paul
The machines were 32GB ram boxes. You must do the RAM requirement
calculation for your indexes . Just the no:of indexes alone won't be enough
to arrive at the RAM requirement


On Tue, Aug 12, 2014 at 6:59 PM, Ramprasad Padmanabhan 
ramprasad...@gmail.com wrote:

 On 12 August 2014 18:18, Noble Paul noble.p...@gmail.com wrote:

  Hi Ramprasad,
 
 
  I have used it in a cluster with millions of users (1 user per core) in
  legacy cloud mode .We used the on demand core loading feature where each
  Solr had 30,000 cores and at a time only 2000 cores were in memory. You
 are
  just hitting 400 and I don't see much of a problem . What is your h/w
 bTW?
 
 
  On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan 
  ramprasad...@gmail.com wrote:
 
   I need to store in SOLR all data of my clients mailing activitiy
  
   The data contains meta data like From;To:Date;Time:Subject etc
  
   I would easily have 1000 Million records every 2 months.
  
   What I am currently doing is creating cores per client. So I have 400
  cores
   already.
  
   Is this a good idea to do ?
  
   What is the general practice for creating cores
  
 
 
 I have a single machine 16GB Ram with 16 cpu cores

 What is the h/w you are using




-- 
-
Noble Paul


Re: Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
And how many machines running the SOLR ?




On 12 August 2014 22:12, Noble Paul noble.p...@gmail.com wrote:

 The machines were 32GB ram boxes. You must do the RAM requirement


And how many machines running the SOLR ?

I expect that I will have to add more servers. What I am looking for is how
do I calculate how many servers I need.