Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Michael Kuhlmann

Am 14.05.2012 05:56, schrieb arjit:

Thanks Erick for the reply.
I have 6 cores which doesn't contain duplicated data. every core has some
unique data. What I thought was when I read it would read parallel 6 cores
and join the result and return the query. And this would be efficient then
reading one big core.


No, it's not. When you request 10 documents from Solr, it can't know in 
prior which shards contain how many of those documents. It could be that 
each shard only needs to fill one or two documents into the result, but 
it might be that only one shard conatins all ten docuemnts. Therefor, 
Solr needs to request 10 documents from each shard, then taking only the 
10 top documents from those 60 ones and drop the rest. And it gets worse 
when you set an offset of, say, 100.


Sharding is (nearly) always slower than using one big index with 
sufficient hardware resources. Only use sharding when your index is too 
huge to fit into one single machine.


Greetings,
Kuli


Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Sami Siren
 Sharding is (nearly) always slower than using one big index with sufficient
 hardware resources. Only use sharding when your index is too huge to fit
 into one single machine.

If you're not constrained by CPU or IO, in other words have plenty of
CPU cores available together with for example separate hard discs for
each shard splitting your index into smaller shards can in some cases
make a huge difference in one box too.

--
 Sami Siren


Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Michael Kuhlmann

Am 14.05.2012 13:22, schrieb Sami Siren:

Sharding is (nearly) always slower than using one big index with sufficient
hardware resources. Only use sharding when your index is too huge to fit
into one single machine.


If you're not constrained by CPU or IO, in other words have plenty of
CPU cores available together with for example separate hard discs for
each shard splitting your index into smaller shards can in some cases
make a huge difference in one box too.


Do you have an example?

This is hard to believe. If you've several shard on the same machine, 
you'll need that much memory that each shard has enough for all its 
caches and duch. With that lot of memory, a single Solr core should be 
really fast.


If dividing the index is the reason, then a software RAID 0 (striping) 
should be much better.


The only point I see is the concurrent search for one request. Maybe, 
for large requests, this might outweigh the sharding overhead, but only 
for long-running requests without disk I/O. I only see the case when 
using very complicated query functions. And, this only stays true as 
long as you don't run multiple concurrent requests.


Greetings,
Kuli


Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Otis Gospodnetic
Hi Kuli,

In a client engagement, I did see this (N shards on 1 beefy box with lots of 
RAM and CPU cores) be faster than 1 big index.

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




 From: Michael Kuhlmann k...@solarier.de
To: solr-user@lucene.apache.org 
Sent: Monday, May 14, 2012 7:56 AM
Subject: Re: Solr Shards multi core slower then single big core
 
Am 14.05.2012 13:22, schrieb Sami Siren:
 Sharding is (nearly) always slower than using one big index with sufficient
 hardware resources. Only use sharding when your index is too huge to fit
 into one single machine.
 
 If you're not constrained by CPU or IO, in other words have plenty of
 CPU cores available together with for example separate hard discs for
 each shard splitting your index into smaller shards can in some cases
 make a huge difference in one box too.

Do you have an example?

This is hard to believe. If you've several shard on the same machine, you'll 
need that much memory that each shard has enough for all its caches and duch. 
With that lot of memory, a single Solr core should be really fast.

If dividing the index is the reason, then a software RAID 0 (striping) should 
be much better.

The only point I see is the concurrent search for one request. Maybe, for 
large requests, this might outweigh the sharding overhead, but only for 
long-running requests without disk I/O. I only see the case when using very 
complicated query functions. And, this only stays true as long as you don't 
run multiple concurrent requests.

Greetings,
Kuli




Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Michael Kuhlmann

Am 14.05.2012 16:18, schrieb Otis Gospodnetic:

Hi Kuli,

In a client engagement, I did see this (N shards on 1 beefy box with lots of 
RAM and CPU cores) be faster than 1 big index.



I want to believe you, but I also want to understand. Can you explain 
why? And did this only happen for single requests, or even under heavy load?


Greetings,
Kuli


Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Otis Gospodnetic
Hi Kuli,

As long as there are enough CPUs with spare cycles and disk IO is not a 
bottleneck, this works faster.  This was 12+ months ago.

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




 From: Michael Kuhlmann k...@solarier.de
To: solr-user@lucene.apache.org 
Sent: Monday, May 14, 2012 10:21 AM
Subject: Re: Solr Shards multi core slower then single big core
 
Am 14.05.2012 16:18, schrieb Otis Gospodnetic:
 Hi Kuli,

 In a client engagement, I did see this (N shards on 1 beefy box with lots of 
 RAM and CPU cores) be faster than 1 big index.


I want to believe you, but I also want to understand. Can you explain 
why? And did this only happen for single requests, or even under heavy load?

Greetings,
Kuli




Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Michael Della Bitta
Hi, all,

I've been running into murmurs about this idea elsewhere:

http://stackoverflow.com/questions/8698762/run-multiple-big-solr-shard-instances-on-one-physical-machine

http://java.dzone.com/articles/optimizing-solr-or-how-7x-your?mz=33057-solr_lucene

Michael

On Mon, May 14, 2012 at 10:29 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi Kuli,

 As long as there are enough CPUs with spare cycles and disk IO is not a 
 bottleneck, this works faster.  This was 12+ months ago.

 Otis
 
 Performance Monitoring for Solr / ElasticSearch / HBase - 
 http://sematext.com/spm




 From: Michael Kuhlmann k...@solarier.de
To: solr-user@lucene.apache.org
Sent: Monday, May 14, 2012 10:21 AM
Subject: Re: Solr Shards multi core slower then single big core

Am 14.05.2012 16:18, schrieb Otis Gospodnetic:
 Hi Kuli,

 In a client engagement, I did see this (N shards on 1 beefy box with lots 
 of RAM and CPU cores) be faster than 1 big index.


I want to believe you, but I also want to understand. Can you explain
why? And did this only happen for single requests, or even under heavy load?

Greetings,
Kuli





Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Robert Stewart
We used to have one large index - then moved to 10 shards (7 million docs each) 
- parallel search across all shards, and we get better performance that way.  
We use a 40 core box with 128GB ram.  We do a lot of faceting so maybe that is 
why since facets can be built in parallel on different threads/cores.  We also 
have indexes on fast local disks (6 15K RPM disks using raid stripes).


On May 14, 2012, at 10:42 AM, Michael Della Bitta wrote:

 Hi, all,
 
 I've been running into murmurs about this idea elsewhere:
 
 http://stackoverflow.com/questions/8698762/run-multiple-big-solr-shard-instances-on-one-physical-machine
 
 http://java.dzone.com/articles/optimizing-solr-or-how-7x-your?mz=33057-solr_lucene
 
 Michael
 
 On Mon, May 14, 2012 at 10:29 AM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
 Hi Kuli,
 
 As long as there are enough CPUs with spare cycles and disk IO is not a 
 bottleneck, this works faster.  This was 12+ months ago.
 
 Otis
 
 Performance Monitoring for Solr / ElasticSearch / HBase - 
 http://sematext.com/spm
 
 
 
 
 From: Michael Kuhlmann k...@solarier.de
 To: solr-user@lucene.apache.org
 Sent: Monday, May 14, 2012 10:21 AM
 Subject: Re: Solr Shards multi core slower then single big core
 
 Am 14.05.2012 16:18, schrieb Otis Gospodnetic:
 Hi Kuli,
 
 In a client engagement, I did see this (N shards on 1 beefy box with lots 
 of RAM and CPU cores) be faster than 1 big index.
 
 
 I want to believe you, but I also want to understand. Can you explain
 why? And did this only happen for single requests, or even under heavy load?
 
 Greetings,
 Kuli
 
 
 



Re: Solr Shards multi core slower then single big core

2012-05-14 Thread arjit
Robert can you tell what you mean when you say We do a lot of faceting so
maybe that is why since facets can be built in parallel on different
threads/cores. I am novice in solr. Can you tell me where Can i read about
it ?
Thanks ,
Arjit



On Mon, May 14, 2012 at 8:54 PM, Robert Stewart [via Lucene] 
ml-node+s472066n3983692...@n3.nabble.com wrote:

 We used to have one large index - then moved to 10 shards (7 million docs
 each) - parallel search across all shards, and we get better performance
 that way.  We use a 40 core box with 128GB ram.  We do a lot of faceting so
 maybe that is why since facets can be built in parallel on different
 threads/cores.  We also have indexes on fast local disks (6 15K RPM disks
 using raid stripes).


 On May 14, 2012, at 10:42 AM, Michael Della Bitta wrote:

  Hi, all,
 
  I've been running into murmurs about this idea elsewhere:
 
 
 http://stackoverflow.com/questions/8698762/run-multiple-big-solr-shard-instances-on-one-physical-machine
 
 
 http://java.dzone.com/articles/optimizing-solr-or-how-7x-your?mz=33057-solr_lucene
 
  Michael
 
  On Mon, May 14, 2012 at 10:29 AM, Otis Gospodnetic
  [hidden email] http://user/SendEmail.jtp?type=nodenode=3983692i=0
 wrote:
  Hi Kuli,
 
  As long as there are enough CPUs with spare cycles and disk IO is not a
 bottleneck, this works faster.  This was 12+ months ago.
 
  Otis
  
  Performance Monitoring for Solr / ElasticSearch / HBase -
 http://sematext.com/spm
 
 
 
  
  From: Michael Kuhlmann [hidden 
  email]http://user/SendEmail.jtp?type=nodenode=3983692i=1

  To: [hidden email]http://user/SendEmail.jtp?type=nodenode=3983692i=2
  Sent: Monday, May 14, 2012 10:21 AM
  Subject: Re: Solr Shards multi core slower then single big core
 
  Am 14.05.2012 16:18, schrieb Otis Gospodnetic:
  Hi Kuli,
 
  In a client engagement, I did see this (N shards on 1 beefy box with
 lots of RAM and CPU cores) be faster than 1 big index.
 
 
  I want to believe you, but I also want to understand. Can you explain
  why? And did this only happen for single requests, or even under heavy
 load?
 
  Greetings,
  Kuli
 
 
 



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3983692.html
  To unsubscribe from Solr Shards multi core slower then single big core, click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3979115code=YXJqaXQyOTJAZ21haWwuY29tfDM5NzkxMTV8MTIwOTQwMDU4MA==
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3983697.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Shards multi core slower then single big core

2012-05-14 Thread Otis Gospodnetic
Aha!  See, Kuli, I wasn't making it up! ;)

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




 From: Robert Stewart bstewart...@gmail.com
To: solr-user@lucene.apache.org 
Sent: Monday, May 14, 2012 11:23 AM
Subject: Re: Solr Shards multi core slower then single big core
 
We used to have one large index - then moved to 10 shards (7 million docs 
each) - parallel search across all shards, and we get better performance that 
way.  We use a 40 core box with 128GB ram.  We do a lot of faceting so maybe 
that is why since facets can be built in parallel on different threads/cores.  
We also have indexes on fast local disks (6 15K RPM disks using raid stripes).


On May 14, 2012, at 10:42 AM, Michael Della Bitta wrote:

 Hi, all,
 
 I've been running into murmurs about this idea elsewhere:
 
 http://stackoverflow.com/questions/8698762/run-multiple-big-solr-shard-instances-on-one-physical-machine
 
 http://java.dzone.com/articles/optimizing-solr-or-how-7x-your?mz=33057-solr_lucene
 
 Michael
 
 On Mon, May 14, 2012 at 10:29 AM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
 Hi Kuli,
 
 As long as there are enough CPUs with spare cycles and disk IO is not a 
 bottleneck, this works faster.  This was 12+ months ago.
 
 Otis
 
 Performance Monitoring for Solr / ElasticSearch / HBase - 
 http://sematext.com/spm
 
 
 
 
 From: Michael Kuhlmann k...@solarier.de
 To: solr-user@lucene.apache.org
 Sent: Monday, May 14, 2012 10:21 AM
 Subject: Re: Solr Shards multi core slower then single big core
 
 Am 14.05.2012 16:18, schrieb Otis Gospodnetic:
 Hi Kuli,
 
 In a client engagement, I did see this (N shards on 1 beefy box with lots 
 of RAM and CPU cores) be faster than 1 big index.
 
 
 I want to believe you, but I also want to understand. Can you explain
 why? And did this only happen for single requests, or even under heavy 
 load?
 
 Greetings,
 Kuli
 
 
 





Re: Solr Shards multi core slower then single big core

2012-05-13 Thread arjit
Thanks Erick for the reply.
I have 6 cores which doesn't contain duplicated data. every core has some
unique data. What I thought was when I read it would read parallel 6 cores
and join the result and return the query. And this would be efficient then
reading one big core.
My question is wouldn't Solr read in  parallel from shards when a query is
fired to it ?

Please let me know If i am assuming something which is wrong.

Thanks ,
Arjit



On Sun, May 13, 2012 at 12:44 AM, Erick Erickson [via Lucene] 
ml-node+s472066n3982950...@n3.nabble.com wrote:

 One of the points of sharding is to use more _machines_. Running multiple
 shards on a single machine is not magically going to make things faster.
 In
 fact I'd expect your process to consume more resources since the
 cores are now not sharing common data (i.e. having a single word
 in more than one core will use two instances of that word).

 Best
 Erick

 On Fri, May 11, 2012 at 3:38 AM, arjit [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3982950i=0
 wrote:

  My query is
  SolrQuery sQuery = new SolrQuery(query.getQueryStr());
 sQuery.setQueryType(dismax);
 
 
 sQuery.setRows(100);
 
 if (!query.isSearchOnDefaultField()) {
 sQuery.setParam(qf, queryFields.toArray(new
  String[queryFields.size()]));
 }
 sQuery.setFields(visibleFields.toArray(new
  String[visibleFields.size()]));
 
 if(query.isORQuery())
 {
 sQuery.setParam(mm,1);
 }
 
  My search is
 
  requestHandler name=dismax class=solr.SearchHandler 
 lst name=defaults
  str name=defTypedismax/str
  str name=echoParamsexplicit/str
  float name=tie0.01/float
   str
 
 name=shardslocalhost:9090/solr/book1,localhost:9090/solr/book2,localhost:9090/solr/book3,localhost:9090/solr/book4,localhost:9090/solr/book5,localhost:9090/solr/book6/str

  str name=qf
 
  text^2.0
 
 
  /str
 
  str name=fl
 title item_id author titleMinusAuthor
  /str
 
  int name=ps4/int
  str name=q.alt*:*/str
 
  str name=hl.fltext features name/str
 
  str name=f.name.hl.fragsize0/str
 
  str name=f.name.hl.alternateFieldname/str
  str name=f.text.hl.fragmenterregex/str
/lst
 
   /requestHandler
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3979243.html
  Sent from the Solr - User mailing list archive at Nabble.com.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3982950.html
  To unsubscribe from Solr Shards multi core slower then single big core, click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3979115code=YXJqaXQyOTJAZ21haWwuY29tfDM5NzkxMTV8MTIwOTQwMDU4MA==
 .
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3983601.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Shards multi core slower then single big core

2012-05-12 Thread arjit
My query is 
SolrQuery sQuery = new SolrQuery(query.getQueryStr());
sQuery.setQueryType(dismax);


sQuery.setRows(100);

if (!query.isSearchOnDefaultField()) {
sQuery.setParam(qf, queryFields.toArray(new
String[queryFields.size()]));
}
sQuery.setFields(visibleFields.toArray(new
String[visibleFields.size()]));

if(query.isORQuery())
{
sQuery.setParam(mm,1);
}

My search is 

requestHandler name=dismax class=solr.SearchHandler 
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.01/float
  str
name=shardslocalhost:9090/solr/book1,localhost:9090/solr/book2,localhost:9090/solr/book3,localhost:9090/solr/book4,localhost:9090/solr/book5,localhost:9090/solr/book6/str
 str name=qf
 
 text^2.0 

   
 /str
   
 str name=fl
title item_id author titleMinusAuthor
 /str

 int name=ps4/int
 str name=q.alt*:*/str
 
 str name=hl.fltext features name/str

 str name=f.name.hl.fragsize0/str
 
 str name=f.name.hl.alternateFieldname/str
 str name=f.text.hl.fragmenterregex/str 
   /lst
   
  /requestHandler


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Shards-multi-core-slower-then-single-big-core-tp3979115p3979243.html
Sent from the Solr - User mailing list archive at Nabble.com.