Re: Benchmarking Solr

2013-05-27 Thread Otis Gospodnetic
Hi Benson,

We typically use https://github.com/sematext/ActionGenerator

As a matter of fact, we are using it right now to test one of our
search clusters...

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Sun, May 26, 2013 at 10:38 AM, Benson Margulies
 wrote:
> I'd like to run a repeatable test of having Solr ingest a corpus of
> docs on disk, to measure the speed of some alternative things plugged
> in.
>
> Anyone have some advice to share? One approach would be a quick SolrJ
> program that pushed the entire stack as one giant collection with a
> commit at the end.


Re: Benchmarking Solr

2013-05-26 Thread Upayavira
SolrMeter?

Upayavira

On Sun, May 26, 2013, at 03:38 PM, Benson Margulies wrote:
> I'd like to run a repeatable test of having Solr ingest a corpus of
> docs on disk, to measure the speed of some alternative things plugged
> in.
> 
> Anyone have some advice to share? One approach would be a quick SolrJ
> program that pushed the entire stack as one giant collection with a
> commit at the end.


Re: Benchmarking Solr 3.3 vs. 4.0

2012-11-30 Thread Daniel Exner

Shawn Heisey wrote:
[..]


For best results, you'll want to ensure that Solr4 is working completely
from scratch, that it has never seen a 3.3 index, so that it will use
its own native format.
That's why I did in the second run. Thanks for clarifying that this is 
in fact better. :)



It may be a good idea to look into the example
Solr4 config/schema and see whether there are improvements you can
make.  One note: the updateLog feature in the update handler config will
generally cause performance to be lower.  The features that require
updateLog would make this less of an apples to apples comparison, so I
wouldn't enable it unless I knew I needed it.

I'll have a look at the updateLog feature. But I'm pretty sure its disabled.


Unless the lines are labelled wrong in the legend, the graph does show
higher CPU usage during the push, but lower CPU usage during the
optimize and most of the rest of the time.
Slightly, but I was expecting higher latency also. Also raw data shows 
the box is unable to deliver CPU stats to the PerfMon Plugin because of 
high load. Perhaps I was expecting higher changes, but if you say what I 
see is ok, I'm fine.

Can you comment on high CPU load even at low QPS rates?
Is there some parameter to force lower load while testing at the cost of 
higher latencies for better comparison?




The graph shows that Solr4 has lower latency than 3.3 during both the
push and the optimize, as well as most of the rest of the time.  The
latency numbers however are a lot higher than I would expect, seeming to
average out at around 100 seconds (10 ms).  That is terrible
performance from both versions.  On my own Solr installation, which is
distributed and has 78 million documents, I have a median latency of 8
milliseconds and a 95th percentile latency of 248 milliseconds.
OK, I should relabel the y-axis because data is in fact 1000 times to 
high. So latency is more at 10ms which is quite good for high QPS rates.




Is this a 64-bit platform with a 64-bit Java?  How much memory have you
allocated for the java heap?  How big is the index?


The VM I am using is an openSUSE 10.3 (i586), so no 64-bit Java at all 
(but production is using it).

Tomcat Java parameters are:
"-Xms1024m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=256m 
-XX:+UseParallelGC -XX:ParallelGCThreads=4 -XX:GCTimeRatio=10"


Number of docs is 266249 for both indices. Which is quite small, but I 
may be able to use a much larger index and a much better machine in the 
near future.


Greetings
Daniel Exner
--
Daniel Exner
Softwaredevelopment & Applicationsupport
ESEMOS GmbH


Re: Benchmarking Solr 3.3 vs. 4.0

2012-11-29 Thread Shawn Heisey

On 11/29/2012 8:29 AM, Daniel Exner wrote:

I'll answer both your mails in one.

Shawn Heisey wrote:

On 11/29/2012 3:15 AM, Daniel Exner wrote:

I'm currently doing some benchmarking of a real Solr 3.3 instance vs
the same ported to Solr 4.0.

[..]

In the graph you can see high CPU load, all the time. This is even the
case if I reduce the QPS down to 5, so CPU is no good metric for
comparison between Solr 3.3 and 4.0 (at least on this machine).
The missing memory data is due to the PerfMon JMeter Plugin having
time-outs sometimes.

You can also see no real increase in latency when pushing data into
the index. This is puzzling me, as rumours say one should not push new
data while under high load, as this would hurt query performance.


I don't see any attachments, or any links to external attachments, so I
can't see the graph.  I can only make general statements, and I can't
guarantee that they'll even be applicable to your scenario.  You may
need to use an external attachment service and just send us a link.
Indeed, it seems like the mailing list daemon scrubbed my attachement. 
I dropped it into my Dropbox, here http://db.tt/EjYCqbpn



Are you seeing lower performance, or just worried about the CPU load?
Solr4 should be able to handle concurrent indexing and querying better
than 3.x.  It is able to do things concurrently that were not possible
before.
In general I'm interested in how much better Solr 4 performs and if it 
may be feasonable to use less powerful machines to get the same low 
latency, or do more data pushes etc.



One way that performance improvements happen is that developers find
slow sections of code where the CPU is fairly idle, and rewrite them so
they are faster, but also exercise the CPU harder. When the new code
runs, CPU load goes higher, but it all runs faster.
Graphs show a slightly better latency for Solr 4.0 compared to 3.3, 
but not while pushing data.




Another note specifically related to this part: Have you used the same
configuration and done the minimal changes required to make it run, or
have you tried to update the config for 4.0 and its considerable list of
new features?  Did you start with a blank index on 4.0, or did you copy
the 3.3 index over?

I used the same configuration and did the minimal changes.
The first runs where using the same data from Solr 3.3 in Solr 4.0 (in 
fact it was even the same data dir..) but further runs used freshly 
filled different indices.


For best results, you'll want to ensure that Solr4 is working completely 
from scratch, that it has never seen a 3.3 index, so that it will use 
its own native format.  It may be a good idea to look into the example 
Solr4 config/schema and see whether there are improvements you can 
make.  One note: the updateLog feature in the update handler config will 
generally cause performance to be lower.  The features that require 
updateLog would make this less of an apples to apples comparison, so I 
wouldn't enable it unless I knew I needed it.


Unless the lines are labelled wrong in the legend, the graph does show 
higher CPU usage during the push, but lower CPU usage during the 
optimize and most of the rest of the time.


The graph shows that Solr4 has lower latency than 3.3 during both the 
push and the optimize, as well as most of the rest of the time.  The 
latency numbers however are a lot higher than I would expect, seeming to 
average out at around 100 seconds (10 ms).  That is terrible 
performance from both versions.  On my own Solr installation, which is 
distributed and has 78 million documents, I have a median latency of 8 
milliseconds and a 95th percentile latency of 248 milliseconds.


Is this a 64-bit platform with a 64-bit Java?  How much memory have you 
allocated for the java heap?  How big is the index?


Thanks,
Shawn



Re: Benchmarking Solr 3.3 vs. 4.0

2012-11-29 Thread Daniel Exner

I'll answer both your mails in one.

Shawn Heisey wrote:

On 11/29/2012 3:15 AM, Daniel Exner wrote:

I'm currently doing some benchmarking of a real Solr 3.3 instance vs
the same ported to Solr 4.0.

[..]

In the graph you can see high CPU load, all the time. This is even the
case if I reduce the QPS down to 5, so CPU is no good metric for
comparison between Solr 3.3 and 4.0 (at least on this machine).
The missing memory data is due to the PerfMon JMeter Plugin having
time-outs sometimes.

You can also see no real increase in latency when pushing data into
the index. This is puzzling me, as rumours say one should not push new
data while under high load, as this would hurt query performance.


I don't see any attachments, or any links to external attachments, so I
can't see the graph.  I can only make general statements, and I can't
guarantee that they'll even be applicable to your scenario.  You may
need to use an external attachment service and just send us a link.
Indeed, it seems like the mailing list daemon scrubbed my attachement. I 
dropped it into my Dropbox, here http://db.tt/EjYCqbpn



Are you seeing lower performance, or just worried about the CPU load?
Solr4 should be able to handle concurrent indexing and querying better
than 3.x.  It is able to do things concurrently that were not possible
before.
In general I'm interested in how much better Solr 4 performs and if it 
may be feasonable to use less powerful machines to get the same low 
latency, or do more data pushes etc.



One way that performance improvements happen is that developers find
slow sections of code where the CPU is fairly idle, and rewrite them so
they are faster, but also exercise the CPU harder.  When the new code
runs, CPU load goes higher, but it all runs faster.
Graphs show a slightly better latency for Solr 4.0 compared to 3.3, but 
not while pushing data.




Another note specifically related to this part: Have you used the same
configuration and done the minimal changes required to make it run, or
have you tried to update the config for 4.0 and its considerable list of
new features?  Did you start with a blank index on 4.0, or did you copy
the 3.3 index over?

I used the same configuration and did the minimal changes.
The first runs where using the same data from Solr 3.3 in Solr 4.0 (in 
fact it was even the same data dir..) but further runs used freshly 
filled different indices.



Greetings
Daniel Exner
--
Daniel Exner
Softwaredevelopment & Applicationsupport
ESEMOS GmbH


Re: Benchmarking Solr 3.3 vs. 4.0

2012-11-29 Thread Shawn Heisey

On 11/29/2012 3:15 AM, Daniel Exner wrote:
I'm currently doing some benchmarking of a real Solr 3.3 instance vs 
the same ported to Solr 4.0.


Another note specifically related to this part: Have you used the same 
configuration and done the minimal changes required to make it run, or 
have you tried to update the config for 4.0 and its considerable list of 
new features?  Did you start with a blank index on 4.0, or did you copy 
the 3.3 index over?


There's no wrong answer to these questions.  Depending on exactly what 
you are trying to do, what is right for someone else may not be right 
for you.  The answers will help narrow the discussion.


Thanks,
Shawn



Re: Benchmarking Solr 3.3 vs. 4.0

2012-11-29 Thread Shawn Heisey

On 11/29/2012 3:15 AM, Daniel Exner wrote:
I'm currently doing some benchmarking of a real Solr 3.3 instance vs 
the same ported to Solr 4.0.


Benchmarking is done using JMeter from localhost.
Test scenario is a constant stream of queries from a log file out of 
production, at targeted 50 QPS.
After some time (marked in graph) I do a push via REST interface of 
the whole index data (796M XML), wait some time and do a optimize via 
REST.


Testmachine is a VM on a "Intel(R) Core(TM)2 Quad CPU Q9400 @2.66GH", 
one core and 2Gb RAM attached.
Both Solr instances are running in the same Tomcat and are not used 
otherwise than testing.


Expected results where a lower overall load for Solr 4 and a lower 
latency while pushing new data.


In the graph you can see high CPU load, all the time. This is even the 
case if I reduce the QPS down to 5, so CPU is no good metric for 
comparison between Solr 3.3 and 4.0 (at least on this machine).
The missing memory data is due to the PerfMon JMeter Plugin having 
time-outs sometimes.


You can also see no real increase in latency when pushing data into 
the index. This is puzzling me, as rumours say one should not push new 
data while under high load, as this would hurt query performance.


I don't see any attachments, or any links to external attachments, so I 
can't see the graph.  I can only make general statements, and I can't 
guarantee that they'll even be applicable to your scenario.  You may 
need to use an external attachment service and just send us a link.


Are you seeing lower performance, or just worried about the CPU load?  
Solr4 should be able to handle concurrent indexing and querying better 
than 3.x.  It is able to do things concurrently that were not possible 
before.


One way that performance improvements happen is that developers find 
slow sections of code where the CPU is fairly idle, and rewrite them so 
they are faster, but also exercise the CPU harder.  When the new code 
runs, CPU load goes higher, but it all runs faster.


Thanks,
Shawn



RE: Benchmarking Solr

2010-04-14 Thread Lesyshyn, Erica
Hi,

I agree with Jean-Sebastien. JMeter is great! The threads in my test plan are 
configured to use an "Access Log Sampler". This allows you to feed your 
production requests through JMeter, simulating production traffic. When I 
launch the test, it has access to about 3 million production queries. I also 
use the "user defined variables" in my test plan so I can customize different 
parameters at runtime. 

http://jakarta.apache.org/jmeter/usermanual/jmeter_accesslog_sampler_step_by_step.pdf

- Erica


-Original Message-
From: Jean-Sebastien Vachon [mailto:js.vac...@videotron.ca] 
Sent: Saturday, April 10, 2010 11:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Benchmarking Solr

Hi,

why don't you use JMeter? It would give you greater control over the tests 
you wish to make.
It has many different samplers that will let you run different scenarios 
using your existing set of queries.

ab is great when you want to evaluate the performance of your server under 
heavy load. But other than this, I don`t see much use to it. JMeter offers 
many more options once you get to know it a little.

good luck

- Original Message - 
From: "Blargy" 
To: 
Sent: Friday, April 09, 2010 9:46 PM
Subject: Benchmarking Solr


>
> I am about to deploy Solr into our production environment and I would like 
> to
> do some benchmarking to determine how many slaves I will need to set up.
> Currently the only way I know how to benchmark is to use Apache Benchmark
> but I would like to be able to send random requests to the Solr... not 
> just
> one request over and over.
>
> I have a sample data set of 5000 user entered queries and I would like to 
> be
> able to use AB to benchmark against all these random queries. Is this
> possible?
>
> FYI our current index is ~1.5 gigs with ~5m documents and we will be using
> faceting quite extensively. Are average requests per/day is ~2m. We will 
> be
> running RHEL with about 8-12g ram. Any idea how many slaves might be
> required to handle our load?
>
> Thanks
> -- 
> View this message in context: 
> http://n3.nabble.com/Benchmarking-Solr-tp709561p709561.html
> Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Benchmarking Solr

2010-04-14 Thread Shawn Heisey

On 4/12/2010 9:57 AM, Shawn Heisey wrote:

On 4/12/2010 8:51 AM, Paolo Castagna wrote:

There are already two related pages:

 - http://wiki.apache.org/solr/SolrPerformanceFactors
 - http://wiki.apache.org/solr/SolrPerformanceData

Why not to create a new page?

 - http://wiki.apache.org/solr/BenchmarkingSolr (?)


Done.  I hope you like it.  Please feel free to improve it.

http://wiki.apache.org/solr/BenchmarkingSolr



I've updated the script to clean things up further, but the most 
important change is that it will now do URI escaping on the query string 
before plugging it into the URL.  You can turn this off if your query 
strings are already escaped.


Re: Benchmarking Solr

2010-04-12 Thread Shawn Heisey

On 4/12/2010 8:51 AM, Paolo Castagna wrote:

There are already two related pages:

 - http://wiki.apache.org/solr/SolrPerformanceFactors
 - http://wiki.apache.org/solr/SolrPerformanceData

Why not to create a new page?

 - http://wiki.apache.org/solr/BenchmarkingSolr (?)


Done.  I hope you like it.  Please feel free to improve it.

http://wiki.apache.org/solr/BenchmarkingSolr



Re: Benchmarking Solr

2010-04-12 Thread Paolo Castagna

Shawn Heisey wrote:

Anyone got a recommendation about where to put it on the wiki?


There are already two related pages:

 - http://wiki.apache.org/solr/SolrPerformanceFactors
 - http://wiki.apache.org/solr/SolrPerformanceData

Why not to create a new page?

 - http://wiki.apache.org/solr/BenchmarkingSolr (?)

It would be good to have someone using JMeter to share their config
files as well.

Paolo


Re: Benchmarking Solr

2010-04-12 Thread Paolo Castagna

Paolo Castagna wrote:

I do not have an answer to your questions.
But, I have the same issue/problem you have.


Some related threads:

 - http://markmail.org/message/pns4dtfvt54mu3vs
 - http://markmail.org/message/7on6lvabsosvj7bc
 - http://markmail.org/message/ftz7tkd7ekhnk4bc
 - http://markmail.org/message/db2cv3dzakdp23qm
 - http://markmail.org/message/m3x6ogkfdhcwae6z
 - http://markmail.org/message/xoe3ny7dldnx4wby
 - http://markmail.org/message/eoqty4ralk34rgzk

Paolo



RE: Benchmarking Solr

2010-04-12 Thread Nagelberg, Kallin
I have been using Jmeter to perform some load testing. In your case you might 
like to take a look at 
http://jakarta.apache.org/jmeter/usermanual/component_reference.html#CSV_Data_Set_Config
 . This will allow you to use a random item from your query list.

Regards,
Kallin Nagelberg

-Original Message-
From: Blargy [mailto:zman...@hotmail.com] 
Sent: Friday, April 09, 2010 9:47 PM
To: solr-user@lucene.apache.org
Subject: Benchmarking Solr


I am about to deploy Solr into our production environment and I would like to
do some benchmarking to determine how many slaves I will need to set up.
Currently the only way I know how to benchmark is to use Apache Benchmark
but I would like to be able to send random requests to the Solr... not just
one request over and over.

I have a sample data set of 5000 user entered queries and I would like to be
able to use AB to benchmark against all these random queries. Is this
possible?

FYI our current index is ~1.5 gigs with ~5m documents and we will be using
faceting quite extensively. Are average requests per/day is ~2m. We will be
running RHEL with about 8-12g ram. Any idea how many slaves might be
required to handle our load?

Thanks
-- 
View this message in context: 
http://n3.nabble.com/Benchmarking-Solr-tp709561p709561.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Benchmarking Solr

2010-04-12 Thread Shawn Heisey
I've got a very simple perl script (most of the work is done with 
modules) that I wrote which forks off multiple processes and throws 
requests at Solr, then gives a little bit of statistical analysis at the 
end.  I have planned on sharing it from the beginning, I just have to 
clean it up for public consumption.  I will try to do that today, though 
I don't know if I can.  Anyone got a recommendation about where to put 
it on the wiki?


Shawn

On 4/9/2010 7:46 PM, Blargy wrote:

I am about to deploy Solr into our production environment and I would like to
do some benchmarking to determine how many slaves I will need to set up.
Currently the only way I know how to benchmark is to use Apache Benchmark
but I would like to be able to send random requests to the Solr... not just
one request over and over.

I have a sample data set of 5000 user entered queries and I would like to be
able to use AB to benchmark against all these random queries. Is this
possible?

FYI our current index is ~1.5 gigs with ~5m documents and we will be using
faceting quite extensively. Are average requests per/day is ~2m. We will be
running RHEL with about 8-12g ram. Any idea how many slaves might be
required to handle our load?

Thanks
   




Re: Benchmarking Solr

2010-04-12 Thread Markus Jelsma
Hi,


You can use Siege [1] in a similar manner as AB and it can support newline 
separated URL files and pick random URL's.


[1]:http://freshmeat.net/projects/siege/


Cheers,


On Saturday 10 April 2010 03:46:56 Blargy wrote:
> I am about to deploy Solr into our production environment and I would like
>  to do some benchmarking to determine how many slaves I will need to set
>  up. Currently the only way I know how to benchmark is to use Apache
>  Benchmark but I would like to be able to send random requests to the
>  Solr... not just one request over and over.
> 
> I have a sample data set of 5000 user entered queries and I would like to
>  be able to use AB to benchmark against all these random queries. Is this
>  possible?
> 
> FYI our current index is ~1.5 gigs with ~5m documents and we will be using
> faceting quite extensively. Are average requests per/day is ~2m. We will be
> running RHEL with about 8-12g ram. Any idea how many slaves might be
> required to handle our load?
> 
> Thanks
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Benchmarking Solr

2010-04-12 Thread Paolo Castagna

Hi,
I do not have an answer to your questions.
But, I have the same issue/problem you have.

It would be good if Solr community would agree and share their approach
for benchmarking Solr. Indeed, it would be good to have a benchmark for
"information retrieval" systems. AFIK there isn't one. :-/

The content on the wiki [1] is better than nothing, but in practice
more is needed IMHO.

I have seen JMeter being used in ElasticSearch [2].
Solr could do the same to help users and new adopters to start.

Some guidelines/advices (I know it's hard) would be useful as well.

I ended up writing my own "crappy" multi-threaded benchmarking tool.
Also, are you using Jetty? At a certain point, in particular when you
are hitting the Solr cache and returning a large number of results,
the transfer time is a significant part of your response time.
Tuning Jetty or Tomcat or something else is essential.

Are you using Jetty or Tomcat?

I would also be interested in understanding the impact of the slave
pooling interval on searches and the impact of the number of slaves
and pooling interval on updates on the master.

Paolo

 [1] http://wiki.apache.org/solr/SolrPerformanceData
 [2] 
http://github.com/elasticsearch/elasticsearch/tree/master/modules/benchmark/jmeter


Blargy wrote:

I am about to deploy Solr into our production environment and I would like to
do some benchmarking to determine how many slaves I will need to set up.
Currently the only way I know how to benchmark is to use Apache Benchmark
but I would like to be able to send random requests to the Solr... not just
one request over and over.

I have a sample data set of 5000 user entered queries and I would like to be
able to use AB to benchmark against all these random queries. Is this
possible?

FYI our current index is ~1.5 gigs with ~5m documents and we will be using
faceting quite extensively. Are average requests per/day is ~2m. We will be
running RHEL with about 8-12g ram. Any idea how many slaves might be
required to handle our load?

Thanks


Re: Benchmarking Solr

2010-04-10 Thread Jean-Sebastien Vachon

Hi,

why don't you use JMeter? It would give you greater control over the tests 
you wish to make.
It has many different samplers that will let you run different scenarios 
using your existing set of queries.


ab is great when you want to evaluate the performance of your server under 
heavy load. But other than this, I don`t see much use to it. JMeter offers 
many more options once you get to know it a little.


good luck

- Original Message - 
From: "Blargy" 

To: 
Sent: Friday, April 09, 2010 9:46 PM
Subject: Benchmarking Solr




I am about to deploy Solr into our production environment and I would like 
to

do some benchmarking to determine how many slaves I will need to set up.
Currently the only way I know how to benchmark is to use Apache Benchmark
but I would like to be able to send random requests to the Solr... not 
just

one request over and over.

I have a sample data set of 5000 user entered queries and I would like to 
be

able to use AB to benchmark against all these random queries. Is this
possible?

FYI our current index is ~1.5 gigs with ~5m documents and we will be using
faceting quite extensively. Are average requests per/day is ~2m. We will 
be

running RHEL with about 8-12g ram. Any idea how many slaves might be
required to handle our load?

Thanks
--
View this message in context: 
http://n3.nabble.com/Benchmarking-Solr-tp709561p709561.html
Sent from the Solr - User mailing list archive at Nabble.com.