Re: Benchmark Solr vs Elastic Search vs Sensei
Hi Eric Thanks for extensive answers. I will try to tune up my Solr installation according to your advises and the wiki page you've mentioned Best regards, Volodymyr 2012/4/27 Jeremy Taylor jtay...@datastax.com: DataStax offers a Solr integration that isn't master/slave and is NearRealTimes. Essentially, the software offers the great features of Solr without the major shortcomings. Jeremy -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, April 27, 2012 5:26 AM To: solr-user@lucene.apache.org Subject: Re: Benchmark Solr vs Elastic Search vs Sensei Some observations: 1 I suspect some of your queries aren't doing what you expect, but I'm not sure if that matters. e.g. !tags:chick magnet will be parsed as -tags:chick defaultField:magnet. 2 Typical Solr setups in production are usually master/slave setups. Your indexing process (the commits) are causing new searchers to be opened/warmed/etc quite regularly, reducing your throughput. It's not surprising at all that your QPS rate increases when not indexing. 3 The trunk Near Real Time with soft commits should change the characteristics of the test with background indexing. You might try that. 4 Examine your cache usage, see the Solr admin page. Caches are quite important. Also consider autowarming characteristics. 5 There's a ton of stuff you can do to tune query rate. Unfortunately what the specific thing that would help your situation is hard to say. You might start with: http://wiki.apache.org/lucene-java/ImproveSearchingSpeed Best Erick On Thu, Apr 26, 2012 at 9:50 PM, Volodymyr Zhabiuk vzhab...@gmail.com wrote: Hi Solr users I've implemented the project to compare the performance between Solr, Elastic Search and SenseiDB https://github.com/vzhabiuk/search-perf the Solr version 3.5.0 was used. I've used the default configuration, just enabled json updates and used the following schema https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xm l. 2.5 mln documents were put into the index, after that I've launched the indexing process to add anotherr 500k docs. I was issuing commits after each 500 doc batch . At the same time I've launched the concurrent client, that sent the following type of queries ((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags: hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!ta gs:soccer%20mom))%20 OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:ye llow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black) )%20 OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20 OR%20city:u.s.a.* facet=truefacet.field=tagsfacet.field=color The query contains the high level OR query, consisting of 2 terms, 2 ranges and 1 prefix. It is designed to hit ~60-70% of all the docs Here is the performance result: #Threads min median mean 75% qps 1 208.95ms 332.66ms 350.48ms 422.92ms 2.8 2 188.68ms 338.09ms 339.22ms 402.15ms 5.9 3 151.06ms 326.64ms 336.20ms 418.61ms 8.8 4 125.13ms 332.90ms 332.18ms 396.14ms 12.0 If there is no indexing process on background The result is as follows for 2,6 mln docs: #Threads min median mean 75% qps 1 106.70ms 199.66ms 199.40ms 234.89ms 5.1 2 128.61ms 199.12ms 201.81ms 229.89ms 9.9 3 110.99ms 197.43ms 203.13ms 232.25ms 14.7 4 90.24ms 201.46ms 200.46ms 227.75ms 19.9 5 106.14ms 208.75ms 207.69ms 242.88ms 24.0 6 103.75ms 208.91ms 211.23ms 238.60ms 28.3 7 113.54ms 207.07ms 209.69ms 239.99ms 33.3 8 117.32ms 216.38ms 224.74ms 258.74ms 35.5 I've got three questions so far: 1. In case of background indexing the latency is almost 2 times higher, is there any way to overcome this? 2. How can we tune the Solr to get better results ? 3. What's in your opinion is the preferred type of queries that I can use for the benchmark? With many thanks, Volodymyr BTW here is the spec of my machine RedHat 6.1 64bit Intel XEON e5620 @2.40 GHz, 8 cores 63 GB RAM
Re: Benchmark Solr vs Elastic Search vs Sensei
Hi Andy I don't want to publish results, since still there are some mistakes in the benchmark. Also this would be controversial, because there are too many parameters to tune and to take into consideration. Nevertheless you can go to the Sensei google group to see the preliminary result for Sensei At first I was using the benchmark to do the stress testing for Sensei. We needed to identify possible memory leaks and bottlenecks in the new release. After that I've extended the tool to test Solr and Elastic search With many thanks, Volodymyr 2012/4/27 Andy angelf...@yahoo.com: What is the performance of Elasticsearch and SenseiDB in your benchmark? From: Volodymyr Zhabiuk vzhab...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, April 26, 2012 9:50 PM Subject: Benchmark Solr vs Elastic Search vs Sensei Hi Solr users I've implemented the project to compare the performance between Solr, Elastic Search and SenseiDB https://github.com/vzhabiuk/search-perf the Solr version 3.5.0 was used. I've used the default configuration, just enabled json updates and used the following schema https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xml. 2.5 mln documents were put into the index, after that I've launched the indexing process to add anotherr 500k docs. I was issuing commits after each 500 doc batch . At the same time I've launched the concurrent client, that sent the following type of queries ((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!tags:soccer%20mom))%20 OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:yellow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black))%20 OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20 OR%20city:u.s.a.* facet=truefacet.field=tagsfacet.field=color The query contains the high level OR query, consisting of 2 terms, 2 ranges and 1 prefix. It is designed to hit ~60-70% of all the docs Here is the performance result: #Threads min median mean 75% qps 1 208.95ms 332.66ms 350.48ms 422.92ms 2.8 2 188.68ms 338.09ms 339.22ms 402.15ms 5.9 3 151.06ms 326.64ms 336.20ms 418.61ms 8.8 4 125.13ms 332.90ms 332.18ms 396.14ms 12.0 If there is no indexing process on background The result is as follows for 2,6 mln docs: #Threads min median mean 75% qps 1 106.70ms 199.66ms 199.40ms 234.89ms 5.1 2 128.61ms 199.12ms 201.81ms 229.89ms 9.9 3 110.99ms 197.43ms 203.13ms 232.25ms 14.7 4 90.24ms 201.46ms 200.46ms 227.75ms 19.9 5 106.14ms 208.75ms 207.69ms 242.88ms 24.0 6 103.75ms 208.91ms 211.23ms 238.60ms 28.3 7 113.54ms 207.07ms 209.69ms 239.99ms 33.3 8 117.32ms 216.38ms 224.74ms 258.74ms 35.5 I've got three questions so far: 1. In case of background indexing the latency is almost 2 times higher, is there any way to overcome this? 2. How can we tune the Solr to get better results ? 3. What's in your opinion is the preferred type of queries that I can use for the benchmark? With many thanks, Volodymyr BTW here is the spec of my machine RedHat 6.1 64bit Intel XEON e5620 @2.40 GHz, 8 cores 63 GB RAM
Benchmark Solr vs Elastic Search vs Sensei
Hi Solr users I've implemented the project to compare the performance between Solr, Elastic Search and SenseiDB https://github.com/vzhabiuk/search-perf the Solr version 3.5.0 was used. I've used the default configuration, just enabled json updates and used the following schema https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xml. 2.5 mln documents were put into the index, after that I've launched the indexing process to add anotherr 500k docs. I was issuing commits after each 500 doc batch . At the same time I've launched the concurrent client, that sent the following type of queries ((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!tags:soccer%20mom))%20 OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:yellow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black))%20 OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20 OR%20city:u.s.a.* facet=truefacet.field=tagsfacet.field=color The query contains the high level OR query, consisting of 2 terms, 2 ranges and 1 prefix. It is designed to hit ~60-70% of all the docs Here is the performance result: #Threads min median mean75% qps 1 208.95ms 332.66ms350.48ms 422.92ms 2.8 2 188.68ms 338.09ms339.22ms 402.15ms 5.9 3 151.06ms 326.64ms336.20ms 418.61ms 8.8 4 125.13ms 332.90ms332.18ms 396.14ms 12.0 If there is no indexing process on background The result is as follows for 2,6 mln docs: #Threads min median mean 75% qps 1 106.70ms 199.66ms199.40ms 234.89ms 5.1 2 128.61ms 199.12ms201.81ms 229.89ms 9.9 3 110.99ms 197.43ms203.13ms 232.25ms 14.7 4 90.24ms201.46ms 200.46ms 227.75ms 19.9 5 106.14ms 208.75ms207.69ms 242.88ms 24.0 6 103.75ms 208.91ms211.23ms 238.60ms 28.3 7 113.54ms 207.07ms209.69ms 239.99ms 33.3 8 117.32ms 216.38ms224.74ms 258.74ms 35.5 I've got three questions so far: 1. In case of background indexing the latency is almost 2 times higher, is there any way to overcome this? 2. How can we tune the Solr to get better results ? 3. What's in your opinion is the preferred type of queries that I can use for the benchmark? With many thanks, Volodymyr BTW here is the spec of my machine RedHat 6.1 64bit Intel XEON e5620 @2.40 GHz, 8 cores 63 GB RAM