Re: Benchmark Solr vs Elastic Search vs Sensei

2012-04-27 Thread Volodymyr Zhabiuk
Hi Eric

Thanks for extensive answers. I will try to tune up my Solr
installation according to your advises and the wiki page you've
mentioned

Best regards,
Volodymyr

2012/4/27 Jeremy Taylor jtay...@datastax.com:
 DataStax offers a Solr integration that isn't master/slave and is
 NearRealTimes.  Essentially, the software offers the great features of
 Solr without the major shortcomings.

 Jeremy

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Friday, April 27, 2012 5:26 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Benchmark Solr vs Elastic Search vs Sensei

 Some observations:
 1 I suspect some of your queries aren't doing what you expect, but
     I'm not sure if that matters. e.g. !tags:chick magnet will be parsed
     as -tags:chick defaultField:magnet.
 2 Typical Solr setups in production are usually master/slave
     setups. Your indexing process (the commits) are causing
     new searchers to be opened/warmed/etc quite regularly,
     reducing your throughput. It's not surprising at all that
     your QPS rate increases when not indexing.
 3 The trunk Near Real Time with soft commits should change
     the characteristics of the test with background indexing. You
     might try that.
 4 Examine your cache usage, see the Solr admin page. Caches
     are quite important. Also consider autowarming characteristics.
 5 There's a ton of stuff you can do to tune query rate. Unfortunately
     what the specific thing that would help your situation is hard to
     say. You might start with:
    http://wiki.apache.org/lucene-java/ImproveSearchingSpeed

 Best
 Erick

 On Thu, Apr 26, 2012 at 9:50 PM, Volodymyr Zhabiuk vzhab...@gmail.com
 wrote:
 Hi Solr users

 I've implemented the project to compare the performance between Solr,
 Elastic Search and SenseiDB https://github.com/vzhabiuk/search-perf
  the Solr version 3.5.0 was used. I've used the default configuration,
 just enabled json updates and used the following schema

 https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xm
 l.
 2.5 mln documents were put into the index, after that I've launched
 the indexing process to add anotherr 500k docs. I was issuing commits
 after each 500 doc batch . At the same time I've launched the
 concurrent client, that sent the following type of queries
 ((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:
 hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!ta
 gs:soccer%20mom))%20
 OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:ye
 llow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black)
 )%20
 OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
 OR%20city:u.s.a.*
 facet=truefacet.field=tagsfacet.field=color
 The query contains the high level OR query, consisting of 2 terms, 2
 ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
 Here is the performance result:
 #Threads     min       median         mean            75%         qps
   1         208.95ms  332.66ms    350.48ms     422.92ms     2.8
   2         188.68ms  338.09ms    339.22ms     402.15ms     5.9
   3         151.06ms  326.64ms    336.20ms     418.61ms     8.8
   4         125.13ms  332.90ms    332.18ms     396.14ms     12.0 If
 there is no  indexing process on background The result is as follows
 for 2,6 mln docs:
 #Threads     min     median          mean             75%         qps
   1         106.70ms  199.66ms    199.40ms     234.89ms     5.1
   2         128.61ms  199.12ms    201.81ms     229.89ms     9.9
   3         110.99ms  197.43ms    203.13ms     232.25ms     14.7
   4         90.24ms    201.46ms      200.46ms     227.75ms     19.9
   5         106.14ms  208.75ms    207.69ms     242.88ms     24.0
   6         103.75ms  208.91ms    211.23ms     238.60ms     28.3
   7         113.54ms  207.07ms    209.69ms     239.99ms     33.3
   8         117.32ms  216.38ms    224.74ms     258.74ms     35.5 I've
 got three questions so far:
 1. In case of background indexing the latency is almost 2 times
 higher, is there any way to overcome this?
 2. How can we tune the Solr to get better results ?
 3. What's in your opinion is the preferred type of queries that I can
 use for the benchmark?

 With many thanks,
 Volodymyr


 BTW here is the spec of my machine
 RedHat 6.1 64bit
 Intel XEON e5620 @2.40 GHz, 8 cores
 63 GB RAM


Re: Benchmark Solr vs Elastic Search vs Sensei

2012-04-27 Thread Volodymyr Zhabiuk
Hi Andy

I don't want to publish results, since still there are some mistakes
in the benchmark. Also this would be controversial, because there are
too many parameters to tune and to take into consideration.
Nevertheless you can go to the Sensei google group to see the
preliminary result for Sensei

At first I was using the benchmark to do the stress testing for
Sensei. We needed  to identify possible memory leaks and bottlenecks
in the new release. After that I've extended the tool to test  Solr
and Elastic search

With many thanks,
Volodymyr

2012/4/27 Andy angelf...@yahoo.com:
 What is the performance of Elasticsearch and SenseiDB in your benchmark?


 
  From: Volodymyr Zhabiuk vzhab...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 26, 2012 9:50 PM
 Subject: Benchmark Solr vs Elastic Search vs Sensei

 Hi Solr users

 I've implemented the project to compare the performance between
 Solr, Elastic Search and SenseiDB
 https://github.com/vzhabiuk/search-perf
 the Solr version 3.5.0 was used. I've used the default configuration,
 just enabled json updates and used the following schema
 https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xml.
 2.5 mln documents were put into the index, after
 that I've launched the indexing process to add anotherr 500k docs. I
 was issuing commits after each 500 doc batch . At the
 same time I've launched the concurrent client, that sent the
 following type of queries
 ((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!tags:soccer%20mom))%20
 OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:yellow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black))%20
 OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
 OR%20city:u.s.a.*
 facet=truefacet.field=tagsfacet.field=color
 The query contains the high level OR query, consisting of 2 terms, 2
 ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
 Here is the performance result:
 #Threads     min       median         mean            75%         qps
    1         208.95ms  332.66ms    350.48ms     422.92ms     2.8
    2         188.68ms  338.09ms    339.22ms     402.15ms     5.9
    3         151.06ms  326.64ms    336.20ms     418.61ms     8.8
    4         125.13ms  332.90ms    332.18ms     396.14ms     12.0
 If there is no  indexing process on background
 The result is as follows for 2,6 mln docs:
 #Threads     min     median          mean             75%         qps
    1         106.70ms  199.66ms    199.40ms     234.89ms     5.1
    2         128.61ms  199.12ms    201.81ms     229.89ms     9.9
    3         110.99ms  197.43ms    203.13ms     232.25ms     14.7
    4         90.24ms    201.46ms      200.46ms     227.75ms     19.9
    5         106.14ms  208.75ms    207.69ms     242.88ms     24.0
    6         103.75ms  208.91ms    211.23ms     238.60ms     28.3
    7         113.54ms  207.07ms    209.69ms     239.99ms     33.3
    8         117.32ms  216.38ms    224.74ms     258.74ms     35.5
 I've got three questions so far:
 1. In case of background indexing the latency is almost 2 times
 higher, is there any way to overcome this?
 2. How can we tune the Solr to get better results ?
 3. What's in your opinion is the preferred type of queries that I can
 use for the benchmark?

 With many thanks,
 Volodymyr


 BTW here is the spec of my machine
 RedHat 6.1 64bit
 Intel XEON e5620 @2.40 GHz, 8 cores
 63 GB RAM


Benchmark Solr vs Elastic Search vs Sensei

2012-04-26 Thread Volodymyr Zhabiuk
Hi Solr users

I've implemented the project to compare the performance between
Solr, Elastic Search and SenseiDB
https://github.com/vzhabiuk/search-perf
 the Solr version 3.5.0 was used. I've used the default configuration,
just enabled json updates and used the following schema
https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xml.
2.5 mln documents were put into the index, after
that I've launched the indexing process to add anotherr 500k docs. I
was issuing commits after each 500 doc batch . At the
same time I've launched the concurrent client, that sent the
following type of queries
((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!tags:soccer%20mom))%20
OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:yellow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black))%20
OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
OR%20city:u.s.a.*
facet=truefacet.field=tagsfacet.field=color
The query contains the high level OR query, consisting of 2 terms, 2
ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
Here is the performance result:
#Threads min   median mean75% qps
   1 208.95ms  332.66ms350.48ms 422.92ms 2.8
   2 188.68ms  338.09ms339.22ms 402.15ms 5.9
   3 151.06ms  326.64ms336.20ms 418.61ms 8.8
   4 125.13ms  332.90ms332.18ms 396.14ms 12.0
If there is no  indexing process on background
The result is as follows for 2,6 mln docs:
#Threads min median  mean 75% qps
   1 106.70ms  199.66ms199.40ms 234.89ms 5.1
   2 128.61ms  199.12ms201.81ms 229.89ms 9.9
   3 110.99ms  197.43ms203.13ms 232.25ms 14.7
   4 90.24ms201.46ms  200.46ms 227.75ms 19.9
   5 106.14ms  208.75ms207.69ms 242.88ms 24.0
   6 103.75ms  208.91ms211.23ms 238.60ms 28.3
   7 113.54ms  207.07ms209.69ms 239.99ms 33.3
   8 117.32ms  216.38ms224.74ms 258.74ms 35.5
I've got three questions so far:
1. In case of background indexing the latency is almost 2 times
higher, is there any way to overcome this?
2. How can we tune the Solr to get better results ?
3. What's in your opinion is the preferred type of queries that I can
use for the benchmark?

With many thanks,
Volodymyr


BTW here is the spec of my machine
RedHat 6.1 64bit
Intel XEON e5620 @2.40 GHz, 8 cores
63 GB RAM