Re: Re: How to speed up solr search speed
My query string is always simple like design, principle of design, tom EG: URL: http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on IMO, indeed with these types of simple searches caching (and thus RAM usage) can not be fully exploited, i.e: there isn't really anything to cache (no sort-ordering, faceting (Lucene fieldcache), no documentsets,faceting (Solr filtercache)) The only thing that helps you here would be a big solr querycache, depending on how often queries are repeated. Just execute the same query twice, the second time you should see a fast response (say 20ms) that's the querycache (and thus RAM) working for you. Now the issue I found is search with fq argument looks slow down the search. This doesn't align with your previous statement that you only use search with a q-param (e.g: http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on ) For your own sake, explain what you're trying to do, otherwise we really are guessing in the dark. Anyway the FQ-param let's you cache (using the Solr-filtercache) individual documentsets that can be used to efficiently to intersect your resultset. Also the first time, caches should be warmed (i.e: the fq-query should be exectuted and results saved to cache, since there isn't anything there yet) . Only on the second time would you start seeing improvements. For instance: http://localhost:7550/solr/select/?q=designfq=doctype:pdfversion=2.2start=0rows=10indent=onhttp://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=onwould only show documents containing design when the doctype=pdf (Again this is just an example here where I'm just assuming that you have defined a field 'doctype') since the nr of values of documenttype would be pretty low and would be used independently of other queries, this would be an excellent candidate for the FQ-param. http://wiki.apache.org/solr/CommonQueryParameters#fq http://wiki.apache.org/solr/CommonQueryParameters#fq This was a longer reply than I wanted to. Really think about your use-cases first, then present some real examples of what you want to achieve and then we can help you in a more useful manner. Cheers, Geert-Jan 2010/7/17 marship mars...@126.com Hi. Peter and All. I merged my indexes today. Now each index stores 10M document. Now I only have 10 solr cores. And I used java -Xmx1g -jar -server start.jar to start the jetty server. At first I deployed them all on one search. The search speed is about 3s. Then I noticed from cmd output when search start, 4 of 10's QTime only cost about 10ms-500ms. The left 5 cost more, up to 2-3s. Then I put 6 on web server, 4 on another(DB, high load most time). Then the search speed goes down to about 1s most time. Now most search takes about 1s. That's great. I watched the jetty output on cmd windows on web server, now when each search start, I saw 2 of 6 costs 60ms-80ms. The another 4 cost 170ms - 700ms. I do believe the bottleneck is still the hard disk. But at least, the search speed at the moment is acceptable. Maybe i should try memdisk to see if that help. And for -Xmx1g, actually I only see jetty consume about 150M memory, consider now the index is 10x bigger. I don't think that works. I googled -Xmx is go enlarge the heap size. Not sure can that help search. I still have 3.5G memory free on server. Now the issue I found is search with fq argument looks slow down the search. Thanks All for your help and suggestions. Thanks. Regards. Scott 在2010-07-17 03:36:19,Peter Karich peat...@yahoo.de 写道: Each solr(jetty) instance on consume 40M-60M memory. java -Xmx1024M -jar start.jar That's a good suggestion! Please, double check that you are using the -server version of the jvm and the latest 1.6.0_20 or so. Additionally you can start jvisualvm (shipped with the jdk) and hook into jetty/tomcat easily to see the current CPU and memory load. But I have 70 solr cores if you ask me: I would reduce them to 10-15 or even less and increase the RAM. try out tomcat too solr distriubted search's speed is decided by the slowest one. so, try to reduce the cores Regards, Peter. you mentioned that you have a lot of mem free, but your yetty containers only using between 40-60 mem. probably stating the obvious, but have you increased the -Xmx param like for instance: java -Xmx1024M -jar start.jar that way you're configuring the container to use a maximum of 1024 MB ram instead of the standard which is much lower (I'm not sure what exactly but it could well be 64MB for non -server, aligning with what you're seeing) Geert-Jan 2010/7/16 marship mars...@126.com Hi Tom Burton-West. Sorry looks my email ISP filtered out your replies. I checked web version of mailing list and saw your reply. My query string is always simple like
Re:Re: Re: How to speed up solr search speed
Hi. Geert-Jan. Thanks for replying. I know solr has querycache and it improves the search speed from second time. Actually when I talk about the search speed. I don't mean talking about the speed of cache. When user search on our site, I don't want the first time cost 10s and all following cost 0s. These are unacceptable. So I want the first time to be as fast as it can. So all my test speed only count the first time. For fq, yes, I need that. We have 5 different types, for general search, user doesn't need to specify which type he need to search over. But sometimes he needs to search over eg: type:product, that's the time I used fq and I believe I understand it correctly. Before I get today's speed, I was always testing against the simple search design etc, for the time before today, even the simple search speed is not acceptable so I doesn't care how fq speed will go. Today, as the simple search speed is acceptable. I move on to check fq and looks it sometimes is much slower than the simple search(The slower means it would take more than 2s, maybe 10s) . The only thing that helps you here would be a big solr querycache, depending on how often queries are repeated. I don't agree. I don't really care the speed of cache as I know it is always super fast. What I want to for solr is to consume as many memory as it can to pre-load the lucene index(maybe be 50% or even 100%). Then when the time comes it need to do the first time of a keyword. It is fast. (I haven't got the answer for this question.) Thanks. Regards. 在2010-07-17 19:30:26,Geert-Jan Brits gbr...@gmail.com 写道: My query string is always simple like design, principle of design, tom EG: URL: http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on IMO, indeed with these types of simple searches caching (and thus RAM usage) can not be fully exploited, i.e: there isn't really anything to cache (no sort-ordering, faceting (Lucene fieldcache), no documentsets,faceting (Solr filtercache)) The only thing that helps you here would be a big solr querycache, depending on how often queries are repeated. Just execute the same query twice, the second time you should see a fast response (say 20ms) that's the querycache (and thus RAM) working for you. Now the issue I found is search with fq argument looks slow down the search. This doesn't align with your previous statement that you only use search with a q-param (e.g: http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on ) For your own sake, explain what you're trying to do, otherwise we really are guessing in the dark. Anyway the FQ-param let's you cache (using the Solr-filtercache) individual documentsets that can be used to efficiently to intersect your resultset. Also the first time, caches should be warmed (i.e: the fq-query should be exectuted and results saved to cache, since there isn't anything there yet) . Only on the second time would you start seeing improvements. For instance: http://localhost:7550/solr/select/?q=designfq=doctype:pdfversion=2.2start=0rows=10indent=onhttp://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=onwould only show documents containing design when the doctype=pdf (Again this is just an example here where I'm just assuming that you have defined a field 'doctype') since the nr of values of documenttype would be pretty low and would be used independently of other queries, this would be an excellent candidate for the FQ-param. http://wiki.apache.org/solr/CommonQueryParameters#fq http://wiki.apache.org/solr/CommonQueryParameters#fq This was a longer reply than I wanted to. Really think about your use-cases first, then present some real examples of what you want to achieve and then we can help you in a more useful manner. Cheers, Geert-Jan 2010/7/17 marship mars...@126.com Hi. Peter and All. I merged my indexes today. Now each index stores 10M document. Now I only have 10 solr cores. And I used java -Xmx1g -jar -server start.jar to start the jetty server. At first I deployed them all on one search. The search speed is about 3s. Then I noticed from cmd output when search start, 4 of 10's QTime only cost about 10ms-500ms. The left 5 cost more, up to 2-3s. Then I put 6 on web server, 4 on another(DB, high load most time). Then the search speed goes down to about 1s most time. Now most search takes about 1s. That's great. I watched the jetty output on cmd windows on web server, now when each search start, I saw 2 of 6 costs 60ms-80ms. The another 4 cost 170ms - 700ms. I do believe the bottleneck is still the hard disk. But at least, the search speed at the moment is acceptable. Maybe i should try memdisk to see if that help. And for -Xmx1g, actually I only see jetty consume about 150M memory, consider now the index is 10x bigger. I don't think that works. I googled -Xmx is
Re: How to speed up solr search speed
On 7/17/2010 3:28 AM, marship wrote: Hi. Peter and All. I merged my indexes today. Now each index stores 10M document. Now I only have 10 solr cores. And I used java -Xmx1g -jar -server start.jar to start the jetty server. How big are the indexes on each of those cores? You can easily get this info from a URL like this (assuming the bundled Jetty and its standard port): http://hostname:8983/solr/corename/admin/replication/index.jsp If your server only has 4GB of RAM, low memory is almost guaranteed to be the true problem. With low ram levels, the disk cache is nearly useless, and high disk I/O is the symptom. My system runs as virtual machines. I've got six static indexes each a little over 12GB in size (7 million rows) and an incremental index that gets to about 700MB (300,000 rows). I've only got one active index core per virtual machine, except when doing a full reindex, which is rare. Each static VM is allocated 2 CPUs and 9GB of memory, each incremental has 2 CPUs and 3GB of memory. As I'm not using VMware, the memory is not oversubscribed. There is a slight oversubscription of CPUs, but I've never seen a CPU load problem. I've got dedicated VMs for load balancing and for the brokers. With a max heap of 1.5GB, that leaves over 7GB of RAM to act as disk cache for a 12GB index. My statistics show that each of my two broker cores has 185000 queries under its belt, with an average query time of about 185 milliseconds. If I had enough memory to fit the entire 12GB index into RAM, I'm sure my query times would be MUCH smaller. Here's a screenshot of the status page that aggregates my Solr statistics: http://www.flickr.com/photos/52107...@n05/4801491979/sizes/l/
Re: How to speed up solr search speed
Each solr(jetty) instance on consume 40M-60M memory. java -Xmx1024M -jar start.jar That's a good suggestion! Please, double check that you are using the -server version of the jvm and the latest 1.6.0_20 or so. Additionally you can start jvisualvm (shipped with the jdk) and hook into jetty/tomcat easily to see the current CPU and memory load. But I have 70 solr cores if you ask me: I would reduce them to 10-15 or even less and increase the RAM. try out tomcat too solr distriubted search's speed is decided by the slowest one. so, try to reduce the cores Regards, Peter. you mentioned that you have a lot of mem free, but your yetty containers only using between 40-60 mem. probably stating the obvious, but have you increased the -Xmx param like for instance: java -Xmx1024M -jar start.jar that way you're configuring the container to use a maximum of 1024 MB ram instead of the standard which is much lower (I'm not sure what exactly but it could well be 64MB for non -server, aligning with what you're seeing) Geert-Jan 2010/7/16 marship mars...@126.com Hi Tom Burton-West. Sorry looks my email ISP filtered out your replies. I checked web version of mailing list and saw your reply. My query string is always simple like design, principle of design, tom EG: URL: http://localhost:7550/solr/select/?q=designversion=2.2start=0rows=10indent=on Response: response - lst name=responseHeader int name=status0/int int name=QTime16/int - lst name=params str name=indenton/str str name=start0/str str name=qdesign/str str name=version2.2/str str name=rows10/str /lst /lst - result name=response numFound=5981 start=0 - doc str name=idproduct_208619/str /doc EG: http://localhost:7550/solr/select/?q=Principleversion=2.2start=0rows=10indent=on response - lst name=responseHeader int name=status0/int int name=QTime94/int - lst name=params str name=indenton/str str name=start0/str str name=qPrinciple/str str name=version2.2/str str name=rows10/str /lst /lst - result name=response numFound=104 start=0 - doc str name=idproduct_56926/str /doc As I am querying over single core and other cores are not querying at same time. The QTime looks good. But when I query the distributed node: (For this case, 6422ms is still a not bad one. Many cost ~20s) URL: http://localhost:7499/solr/select/?q=the+first+world+warversion=2.2start=0rows=10indent=ondebugQuery=true Response: response - lst name=responseHeader int name=status0/int int name=QTime6422/int - lst name=params str name=debugQuerytrue/str str name=indenton/str str name=start0/str str name=qthe first world war/str str name=version2.2/str str name=rows10/str /lst /lst - result name=response numFound=4231 start=0 Actually I am thinking and testing a solution: As I believe the bottleneck is in harddisk and all our indexes add up is about 10-15G. What about I just add another 16G memory to my server then use MemDisk to map a memory disk and put all my indexes into it. Then each time, solr/jetty need to load index from harddisk, it is loading from memory. This should give solr the most throughout and avoid the harddisk access delay. I am testing But if there are way to make solr use better use our limited resource to avoid adding new ones. that would be great. -- http://karussell.wordpress.com/
RE: How to speed up solr search speed
Is there any reason why you have to limit each instance to only 1M documents? If you could put more documents in the same core I think it would dramatically improve your response times. -Original Message- From: marship [mailto:mars...@126.com] Sent: donderdag 15 juli 2010 6:23 To: solr-user Subject: How to speed up solr search speed Hi. All. I got a problem with distributed solr search. The issue is I have 76M documents spread over 76 solr instances, each instance handles 1M documents. Previously I put all 76 instances on single server and when I tested I found each time it runs, it will take several times, mostly 10-20s to finish a search. Now, I split these instances into 2 servers. each one with 38 instances. the search speed is about 5-10s each time. 10s is a bit unacceptable for me. And based on my observation, the slow is caused by disk operation as all theses instances are on same server. Because when I test each single instance, it is purely fast, always ~400ms. When I use distributed search, I found some instance say it need 7000+ms. Our server has plenty of memory free of use. I am thinking is there a way we can make solr use more memory instead of harddisk index, like, load all indexes into memory so it can speed up? welcome any help. Thanks. Regards. Scott
Re: How to speed up solr search speed
How does your queries look like? Do you use faceting, highlighting, ... ? Did you try to customize the cache? Setting the HashDocSet to 0.005 of all documents improves our search speed a lot. Did you optimize the index? 500ms seems to be slow for an 'average' search. I am not an expert but without highlighting it should be faster as 100ms or at least 200ms Regards, Peter. Hi. Thanks for replying. My document has many different fields(about 30 fields, 10 different type of documents but these are not the point ) and I have to search over several fields. I was putting all 76M documents into several lucene indexes and use the default lucene.net ParaSearch to search over these indexes. That was slow, more than 20s. Then someone suggested I need to merge all our indexes into a huge one, he thought lucene can handle 76M documents in one index easily. Then I merged all the documents into a single huge one(which took me 3 days) . That time, the index folder is about 15G(I don't store info into index, just index them). Actually the search is still very slow, more than 20s too, and looks slower than use several indexes. Then I come to solr. Why I put 1M into each core is I found when a core has 1M document, the search speed is fast, range from 0-500ms, which is acceptable. I don't know how many documents to saved in one core is proper. The problem is even if I put 2M documents into each core. Then I have only 36 cores at the moment. But when our documents doubles in the future, same issue will rise again. So I don't think save 1M in each core is the issue. The issue is I put too many cores into one server. I don't have extra server to spread solr cores. So we have to improve solr search speed from some other way. Any suggestion? Regards. Scott ??2010-07-15 15:24:08??Fornoville, Tom tom.fornovi...@truvo.com ?? Is there any reason why you have to limit each instance to only 1M documents? If you could put more documents in the same core I think it would dramatically improve your response times. -Original Message- From: marship [mailto:mars...@126.com] Sent: donderdag 15 juli 2010 6:23 To: solr-user Subject: How to speed up solr search speed Hi. All. I got a problem with distributed solr search. The issue is I have 76M documents spread over 76 solr instances, each instance handles 1M documents. Previously I put all 76 instances on single server and when I tested I found each time it runs, it will take several times, mostly 10-20s to finish a search. Now, I split these instances into 2 servers. each one with 38 instances. the search speed is about 5-10s each time. 10s is a bit unacceptable for me. And based on my observation, the slow is caused by disk operation as all theses instances are on same server. Because when I test each single instance, it is purely fast, always ~400ms. When I use distributed search, I found some instance say it need 7000+ms. Our server has plenty of memory free of use. I am thinking is there a way we can make solr use more memory instead of harddisk index, like, load all indexes into memory so it can speed up? welcome any help. Thanks. Regards. Scott -- http://karussell.wordpress.com/
Re: How to speed up solr search speed
Hi Scott! I am aware these cores on same server are interfering with each other. Thats not good. Try to use only one core per CPU. With more per CPU you won't have any benefits over the single-core version, I think. can solr use more memory to avoid disk operation conflicts? Yes, only the memory you have on the machine of course. Are you using tomcat or jetty? For my case, I don't think solr can work as fast as 100-200ms on average. We have indices with a lot entries not as large as yours, but in the range of X Million. and have response times under 100ms. What about testing only one core with 5-10 Mio docs? If the response time isn't any better maybe you need a different field config or sth. different is wrong? So should I add it or the default(without it ) is ok? Without is also okay - solr uses default. With 75 Mio docs it should around 20 000 but I guess there is sth. different wrong: maybe caching or field definition. Could you post the latter one? Regards, Peter. Hi. Peter. I think I am not using faceting, highlighting ... I read about them but don't know how to work with them. I am using the default example just change the indexed fields. For my case, I don't think solr can work as fast as 100-200ms on average. I tried some keywords on only single solr instance. It sometimes takes more than 20s. I just input 4 keywords. I agree it is keyword concerns. But the issue is it doesn't work consistently. When 37 instances on same server works at same time (when a distributed search start), it goes worse, I saw some solr cores execute very fast, 0ms, ~40ms, ~200ms. But more solr cores executed as ~2500ms, ~3500ms, ~6700ms. and about 5-10 solr cores need more than 17s. I have 70 cores running. And the search speed depends on the SLOWEST one. Even 69 cores can run at 1ms. but last one need 50s. then the distributed search speed is 50s. I am aware these cores on same server are interfering with each other. As I have lots of free memory. I want to know, with the prerequisite, can solr use more memory to avoid disk operation conflicts? Thanks. Regards. Scott 在2010-07-15 17:19:57,Peter Karich peat...@yahoo.de 写道: How does your queries look like? Do you use faceting, highlighting, ... ? Did you try to customize the cache? Setting the HashDocSet to 0.005 of all documents improves our search speed a lot. Did you optimize the index? 500ms seems to be slow for an 'average' search. I am not an expert but without highlighting it should be faster as 100ms or at least 200ms Regards, Peter. Hi. Thanks for replying. My document has many different fields(about 30 fields, 10 different type of documents but these are not the point ) and I have to search over several fields. I was putting all 76M documents into several lucene indexes and use the default lucene.net ParaSearch to search over these indexes. That was slow, more than 20s. Then someone suggested I need to merge all our indexes into a huge one, he thought lucene can handle 76M documents in one index easily. Then I merged all the documents into a single huge one(which took me 3 days) . That time, the index folder is about 15G(I don't store info into index, just index them). Actually the search is still very slow, more than 20s too, and looks slower than use several indexes. Then I come to solr. Why I put 1M into each core is I found when a core has 1M document, the search speed is fast, range from 0-500ms, which is acceptable. I don't know how many documents to saved in one core is proper. The problem is even if I put 2M documents into each core. Then I have only 36 cores at the moment. But when our documents doubles in the future, same issue will rise again. So I don't think save 1M in each core is the issue. The issue is I put too many cores into one server. I don't have extra server to spread solr cores. So we have to improve solr search speed from some other way. Any suggestion? Regards. Scott 在2010-07-15 15:24:08,Fornoville, Tom tom.fornovi...@truvo.com 写道: Is there any reason why you have to limit each instance to only 1M documents? If you could put more documents in the same core I think it would dramatically improve your response times. -Original Message- From: marship [mailto:mars...@126.com] Sent: donderdag 15 juli 2010 6:23 To: solr-user Subject: How to speed up solr search speed Hi. All. I got a problem with distributed solr search. The issue is I have 76M documents spread over 76 solr instances, each instance handles 1M documents. Previously I put all 76 instances on single server and when I tested I found each time it runs, it will take several times, mostly 10-20s to finish a search. Now, I split these instances into 2 servers. each one with 38 instances. the search speed is about 5-10s each time. 10s is a bit unacceptable for me. And based on my observation, the slow is caused by disk operation as all theses instances are on same