Re: Solr 4.0 simultaneous query problem
So is it a better approach to query for smaller rows, say 500, and keep increasing the start parameter? wouldnt that be slower since I have an increasing start parameter and I will also be sorting by the same field in each of my queries made to the multiple shards? Also, does it make sense to have all these documents in the same shard? I went for this approach because the shard which is queried the most is small and gives a lot of benefit in terms of time taken for all the stats queries. This shard is only about 5 gb whereas the entire index will be about 50 gb. Thanks for the help, Rohit On Mon, Nov 5, 2012 at 4:02 PM, Walter Underwood wun...@wunderwood.orgwrote: Don't query for 5000 documents. That is going to be slow no matter how it is implemented. wunder On Nov 5, 2012, at 1:00 PM, Rohit Harchandani wrote: Hi, So it seems that when I query multiple shards with the sort criteria for 5000 documents, it queries all shards and gets a list of document ids and then adds the document ids to the original query and queries all the shards again. This process of doing the join of query results with the unique ids and getting the remaining fields is turning out to be really slow. It takes a while to search for a list of unique ids. Is there any config change to make this process faster? Also what does isDistrib=false mean when solr generates the queries internally? Thanks, Rohit On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi, The same query is fired always for 500 rows. The only thing different is the start parameter. The 3 shards are in the same instance on the same server. They all have the same schema. But the inherent type of the documents is different. Also most of the apps queries goes to shard A which has the smallest index size (4gb). The query is made to a master shard which by default goes to all 3 shards for results. (also, the query that i am trying matches documents only only in shard A mentioned above) Will try debugQuery now and post it here. Thanks, Rohit On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Maybe you can narrow this down a little further. Are there some queries that are faster and some slower? Is there a pattern? Can you share examples of slow queries? Have you tried debugQuery=true? These 3 shards is each of them on its own server or? Is the slow one always the one that hits the biggest shard? Do they hold the same type of data? How come their sizes are so different? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi all, I have an application which queries a solr instance having 3 shards(4gb, 13gb and 30gb index size respectively) having 6 million documents in all. When I start 10 threads in my app to make simultaneous queries (with rows=500 and different start parameter, sort on 1 field and no facets) to solr to return 500 different documents in each query, sometimes I see that most of the responses come back within no time (500ms-1000ms), but the last response takes close to 50 seconds (Qtime). I am using the latest 4.0 release. What is the reason for this delay? Is there a way to prevent this? Thanks and regards, Rohit -- Walter Underwood wun...@wunderwood.org
Re: Solr 4.0 simultaneous query problem
Hi, So it seems that when I query multiple shards with the sort criteria for 5000 documents, it queries all shards and gets a list of document ids and then adds the document ids to the original query and queries all the shards again. This process of doing the join of query results with the unique ids and getting the remaining fields is turning out to be really slow. It takes a while to search for a list of unique ids. Is there any config change to make this process faster? Also what does isDistrib=false mean when solr generates the queries internally? Thanks, Rohit On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.comwrote: Hi, The same query is fired always for 500 rows. The only thing different is the start parameter. The 3 shards are in the same instance on the same server. They all have the same schema. But the inherent type of the documents is different. Also most of the apps queries goes to shard A which has the smallest index size (4gb). The query is made to a master shard which by default goes to all 3 shards for results. (also, the query that i am trying matches documents only only in shard A mentioned above) Will try debugQuery now and post it here. Thanks, Rohit On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Maybe you can narrow this down a little further. Are there some queries that are faster and some slower? Is there a pattern? Can you share examples of slow queries? Have you tried debugQuery=true? These 3 shards is each of them on its own server or? Is the slow one always the one that hits the biggest shard? Do they hold the same type of data? How come their sizes are so different? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi all, I have an application which queries a solr instance having 3 shards(4gb, 13gb and 30gb index size respectively) having 6 million documents in all. When I start 10 threads in my app to make simultaneous queries (with rows=500 and different start parameter, sort on 1 field and no facets) to solr to return 500 different documents in each query, sometimes I see that most of the responses come back within no time (500ms-1000ms), but the last response takes close to 50 seconds (Qtime). I am using the latest 4.0 release. What is the reason for this delay? Is there a way to prevent this? Thanks and regards, Rohit
Re: Solr 4.0 simultaneous query problem
Don't query for 5000 documents. That is going to be slow no matter how it is implemented. wunder On Nov 5, 2012, at 1:00 PM, Rohit Harchandani wrote: Hi, So it seems that when I query multiple shards with the sort criteria for 5000 documents, it queries all shards and gets a list of document ids and then adds the document ids to the original query and queries all the shards again. This process of doing the join of query results with the unique ids and getting the remaining fields is turning out to be really slow. It takes a while to search for a list of unique ids. Is there any config change to make this process faster? Also what does isDistrib=false mean when solr generates the queries internally? Thanks, Rohit On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.comwrote: Hi, The same query is fired always for 500 rows. The only thing different is the start parameter. The 3 shards are in the same instance on the same server. They all have the same schema. But the inherent type of the documents is different. Also most of the apps queries goes to shard A which has the smallest index size (4gb). The query is made to a master shard which by default goes to all 3 shards for results. (also, the query that i am trying matches documents only only in shard A mentioned above) Will try debugQuery now and post it here. Thanks, Rohit On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Maybe you can narrow this down a little further. Are there some queries that are faster and some slower? Is there a pattern? Can you share examples of slow queries? Have you tried debugQuery=true? These 3 shards is each of them on its own server or? Is the slow one always the one that hits the biggest shard? Do they hold the same type of data? How come their sizes are so different? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi all, I have an application which queries a solr instance having 3 shards(4gb, 13gb and 30gb index size respectively) having 6 million documents in all. When I start 10 threads in my app to make simultaneous queries (with rows=500 and different start parameter, sort on 1 field and no facets) to solr to return 500 different documents in each query, sometimes I see that most of the responses come back within no time (500ms-1000ms), but the last response takes close to 50 seconds (Qtime). I am using the latest 4.0 release. What is the reason for this delay? Is there a way to prevent this? Thanks and regards, Rohit -- Walter Underwood wun...@wunderwood.org
Re: Solr 4.0 simultaneous query problem
Hi, Maybe you can narrow this down a little further. Are there some queries that are faster and some slower? Is there a pattern? Can you share examples of slow queries? Have you tried debugQuery=true? These 3 shards is each of them on its own server or? Is the slow one always the one that hits the biggest shard? Do they hold the same type of data? How come their sizes are so different? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi all, I have an application which queries a solr instance having 3 shards(4gb, 13gb and 30gb index size respectively) having 6 million documents in all. When I start 10 threads in my app to make simultaneous queries (with rows=500 and different start parameter, sort on 1 field and no facets) to solr to return 500 different documents in each query, sometimes I see that most of the responses come back within no time (500ms-1000ms), but the last response takes close to 50 seconds (Qtime). I am using the latest 4.0 release. What is the reason for this delay? Is there a way to prevent this? Thanks and regards, Rohit