Re: Solr 4.0 simultaneous query problem

2012-11-06 Thread Rohit Harchandani
So is it a better approach to query for smaller rows, say 500, and keep
increasing the start parameter? wouldnt that be slower since I have an
increasing start parameter and I will also be sorting by the same field in
each of my queries made to the multiple shards?

Also, does it make sense to have all these documents in the same shard? I
went for this approach because the shard which is queried the most is small
and gives a lot of benefit in terms of time taken for all the stats
queries. This shard is only about 5 gb whereas the entire index will be
about 50 gb.

Thanks for the help,
Rohit

On Mon, Nov 5, 2012 at 4:02 PM, Walter Underwood wun...@wunderwood.orgwrote:

 Don't query for 5000 documents. That is going to be slow no matter how it
 is implemented.

 wunder

 On Nov 5, 2012, at 1:00 PM, Rohit Harchandani wrote:

  Hi,
  So it seems that when I query multiple shards with the sort criteria for
  5000 documents, it queries all shards and gets a list of document ids and
  then adds the document ids to the original query and queries all the
 shards
  again.
  This process of doing the join of query results with the unique ids and
  getting the remaining fields is turning out to be really slow. It takes a
  while to search for a list of unique ids. Is there any config change  to
  make this process faster?
  Also what does isDistrib=false mean when solr generates the queries
  internally?
  Thanks,
  Rohit
 
  On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.com
 wrote:
 
  Hi,
 
  The same query is fired always for 500 rows. The only thing different is
  the start parameter.
 
  The 3 shards are in the same instance on the same server. They all have
  the same schema. But the inherent type of the documents is different.
 Also
  most of the apps queries goes to shard A which has the smallest index
  size (4gb).
 
  The query is made to a master shard which by default goes to all 3
  shards for results. (also, the query that i am trying matches documents
  only only in shard A mentioned above)
 
  Will try debugQuery now and post it here.
 
  Thanks,
  Rohit
 
 
 
 
  On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
  Hi,
 
  Maybe you can narrow this down a little further.  Are there some
  queries that are faster and some slower?  Is there a pattern?  Can you
  share examples of slow queries?  Have you tried debugQuery=true?
  These 3 shards is each of them on its own server or?  Is the slow
  one always the one that hits the biggest shard?  Do they hold the same
  type of data?  How come their sizes are so different?
 
  Otis
  --
  Search Analytics - http://sematext.com/search-analytics/index.html
  Performance Monitoring - http://sematext.com/spm/index.html
 
 
  On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com
 
  wrote:
  Hi all,
  I have an application which queries a solr instance having 3
 shards(4gb,
  13gb and 30gb index size respectively) having 6 million documents in
  all.
  When I start 10 threads in my app to make simultaneous queries (with
  rows=500 and different start parameter, sort on 1 field and no facets)
  to
  solr to return 500 different documents in each query, sometimes I see
  that
  most of the responses come back within no time (500ms-1000ms), but the
  last
  response takes close to 50 seconds (Qtime).
  I am using the latest 4.0 release. What is the reason for this delay?
 Is
  there a way to prevent this?
  Thanks and regards,
  Rohit
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Solr 4.0 simultaneous query problem

2012-11-05 Thread Rohit Harchandani
Hi,
So it seems that when I query multiple shards with the sort criteria for
5000 documents, it queries all shards and gets a list of document ids and
then adds the document ids to the original query and queries all the shards
again.
This process of doing the join of query results with the unique ids and
getting the remaining fields is turning out to be really slow. It takes a
while to search for a list of unique ids. Is there any config change  to
make this process faster?
Also what does isDistrib=false mean when solr generates the queries
internally?
Thanks,
Rohit

On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.comwrote:

 Hi,

 The same query is fired always for 500 rows. The only thing different is
 the start parameter.

 The 3 shards are in the same instance on the same server. They all have
 the same schema. But the inherent type of the documents is different. Also
 most of the apps queries goes to shard A which has the smallest index
 size (4gb).

 The query is made to a master shard which by default goes to all 3
 shards for results. (also, the query that i am trying matches documents
 only only in shard A mentioned above)

 Will try debugQuery now and post it here.

 Thanks,
 Rohit




 On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

 Hi,

 Maybe you can narrow this down a little further.  Are there some
 queries that are faster and some slower?  Is there a pattern?  Can you
 share examples of slow queries?  Have you tried debugQuery=true?
 These 3 shards is each of them on its own server or?  Is the slow
 one always the one that hits the biggest shard?  Do they hold the same
 type of data?  How come their sizes are so different?

 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html


 On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com
 wrote:
  Hi all,
  I have an application which queries a solr instance having 3 shards(4gb,
  13gb and 30gb index size respectively) having 6 million documents in
 all.
  When I start 10 threads in my app to make simultaneous queries (with
  rows=500 and different start parameter, sort on 1 field and no facets)
 to
  solr to return 500 different documents in each query, sometimes I see
 that
  most of the responses come back within no time (500ms-1000ms), but the
 last
  response takes close to 50 seconds (Qtime).
  I am using the latest 4.0 release. What is the reason for this delay? Is
  there a way to prevent this?
  Thanks and regards,
  Rohit





Re: Solr 4.0 simultaneous query problem

2012-11-05 Thread Walter Underwood
Don't query for 5000 documents. That is going to be slow no matter how it is 
implemented.

wunder

On Nov 5, 2012, at 1:00 PM, Rohit Harchandani wrote:

 Hi,
 So it seems that when I query multiple shards with the sort criteria for
 5000 documents, it queries all shards and gets a list of document ids and
 then adds the document ids to the original query and queries all the shards
 again.
 This process of doing the join of query results with the unique ids and
 getting the remaining fields is turning out to be really slow. It takes a
 while to search for a list of unique ids. Is there any config change  to
 make this process faster?
 Also what does isDistrib=false mean when solr generates the queries
 internally?
 Thanks,
 Rohit
 
 On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.comwrote:
 
 Hi,
 
 The same query is fired always for 500 rows. The only thing different is
 the start parameter.
 
 The 3 shards are in the same instance on the same server. They all have
 the same schema. But the inherent type of the documents is different. Also
 most of the apps queries goes to shard A which has the smallest index
 size (4gb).
 
 The query is made to a master shard which by default goes to all 3
 shards for results. (also, the query that i am trying matches documents
 only only in shard A mentioned above)
 
 Will try debugQuery now and post it here.
 
 Thanks,
 Rohit
 
 
 
 
 On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:
 
 Hi,
 
 Maybe you can narrow this down a little further.  Are there some
 queries that are faster and some slower?  Is there a pattern?  Can you
 share examples of slow queries?  Have you tried debugQuery=true?
 These 3 shards is each of them on its own server or?  Is the slow
 one always the one that hits the biggest shard?  Do they hold the same
 type of data?  How come their sizes are so different?
 
 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html
 
 
 On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com
 wrote:
 Hi all,
 I have an application which queries a solr instance having 3 shards(4gb,
 13gb and 30gb index size respectively) having 6 million documents in
 all.
 When I start 10 threads in my app to make simultaneous queries (with
 rows=500 and different start parameter, sort on 1 field and no facets)
 to
 solr to return 500 different documents in each query, sometimes I see
 that
 most of the responses come back within no time (500ms-1000ms), but the
 last
 response takes close to 50 seconds (Qtime).
 I am using the latest 4.0 release. What is the reason for this delay? Is
 there a way to prevent this?
 Thanks and regards,
 Rohit
 
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: Solr 4.0 simultaneous query problem

2012-10-18 Thread Otis Gospodnetic
Hi,

Maybe you can narrow this down a little further.  Are there some
queries that are faster and some slower?  Is there a pattern?  Can you
share examples of slow queries?  Have you tried debugQuery=true?
These 3 shards is each of them on its own server or?  Is the slow
one always the one that hits the biggest shard?  Do they hold the same
type of data?  How come their sizes are so different?

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com wrote:
 Hi all,
 I have an application which queries a solr instance having 3 shards(4gb,
 13gb and 30gb index size respectively) having 6 million documents in all.
 When I start 10 threads in my app to make simultaneous queries (with
 rows=500 and different start parameter, sort on 1 field and no facets) to
 solr to return 500 different documents in each query, sometimes I see that
 most of the responses come back within no time (500ms-1000ms), but the last
 response takes close to 50 seconds (Qtime).
 I am using the latest 4.0 release. What is the reason for this delay? Is
 there a way to prevent this?
 Thanks and regards,
 Rohit