How to not limit maximum number of documents?

2010-02-10 Thread egon . o
Hi at all,

I'm working with Solr1.4 and came across the point, that Solr limits the number 
of documents retrieved by a solr response. This number can be changed by the 
common query parameter 'rows'.

In my scenario it is very important that the response contains ALL documents in 
the index! I played around with the 'rows'-parameter but couldn't find a way to 
do it.

I was not able to find any hint in the mailing list.
Thanks a lot in advance.

Cheers,
Egon
-- 
NEU: Mit GMX DSL über 1000,- ¿ sparen!
http://portal.gmx.net/de/go/dsl02


RE: How to not limit maximum number of documents?

2010-02-10 Thread stefan.maric
I was just thinking along similar lines

As far as I can tell you can use the parameters start  rows in combination to 
control the retrieval of query results

So
http://host:port/solr/select/?q=query
Will retrieve up to results 1..10

http://host:port/solr/select/?q=querystart=11rows=10
Will retrieve up results 11..20

So it is up to your application to control result traversal/pagination


Question - does this mean that 
http://host:port/solr/select/?q=querystart=11rows=10
Runs the query a 2nd time

And so on


Regards
Stefan Maric 


Re: How to not limit maximum number of documents?

2010-02-10 Thread egon . o
Hi Stefan,

you are right. I noticed this page-based result handling too. For web pages it 
is handy to maintain a number-of-results-per-page parameter together with an 
offset to browse result pages. Both can be done be solr's 'start' and 'rows' 
parameters.
But as I don't use Solr in a web context it's important for me to get all 
results in one go.

While waiting for answers I was working on a work-around and came across the 
LukeRequestHandler (http://wiki.apache.org/solr/LukeRequestHandler). It allows 
to query the index and obtain meta information about it. I found a parameter in 
the response called 'numDocs' which seams to contain the current number of 
index rows.

So I was now thinking about first asking for the number of index rows via the 
LukeRequestHandler and then setting the 'rows' parameter to this value. 
Apparently, this is quite expensive as one front-end query always leads to two 
back-end queries. So I'm still searching for a better way to do this!

Cheers,
Egon



 Original-Nachricht 
 Datum: Wed, 10 Feb 2010 13:19:05 +
 Von: stefan.ma...@bt.com
 An: solr-user@lucene.apache.org
 Betreff: RE: How to not limit maximum number of documents?

 I was just thinking along similar lines
 
 As far as I can tell you can use the parameters start  rows in
 combination to control the retrieval of query results
 
 So
 http://host:port/solr/select/?q=query
 Will retrieve up to results 1..10
 
 http://host:port/solr/select/?q=querystart=11rows=10
 Will retrieve up results 11..20
 
 So it is up to your application to control result traversal/pagination
 
 
 Question - does this mean that 
 http://host:port/solr/select/?q=querystart=11rows=10
 Runs the query a 2nd time
 
 And so on
 
 
 Regards
 Stefan Maric 

-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01


RE: How to not limit maximum number of documents?

2010-02-10 Thread stefan.maric
Egon

If you first run your query with q=queryrows=0

Then your you get back an indication of the total number of docs 
result name=response numFound=53 start=0/

Now your app can query again to get 1st n rows  manage forward|backward 
traversal of results by subsequent queries



Regards
Stefan Maric 

-Original Message-
From: ego...@gmx.de [mailto:ego...@gmx.de] 
Sent: 10 February 2010 14:08
To: solr-user@lucene.apache.org
Subject: Re: How to not limit maximum number of documents?

Hi Stefan,

you are right. I noticed this page-based result handling too. For web pages it 
is handy to maintain a number-of-results-per-page parameter together with an 
offset to browse result pages. Both can be done be solr's 'start' and 'rows' 
parameters.
But as I don't use Solr in a web context it's important for me to get all 
results in one go.

While waiting for answers I was working on a work-around and came across the 
LukeRequestHandler (http://wiki.apache.org/solr/LukeRequestHandler). It allows 
to query the index and obtain meta information about it. I found a parameter in 
the response called 'numDocs' which seams to contain the current number of 
index rows.

So I was now thinking about first asking for the number of index rows via the 
LukeRequestHandler and then setting the 'rows' parameter to this value. 
Apparently, this is quite expensive as one front-end query always leads to two 
back-end queries. So I'm still searching for a better way to do this!

Cheers,
Egon



 Original-Nachricht 
 Datum: Wed, 10 Feb 2010 13:19:05 +
 Von: stefan.ma...@bt.com
 An: solr-user@lucene.apache.org
 Betreff: RE: How to not limit maximum number of documents?

 I was just thinking along similar lines
 
 As far as I can tell you can use the parameters start  rows in 
 combination to control the retrieval of query results
 
 So
 http://host:port/solr/select/?q=query
 Will retrieve up to results 1..10
 
 http://host:port/solr/select/?q=querystart=11rows=10
 Will retrieve up results 11..20
 
 So it is up to your application to control result traversal/pagination
 
 
 Question - does this mean that
 http://host:port/solr/select/?q=querystart=11rows=10
 Runs the query a 2nd time
 
 And so on
 
 
 Regards
 Stefan Maric

--
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01


Re: How to not limit maximum number of documents?

2010-02-10 Thread Ron Chan
just set the rows to a very large number, larger than the number of documents 
available 

useful to set the fl parameter with the fields required to avoid memory 
problems, if each document contains a lot of information 


- Original Message - 
From: stefan maric stefan.ma...@bt.com 
To: solr-user@lucene.apache.org 
Sent: Wednesday, 10 February, 2010 2:14:05 PM 
Subject: RE: How to not limit maximum number of documents? 

Egon 

If you first run your query with q=queryrows=0 

Then your you get back an indication of the total number of docs 
result name=response numFound=53 start=0/ 

Now your app can query again to get 1st n rows  manage forward|backward 
traversal of results by subsequent queries 



Regards 
Stefan Maric 

-Original Message- 
From: ego...@gmx.de [mailto:ego...@gmx.de] 
Sent: 10 February 2010 14:08 
To: solr-user@lucene.apache.org 
Subject: Re: How to not limit maximum number of documents? 

Hi Stefan, 

you are right. I noticed this page-based result handling too. For web pages it 
is handy to maintain a number-of-results-per-page parameter together with an 
offset to browse result pages. Both can be done be solr's 'start' and 'rows' 
parameters. 
But as I don't use Solr in a web context it's important for me to get all 
results in one go. 

While waiting for answers I was working on a work-around and came across the 
LukeRequestHandler (http://wiki.apache.org/solr/LukeRequestHandler). It allows 
to query the index and obtain meta information about it. I found a parameter in 
the response called 'numDocs' which seams to contain the current number of 
index rows. 

So I was now thinking about first asking for the number of index rows via the 
LukeRequestHandler and then setting the 'rows' parameter to this value. 
Apparently, this is quite expensive as one front-end query always leads to two 
back-end queries. So I'm still searching for a better way to do this! 

Cheers, 
Egon 



 Original-Nachricht  
 Datum: Wed, 10 Feb 2010 13:19:05 + 
 Von: stefan.ma...@bt.com 
 An: solr-user@lucene.apache.org 
 Betreff: RE: How to not limit maximum number of documents? 

 I was just thinking along similar lines 
 
 As far as I can tell you can use the parameters start  rows in 
 combination to control the retrieval of query results 
 
 So 
 http://host:port/solr/select/?q=query 
 Will retrieve up to results 1..10 
 
 http://host:port/solr/select/?q=querystart=11rows=10 
 Will retrieve up results 11..20 
 
 So it is up to your application to control result traversal/pagination 
 
 
 Question - does this mean that 
 http://host:port/solr/select/?q=querystart=11rows=10 
 Runs the query a 2nd time 
 
 And so on 
 
 
 Regards 
 Stefan Maric 

-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT! 
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 


Re: How to not limit maximum number of documents?

2010-02-10 Thread egon . o
Setting the 'rows' parameter to a number larger than the number of documents 
available requires that you know how much are available. That's what I intended 
to retrieve via the LukeRequestHandler.

Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :)
But still, it feels like a hack. Originally I was searching more for something 
like:

q=queryrows=-1

Which leaves the API to do the job (efficiently!). :)
The question is:
Does Solr support something? Or should we write a feature request?

Cheers,
Egon



 Original-Message 
 Datum: Wed, 10 Feb 2010 14:38:51 + (GMT)
 Von: Ron Chan rc...@i-tao.com
 An: solr-user@lucene.apache.org
 Betreff: Re: How to not limit maximum number of documents?

 just set the rows to a very large number, larger than the number of
 documents available 
 
 useful to set the fl parameter with the fields required to avoid memory
 problems, if each document contains a lot of information 
 
 
 - Original Message - 
 From: stefan maric stefan.ma...@bt.com 
 To: solr-user@lucene.apache.org 
 Sent: Wednesday, 10 February, 2010 2:14:05 PM 
 Subject: RE: How to not limit maximum number of documents? 
 
 Egon 
 
 If you first run your query with q=queryrows=0 
 
 Then your you get back an indication of the total number of docs 
 result name=response numFound=53 start=0/ 
 
 Now your app can query again to get 1st n rows  manage forward|backward
 traversal of results by subsequent queries 
 
 
 
 Regards 
 Stefan Maric
-- 
NEU: Mit GMX DSL über 1000,- ¿ sparen!
http://portal.gmx.net/de/go/dsl02


RE: How to not limit maximum number of documents?

2010-02-10 Thread stefan.maric
Yes, I tried the q=queryrows=-1 - the other day and gave up

But as you say it wouldn't help because you might get 
a) timeouts because you have to wait a 'long' time for the large set of results 
to be returned
b) exceptions being thrown because you're retrieving too much info to be thrown 
around the system



Regards
Stefan Maric 

-Original Message-
From: ego...@gmx.de [mailto:ego...@gmx.de] 
Sent: 10 February 2010 15:06
To: solr-user@lucene.apache.org
Subject: Re: How to not limit maximum number of documents?

Setting the 'rows' parameter to a number larger than the number of documents 
available requires that you know how much are available. That's what I intended 
to retrieve via the LukeRequestHandler.

Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :) 
But still, it feels like a hack. Originally I was searching more for something 
like:

q=queryrows=-1

Which leaves the API to do the job (efficiently!). :) The question is:
Does Solr support something? Or should we write a feature request?

Cheers,
Egon



 Original-Message 
 Datum: Wed, 10 Feb 2010 14:38:51 + (GMT)
 Von: Ron Chan rc...@i-tao.com
 An: solr-user@lucene.apache.org
 Betreff: Re: How to not limit maximum number of documents?

 just set the rows to a very large number, larger than the number of 
 documents available
 
 useful to set the fl parameter with the fields required to avoid 
 memory problems, if each document contains a lot of information
 
 
 - Original Message -
 From: stefan maric stefan.ma...@bt.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, 10 February, 2010 2:14:05 PM
 Subject: RE: How to not limit maximum number of documents? 
 
 Egon
 
 If you first run your query with q=queryrows=0
 
 Then your you get back an indication of the total number of docs 
 result name=response numFound=53 start=0/
 
 Now your app can query again to get 1st n rows  manage 
 forward|backward traversal of results by subsequent queries
 
 
 
 Regards
 Stefan Maric
--
NEU: Mit GMX DSL über 1000,- ¿ sparen!
http://portal.gmx.net/de/go/dsl02


Re: How to not limit maximum number of documents?

2010-02-10 Thread Walter Underwood
Solr will not do this efficiently. Getting all rows will be very slow. Adding a 
parameter will not make it fast.

Why do you want to do this?

wunder

On Feb 10, 2010, at 7:06 AM, ego...@gmx.de wrote:

 Setting the 'rows' parameter to a number larger than the number of documents 
 available requires that you know how much are available. That's what I 
 intended to retrieve via the LukeRequestHandler.
 
 Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :)
 But still, it feels like a hack. Originally I was searching more for 
 something like:
 
 q=queryrows=-1
 
 Which leaves the API to do the job (efficiently!). :)
 The question is:
 Does Solr support something? Or should we write a feature request?
 
 Cheers,
 Egon
 
 
 
  Original-Message 
 Datum: Wed, 10 Feb 2010 14:38:51 + (GMT)
 Von: Ron Chan rc...@i-tao.com
 An: solr-user@lucene.apache.org
 Betreff: Re: How to not limit maximum number of documents?
 
 just set the rows to a very large number, larger than the number of
 documents available 
 
 useful to set the fl parameter with the fields required to avoid memory
 problems, if each document contains a lot of information 
 
 
 - Original Message - 
 From: stefan maric stefan.ma...@bt.com 
 To: solr-user@lucene.apache.org 
 Sent: Wednesday, 10 February, 2010 2:14:05 PM 
 Subject: RE: How to not limit maximum number of documents? 
 
 Egon 
 
 If you first run your query with q=queryrows=0 
 
 Then your you get back an indication of the total number of docs 
 result name=response numFound=53 start=0/ 
 
 Now your app can query again to get 1st n rows  manage forward|backward
 traversal of results by subsequent queries 
 
 
 
 Regards 
 Stefan Maric
 -- 
 NEU: Mit GMX DSL über 1000,- ¿ sparen!
 http://portal.gmx.net/de/go/dsl02
 



Re: How to not limit maximum number of documents?

2010-02-10 Thread egon . o
Okay. So we have to leave this question open for now. There might be other 
(more advanced) users that can answer this question. It's for sure, the 
solution we found is not quite good.

In the meantime, I will look for a way to submit a feature request. :)



 Original-Message 
 Datum: Wed, 10 Feb 2010 15:13:49 +
 Von: stefan.ma...@bt.com
 An: solr-user@lucene.apache.org
 Betreff: RE: How to not limit maximum number of documents?

 Yes, I tried the q=queryrows=-1 - the other day and gave up
 
 But as you say it wouldn't help because you might get 
 a) timeouts because you have to wait a 'long' time for the large set of
 results to be returned
 b) exceptions being thrown because you're retrieving too much info to be
 thrown around the system
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01


Re: How to not limit maximum number of documents?

2010-02-10 Thread Ron Chan
I meant, available in total, not what just what satisfies the particular query 

you should have at least an estimate of the amount of total documents, even if 
it grows daily 

and if you are talking about millions of rows, and you are try to retrieve them 
all, IMHO, not getting all of them will be the least of your problems 


- Original Message - 
From: egon o ego...@gmx.de 
To: solr-user@lucene.apache.org 
Sent: Wednesday, 10 February, 2010 3:06:25 PM 
Subject: Re: How to not limit maximum number of documents? 

Setting the 'rows' parameter to a number larger than the number of documents 
available requires that you know how much are available. That's what I intended 
to retrieve via the LukeRequestHandler. 

Anyway, nice approach Stefan. I'm afraid I forgot this 'numFound' aspect. :) 
But still, it feels like a hack. Originally I was searching more for something 
like: 

q=queryrows=-1 

Which leaves the API to do the job (efficiently!). :) 
The question is: 
Does Solr support something? Or should we write a feature request? 

Cheers, 
Egon 



 Original-Message  
 Datum: Wed, 10 Feb 2010 14:38:51 + (GMT) 
 Von: Ron Chan rc...@i-tao.com 
 An: solr-user@lucene.apache.org 
 Betreff: Re: How to not limit maximum number of documents? 

 just set the rows to a very large number, larger than the number of 
 documents available 
 
 useful to set the fl parameter with the fields required to avoid memory 
 problems, if each document contains a lot of information 
 
 
 - Original Message - 
 From: stefan maric stefan.ma...@bt.com 
 To: solr-user@lucene.apache.org 
 Sent: Wednesday, 10 February, 2010 2:14:05 PM 
 Subject: RE: How to not limit maximum number of documents? 
 
 Egon 
 
 If you first run your query with q=queryrows=0 
 
 Then your you get back an indication of the total number of docs 
 result name=response numFound=53 start=0/ 
 
 Now your app can query again to get 1st n rows  manage forward|backward 
 traversal of results by subsequent queries 
 
 
 
 Regards 
 Stefan Maric 
-- 
NEU: Mit GMX DSL über 1000,- ¿ sparen! 
http://portal.gmx.net/de/go/dsl02 


Re: How to not limit maximum number of documents?

2010-02-10 Thread Chris Hostetter

: Okay. So we have to leave this question open for now. There might be 
: other (more advanced) users that can answer this question. It's for 
: sure, the solution we found is not quite good.

The question really isn't open, it's a FAQ...

http://wiki.apache.org/solr/FAQ#How_can_I_get_ALL_the_matching_documents_back.3F_..._How_can_I_return_an_unlimited_number_of_rows.3F


-Hoss