Re: CursorMarks and 'end of results'

2018-07-02 Thread Erick Erickson
OK, that makes sense then.

I don't think we've mentioned streaming as an alternative. It has some
restrictions (it can only export docValues), and frankly I don't
really remember how much of it was in 5.5 so you'll have to check.

Streaming is designed exactly to, well, stream the entire result set
out. There's some setup cost, so your use case where most cases have
not have all that many hits the setup may be too onerous but I thought
I'd mention it.

Best,
Erick

On Mon, Jul 2, 2018 at 5:14 AM, David Frese  wrote:
> Am 29.06.18 um 17:42 schrieb Erick Erickson:
>>
>> bq. It basically cuts down the search time in half in the usual case
>> for us, so it's an important 'feature'.
>>
>> Wait. You mean that the "extra" call to get back 0 rows doubles your
>> query time? That's surprising, tell us more.
>>
>> How many times does your "usual" use case call using CursorMark? My
>> off-the-cuff explanation would be that
>> you usually get all the rows in the first call.
>>
>> CursorMark is intended to help with the "deep paging" problem, i.e.
>> where start=some_big_number to allow
>> returning large results sets in chunks, say through 10s of K rows.
>> Part of our puzzlement is that in that
>> case the overhead of the last call is minuscule compared to the rest.
>>
>> There's no reason that it can't be used for small result sets, those
>> are just usually handled by setting the
>> start parameter. Up through, say, 1,000 or so the extra overhead is
>> pretty unnoticeable. So my head was
>> in the "what's the problem with 1 extra call after making the first 50?".
>>
>> OTOH, if you make 100 successive calls to search with the CursorMark
>> and call 101 takes as long as
>> the previous 100, something's horribly wrong.
>
>
> Hi,
>
> I use it in a server application where I need to process all results in
> every case, which can be between 0 and 100's of thousands. We use pagination
> to have a boundary on the required memory on "our" side by processing
> page-after-page.
>
> Most cases will fit into one page though - a few hundred results. Our Solr
> cluster takes about 5 to 10 seconds (*) for the first 'filled' page _and_
> about the _same time_ again for the second empty page. So if I have the
> guarantee that the second page is always empty, that helps a lot.
>
> Solr 5.5 that is, btw.
>
> (*) If it could be faster then 5 seconds is a different issue. But the query
> is quite complex with a lot of AND/OR and BlockJoins too, and I have no idea
> if memory is large enough to hold the indices and things like that. Not
> really optimized yet.
>
>
> David.
>
> --
> David Frese
> +49 7071 70896 75
>
> Active Group GmbH
> Hechinger Str. 12/1, 72072 Tübingen
> Registergericht: Amtsgericht Stuttgart, HRB 224404
> Geschäftsführer: Dr. Michael Sperber


Re: CursorMarks and 'end of results'

2018-07-02 Thread David Frese

Am 29.06.18 um 17:42 schrieb Erick Erickson:

bq. It basically cuts down the search time in half in the usual case
for us, so it's an important 'feature'.

Wait. You mean that the "extra" call to get back 0 rows doubles your
query time? That's surprising, tell us more.

How many times does your "usual" use case call using CursorMark? My
off-the-cuff explanation would be that
you usually get all the rows in the first call.

CursorMark is intended to help with the "deep paging" problem, i.e.
where start=some_big_number to allow
returning large results sets in chunks, say through 10s of K rows.
Part of our puzzlement is that in that
case the overhead of the last call is minuscule compared to the rest.

There's no reason that it can't be used for small result sets, those
are just usually handled by setting the
start parameter. Up through, say, 1,000 or so the extra overhead is
pretty unnoticeable. So my head was
in the "what's the problem with 1 extra call after making the first 50?".

OTOH, if you make 100 successive calls to search with the CursorMark
and call 101 takes as long as
the previous 100, something's horribly wrong.


Hi,

I use it in a server application where I need to process all results in 
every case, which can be between 0 and 100's of thousands. We use 
pagination to have a boundary on the required memory on "our" side by 
processing page-after-page.


Most cases will fit into one page though - a few hundred results. Our 
Solr cluster takes about 5 to 10 seconds (*) for the first 'filled' page 
_and_ about the _same time_ again for the second empty page. So if I 
have the guarantee that the second page is always empty, that helps a lot.


Solr 5.5 that is, btw.

(*) If it could be faster then 5 seconds is a different issue. But the 
query is quite complex with a lot of AND/OR and BlockJoins too, and I 
have no idea if memory is large enough to hold the indices and things 
like that. Not really optimized yet.



David.

--
David Frese
+49 7071 70896 75

Active Group GmbH
Hechinger Str. 12/1, 72072 Tübingen
Registergericht: Amtsgericht Stuttgart, HRB 224404
Geschäftsführer: Dr. Michael Sperber


Re: CursorMarks and 'end of results'

2018-06-29 Thread Erick Erickson
bq. It basically cuts down the search time in half in the usual case
for us, so it's an important 'feature'.

Wait. You mean that the "extra" call to get back 0 rows doubles your
query time? That's surprising, tell us more.

How many times does your "usual" use case call using CursorMark? My
off-the-cuff explanation would be that
you usually get all the rows in the first call.

CursorMark is intended to help with the "deep paging" problem, i.e.
where start=some_big_number to allow
returning large results sets in chunks, say through 10s of K rows.
Part of our puzzlement is that in that
case the overhead of the last call is minuscule compared to the rest.

There's no reason that it can't be used for small result sets, those
are just usually handled by setting the
start parameter. Up through, say, 1,000 or so the extra overhead is
pretty unnoticeable. So my head was
in the "what's the problem with 1 extra call after making the first 50?".

OTOH, if you make 100 successive calls to search with the CursorMark
and call 101 takes as long as
the previous 100, something's horribly wrong.

Best,
Erick


On Fri, Jun 29, 2018 at 4:01 AM, David Frese
 wrote:
> Am 22.06.18 um 02:37 schrieb Chris Hostetter:
>>
>>
>> : the documentation of 'cursorMarks' recommends to fetch until a query
>> returns
>> : the cursorMark that was passed in to a request.
>> :
>> : But that always requires an additional request at the end, so I wonder
>> if I
>> : can stop already, if a request returns less results than requested (num
>> rows).
>> : There won't be new documents added during the search in my use case, so
>> could
>> : there every be a non-empty 'page' after a non-full 'page'?
>>
>> You could stop then -- if that fits your usecase -- but the documentation
>> (in particular the sentence you are refering to) is trying to be as
>> straightforward and general as possible ... which includes the use case
>> where someone is "tailing" an index and documents may be continually
>> added.
>>
>> When originally writing those docs, I did have a bit in there about
>> *either* getting back less then "rows" docs *or* getting back the same
>> cursor you passed in (to try to cover both use cases as efficiently as
>> possible) but it seemed more confusing -- and i was worried people might
>> be suprised/confused when the number of docs was perfectly divisible by
>> "rows" so the "less then rows" case could still wind up in a final
>> request that returned "0" docs.
>>
>> the current docs seemed like a good balance between brevity & clarity,
>> with the added bonus of being correct :)
>>
>> But as Anshum said: if you have suggested improvements for rewording,
>> patches/PRs certainly welcome.  It's hard to have a good perspective on
>> what docs are helpful to new users whne you have been working with the
>> software for 14 years and wrote the code in question.
>
>
> Thank you very much for the clarification.
>
> It basically cuts down the search time in half in the usual case for us, so
> it's an important 'feature'.
>
>
> --
> David Frese
> +49 7071 70896 75
>
> Active Group GmbH
> Hechinger Str. 12/1, 72072 Tübingen
> Registergericht: Amtsgericht Stuttgart, HRB 224404
> Geschäftsführer: Dr. Michael Sperber


Re: CursorMarks and 'end of results'

2018-06-29 Thread David Frese

Am 22.06.18 um 02:37 schrieb Chris Hostetter:


: the documentation of 'cursorMarks' recommends to fetch until a query returns
: the cursorMark that was passed in to a request.
:
: But that always requires an additional request at the end, so I wonder if I
: can stop already, if a request returns less results than requested (num rows).
: There won't be new documents added during the search in my use case, so could
: there every be a non-empty 'page' after a non-full 'page'?

You could stop then -- if that fits your usecase -- but the documentation
(in particular the sentence you are refering to) is trying to be as
straightforward and general as possible ... which includes the use case
where someone is "tailing" an index and documents may be continually
added.

When originally writing those docs, I did have a bit in there about
*either* getting back less then "rows" docs *or* getting back the same
cursor you passed in (to try to cover both use cases as efficiently as
possible) but it seemed more confusing -- and i was worried people might
be suprised/confused when the number of docs was perfectly divisible by
"rows" so the "less then rows" case could still wind up in a final
request that returned "0" docs.

the current docs seemed like a good balance between brevity & clarity,
with the added bonus of being correct :)

But as Anshum said: if you have suggested improvements for rewording,
patches/PRs certainly welcome.  It's hard to have a good perspective on
what docs are helpful to new users whne you have been working with the
software for 14 years and wrote the code in question.


Thank you very much for the clarification.

It basically cuts down the search time in half in the usual case for us, 
so it's an important 'feature'.



--
David Frese
+49 7071 70896 75

Active Group GmbH
Hechinger Str. 12/1, 72072 Tübingen
Registergericht: Amtsgericht Stuttgart, HRB 224404
Geschäftsführer: Dr. Michael Sperber


Re: CursorMarks and 'end of results'

2018-06-21 Thread Chris Hostetter


: the documentation of 'cursorMarks' recommends to fetch until a query returns
: the cursorMark that was passed in to a request.
: 
: But that always requires an additional request at the end, so I wonder if I
: can stop already, if a request returns less results than requested (num rows).
: There won't be new documents added during the search in my use case, so could
: there every be a non-empty 'page' after a non-full 'page'?

You could stop then -- if that fits your usecase -- but the documentation 
(in particular the sentence you are refering to) is trying to be as 
straightforward and general as possible ... which includes the use case 
where someone is "tailing" an index and documents may be continually 
added.

When originally writing those docs, I did have a bit in there about 
*either* getting back less then "rows" docs *or* getting back the same 
cursor you passed in (to try to cover both use cases as efficiently as 
possible) but it seemed more confusing -- and i was worried people might 
be suprised/confused when the number of docs was perfectly divisible by 
"rows" so the "less then rows" case could still wind up in a final 
request that returned "0" docs.

the current docs seemed like a good balance between brevity & clarity, 
with the added bonus of being correct :)

But as Anshum said: if you have suggested improvements for rewording, 
patches/PRs certainly welcome.  It's hard to have a good perspective on 
what docs are helpful to new users whne you have been working with the 
software for 14 years and wrote the code in question.



-Hoss
http://www.lucidworks.com/


Re: CursorMarks and 'end of results'

2018-06-19 Thread Anshum Gupta
I might have been wrong there. Having an explicit check for the # results 
returned vs rows requested, would allow you to avoid the last request that 
would otherwise come back with 0 results. That check isn’t automatically done 
within Solr.

 Anshum


> On Jun 19, 2018, at 2:39 PM, Anshum Gupta  wrote:
> 
> Hi David,
> 
> The cursormark would be the same if you get back fewer than the max records 
> requested and so you should exit, as per the documentation.
> 
> I think the documentation says just what you are suggesting, but if you think 
> it could be improved, feel free to put up a patch.
> 
> 
>  Anshum
> 
> 
>> On Jun 18, 2018, at 2:09 AM, David Frese > > wrote:
>> 
>> Hi List,
>> 
>> the documentation of 'cursorMarks' recommends to fetch until a query returns 
>> the cursorMark that was passed in to a request.
>> 
>> But that always requires an additional request at the end, so I wonder if I 
>> can stop already, if a request returns less results than requested (num 
>> rows). There won't be new documents added during the search in my use case, 
>> so could there every be a non-empty 'page' after a non-full 'page'?
>> 
>> Thanks very much.
>> 
>> --
>> David Frese
>> +49 7071 70896 75
>> 
>> Active Group GmbH
>> Hechinger Str. 12/1, 72072 Tübingen
>> Registergericht: Amtsgericht Stuttgart, HRB 224404
>> Geschäftsführer: Dr. Michael Sperber
> 



signature.asc
Description: Message signed with OpenPGP


Re: CursorMarks and 'end of results'

2018-06-19 Thread Anshum Gupta
Hi David,

The cursormark would be the same if you get back fewer than the max records 
requested and so you should exit, as per the documentation.

I think the documentation says just what you are suggesting, but if you think 
it could be improved, feel free to put up a patch.


 Anshum


> On Jun 18, 2018, at 2:09 AM, David Frese  wrote:
> 
> Hi List,
> 
> the documentation of 'cursorMarks' recommends to fetch until a query returns 
> the cursorMark that was passed in to a request.
> 
> But that always requires an additional request at the end, so I wonder if I 
> can stop already, if a request returns less results than requested (num 
> rows). There won't be new documents added during the search in my use case, 
> so could there every be a non-empty 'page' after a non-full 'page'?
> 
> Thanks very much.
> 
> --
> David Frese
> +49 7071 70896 75
> 
> Active Group GmbH
> Hechinger Str. 12/1, 72072 Tübingen
> Registergericht: Amtsgericht Stuttgart, HRB 224404
> Geschäftsführer: Dr. Michael Sperber



signature.asc
Description: Message signed with OpenPGP


CursorMarks and 'end of results'

2018-06-18 Thread David Frese

Hi List,

the documentation of 'cursorMarks' recommends to fetch until a query 
returns the cursorMark that was passed in to a request.


But that always requires an additional request at the end, so I wonder 
if I can stop already, if a request returns less results than requested 
(num rows). There won't be new documents added during the search in my 
use case, so could there every be a non-empty 'page' after a non-full 
'page'?


Thanks very much.

--
David Frese
+49 7071 70896 75

Active Group GmbH
Hechinger Str. 12/1, 72072 Tübingen
Registergericht: Amtsgericht Stuttgart, HRB 224404
Geschäftsführer: Dr. Michael Sperber