Re: Number of requested rows

2020-02-05 Thread Emir Arnautović
Hi Toke,
Thanks for the post. Good that things are moving forward! It took a while!

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 5 Feb 2020, at 15:23, Toke Eskildsen  wrote:
> 
> On Wed, 2020-02-05 at 13:00 +0100, Emir Arnautović wrote:
>> I was thinking in that direction. Do you know where it is in the
>> codebase or which structure is used - I am guessing some array of
>> objects?
> 
> Yeah. More precisely a priority queue of Objects, initialized with
> sentinel Objects. rows=100 is bad both from a memory allocation POW
> and because the heap-structure of the priority queue implementation has
> extremely bad memory locality when it is being updated.
> 
> I performed some measurements and did some experiments a few years ago:
> https://sbdevel.wordpress.com/2015/10/05/speeding-up-core-search/
> and there is https://issues.apache.org/jira/browse/LUCENE-8875 which
> takes care of the Sentinel thing in solr 8.2.
> 
> - Toke Eskildsen, Royal Danish Library
> 
> 



Re: Number of requested rows

2020-02-05 Thread Toke Eskildsen
On Wed, 2020-02-05 at 13:00 +0100, Emir Arnautović wrote:
> I was thinking in that direction. Do you know where it is in the
> codebase or which structure is used - I am guessing some array of
> objects?

Yeah. More precisely a priority queue of Objects, initialized with
sentinel Objects. rows=100 is bad both from a memory allocation POW
and because the heap-structure of the priority queue implementation has
extremely bad memory locality when it is being updated.

I performed some measurements and did some experiments a few years ago:
https://sbdevel.wordpress.com/2015/10/05/speeding-up-core-search/
and there is https://issues.apache.org/jira/browse/LUCENE-8875 which
takes care of the Sentinel thing in solr 8.2.

- Toke Eskildsen, Royal Danish Library




Re: Number of requested rows

2020-02-05 Thread Emir Arnautović
Thanks a lot!

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 5 Feb 2020, at 13:27, Mikhail Khludnev  wrote:
> 
> Hi, Emir.
> 
> Please check callers of org.apache.lucene.search.HitQueue.HitQueue(int,
> boolean), you may found an alternative usage you probably is looking for.
> 
> On Wed, Feb 5, 2020 at 3:01 PM Emir Arnautović 
> wrote:
> 
>> Hi Mikhail,
>> I was thinking in that direction. Do you know where it is in the codebase
>> or which structure is used - I am guessing some array of objects?
>> 
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 5 Feb 2020, at 12:54, Mikhail Khludnev  wrote:
>>> 
>>> Absolutely. Searcher didn't know number of hits a priory. It eagerly
>>> allocate results heap before collecting results. The only cap I'm aware
>> of
>>> is maxDocs.
>>> 
>>> On Wed, Feb 5, 2020 at 2:42 PM Emir Arnautović <
>> emir.arnauto...@sematext.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> Does somebody know if requested number of rows is used internally to set
>>>> some temp structures? In other words will query with rows=100 be
>> more
>>>> expensive than query with rows=1000 if number of hits is 1000?
>>>> 
>>>> Thanks,
>>>> Emir
>>>> --
>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>> 
>> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev



Re: Number of requested rows

2020-02-05 Thread Mikhail Khludnev
Hi, Emir.

Please check callers of org.apache.lucene.search.HitQueue.HitQueue(int,
boolean), you may found an alternative usage you probably is looking for.

On Wed, Feb 5, 2020 at 3:01 PM Emir Arnautović 
wrote:

> Hi Mikhail,
> I was thinking in that direction. Do you know where it is in the codebase
> or which structure is used - I am guessing some array of objects?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 5 Feb 2020, at 12:54, Mikhail Khludnev  wrote:
> >
> > Absolutely. Searcher didn't know number of hits a priory. It eagerly
> > allocate results heap before collecting results. The only cap I'm aware
> of
> > is maxDocs.
> >
> > On Wed, Feb 5, 2020 at 2:42 PM Emir Arnautović <
> emir.arnauto...@sematext.com>
> > wrote:
> >
> >> Hi,
> >> Does somebody know if requested number of rows is used internally to set
> >> some temp structures? In other words will query with rows=100 be
> more
> >> expensive than query with rows=1000 if number of hits is 1000?
> >>
> >> Thanks,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Number of requested rows

2020-02-05 Thread Emir Arnautović
Hi Mikhail,
I was thinking in that direction. Do you know where it is in the codebase or 
which structure is used - I am guessing some array of objects?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 5 Feb 2020, at 12:54, Mikhail Khludnev  wrote:
> 
> Absolutely. Searcher didn't know number of hits a priory. It eagerly
> allocate results heap before collecting results. The only cap I'm aware of
> is maxDocs.
> 
> On Wed, Feb 5, 2020 at 2:42 PM Emir Arnautović 
> wrote:
> 
>> Hi,
>> Does somebody know if requested number of rows is used internally to set
>> some temp structures? In other words will query with rows=100 be more
>> expensive than query with rows=1000 if number of hits is 1000?
>> 
>> Thanks,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev



Re: Number of requested rows

2020-02-05 Thread Mikhail Khludnev
Absolutely. Searcher didn't know number of hits a priory. It eagerly
allocate results heap before collecting results. The only cap I'm aware of
is maxDocs.

On Wed, Feb 5, 2020 at 2:42 PM Emir Arnautović 
wrote:

> Hi,
> Does somebody know if requested number of rows is used internally to set
> some temp structures? In other words will query with rows=100 be more
> expensive than query with rows=1000 if number of hits is 1000?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>

-- 
Sincerely yours
Mikhail Khludnev


Number of requested rows

2020-02-05 Thread Emir Arnautović
Hi,
Does somebody know if requested number of rows is used internally to set some 
temp structures? In other words will query with rows=100 be more expensive 
than query with rows=1000 if number of hits is 1000?

Thanks,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/