Re: Number of requested rows
Hi Toke, Thanks for the post. Good that things are moving forward! It took a while! Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 5 Feb 2020, at 15:23, Toke Eskildsen wrote: > > On Wed, 2020-02-05 at 13:00 +0100, Emir Arnautović wrote: >> I was thinking in that direction. Do you know where it is in the >> codebase or which structure is used - I am guessing some array of >> objects? > > Yeah. More precisely a priority queue of Objects, initialized with > sentinel Objects. rows=100 is bad both from a memory allocation POW > and because the heap-structure of the priority queue implementation has > extremely bad memory locality when it is being updated. > > I performed some measurements and did some experiments a few years ago: > https://sbdevel.wordpress.com/2015/10/05/speeding-up-core-search/ > and there is https://issues.apache.org/jira/browse/LUCENE-8875 which > takes care of the Sentinel thing in solr 8.2. > > - Toke Eskildsen, Royal Danish Library > >
Re: Number of requested rows
On Wed, 2020-02-05 at 13:00 +0100, Emir Arnautović wrote: > I was thinking in that direction. Do you know where it is in the > codebase or which structure is used - I am guessing some array of > objects? Yeah. More precisely a priority queue of Objects, initialized with sentinel Objects. rows=100 is bad both from a memory allocation POW and because the heap-structure of the priority queue implementation has extremely bad memory locality when it is being updated. I performed some measurements and did some experiments a few years ago: https://sbdevel.wordpress.com/2015/10/05/speeding-up-core-search/ and there is https://issues.apache.org/jira/browse/LUCENE-8875 which takes care of the Sentinel thing in solr 8.2. - Toke Eskildsen, Royal Danish Library
Re: Number of requested rows
Thanks a lot! Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 5 Feb 2020, at 13:27, Mikhail Khludnev wrote: > > Hi, Emir. > > Please check callers of org.apache.lucene.search.HitQueue.HitQueue(int, > boolean), you may found an alternative usage you probably is looking for. > > On Wed, Feb 5, 2020 at 3:01 PM Emir Arnautović > wrote: > >> Hi Mikhail, >> I was thinking in that direction. Do you know where it is in the codebase >> or which structure is used - I am guessing some array of objects? >> >> Thanks, >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >>> On 5 Feb 2020, at 12:54, Mikhail Khludnev wrote: >>> >>> Absolutely. Searcher didn't know number of hits a priory. It eagerly >>> allocate results heap before collecting results. The only cap I'm aware >> of >>> is maxDocs. >>> >>> On Wed, Feb 5, 2020 at 2:42 PM Emir Arnautović < >> emir.arnauto...@sematext.com> >>> wrote: >>> >>>> Hi, >>>> Does somebody know if requested number of rows is used internally to set >>>> some temp structures? In other words will query with rows=100 be >> more >>>> expensive than query with rows=1000 if number of hits is 1000? >>>> >>>> Thanks, >>>> Emir >>>> -- >>>> Monitoring - Log Management - Alerting - Anomaly Detection >>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >>>> >>>> >>>> >>>> >>> >>> -- >>> Sincerely yours >>> Mikhail Khludnev >> >> > > -- > Sincerely yours > Mikhail Khludnev
Re: Number of requested rows
Hi, Emir. Please check callers of org.apache.lucene.search.HitQueue.HitQueue(int, boolean), you may found an alternative usage you probably is looking for. On Wed, Feb 5, 2020 at 3:01 PM Emir Arnautović wrote: > Hi Mikhail, > I was thinking in that direction. Do you know where it is in the codebase > or which structure is used - I am guessing some array of objects? > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 5 Feb 2020, at 12:54, Mikhail Khludnev wrote: > > > > Absolutely. Searcher didn't know number of hits a priory. It eagerly > > allocate results heap before collecting results. The only cap I'm aware > of > > is maxDocs. > > > > On Wed, Feb 5, 2020 at 2:42 PM Emir Arnautović < > emir.arnauto...@sematext.com> > > wrote: > > > >> Hi, > >> Does somebody know if requested number of rows is used internally to set > >> some temp structures? In other words will query with rows=100 be > more > >> expensive than query with rows=1000 if number of hits is 1000? > >> > >> Thanks, > >> Emir > >> -- > >> Monitoring - Log Management - Alerting - Anomaly Detection > >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > >> > >> > >> > >> > > > > -- > > Sincerely yours > > Mikhail Khludnev > > -- Sincerely yours Mikhail Khludnev
Re: Number of requested rows
Hi Mikhail, I was thinking in that direction. Do you know where it is in the codebase or which structure is used - I am guessing some array of objects? Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 5 Feb 2020, at 12:54, Mikhail Khludnev wrote: > > Absolutely. Searcher didn't know number of hits a priory. It eagerly > allocate results heap before collecting results. The only cap I'm aware of > is maxDocs. > > On Wed, Feb 5, 2020 at 2:42 PM Emir Arnautović > wrote: > >> Hi, >> Does somebody know if requested number of rows is used internally to set >> some temp structures? In other words will query with rows=100 be more >> expensive than query with rows=1000 if number of hits is 1000? >> >> Thanks, >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >> > > -- > Sincerely yours > Mikhail Khludnev
Re: Number of requested rows
Absolutely. Searcher didn't know number of hits a priory. It eagerly allocate results heap before collecting results. The only cap I'm aware of is maxDocs. On Wed, Feb 5, 2020 at 2:42 PM Emir Arnautović wrote: > Hi, > Does somebody know if requested number of rows is used internally to set > some temp structures? In other words will query with rows=100 be more > expensive than query with rows=1000 if number of hits is 1000? > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > -- Sincerely yours Mikhail Khludnev
Number of requested rows
Hi, Does somebody know if requested number of rows is used internally to set some temp structures? In other words will query with rows=100 be more expensive than query with rows=1000 if number of hits is 1000? Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/