Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

2021-10-05 Thread Baris Kazar
Hi Adrien,-
 Is there a best practice paper or Lucene document that shows the
benefit of IndexWriter.forceMerge and merge() methods since You mentioned about 
too many segments.
and maybe show this concept on a toy dataset as a best practice example.
Best regards
baris


From: Baris Kazar 
Sent: Tuesday, October 5, 2021 3:56 PM
To: Adrien Grand ; Lucene Users Mailing List 
; Baris Kazar 
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and 
BulkScorer.score()

Hi Adrien,-
Thanks for taking a look at it and sure, that will be very nice to fix those 
accessors.
It is ok in terms of speed and i want more faster though.
Is there anything else i should look at to help make it faster?
Best regards


From: Adrien Grand 
Sent: Tuesday, October 5, 2021 3:18 PM
To: Lucene Users Mailing List 
Cc: Baris Kazar 
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and 
BulkScorer.score()

Hmm we should fix these access$ accessors by fixing the visibility of some 
fields.

These breakdowns do not necessarily signal that something is wrong. Is the 
query executing fast overall?

On Mon, Oct 4, 2021 at 11:57 PM Baris Kazar 
mailto:baris.ka...@oracle.com>> wrote:
Hi, -
I did more experiments and this time i looked into these methods:
org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()


Lets start with BooleanWeight.bulkScorer() with its call tree and time spent:


BooleanWeight.bulkScorer()
-->> Weight.bulkScorer()
-->>-->> BooleanWeight.scorer()
-->>-->>-->>BooleanWeight.scorerSupplier()
-->>-->>-->>-->> Weight.scorerSupplier()
-->>-->>-->>-->>-->> TermQuery$Termweight.scorer()
-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.impacts()
-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.impacts()
-->>-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocEnums.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>  
org.apache.lucene.codecs.lucene84.Lucene84SkipReader.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.MultiLevelSkipListReader.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.MultiLevelSkipListReader.loadSkipLevels()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.store.DataInput.readVLong() (constittutes %100 of 
BooleanWeight.bulkScorer() time here)



Next: BulkScorer.score() with its call tree and time spent:



BulkScorer.score()
-->> Weight$DefaultBulkScorer.score()
-->>-->> Weight$DefaultBulkScorer.scoreAll()
-->>-->>-->> WANDScorer$1.nextDoc()
-->>-->>-->>-->> WANDScorer$1.advance()
-->>-->>-->>-->>-->> WANDScorer.access$300() (constitutes %65 of 
BulkScorer.score() time here)
-->>-->>-->>-->>-->> WANDScorer.access$100() (constitutes %30 of 
BulkScorer.score() time here)
-->>-->>-->>-->>-->> WANDScorer.access$400() (constitutes %5 of 
BulkScorer.score() time here)

Best regards


From: Baris Kazar mailto:baris.ka...@oracle.com>>
Sent: Saturday, October 2, 2021 3:14 PM
To: Adrien Grand mailto:jpou...@gmail.com>>; Lucene Users 
Mailing List mailto:java-user@lucene.apache.org>>
Cc: Baris Kazar mailto:baris.ka...@oracle.com>>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and 
BulkScorer.score()

Hi Adrien,-
Thanks. Let me see next week the components (units, methods) within 
BulkScorer#score to see what takes most time among its called methods.

Jvisualvm reports for a method whole time including the time spent in the 
called methods and when you go down the execution tree it goes until the very 
last called method.

Regarding the second paragraph above:
when will there be too many segments in the Lucene index? i have 1 text field 
and 1 stored (non indexed) field.

I most of the time get a couple of thousands hits and i ask for top 20 of them. 
Could this be leading to
BooleanWeight#bulkScorer spending time?

Both of these units:
BooleanWeight#bulkScorer and BulkScorer#score spend equal amounts of time and 
totally make up
75% of IndexSearcher#search as i mentioned before.

Thanks for the swift reply
I appreciate very much


Best regards

From: Adrien Grand mailto:jpou...@gmail.com>>
Sent: Saturday, October 2, 2021 1:44:40 AM
To: Lucene Users Mailing List 
mailto:java-user@lucene.apache.org>>
Cc: Baris Kazar mailto:baris.ka...@oracle.com>>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and 
BulkScorer.score()

Is your profiler reporting inclusive or exclusive costs for each function? Ie. 
does it exclude time spent in functions that are called within a function? I'm 
asking because it makes total sense for IndexSearcher#search to spend most of 
its time is BulkScorer#score, which coordinates the whole matching+scoring 
process.

Having much time spent in BooleanWeight#bulkScorer is a bit surprising however. 
This suggests

Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

2021-10-05 Thread Baris Kazar
Hi Adrien,-
Thanks for taking a look at it and sure, that will be very nice to fix those 
accessors.
It is ok in terms of speed and i want more faster though.
Is there anything else i should look at to help make it faster?
Best regards


From: Adrien Grand 
Sent: Tuesday, October 5, 2021 3:18 PM
To: Lucene Users Mailing List 
Cc: Baris Kazar 
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and 
BulkScorer.score()

Hmm we should fix these access$ accessors by fixing the visibility of some 
fields.

These breakdowns do not necessarily signal that something is wrong. Is the 
query executing fast overall?

On Mon, Oct 4, 2021 at 11:57 PM Baris Kazar 
mailto:baris.ka...@oracle.com>> wrote:
Hi, -
I did more experiments and this time i looked into these methods:
org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()


Lets start with BooleanWeight.bulkScorer() with its call tree and time spent:


BooleanWeight.bulkScorer()
-->> Weight.bulkScorer()
-->>-->> BooleanWeight.scorer()
-->>-->>-->>BooleanWeight.scorerSupplier()
-->>-->>-->>-->> Weight.scorerSupplier()
-->>-->>-->>-->>-->> TermQuery$Termweight.scorer()
-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.impacts()
-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.impacts()
-->>-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocEnums.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>  
org.apache.lucene.codecs.lucene84.Lucene84SkipReader.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.MultiLevelSkipListReader.init()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.codecs.MultiLevelSkipListReader.loadSkipLevels()
-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->> 
org.apache.lucene.store.DataInput.readVLong() (constittutes %100 of 
BooleanWeight.bulkScorer() time here)



Next: BulkScorer.score() with its call tree and time spent:



BulkScorer.score()
-->> Weight$DefaultBulkScorer.score()
-->>-->> Weight$DefaultBulkScorer.scoreAll()
-->>-->>-->> WANDScorer$1.nextDoc()
-->>-->>-->>-->> WANDScorer$1.advance()
-->>-->>-->>-->>-->> WANDScorer.access$300() (constitutes %65 of 
BulkScorer.score() time here)
-->>-->>-->>-->>-->> WANDScorer.access$100() (constitutes %30 of 
BulkScorer.score() time here)
-->>-->>-->>-->>-->> WANDScorer.access$400() (constitutes %5 of 
BulkScorer.score() time here)

Best regards


From: Baris Kazar mailto:baris.ka...@oracle.com>>
Sent: Saturday, October 2, 2021 3:14 PM
To: Adrien Grand mailto:jpou...@gmail.com>>; Lucene Users 
Mailing List mailto:java-user@lucene.apache.org>>
Cc: Baris Kazar mailto:baris.ka...@oracle.com>>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and 
BulkScorer.score()

Hi Adrien,-
Thanks. Let me see next week the components (units, methods) within 
BulkScorer#score to see what takes most time among its called methods.

Jvisualvm reports for a method whole time including the time spent in the 
called methods and when you go down the execution tree it goes until the very 
last called method.

Regarding the second paragraph above:
when will there be too many segments in the Lucene index? i have 1 text field 
and 1 stored (non indexed) field.

I most of the time get a couple of thousands hits and i ask for top 20 of them. 
Could this be leading to
BooleanWeight#bulkScorer spending time?

Both of these units:
BooleanWeight#bulkScorer and BulkScorer#score spend equal amounts of time and 
totally make up
75% of IndexSearcher#search as i mentioned before.

Thanks for the swift reply
I appreciate very much


Best regards

From: Adrien Grand mailto:jpou...@gmail.com>>
Sent: Saturday, October 2, 2021 1:44:40 AM
To: Lucene Users Mailing List 
mailto:java-user@lucene.apache.org>>
Cc: Baris Kazar mailto:baris.ka...@oracle.com>>
Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and 
BulkScorer.score()

Is your profiler reporting inclusive or exclusive costs for each function? Ie. 
does it exclude time spent in functions that are called within a function? I'm 
asking because it makes total sense for IndexSearcher#search to spend most of 
its time is BulkScorer#score, which coordinates the whole matching+scoring 
process.

Having much time spent in BooleanWeight#bulkScorer is a bit surprising however. 
This suggests that you have too many segments in your index (since the bulk 
scorer needs to be recreated for every segment) or that your average query 
matches a very low number of documents (so that Lucene spends more time 
figuring out how best to find the matches versus actually finding these 
matches).

On Sat, Oct 2, 2021 at 5:57 AM Baris Kazar 
mailto:baris.ka...@oracle.com>>>
 wrote:
Hi,-
 I performance profiled my application via jvisualvm on Java
and 

Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()

2021-10-05 Thread Adrien Grand
Hmm we should fix these access$ accessors by fixing the visibility of some
fields.

These breakdowns do not necessarily signal that something is wrong. Is the
query executing fast overall?

On Mon, Oct 4, 2021 at 11:57 PM Baris Kazar  wrote:

> Hi, -
> I did more experiments and this time i looked into these methods:
> org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
>
>
> Lets start with BooleanWeight.bulkScorer() with its call tree and time
> spent:
>
>
> BooleanWeight.bulkScorer()
> -->> Weight.bulkScorer()
> -->>-->> BooleanWeight.scorer()
> -->>-->>-->>BooleanWeight.scorerSupplier()
> -->>-->>-->>-->> Weight.scorerSupplier()
> -->>-->>-->>-->>-->> TermQuery$Termweight.scorer()
> -->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.impacts()
> -->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.impacts()
> -->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocEnums.init()
> -->>-->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.lucene84.Lucene84SkipReader.init()
> -->>-->>-->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.MultiLevelSkipListReader.init()
> -->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.codecs.MultiLevelSkipListReader.loadSkipLevels()
> -->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>-->>
> org.apache.lucene.store.DataInput.readVLong() (constittutes %100 of
> BooleanWeight.bulkScorer() time here)
>
>
>
> Next: BulkScorer.score() with its call tree and time spent:
>
>
>
> BulkScorer.score()
> -->> Weight$DefaultBulkScorer.score()
> -->>-->> Weight$DefaultBulkScorer.scoreAll()
> -->>-->>-->> WANDScorer$1.nextDoc()
> -->>-->>-->>-->> WANDScorer$1.advance()
> -->>-->>-->>-->>-->> WANDScorer.access$300() (constitutes %65 of
> BulkScorer.score() time here)
> -->>-->>-->>-->>-->> WANDScorer.access$100() (constitutes %30 of
> BulkScorer.score() time here)
> -->>-->>-->>-->>-->> WANDScorer.access$400() (constitutes %5 of
> BulkScorer.score() time here)
>
> Best regards
>
> 
> From: Baris Kazar 
> Sent: Saturday, October 2, 2021 3:14 PM
> To: Adrien Grand ; Lucene Users Mailing List <
> java-user@lucene.apache.org>
> Cc: Baris Kazar 
> Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and
> BulkScorer.score()
>
> Hi Adrien,-
> Thanks. Let me see next week the components (units, methods) within
> BulkScorer#score to see what takes most time among its called methods.
>
> Jvisualvm reports for a method whole time including the time spent in the
> called methods and when you go down the execution tree it goes until the
> very last called method.
>
> Regarding the second paragraph above:
> when will there be too many segments in the Lucene index? i have 1 text
> field and 1 stored (non indexed) field.
>
> I most of the time get a couple of thousands hits and i ask for top 20 of
> them. Could this be leading to
> BooleanWeight#bulkScorer spending time?
>
> Both of these units:
> BooleanWeight#bulkScorer and BulkScorer#score spend equal amounts of time
> and totally make up
> 75% of IndexSearcher#search as i mentioned before.
>
> Thanks for the swift reply
> I appreciate very much
>
>
> Best regards
> 
> From: Adrien Grand 
> Sent: Saturday, October 2, 2021 1:44:40 AM
> To: Lucene Users Mailing List 
> Cc: Baris Kazar 
> Subject: Re: org.apache.lucene.search.BooleanWeight.bulkScorer() and
> BulkScorer.score()
>
> Is your profiler reporting inclusive or exclusive costs for each function?
> Ie. does it exclude time spent in functions that are called within a
> function? I'm asking because it makes total sense for IndexSearcher#search
> to spend most of its time is BulkScorer#score, which coordinates the whole
> matching+scoring process.
>
> Having much time spent in BooleanWeight#bulkScorer is a bit surprising
> however. This suggests that you have too many segments in your index (since
> the bulk scorer needs to be recreated for every segment) or that your
> average query matches a very low number of documents (so that Lucene spends
> more time figuring out how best to find the matches versus actually finding
> these matches).
>
> On Sat, Oct 2, 2021 at 5:57 AM Baris Kazar  baris.ka...@oracle.com>> wrote:
> Hi,-
>  I performance profiled my application via jvisualvm on Java
> and saw that 75% of the search process from
> org.apache.lucene.search.IndexSearcher.search() are spent on
> these units:
> org.apache.lucene.search.BooleanWeight.bulkScorer() and BulkScorer.score()
> Is there any study or project to speed up these please?
>
> Best regards
>
>
>
> --
> Adrien
>


-- 
Adrien