Re: Query Optimization in search/searchAfter

2024-04-12 Thread Puneeth Bikkumanla
Thanks Adrien!

On Fri, Apr 12, 2024 at 9:49 AM Adrien Grand  wrote:

> You are correct, query rewriting is not affected by the use of search vs.
> searchAfter.
>
> On Fri, Apr 12, 2024 at 3:37 PM Puneeth Bikkumanla 
> wrote:
>
> > Hello,
> > Sorry I should have clarified what I meant by “optimized”. I am familiar
> > with the collector/comparators using the “after” doc to filter out
> > documents but I specifically was talking about the query rewriting phase.
> > Is the query rewritten differently in search vs searchAfter? Looking at
> the
> > code I think no but would just like to confirm if there are any edge
> cases
> > here.
> >
> > On Fri, Apr 12, 2024 at 8:46 AM Adrien Grand  wrote:
> >
> > > Hello Puneeth,
> > >
> > > When you pass an `after` doc, Lucene will filter out documents that
> > compare
> > > better than this `after` document if it can. See e.g. what
> LongComparator
> > > does with its `topValue`, which is the value of the `after` doc.
> > >
> > > On Thu, Apr 11, 2024 at 4:34 PM Puneeth Bikkumanla <
> > pbikkuma...@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > > I was wondering if a user-defined Query is optimized the same way in
> > both
> > > > search/searchAfter provided the index stays the same (no CRUD takes
> > > place).
> > > >
> > > > In searchAfter we pass in an "after" doc so I was wondering if that
> > > changes
> > > > how a query is optimized at all. By looking at the code, I'm thinking
> > no
> > > > but was wondering if there were any other parameters here that I am
> not
> > > > aware of that would influence query optimization differently in
> > > > search/searchAfter. Thanks!
> > > >
> > >
> > >
> > > --
> > > Adrien
> > >
> >
>
>
> --
> Adrien
>


Re: Query Optimization in search/searchAfter

2024-04-12 Thread Puneeth Bikkumanla
Hello,
Sorry I should have clarified what I meant by “optimized”. I am familiar
with the collector/comparators using the “after” doc to filter out
documents but I specifically was talking about the query rewriting phase.
Is the query rewritten differently in search vs searchAfter? Looking at the
code I think no but would just like to confirm if there are any edge cases
here.

On Fri, Apr 12, 2024 at 8:46 AM Adrien Grand  wrote:

> Hello Puneeth,
>
> When you pass an `after` doc, Lucene will filter out documents that compare
> better than this `after` document if it can. See e.g. what LongComparator
> does with its `topValue`, which is the value of the `after` doc.
>
> On Thu, Apr 11, 2024 at 4:34 PM Puneeth Bikkumanla 
> wrote:
>
> > Hello,
> > I was wondering if a user-defined Query is optimized the same way in both
> > search/searchAfter provided the index stays the same (no CRUD takes
> place).
> >
> > In searchAfter we pass in an "after" doc so I was wondering if that
> changes
> > how a query is optimized at all. By looking at the code, I'm thinking no
> > but was wondering if there were any other parameters here that I am not
> > aware of that would influence query optimization differently in
> > search/searchAfter. Thanks!
> >
>
>
> --
> Adrien
>


Query Optimization in search/searchAfter

2024-04-11 Thread Puneeth Bikkumanla
Hello,
I was wondering if a user-defined Query is optimized the same way in both
search/searchAfter provided the index stays the same (no CRUD takes place).

In searchAfter we pass in an "after" doc so I was wondering if that changes
how a query is optimized at all. By looking at the code, I'm thinking no
but was wondering if there were any other parameters here that I am not
aware of that would influence query optimization differently in
search/searchAfter. Thanks!


DoubleLeafComparator Question

2023-02-16 Thread Puneeth Bikkumanla
Hello everyone,
In the DoubleLeafComparator::getValueForDoc

when Lucene is converting back to the double from the long representation,
I see that it is using Double::longBitsToDouble. My question is why does
Lucene use that and not NumericUtils::sortableLongToDouble
?
I am assuming that users would use NumericUtils::doubleToSortableLong
during indexing so shouldn't Lucene use NumericUtils::sortableLongToDouble
when converting back to a double? Thanks in advance.


Re: LongDistanceFeatureQuery for DoublePoint

2022-04-01 Thread Puneeth Bikkumanla
Hello,

Just wanted to bump this email again because it might have been lost since
there's been no activity on it for the past week+.

On Wed, Mar 23, 2022 at 3:04 PM Puneeth Bikkumanla 
wrote:

> Hello Adrien,
> Thanks for the quick response! The doubles can be pretty much anything.
> We're implementing something very similar to the Distance Feature Query
> <https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-distance-feature-query.html>
>  in
> Elastic but want it to work with doubles where the "field" can be a double
> value. Users could then find a set of relevant documents with "fields" that
> are closest to specified origin using this query. Would this feature be
> something that we should put up a PR for in the Lucene codebase as well?
>
> On Wed, Mar 23, 2022 at 12:42 AM Adrien Grand  wrote:
>
>> Hi Puneeth,
>>
>> Doubles are always a bit more tricky due to rounding for arithmetic
>> operations, but this should still be doable.
>>
>> Out of curiosity, what sort of data do your double fields store? This
>> query had been added with the idea that it would be useful for
>> timestamp fields in order to boost hits by recency. What is your
>> use-case for adding similar functionality to double fields?
>>
>> On Wed, Mar 23, 2022 at 12:38 AM Puneeth Bikkumanla
>>  wrote:
>> >
>> > Hello,
>> > I was wondering if there is anything similar to the
>> > LongDistanceFeatureQuery for DoublePoint. We are currently converting
>> our
>> > doubles into longs in order to use this feature but would like to switch
>> > off of that. If nothing exists, are there any immediate challenges that
>> > people foresee for implementing a "DoubleDistanceFeatureQuery" for
>> > DoublePoint?
>>
>>
>>
>> --
>> Adrien
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>


Re: LongDistanceFeatureQuery for DoublePoint

2022-03-23 Thread Puneeth Bikkumanla
Hello Adrien,
Thanks for the quick response! The doubles can be pretty much anything.
We're implementing something very similar to the Distance Feature Query
<https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-distance-feature-query.html>
in
Elastic but want it to work with doubles where the "field" can be a double
value. Users could then find a set of relevant documents with "fields" that
are closest to specified origin using this query. Would this feature be
something that we should put up a PR for in the Lucene codebase as well?

On Wed, Mar 23, 2022 at 12:42 AM Adrien Grand  wrote:

> Hi Puneeth,
>
> Doubles are always a bit more tricky due to rounding for arithmetic
> operations, but this should still be doable.
>
> Out of curiosity, what sort of data do your double fields store? This
> query had been added with the idea that it would be useful for
> timestamp fields in order to boost hits by recency. What is your
> use-case for adding similar functionality to double fields?
>
> On Wed, Mar 23, 2022 at 12:38 AM Puneeth Bikkumanla
>  wrote:
> >
> > Hello,
> > I was wondering if there is anything similar to the
> > LongDistanceFeatureQuery for DoublePoint. We are currently converting our
> > doubles into longs in order to use this feature but would like to switch
> > off of that. If nothing exists, are there any immediate challenges that
> > people foresee for implementing a "DoubleDistanceFeatureQuery" for
> > DoublePoint?
>
>
>
> --
> Adrien
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


LongDistanceFeatureQuery for DoublePoint

2022-03-22 Thread Puneeth Bikkumanla
Hello,
I was wondering if there is anything similar to the
LongDistanceFeatureQuery for DoublePoint. We are currently converting our
doubles into longs in order to use this feature but would like to switch
off of that. If nothing exists, are there any immediate challenges that
people foresee for implementing a "DoubleDistanceFeatureQuery" for
DoublePoint?


Re: Lucene Explanation

2021-04-23 Thread Puneeth Bikkumanla
Thank you this was very helpful!

On Mon, Apr 12, 2021 at 9:07 AM Michael Sokolov  wrote:

> You might want to check out
> https://issues.apache.org/jira/browse/LUCENE-8019 where I tried to
> implement some debugging utilities on top of Explain. It never got
> committed, but it does explore some of the challenges around
> introducing a more structured explain response.
>
> On Fri, Apr 9, 2021 at 6:40 PM Puneeth Bikkumanla
>  wrote:
> >
> > Hello,
> > I am currently working on a project that would like to implement Document
> > Explain where we can see how a document was scored internally in lucene
> > given a query.
> >
> > I see that the IndexSearcher has an explain
> > <
> https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/IndexSearcher.html#explain-org.apache.lucene.search.Query-int-
> >
> > method
> > available that returns an Explanation
> > <
> https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/Explanation.html
> >
> > object. An Explanation object only contains a description field (string)
> > but there is no way to know what part of a score that Explanation object
> is
> > for without parsing the description field itself. We wanted to implement
> > Document Explain in a more safe way where we could know what part of the
> > score an Explanation object is associated with and not parse the
> > description string field to find out. Here are a few of the options I
> have
> > thought of:
> >
> > 1. I was thinking about extending the similarity class (BM25Similarity)
> and
> > then overriding the particular methods that dealt with the different
> > subcomponents of explain but saw that the explainTF
> > <
> https://github.com/apache/lucene/blob/e510ef11c2a4307dd6ecc8c8974eef2c04e3e4d6/lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java#L268
> >
> > method
> > is private. Is there a reason why this is? It would be very useful if it
> > could be public so that I can override it and store the knowledge that
> the
> > returned Explanation is for the TF component of the document score.
> >
> > 2. I also thought about extending the IndexSearcher and overriding the
> > createWeight method to store the weight structure and then use that to
> > understand the resulting Explanation structure from the IndexSearcher's
> > explain method.
> >
> > Please let me know if any of that didn't make sense. Also, if anyone has
> > any other ideas on how I could approach this problem suggestions would be
> > greatly appreciated. Lastly, I would be happy to submit a PR to modify
> > Lucene's Explanation to be more aware of where it is created.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Lucene Explanation

2021-04-09 Thread Puneeth Bikkumanla
Hello,
I am currently working on a project that would like to implement Document
Explain where we can see how a document was scored internally in lucene
given a query.

I see that the IndexSearcher has an explain

method
available that returns an Explanation

object. An Explanation object only contains a description field (string)
but there is no way to know what part of a score that Explanation object is
for without parsing the description field itself. We wanted to implement
Document Explain in a more safe way where we could know what part of the
score an Explanation object is associated with and not parse the
description string field to find out. Here are a few of the options I have
thought of:

1. I was thinking about extending the similarity class (BM25Similarity) and
then overriding the particular methods that dealt with the different
subcomponents of explain but saw that the explainTF

method
is private. Is there a reason why this is? It would be very useful if it
could be public so that I can override it and store the knowledge that the
returned Explanation is for the TF component of the document score.

2. I also thought about extending the IndexSearcher and overriding the
createWeight method to store the weight structure and then use that to
understand the resulting Explanation structure from the IndexSearcher's
explain method.

Please let me know if any of that didn't make sense. Also, if anyone has
any other ideas on how I could approach this problem suggestions would be
greatly appreciated. Lastly, I would be happy to submit a PR to modify
Lucene's Explanation to be more aware of where it is created.