date:20191021

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

2019-10-21 Thread baris . kazar


I wonder if this repeats in version 7.7.2, too?

Best regards


On 10/21/19 5:22 PM, Shifflett, David [USA] wrote:

Baris,

Sorry I neglected to add that piece.
This test was run against 8.0.0,
but I also want it to work in later versions.

Another piece of my project is using 8.2.0.

Thanks again for any info,
David Shifflett


On 10/21/19, 3:23 PM, "baris.ka...@oracle.com"  wrote:

 David,-
 
   which version of Lucene are You using?
 
 Best regards
 
 
 On 10/21/19 1:31 PM, Shifflett, David [USA] wrote:

 > Hi all,
 > Using the code snippet:
 >  ComplexPhraseQueryParser qp = new 
ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer());
 >  String teststr = "\"Foo Bar\"~2";
 >  Query queryToSearch = qp.parse(teststr);
 >  System.out.println("Query : " + queryToSearch.toString());
 >  System.out.println("Type of query : " + 
queryToSearch.getClass().getSimpleName());
 >
 > I am getting the output
 >  Query : "Foo Bar"~2
 >  Type of query : ComplexPhraseQuery
 >
 > If I change teststr to "\"Foo Bar\""
 > I get
 >  Query : "Foo Bar"
 >  Type of query : ComplexPhraseQuery
 >
 > If I change teststr to "Foo Bar"
 > I get
 >  Query : content:foo content:bar
 >  Type of query : BooleanQuery
 >
 >
 > In the first two cases I was expecting the search terms to be switched 
to lowercase.
 >
 > Were the Foo and Bar left as originally specified because the terms are 
inside double quotes?
 >
 > How can I specify a search term that I want treated as a Phrase,
 > but also have the query parser apply the LowerCaseFilter?
 >
 > I am hoping to avoid the need to handle this using PhraseQuery,
 > and continue to use the QueryParser.
 >
 >
 > Thanks in advance for any help you can give me,
 > David Shifflett
 >
 
 -

 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 
 



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

2019-10-21 Thread Shifflett, David [USA]

Baris,

Sorry I neglected to add that piece.
This test was run against 8.0.0,
but I also want it to work in later versions.

Another piece of my project is using 8.2.0.

Thanks again for any info,
David Shifflett


On 10/21/19, 3:23 PM, "baris.ka...@oracle.com"  wrote:

David,-

  which version of Lucene are You using?

Best regards


On 10/21/19 1:31 PM, Shifflett, David [USA] wrote:
> Hi all,
> Using the code snippet:
>  ComplexPhraseQueryParser qp = new 
ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer());
>  String teststr = "\"Foo Bar\"~2";
>  Query queryToSearch = qp.parse(teststr);
>  System.out.println("Query : " + queryToSearch.toString());
>  System.out.println("Type of query : " + 
queryToSearch.getClass().getSimpleName());
>
> I am getting the output
>  Query : "Foo Bar"~2
>  Type of query : ComplexPhraseQuery
>
> If I change teststr to "\"Foo Bar\""
> I get
>  Query : "Foo Bar"
>  Type of query : ComplexPhraseQuery
>
> If I change teststr to "Foo Bar"
> I get
>  Query : content:foo content:bar
>  Type of query : BooleanQuery
>
>
> In the first two cases I was expecting the search terms to be switched to 
lowercase.
>
> Were the Foo and Bar left as originally specified because the terms are 
inside double quotes?
>
> How can I specify a search term that I want treated as a Phrase,
> but also have the query parser apply the LowerCaseFilter?
>
> I am hoping to avoid the need to handle this using PhraseQuery,
> and continue to use the QueryParser.
>
>
> Thanks in advance for any help you can give me,
> David Shifflett
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Iterating Over All Documents On a Changing Index

2019-10-21 Thread Adrien Grand

This is the right place to ask these questions indeed.

This is a good way to iterate over documents. Regarding your 2nd
question, Lucene IndexReaders are point-in-time views of the data, so
changes won't become visible in-place. The tricky problem with this
kind of problem is usually to deal with documents that are getting
indexed after you pulled a new reader and while you are in the process
of reindexing.

On Sat, Oct 19, 2019 at 1:35 AM Matt Davis  wrote:
>
> Hi All,
>
> I am working on implementing of an in place reindex using Lucene.  In my
> case, I have BSON document stored in a binary field and have a set of rules
> that pull fields out of the BSON and indexes them into different Lucene
> fields with different analyzers.  I would like to be able to change these
> rules / schema and then iterate over the documents, indexing them using the
> new schema.
>
> I have come up with the following code block:
> https://gist.github.com/mdavis95/f600e0a8233d0a1232eff77645d1dc8a
>
> I have two questions:
> 1) Is this a good way to iterate over the documents
> 2) How can I manage documents changing when I am doing this.  New documents
> coming in should be fine I believe but changes to existing documents could
> be lost if I understand correctly.
>
> I hope that this is the right place to ask this question and I apologize if
> this is obvious or has been asked and answered.
>
> Thanks,
> Matt



-- 
Adrien

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: where can I register a new scorer in an existing query?

2019-10-21 Thread Adrien Grand

You could iterate manually over the leaves (IndexReader#leaves) of
your IndexReader and call Weight#matches on every leaf?

On Sat, Oct 19, 2019 at 7:41 PM Yoav Goldberg  wrote:
>
> Hello,
>
> Is there a way to supply a new Scorer implementation to an existing query?
> From what I've been able to understand, the only way to provide a new
> scorer is to change the scorer() method Weight, which in itself requires
> implementing a new Weight, which in itself requires implementing a whole
> new Query. Is there something I am missing here?
>
> To be more concrete about what I want to achieve (maybe there is a
> different / better way):
> I would like to collect the *match positions* of several sub-queries, for
> all matching documents. The ideal interface would be to perform search as
> usual, and supply a collector that has access and collects this
> information. Alternatively, to have it available in the returned results
> object.
>
> The needed information is available during search (in the internal
> iterators), but I did not find a way to access it. The Weight.matches() and
> SpanWeight.getSpans() methods return the iterators I'd need, but they also
> require a LeafReaderContext, which I believe is only available during
> search? So I thought I'd create a scorer (that gets a context and gets
> called for the relevant document), but I do not see a way to supply a
> custom scorer.
>
> Any tips are greatly appreciated.
>
> Thanks!
>
> Yoav



-- 
Adrien

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Index-time boosting: Deprecated setBoost method

2019-10-21 Thread Uwe Schindler

No. That's how you do it: BooleanQuery with 2 should clauses.

Or use a different query parser that offers this out of box.

Uwe

Am October 21, 2019 7:16:01 PM UTC schrieb baris.ka...@oracle.com:
>Hi,-
>
>Thanks.
>
>  lets apply to this case:
>
>QueryParser parser = new QueryParser("field1", analyzer) ;
>parser.setPhraseSlop(2);
>Query query = parser.parse("some string value here"+"*");
>TopDocs hits = indexsearcherObject.search(query, 10);
>
>Now i want to use BoostQuery
>
>QueryParser parser = new QueryParser("field1", analyzerObject) ;
>parser.setPhraseSlop(2);
>Query query = parser.parse("some string value here"+"*");
>
>BoostQuery bq = new BoostQuery(query, "2.0f");
>
>TopDocs hits = indexsearcherObject.search(bq, 10);
>
>
>Now how will i process field2 with boost value 1.0f?
>
>Before, this was being done at index time.
>
>
>i can see the only way here is the BooleanQuery which combines
>
>the first boostquery object bq and another one that i need to define
>for 
>bq2 for field2.
>
>is there any other way?
>
>Best regards
>
>
>
>On 10/21/19 2:33 PM, Uwe Schindler wrote:
>> Hi Boris,
>>
>>> That is ok, and i can see this case would be best with BoostQuery
>and
>>> also i dont have to use lucene expression jar and its dependents.
>>>
>>> However, i am curious how to do this kind of field based boosting at
>>> index time even though i will prefer the query time boosting
>methodology.
>> The reason why it was deprecated is exactly the problem I mentioned
>before: It did never do what the user expected. The boost factor given
>in the document's field was multiplied into the per document norms.
>Unfortunately, at the same time, he query normalization was using query
>statistics and normalized the scores. As Lucene is working per field,
>the same normalization is done per field, resulting in the constant
>factor per field to disappear. There was still some effect of index
>time boosting if different documents had different values, but it your
>case all is the same. I am not sure how your queries worked before, but
>the constant boost factors per field at index time did definitely not
>have the effect you were thinking of. Since the earliest version of
>Lucene, boosting at query time was the way to go to have different
>weights per field.
>>
>> The new feature in Lucene is now that you can change the score per
>document using docvalues and apply that per document at query time.
>Previously this was also possible with Document/Field#setBoost, but the
>flexibility was missing (only multiplying and limited precision). In
>addition the normalization effects made the whole thing not reliable.
>>
>> Uwe
>>
>>> Best regards
>>>
>>>
>>> On 10/21/19 12:54 PM, Uwe Schindler wrote:
 Hi,

 As I said, before that is a misuse of index-time boosting. In
>addition in
>>> previous versions it did not even work correctly, because of query
>>> normalization it was normalized away anyways. And on top, to change
>it
>>> your have to reindex.
 What you intend to do is a typical use case for query time boosting
>with
>>> BoostQuery. That is explained in almost every book about search,
>like those
>>> about Solr or Elasticsearch.
 Most query parsers also allow to also add boost factors for fields,
>e.g.
>>> SimpleQueryParser (for humans that need simple syntax without
>fields).
>>> There you give a list of fields and boost factors.
 Uwe

 -
 Uwe Schindler
 Achterdiek 19, D-28357 Bremen
 https://urldefense.proofpoint.com/v2/url?u=https-
>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>> BKNeyLlULCbaezrgocEvPhQkl4&m=r7LRZQV82ywkycV4mBw1baHDKxar0wnm
>>> JtLLTiUC0wI&s=Zj32e0QqmZFvPbBlD8DPeh7KHYfOgQr89wvmaRvy_n8&e=
 eMail: u...@thetaphi.de

> -Original Message-
> From: baris.ka...@oracle.com 
> Sent: Monday, October 21, 2019 6:45 PM
> To: java-user@lucene.apache.org
> Cc: baris.kazar 
> Subject: Re: Index-time boosting: Deprecated setBoost method
>
> Hi,-
>
> Thanks and i appreciate the disccussion.
>
> Let me please  ask this way, i think i give too much info at one
>time:
>
> Currently i have this:
>
>  > >Field  f1= new TextField("field1", "string1", Field.Store.YES); >
>
> doc.add(f1);  >f1.setBoost(2.0f); > >
>
> Field f2 = new TextField("field2", "string2", Field.Store.YES); >
>
> doc.add(f2); >
>
> f2.setBoost(1.0f); > >
>
>
> But this fails with Lucene 7.7.2.
>
>
> Probably it is more efficient and more flexible to fix this by
>using
> BoostQuery.
>
> However, what could be the fix with index time boosting? the code
>in my
> previous post was trying to do that.
>
> Best regards
>
>
> On 10/21/19 12:34 PM, Uwe Schindler wrote:
>> Hi,
>>
>> sorry I don't fully understand what you intend to do? If the
>boost values
> per field are static and u

Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

2019-10-21 Thread baris . kazar


David,-

 which version of Lucene are You using?

Best regards


On 10/21/19 1:31 PM, Shifflett, David [USA] wrote:

Hi all,
Using the code snippet:
 ComplexPhraseQueryParser qp = new ComplexPhraseQueryParser(“somefield”, 
new StandardAnalyzer());
 String teststr = "\"Foo Bar\"~2";
 Query queryToSearch = qp.parse(teststr);
 System.out.println("Query : " + queryToSearch.toString());
 System.out.println("Type of query : " + 
queryToSearch.getClass().getSimpleName());

I am getting the output
 Query : "Foo Bar"~2
 Type of query : ComplexPhraseQuery

If I change teststr to "\"Foo Bar\""
I get
 Query : "Foo Bar"
 Type of query : ComplexPhraseQuery

If I change teststr to "Foo Bar"
I get
 Query : content:foo content:bar
 Type of query : BooleanQuery


In the first two cases I was expecting the search terms to be switched to 
lowercase.

Were the Foo and Bar left as originally specified because the terms are inside 
double quotes?

How can I specify a search term that I want treated as a Phrase,
but also have the query parser apply the LowerCaseFilter?

I am hoping to avoid the need to handle this using PhraseQuery,
and continue to use the QueryParser.


Thanks in advance for any help you can give me,
David Shifflett



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Index-time boosting: Deprecated setBoost method

2019-10-21 Thread baris . kazar


Hi,-

Thanks.

 lets apply to this case:

QueryParser parser = new QueryParser("field1", analyzer) ;
parser.setPhraseSlop(2);
Query query = parser.parse("some string value here"+"*");
TopDocs hits = indexsearcherObject.search(query, 10);

Now i want to use BoostQuery

QueryParser parser = new QueryParser("field1", analyzerObject) ;
parser.setPhraseSlop(2);
Query query = parser.parse("some string value here"+"*");

BoostQuery bq = new BoostQuery(query, "2.0f");

TopDocs hits = indexsearcherObject.search(bq, 10);


Now how will i process field2 with boost value 1.0f?

Before, this was being done at index time.


i can see the only way here is the BooleanQuery which combines

the first boostquery object bq and another one that i need to define for 
bq2 for field2.


is there any other way?

Best regards



On 10/21/19 2:33 PM, Uwe Schindler wrote:

Hi Boris,


That is ok, and i can see this case would be best with BoostQuery and
also i dont have to use lucene expression jar and its dependents.

However, i am curious how to do this kind of field based boosting at
index time even though i will prefer the query time boosting methodology.

The reason why it was deprecated is exactly the problem I mentioned before: It 
did never do what the user expected. The boost factor given in the document's 
field was multiplied into the per document norms. Unfortunately, at the same 
time, he query normalization was using query statistics and normalized the 
scores. As Lucene is working per field, the same normalization is done per 
field, resulting in the constant factor per field to disappear. There was still 
some effect of index time boosting if different documents had different values, 
but it your case all is the same. I am not sure how your queries worked before, 
but the constant boost factors per field at index time did definitely not have 
the effect you were thinking of. Since the earliest version of Lucene, boosting 
at query time was the way to go to have different weights per field.

The new feature in Lucene is now that you can change the score per document 
using docvalues and apply that per document at query time. Previously this was 
also possible with Document/Field#setBoost, but the flexibility was missing 
(only multiplying and limited precision). In addition the normalization effects 
made the whole thing not reliable.

Uwe


Best regards


On 10/21/19 12:54 PM, Uwe Schindler wrote:

Hi,

As I said, before that is a misuse of index-time boosting. In addition in

previous versions it did not even work correctly, because of query
normalization it was normalized away anyways. And on top, to change it
your have to reindex.

What you intend to do is a typical use case for query time boosting with

BoostQuery. That is explained in almost every book about search, like those
about Solr or Elasticsearch.

Most query parsers also allow to also add boost factors for fields, e.g.

SimpleQueryParser (for humans that need simple syntax without fields).
There you give a list of fields and boost factors.

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://urldefense.proofpoint.com/v2/url?u=https-

3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=r7LRZQV82ywkycV4mBw1baHDKxar0wnm
JtLLTiUC0wI&s=Zj32e0QqmZFvPbBlD8DPeh7KHYfOgQr89wvmaRvy_n8&e=

eMail: u...@thetaphi.de


-Original Message-
From: baris.ka...@oracle.com 
Sent: Monday, October 21, 2019 6:45 PM
To: java-user@lucene.apache.org
Cc: baris.kazar 
Subject: Re: Index-time boosting: Deprecated setBoost method

Hi,-

Thanks and i appreciate the disccussion.

Let me please  ask this way, i think i give too much info at one time:

Currently i have this:

  Field  f1= new TextField("field1", "string1", Field.Store.YES); 

doc.add(f1);  f1.setBoost(2.0f);  

Field f2 = new TextField("field2", "string2", Field.Store.YES); 

doc.add(f2); 

f2.setBoost(1.0f);  


But this fails with Lucene 7.7.2.


Probably it is more efficient and more flexible to fix this by using
BoostQuery.

However, what could be the fix with index time boosting? the code in my
previous post was trying to do that.

Best regards


On 10/21/19 12:34 PM, Uwe Schindler wrote:

Hi,

sorry I don't fully understand what you intend to do? If the boost values

per field are static and used with exactly same value for every document,

it's

not needed a index time. You can just boost the field on the query side

(e.g.

using BoostQuery). Boosting every document with the same static values

is

an anti-pattern, that's something better suited for the query side - as you

are

more flexible.

If you need a different boost value per document, you can save that

boost

value in the index per document using a docvalues field (this consumes

extra

space, of course). Then you need the ExpressionQuery on the query side.

But

just because it looks like Javascript, it's not slow. The syntax is compiled to
bytecode and directly in

RE: Index-time boosting: Deprecated setBoost method

2019-10-21 Thread Uwe Schindler

Hi Boris,

> That is ok, and i can see this case would be best with BoostQuery and
> also i dont have to use lucene expression jar and its dependents.
> 
> However, i am curious how to do this kind of field based boosting at
> index time even though i will prefer the query time boosting methodology.

The reason why it was deprecated is exactly the problem I mentioned before: It 
did never do what the user expected. The boost factor given in the document's 
field was multiplied into the per document norms. Unfortunately, at the same 
time, he query normalization was using query statistics and normalized the 
scores. As Lucene is working per field, the same normalization is done per 
field, resulting in the constant factor per field to disappear. There was still 
some effect of index time boosting if different documents had different values, 
but it your case all is the same. I am not sure how your queries worked before, 
but the constant boost factors per field at index time did definitely not have 
the effect you were thinking of. Since the earliest version of Lucene, boosting 
at query time was the way to go to have different weights per field.

The new feature in Lucene is now that you can change the score per document 
using docvalues and apply that per document at query time. Previously this was 
also possible with Document/Field#setBoost, but the flexibility was missing 
(only multiplying and limited precision). In addition the normalization effects 
made the whole thing not reliable.

Uwe

> Best regards
> 
> 
> On 10/21/19 12:54 PM, Uwe Schindler wrote:
> > Hi,
> >
> > As I said, before that is a misuse of index-time boosting. In addition in
> previous versions it did not even work correctly, because of query
> normalization it was normalized away anyways. And on top, to change it
> your have to reindex.
> >
> > What you intend to do is a typical use case for query time boosting with
> BoostQuery. That is explained in almost every book about search, like those
> about Solr or Elasticsearch.
> >
> > Most query parsers also allow to also add boost factors for fields, e.g.
> SimpleQueryParser (for humans that need simple syntax without fields).
> There you give a list of fields and boost factors.
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> BKNeyLlULCbaezrgocEvPhQkl4&m=r7LRZQV82ywkycV4mBw1baHDKxar0wnm
> JtLLTiUC0wI&s=Zj32e0QqmZFvPbBlD8DPeh7KHYfOgQr89wvmaRvy_n8&e=
> > eMail: u...@thetaphi.de
> >
> >> -Original Message-
> >> From: baris.ka...@oracle.com 
> >> Sent: Monday, October 21, 2019 6:45 PM
> >> To: java-user@lucene.apache.org
> >> Cc: baris.kazar 
> >> Subject: Re: Index-time boosting: Deprecated setBoost method
> >>
> >> Hi,-
> >>
> >> Thanks and i appreciate the disccussion.
> >>
> >> Let me please  ask this way, i think i give too much info at one time:
> >>
> >> Currently i have this:
> >>
> >>   Field  f1= new TextField("field1", "string1", Field.Store.YES); 
> >>
> >> doc.add(f1);  f1.setBoost(2.0f);  
> >>
> >> Field f2 = new TextField("field2", "string2", Field.Store.YES); 
> >>
> >> doc.add(f2); 
> >>
> >> f2.setBoost(1.0f);  
> >>
> >>
> >> But this fails with Lucene 7.7.2.
> >>
> >>
> >> Probably it is more efficient and more flexible to fix this by using
> >> BoostQuery.
> >>
> >> However, what could be the fix with index time boosting? the code in my
> >> previous post was trying to do that.
> >>
> >> Best regards
> >>
> >>
> >> On 10/21/19 12:34 PM, Uwe Schindler wrote:
> >>> Hi,
> >>>
> >>> sorry I don't fully understand what you intend to do? If the boost values
> >> per field are static and used with exactly same value for every document,
> it's
> >> not needed a index time. You can just boost the field on the query side
> (e.g.
> >> using BoostQuery). Boosting every document with the same static values
> is
> >> an anti-pattern, that's something better suited for the query side - as you
> are
> >> more flexible.
> >>> If you need a different boost value per document, you can save that
> boost
> >> value in the index per document using a docvalues field (this consumes
> extra
> >> space, of course). Then you need the ExpressionQuery on the query side.
> But
> >> just because it looks like Javascript, it's not slow. The syntax is 
> >> compiled to
> >> bytecode and directly included into the query execution as a dynamic java
> >> class, so it's very fast.
> >>> In short:
> >>> - If you need to have a different boost factor per field name that's
> constant
> >> for all documents, apply it at query time with BoostQuery.
> >>> - If you have to boost specific documents (e.g., top selling products),
> index
> >> a numeric docvalues field per document. On the query side you can use
> >> different query types to modify the score of each result based on the
> >> docvalues field. That can be done

ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

2019-10-21 Thread Shifflett, David [USA]

Hi all,
Using the code snippet:
ComplexPhraseQueryParser qp = new ComplexPhraseQueryParser(“somefield”, new 
StandardAnalyzer());
String teststr = "\"Foo Bar\"~2";
Query queryToSearch = qp.parse(teststr);
System.out.println("Query : " + queryToSearch.toString());
System.out.println("Type of query : " + 
queryToSearch.getClass().getSimpleName());

I am getting the output
Query : "Foo Bar"~2
Type of query : ComplexPhraseQuery

If I change teststr to "\"Foo Bar\""
I get
Query : "Foo Bar"
Type of query : ComplexPhraseQuery

If I change teststr to "Foo Bar"
I get
Query : content:foo content:bar
Type of query : BooleanQuery


In the first two cases I was expecting the search terms to be switched to 
lowercase.

Were the Foo and Bar left as originally specified because the terms are inside 
double quotes?

How can I specify a search term that I want treated as a Phrase,
but also have the query parser apply the LowerCaseFilter?

I am hoping to avoid the need to handle this using PhraseQuery,
and continue to use the QueryParser.


Thanks in advance for any help you can give me,
David Shifflett

Re: Index-time boosting: Deprecated setBoost method

2019-10-21 Thread baris . kazar


Hi,-

That is ok, and i can see this case would be best with BoostQuery and 
also i dont have to use lucene expression jar and its dependents.


However, i am curious how to do this kind of field based boosting at 
index time even though i will prefer the query time boosting methodology.


Best regards


On 10/21/19 12:54 PM, Uwe Schindler wrote:

Hi,

As I said, before that is a misuse of index-time boosting. In addition in 
previous versions it did not even work correctly, because of query 
normalization it was normalized away anyways. And on top, to change it your 
have to reindex.

What you intend to do is a typical use case for query time boosting with 
BoostQuery. That is explained in almost every book about search, like those 
about Solr or Elasticsearch.

Most query parsers also allow to also add boost factors for fields, e.g. 
SimpleQueryParser (for humans that need simple syntax without fields). There 
you give a list of fields and boost factors.

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=r7LRZQV82ywkycV4mBw1baHDKxar0wnmJtLLTiUC0wI&s=Zj32e0QqmZFvPbBlD8DPeh7KHYfOgQr89wvmaRvy_n8&e=
eMail: u...@thetaphi.de


-Original Message-
From: baris.ka...@oracle.com 
Sent: Monday, October 21, 2019 6:45 PM
To: java-user@lucene.apache.org
Cc: baris.kazar 
Subject: Re: Index-time boosting: Deprecated setBoost method

Hi,-

Thanks and i appreciate the disccussion.

Let me please  ask this way, i think i give too much info at one time:

Currently i have this:

  Field  f1= new TextField("field1", "string1", Field.Store.YES); 

doc.add(f1);  f1.setBoost(2.0f);  

Field f2 = new TextField("field2", "string2", Field.Store.YES); 

doc.add(f2); 

f2.setBoost(1.0f);  


But this fails with Lucene 7.7.2.


Probably it is more efficient and more flexible to fix this by using
BoostQuery.

However, what could be the fix with index time boosting? the code in my
previous post was trying to do that.

Best regards


On 10/21/19 12:34 PM, Uwe Schindler wrote:

Hi,

sorry I don't fully understand what you intend to do? If the boost values

per field are static and used with exactly same value for every document, it's
not needed a index time. You can just boost the field on the query side (e.g.
using BoostQuery). Boosting every document with the same static values is
an anti-pattern, that's something better suited for the query side - as you are
more flexible.

If you need a different boost value per document, you can save that boost

value in the index per document using a docvalues field (this consumes extra
space, of course). Then you need the ExpressionQuery on the query side. But
just because it looks like Javascript, it's not slow. The syntax is compiled to
bytecode and directly included into the query execution as a dynamic java
class, so it's very fast.

In short:
- If you need to have a different boost factor per field name that's constant

for all documents, apply it at query time with BoostQuery.

- If you have to boost specific documents (e.g., top selling products), index

a numeric docvalues field per document. On the query side you can use
different query types to modify the score of each result based on the
docvalues field. That can be done with Expression modules (using compiled
Javascript) or by another query in Lucene that operates on ValueSource (e.g.,
FunctionQuery). The first one is easier to use for complex formulas.4

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://urldefense.proofpoint.com/v2/url?u=https-

3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
BKNeyLlULCbaezrgocEvPhQkl4&m=70RoM6loHhMGsp95phVzGQf8w5JxW7gX
T5XnleMKrOs&s=td7cUfd22mXljSuvkUPXDunkIs_eO4GxdvHHxD2CTk0&e=

eMail: u...@thetaphi.de


-Original Message-
From: baris.ka...@oracle.com 
Sent: Monday, October 21, 2019 5:17 PM
To: java-user@lucene.apache.org
Cc: baris.kazar 
Subject: Re: Index-time boosting: Deprecated setBoost method

Hi,-

Sorry about the missing parts in previous post. please accept my
apologies for that.

i needed to add a few more questions/corrections/additions to the
previous post:

Main Question was: if boost is a single constant value, do we need the
Javascript part below?



=== Indexing code snippet for Lucene version 6.6.0 and before===

Document doc = new Document();


  Field  f1= new TextField("field1", "string1", Field.Store.YES); 

doc.add(f1);  f1.setBoost(2.0f);  

Field f2 = new TextField("field2", "string2", Field.Store.YES); 

doc.add(f2); 

f2.setBoost(1.0f);  

=== end of indexing code snippet for Lucene version 6.6.0 and before ===


This turns into this where _boost1 field is associated with field1 and

_boost2 field is associated with field2 field:


In Indexing code:

=== begining of indexing code snippet ===
Field  f1= new TextField("field1",

RE: Index-time boosting: Deprecated setBoost method

2019-10-21 Thread Uwe Schindler

Hi,

As I said, before that is a misuse of index-time boosting. In addition in 
previous versions it did not even work correctly, because of query 
normalization it was normalized away anyways. And on top, to change it your 
have to reindex.

What you intend to do is a typical use case for query time boosting with 
BoostQuery. That is explained in almost every book about search, like those 
about Solr or Elasticsearch.

Most query parsers also allow to also add boost factors for fields, e.g. 
SimpleQueryParser (for humans that need simple syntax without fields). There 
you give a list of fields and boost factors.

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: baris.ka...@oracle.com 
> Sent: Monday, October 21, 2019 6:45 PM
> To: java-user@lucene.apache.org
> Cc: baris.kazar 
> Subject: Re: Index-time boosting: Deprecated setBoost method
> 
> Hi,-
> 
> Thanks and i appreciate the disccussion.
> 
> Let me please  ask this way, i think i give too much info at one time:
> 
> Currently i have this:
> 
>   Field  f1= new TextField("field1", "string1", Field.Store.YES); 
> 
> doc.add(f1);  f1.setBoost(2.0f);  
> 
> Field f2 = new TextField("field2", "string2", Field.Store.YES); 
> 
> doc.add(f2); 
> 
> f2.setBoost(1.0f);  
> 
> 
> But this fails with Lucene 7.7.2.
> 
> 
> Probably it is more efficient and more flexible to fix this by using
> BoostQuery.
> 
> However, what could be the fix with index time boosting? the code in my
> previous post was trying to do that.
> 
> Best regards
> 
> 
> On 10/21/19 12:34 PM, Uwe Schindler wrote:
> > Hi,
> >
> > sorry I don't fully understand what you intend to do? If the boost values
> per field are static and used with exactly same value for every document, it's
> not needed a index time. You can just boost the field on the query side (e.g.
> using BoostQuery). Boosting every document with the same static values is
> an anti-pattern, that's something better suited for the query side - as you 
> are
> more flexible.
> >
> > If you need a different boost value per document, you can save that boost
> value in the index per document using a docvalues field (this consumes extra
> space, of course). Then you need the ExpressionQuery on the query side. But
> just because it looks like Javascript, it's not slow. The syntax is compiled 
> to
> bytecode and directly included into the query execution as a dynamic java
> class, so it's very fast.
> >
> > In short:
> > - If you need to have a different boost factor per field name that's 
> > constant
> for all documents, apply it at query time with BoostQuery.
> > - If you have to boost specific documents (e.g., top selling products), 
> > index
> a numeric docvalues field per document. On the query side you can use
> different query types to modify the score of each result based on the
> docvalues field. That can be done with Expression modules (using compiled
> Javascript) or by another query in Lucene that operates on ValueSource (e.g.,
> FunctionQuery). The first one is easier to use for complex formulas.4
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
> BKNeyLlULCbaezrgocEvPhQkl4&m=70RoM6loHhMGsp95phVzGQf8w5JxW7gX
> T5XnleMKrOs&s=td7cUfd22mXljSuvkUPXDunkIs_eO4GxdvHHxD2CTk0&e=
> > eMail: u...@thetaphi.de
> >
> >> -Original Message-
> >> From: baris.ka...@oracle.com 
> >> Sent: Monday, October 21, 2019 5:17 PM
> >> To: java-user@lucene.apache.org
> >> Cc: baris.kazar 
> >> Subject: Re: Index-time boosting: Deprecated setBoost method
> >>
> >> Hi,-
> >>
> >> Sorry about the missing parts in previous post. please accept my
> >> apologies for that.
> >>
> >> i needed to add a few more questions/corrections/additions to the
> >> previous post:
> >>
> >> Main Question was: if boost is a single constant value, do we need the
> >> Javascript part below?
> >>
> >>
> >>
> >> === Indexing code snippet for Lucene version 6.6.0 and before===
> >>
> >> Document doc = new Document();
> >>
> >>
> >>   Field  f1= new TextField("field1", "string1", Field.Store.YES); 
> >>
> >> doc.add(f1);  f1.setBoost(2.0f);  
> >>
> >> Field f2 = new TextField("field2", "string2", Field.Store.YES); 
> >>
> >> doc.add(f2); 
> >>
> >> f2.setBoost(1.0f);  
> >>
> >> === end of indexing code snippet for Lucene version 6.6.0 and before ===
> >>
> >>
> >> This turns into this where _boost1 field is associated with field1 and
> >>
> >> _boost2 field is associated with field2 field:
> >>
> >>
> >> In Indexing code:
> >>
> >> === begining of indexing code snippet ===
> >> Field  f1= new TextField("field1", "string1", Field.Store.YES); 
> >>
> >> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
> >> doc.add(_boost1);
> >>
> >> // If this boost value needs to be stored, a separate sto

Re: Index-time boosting: Deprecated setBoost method

2019-10-21 Thread baris . kazar


Hi,-

Thanks and i appreciate the disccussion.

Let me please  ask this way, i think i give too much info at one time:

Currently i have this:

  Field  f1= new TextField("field1", "string1", Field.Store.YES); 

doc.add(f1);  f1.setBoost(2.0f);  

Field f2 = new TextField("field2", "string2", Field.Store.YES); 

doc.add(f2); 

f2.setBoost(1.0f);  


But this fails with Lucene 7.7.2.


Probably it is more efficient and more flexible to fix this by using 
BoostQuery.


However, what could be the fix with index time boosting? the code in my 
previous post was trying to do that.


Best regards


On 10/21/19 12:34 PM, Uwe Schindler wrote:

Hi,

sorry I don't fully understand what you intend to do? If the boost values per 
field are static and used with exactly same value for every document, it's not 
needed a index time. You can just boost the field on the query side (e.g. using 
BoostQuery). Boosting every document with the same static values is an 
anti-pattern, that's something better suited for the query side - as you are 
more flexible.

If you need a different boost value per document, you can save that boost value 
in the index per document using a docvalues field (this consumes extra space, 
of course). Then you need the ExpressionQuery on the query side. But just 
because it looks like Javascript, it's not slow. The syntax is compiled to 
bytecode and directly included into the query execution as a dynamic java 
class, so it's very fast.

In short:
- If you need to have a different boost factor per field name that's constant 
for all documents, apply it at query time with BoostQuery.
- If you have to boost specific documents (e.g., top selling products), index a 
numeric docvalues field per document. On the query side you can use different 
query types to modify the score of each result based on the docvalues field. 
That can be done with Expression modules (using compiled Javascript) or by 
another query in Lucene that operates on ValueSource (e.g., FunctionQuery). The 
first one is easier to use for complex formulas.4

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=70RoM6loHhMGsp95phVzGQf8w5JxW7gXT5XnleMKrOs&s=td7cUfd22mXljSuvkUPXDunkIs_eO4GxdvHHxD2CTk0&e=
eMail: u...@thetaphi.de


-Original Message-
From: baris.ka...@oracle.com 
Sent: Monday, October 21, 2019 5:17 PM
To: java-user@lucene.apache.org
Cc: baris.kazar 
Subject: Re: Index-time boosting: Deprecated setBoost method

Hi,-

Sorry about the missing parts in previous post. please accept my
apologies for that.

i needed to add a few more questions/corrections/additions to the
previous post:

Main Question was: if boost is a single constant value, do we need the
Javascript part below?



=== Indexing code snippet for Lucene version 6.6.0 and before===

Document doc = new Document();


  Field  f1= new TextField("field1", "string1", Field.Store.YES); 

doc.add(f1);  f1.setBoost(2.0f);  

Field f2 = new TextField("field2", "string2", Field.Store.YES); 

doc.add(f2); 

f2.setBoost(1.0f);  

=== end of indexing code snippet for Lucene version 6.6.0 and before ===


This turns into this where _boost1 field is associated with field1 and

_boost2 field is associated with field2 field:


In Indexing code:

=== begining of indexing code snippet ===
Field  f1= new TextField("field1", "string1", Field.Store.YES); 

Field _boost1 = new NumericDocValuesField(“field1”, 2L);
doc.add(_boost1);

// If this boost value needs to be stored, a separate storedField
instance needs to be added as well
… ( i will post this soon)

Field _boost2 = new NumericDocValuesField(“field2”, 1L);
doc.add(_boost2);

// If this boost value needs to be stored, a separate storedField
instance needs to be added as well
… ( i will post this soon)

=== end of indexing code snippet ===


Now, in the searching code (i.e., at query time) should i need the
FunctionScoreQuery because in this case

the boost is just a constant value but not a function? However, constant
value can be argued to be a function with the same value all the time, too.


== begining of query time code snippet ===
Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2");

  // SimpleBindings just maps variables to SortField instances 

SimpleBindings bindings = new SimpleBindings(); 

bindings.add(new SortField("_boost1", SortField.Type.LONG));   // These
have to LONG type i think since NumericDocValuesField accepts "long"
type only, am i right? Can this be DOUBLE type?

bindings.add(new SortField("_boost2", SortField.Type.LONG));   // same
question here

// create a query that matches based on body:contents but 

// scores using expr 

Query query = new FunctionScoreQuery( 

  new TermQuery(new Term("field1", "term_to_look_for")), 

expr.getDoubleValuesSource(bindings));

 searcher.search(query, 10);

=== end of code sn

RE: Index-time boosting: Deprecated setBoost method

2019-10-21 Thread Uwe Schindler

Hi,

sorry I don't fully understand what you intend to do? If the boost values per 
field are static and used with exactly same value for every document, it's not 
needed a index time. You can just boost the field on the query side (e.g. using 
BoostQuery). Boosting every document with the same static values is an 
anti-pattern, that's something better suited for the query side - as you are 
more flexible.

If you need a different boost value per document, you can save that boost value 
in the index per document using a docvalues field (this consumes extra space, 
of course). Then you need the ExpressionQuery on the query side. But just 
because it looks like Javascript, it's not slow. The syntax is compiled to 
bytecode and directly included into the query execution as a dynamic java 
class, so it's very fast.

In short:
- If you need to have a different boost factor per field name that's constant 
for all documents, apply it at query time with BoostQuery.
- If you have to boost specific documents (e.g., top selling products), index a 
numeric docvalues field per document. On the query side you can use different 
query types to modify the score of each result based on the docvalues field. 
That can be done with Expression modules (using compiled Javascript) or by 
another query in Lucene that operates on ValueSource (e.g., FunctionQuery). The 
first one is easier to use for complex formulas.4

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: baris.ka...@oracle.com 
> Sent: Monday, October 21, 2019 5:17 PM
> To: java-user@lucene.apache.org
> Cc: baris.kazar 
> Subject: Re: Index-time boosting: Deprecated setBoost method
> 
> Hi,-
> 
> Sorry about the missing parts in previous post. please accept my
> apologies for that.
> 
> i needed to add a few more questions/corrections/additions to the
> previous post:
> 
> Main Question was: if boost is a single constant value, do we need the
> Javascript part below?
> 
> 
> 
> === Indexing code snippet for Lucene version 6.6.0 and before===
> 
> Document doc = new Document();
> 
> 
>   Field  f1= new TextField("field1", "string1", Field.Store.YES); 
> 
> doc.add(f1);  f1.setBoost(2.0f);  
> 
> Field f2 = new TextField("field2", "string2", Field.Store.YES); 
> 
> doc.add(f2); 
> 
> f2.setBoost(1.0f);  
> 
> === end of indexing code snippet for Lucene version 6.6.0 and before ===
> 
> 
> This turns into this where _boost1 field is associated with field1 and
> 
> _boost2 field is associated with field2 field:
> 
> 
> In Indexing code:
> 
> === begining of indexing code snippet ===
> Field  f1= new TextField("field1", "string1", Field.Store.YES); 
> 
> Field _boost1 = new NumericDocValuesField(“field1”, 2L);
> doc.add(_boost1);
> 
> // If this boost value needs to be stored, a separate storedField
> instance needs to be added as well
> … ( i will post this soon)
> 
> Field _boost2 = new NumericDocValuesField(“field2”, 1L);
> doc.add(_boost2);
> 
> // If this boost value needs to be stored, a separate storedField
> instance needs to be added as well
> … ( i will post this soon)
> 
> === end of indexing code snippet ===
> 
> 
> Now, in the searching code (i.e., at query time) should i need the
> FunctionScoreQuery because in this case
> 
> the boost is just a constant value but not a function? However, constant
> value can be argued to be a function with the same value all the time, too.
> 
> 
> == begining of query time code snippet ===
> Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2");
> 
>   // SimpleBindings just maps variables to SortField instances 
> 
> SimpleBindings bindings = new SimpleBindings(); 
> 
> bindings.add(new SortField("_boost1", SortField.Type.LONG));   // These
> have to LONG type i think since NumericDocValuesField accepts "long"
> type only, am i right? Can this be DOUBLE type?
> 
> bindings.add(new SortField("_boost2", SortField.Type.LONG));   // same
> question here
> 
> // create a query that matches based on body:contents but 
> 
> // scores using expr 
> 
> Query query = new FunctionScoreQuery( 
> 
>  new TermQuery(new Term("field1", "term_to_look_for")), 
> 
> expr.getDoubleValuesSource(bindings));
> 
>  searcher.search(query, 10);
> 
> === end of code snippet ===
> 
> 
> Best regards
> 
> 
> On 10/21/19 11:05 AM, baris.ka...@oracle.com wrote:
> > Hi,-
> >
> >  i would like to ask the following to make it clearer (for me at least):
> >
> > Document doc = new Document();
> >
> >   Field  f1= new TextField("field1", "string1", Field.Store.YES); 
> >
> > doc.add(f1);  f1.setBoost(2.0f);  
> >
> > Field f2 = new TextField("field2", "string2", Field.Store.YES); 
> >
> > doc.add(f2); 
> >
> > f2.setBoost(1.0f);  
> >
> >
> > This turns into this where _boost1 field is associated with field1 and
> >
> > _boost2 field is associated with field2 field:
> >
> >
> > In Indexing code:
> >
> > Field  f1= new TextField("field1", "string1", Field

Re: Index-time boosting: Deprecated setBoost method

2019-10-21 Thread baris . kazar


Hi,-

Sorry about the missing parts in previous post. please accept my 
apologies for that.


i needed to add a few more questions/corrections/additions to the 
previous post:


Main Question was: if boost is a single constant value, do we need the 
Javascript part below?




=== Indexing code snippet for Lucene version 6.6.0 and before===

Document doc = new Document();


  Field  f1= new TextField("field1", "string1", Field.Store.YES); 

doc.add(f1);  f1.setBoost(2.0f);  

Field f2 = new TextField("field2", "string2", Field.Store.YES); 

doc.add(f2); 

f2.setBoost(1.0f);  

=== end of indexing code snippet for Lucene version 6.6.0 and before ===


This turns into this where _boost1 field is associated with field1 and

_boost2 field is associated with field2 field:


In Indexing code:

=== begining of indexing code snippet ===
Field  f1= new TextField("field1", "string1", Field.Store.YES); 

Field _boost1 = new NumericDocValuesField(“field1”, 2L);
doc.add(_boost1);

// If this boost value needs to be stored, a separate storedField 
instance needs to be added as well

… ( i will post this soon)

Field _boost2 = new NumericDocValuesField(“field2”, 1L);
doc.add(_boost2);

// If this boost value needs to be stored, a separate storedField 
instance needs to be added as well

… ( i will post this soon)

=== end of indexing code snippet ===


Now, in the searching code (i.e., at query time) should i need the 
FunctionScoreQuery because in this case


the boost is just a constant value but not a function? However, constant 
value can be argued to be a function with the same value all the time, too.



== begining of query time code snippet ===
Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2");

  // SimpleBindings just maps variables to SortField instances 

SimpleBindings bindings = new SimpleBindings(); 

bindings.add(new SortField("_boost1", SortField.Type.LONG));   // These 
have to LONG type i think since NumericDocValuesField accepts "long" 
type only, am i right? Can this be DOUBLE type?


bindings.add(new SortField("_boost2", SortField.Type.LONG));   // same 
question here


// create a query that matches based on body:contents but 

// scores using expr 

Query query = new FunctionScoreQuery( 

    new TermQuery(new Term("field1", "term_to_look_for")), 

expr.getDoubleValuesSource(bindings));

 searcher.search(query, 10);

=== end of code snippet ===


Best regards


On 10/21/19 11:05 AM, baris.ka...@oracle.com wrote:

Hi,-

 i would like to ask the following to make it clearer (for me at least):

Document doc = new Document();

  Field  f1= new TextField("field1", "string1", Field.Store.YES); 

doc.add(f1);  f1.setBoost(2.0f);  

Field f2 = new TextField("field2", "string2", Field.Store.YES); 

doc.add(f2); 

f2.setBoost(1.0f);  


This turns into this where _boost1 field is associated with field1 and

_boost2 field is associated with field2 field:


In Indexing code:

Field  f1= new TextField("field1", "string1", Field.Store.YES); 

Field _boost1 = new NumericDocValuesField(“field1”, 2L);
doc.add(_boost1);

// If this boost value needs to be stored, a separate storedField 
instance needs to be added as well

… ( i will post this soon)

Field _boost2 = new NumericDocValuesField(“field2”, 1L);
doc.add(_boost2);

// If this boost value needs to be stored, a separate storedField 
instance needs to be added as well

… ( i will post this soon)


Now, in the searching code (i.e., at query time) should i need the 
FunctionScoreQuery because in this case


the boost is just a constant value but not a function? However, 
constant value can be argued to be a function with the same value all 
the time, too.



Expression expr = JavascriptCompiler.compile(“_boost");

  // SimpleBindings just maps variables to SortField instances 

SimpleBindings bindings = new SimpleBindings(); 

bindings.add(new SortField("_boost1", SortField.Type.SCORE));   

// create a query that matches based on body:contents but 

// scores using expr 

Query query = new FunctionScoreQuery( 

    new TermQuery(new Term("field1", "term_to_look_for")), 

expr.getDoubleValuesSource(bindings));

 searcher.search(query, 10);


So, if boost is a single constant value, do we need the Javascript 
part above?


Best regards


On 10/18/19 4:07 PM, baris.ka...@oracle.com wrote:

Uwe,-

 can this 
https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=MR2S9Z9HEge6s665mtGOFRHKGmuiVYkjp4tXOciYl7A&s=tMCjb5H5KivfJsp-BfABonpjelgp6hn9cBg2GScCmic&e= 
doc example that You also gave be extended with NumericDocValuesField 
part that needs to be done at indexing time boosting, too?


i see now why You meant that this is mixed type of boosting (i.e., 
both indexing time and search time).


I need then include this query mentioned in this example on these 
_scor

Re: Index-time boosting: Deprecated setBoost method

2019-10-21 Thread baris . kazar


Hi,-

 i would like to ask the following to make it clearer (for me at least):

Document doc = new Document();

  Field  f1= new TextField("field1", "string1", Field.Store.YES); 

doc.add(f1);  f1.setBoost(2.0f);  

Field f2 = new TextField("field2", "string2", Field.Store.YES); 

doc.add(f2); 

f2.setBoost(1.0f);  


This turns into this where _boost1 field is associated with field1 and

_boost2 field is associated with field2 field:


In Indexing code:

Field  f1= new TextField("field1", "string1", Field.Store.YES); 

Field _boost1 = new NumericDocValuesField(“field1”, 2L);
doc.add(_boost1);

// If this boost value needs to be stored, a separate storedField 
instance needs to be added as well

… ( i will post this soon)

Field _boost2 = new NumericDocValuesField(“field2”, 1L);
doc.add(_boost2);

// If this boost value needs to be stored, a separate storedField 
instance needs to be added as well

… ( i will post this soon)


Now, in the searching code (i.e., at query time) should i need the 
FunctionScoreQuery because in this case


the boost is just a constant value but not a function? However, constant 
value can be argued to be a function with the same value all the time, too.



Expression expr = JavascriptCompiler.compile(“_boost");

  // SimpleBindings just maps variables to SortField instances 

SimpleBindings bindings = new SimpleBindings(); 

bindings.add(new SortField("_boost1", SortField.Type.SCORE));   

// create a query that matches based on body:contents but 

// scores using expr 

Query query = new FunctionScoreQuery( 

    new TermQuery(new Term("field1", "term_to_look_for")), 

expr.getDoubleValuesSource(bindings));

 searcher.search(query, 10);


So, if boost is a single constant value, do we need the Javascript part 
above?


Best regards


On 10/18/19 4:07 PM, baris.ka...@oracle.com wrote:

Uwe,-

 can this 
https://lucene.apache.org/core/7_7_2/expressions/org/apache/lucene/expressions/Expression.html 
doc example that You also gave be extended with NumericDocValuesField 
part that needs to be done at indexing time boosting, too?


i see now why You meant that this is mixed type of boosting (i.e., 
both indexing time and search time).


I need then include this query mentioned in this example on these 
_score field (i would call it _boost field in my case) into my overall 
BooleanQuery.


i will now try to combine these together and post here for future help.

Best regards


On 10/18/19 3:18 PM, Uwe Schindler wrote:

Hi,

Read my original email! The index time values are written using 
NumericDocValuesField. The expressions docs also refer to that when 
the bindings are documented.


It's separate from the indexed data (TextField). Think of it like an 
additional numeric field in your database table with a factor in each 
row.


Uwe

Am October 18, 2019 7:14:03 PM UTC schrieb baris.ka...@oracle.com:

Uwe,-

Two questions there:

i guess this is applicable to TextField, too.

And i was expecting a index writer object in the example for index time

boosting.

Best regards


On 10/18/19 2:57 PM, Uwe Schindler wrote:

Sorry I was imprecise. It's a mix of both. The factors are stored per

document in index (this is why I called it index time). During query
time the expression use the index time values to fold them into the
query boost at query time.

What's your problem with that approach?

Uwe

Am October 18, 2019 6:50:40 PM UTC schrieb baris.ka...@oracle.com:

Uwe,-

   Thanks, if possible i am looking for a pure Java methodology to do

the

index time boosting.

This example looks like a search time boosting example:


https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e= 




Best regards

On 10/18/19 2:31 PM, Uwe Schindler wrote:

Hi,


Is there a working example for this? Is this mentioned in the

Lucene

Javadocs or any other docs so that i can look it?

To index the docvalues, see NumericDocValuesField (it can be added

to

documents like indexed or stored fields). You may have used them for
sorting already.

this methodology seems sort of like discouraging using index time

boosting.

Not really. Many use this all the time. It's one of the killer

features of both Solr and Elasticsearch. The problem was how the
Document.setBoost()worked (it did not work correctly, see below).

Previous setBoost method call was fine and easy to use.
Did it have some performance issues and then is that why it was

deprecated?

No the reason for deprecating this was for several reasons:

setBoost

was not doing what the user had expected. Internally the boost value
was just multiplied into the document norm factor (which is

internally

also a docvalues field). The norm factors are only very inprecise
floats stored i

Re: Parameterized queries in Lucene

2019-10-21 Thread Atri Sharma

I am curious — what use case are you targeting to solve here?

In relational world, this is useful primarily due to the fact that prepared
statements eliminate the need for re planning the query, thus saving the
cost of iterating over a potentially large combinatorial space. However,
for Lucene, there isn’t so much of a concept of a query plan (yet). Some
queries try to achieve that (IndexOrDocValuesQuery for eg), but it is a far
cry from what relational databases expose.

Atri

On Mon, 21 Oct 2019 at 17:42, Stamatis Zampetakis  wrote:

> Hi al
> In the world of relational databases and SQL, the existence of
> parameterized queries (aka. PreparedStatement) offers many advantages in
> terms of security and performance.
>
> I guess everybody is familiar with the idea that you prepare a statement
> and then you execute it multiple times by just changing certain parameters.
> A simple use case for demonstrating the idea
> is shown below:
>
> Query q = ... // An arbitrary complex query with a part that has a single
> parameter of type int
> for (int i=0; i<100; i++) {
>   int paramValue = i;
>   q.visit(new ParameterSetter(paramValue));
>   TopDocs docs = searcher.search(q, 10);
> }
>
> Note that this is a very simplistic use case and does not correspond to the
> reality where the construction and execution are not done side by side.
>
> I already implemented something to satisfy use-cases like the one shown
> above by introducing a new subclass of Query. However, I was wondering if
> there is already a mechanism to compile and execute queries with parameters
> in Lucene and I am just reinventing the wheel.
>
> Feedback is much appreciated!
>
> Best,
> Stamatis
>
-- 
Regards,

Atri
Apache Concerted

Parameterized queries in Lucene

2019-10-21 Thread Stamatis Zampetakis

Hi all,

In the world of relational databases and SQL, the existence of
parameterized queries (aka. PreparedStatement) offers many advantages in
terms of security and performance.

I guess everybody is familiar with the idea that you prepare a statement
and then you execute it multiple times by just changing certain parameters.
A simple use case for demonstrating the idea
is shown below:

Query q = ... // An arbitrary complex query with a part that has a single
parameter of type int
for (int i=0; i<100; i++) {
  int paramValue = i;
  q.visit(new ParameterSetter(paramValue));
  TopDocs docs = searcher.search(q, 10);
}

Note that this is a very simplistic use case and does not correspond to the
reality where the construction and execution are not done side by side.

I already implemented something to satisfy use-cases like the one shown
above by introducing a new subclass of Query. However, I was wondering if
there is already a mechanism to compile and execute queries with parameters
in Lucene and I am just reinventing the wheel.

Feedback is much appreciated!

Best,
Stamatis

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

Re: Iterating Over All Documents On a Changing Index

Re: where can I register a new scorer in an existing query?

Re: Index-time boosting: Deprecated setBoost method

Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

Re: Index-time boosting: Deprecated setBoost method

RE: Index-time boosting: Deprecated setBoost method

ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer

Re: Index-time boosting: Deprecated setBoost method

RE: Index-time boosting: Deprecated setBoost method

Re: Index-time boosting: Deprecated setBoost method

RE: Index-time boosting: Deprecated setBoost method

Re: Index-time boosting: Deprecated setBoost method

Re: Index-time boosting: Deprecated setBoost method

Re: Parameterized queries in Lucene

Parameterized queries in Lucene

17 matches

Site Navigation

Mail list logo

Footer information