Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer
I wonder if this repeats in version 7.7.2, too? Best regards On 10/21/19 5:22 PM, Shifflett, David [USA] wrote: Baris, Sorry I neglected to add that piece. This test was run against 8.0.0, but I also want it to work in later versions. Another piece of my project is using 8.2.0. Thanks again for any info, David Shifflett On 10/21/19, 3:23 PM, "baris.ka...@oracle.com" wrote: David,- which version of Lucene are You using? Best regards On 10/21/19 1:31 PM, Shifflett, David [USA] wrote: > Hi all, > Using the code snippet: > ComplexPhraseQueryParser qp = new ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer()); > String teststr = "\"Foo Bar\"~2"; > Query queryToSearch = qp.parse(teststr); > System.out.println("Query : " + queryToSearch.toString()); > System.out.println("Type of query : " + queryToSearch.getClass().getSimpleName()); > > I am getting the output > Query : "Foo Bar"~2 > Type of query : ComplexPhraseQuery > > If I change teststr to "\"Foo Bar\"" > I get > Query : "Foo Bar" > Type of query : ComplexPhraseQuery > > If I change teststr to "Foo Bar" > I get > Query : content:foo content:bar > Type of query : BooleanQuery > > > In the first two cases I was expecting the search terms to be switched to lowercase. > > Were the Foo and Bar left as originally specified because the terms are inside double quotes? > > How can I specify a search term that I want treated as a Phrase, > but also have the query parser apply the LowerCaseFilter? > > I am hoping to avoid the need to handle this using PhraseQuery, > and continue to use the QueryParser. > > > Thanks in advance for any help you can give me, > David Shifflett > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: [External] Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer
Baris, Sorry I neglected to add that piece. This test was run against 8.0.0, but I also want it to work in later versions. Another piece of my project is using 8.2.0. Thanks again for any info, David Shifflett On 10/21/19, 3:23 PM, "baris.ka...@oracle.com" wrote: David,- which version of Lucene are You using? Best regards On 10/21/19 1:31 PM, Shifflett, David [USA] wrote: > Hi all, > Using the code snippet: > ComplexPhraseQueryParser qp = new ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer()); > String teststr = "\"Foo Bar\"~2"; > Query queryToSearch = qp.parse(teststr); > System.out.println("Query : " + queryToSearch.toString()); > System.out.println("Type of query : " + queryToSearch.getClass().getSimpleName()); > > I am getting the output > Query : "Foo Bar"~2 > Type of query : ComplexPhraseQuery > > If I change teststr to "\"Foo Bar\"" > I get > Query : "Foo Bar" > Type of query : ComplexPhraseQuery > > If I change teststr to "Foo Bar" > I get > Query : content:foo content:bar > Type of query : BooleanQuery > > > In the first two cases I was expecting the search terms to be switched to lowercase. > > Were the Foo and Bar left as originally specified because the terms are inside double quotes? > > How can I specify a search term that I want treated as a Phrase, > but also have the query parser apply the LowerCaseFilter? > > I am hoping to avoid the need to handle this using PhraseQuery, > and continue to use the QueryParser. > > > Thanks in advance for any help you can give me, > David Shifflett > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Iterating Over All Documents On a Changing Index
This is the right place to ask these questions indeed. This is a good way to iterate over documents. Regarding your 2nd question, Lucene IndexReaders are point-in-time views of the data, so changes won't become visible in-place. The tricky problem with this kind of problem is usually to deal with documents that are getting indexed after you pulled a new reader and while you are in the process of reindexing. On Sat, Oct 19, 2019 at 1:35 AM Matt Davis wrote: > > Hi All, > > I am working on implementing of an in place reindex using Lucene. In my > case, I have BSON document stored in a binary field and have a set of rules > that pull fields out of the BSON and indexes them into different Lucene > fields with different analyzers. I would like to be able to change these > rules / schema and then iterate over the documents, indexing them using the > new schema. > > I have come up with the following code block: > https://gist.github.com/mdavis95/f600e0a8233d0a1232eff77645d1dc8a > > I have two questions: > 1) Is this a good way to iterate over the documents > 2) How can I manage documents changing when I am doing this. New documents > coming in should be fine I believe but changes to existing documents could > be lost if I understand correctly. > > I hope that this is the right place to ask this question and I apologize if > this is obvious or has been asked and answered. > > Thanks, > Matt -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: where can I register a new scorer in an existing query?
You could iterate manually over the leaves (IndexReader#leaves) of your IndexReader and call Weight#matches on every leaf? On Sat, Oct 19, 2019 at 7:41 PM Yoav Goldberg wrote: > > Hello, > > Is there a way to supply a new Scorer implementation to an existing query? > From what I've been able to understand, the only way to provide a new > scorer is to change the scorer() method Weight, which in itself requires > implementing a new Weight, which in itself requires implementing a whole > new Query. Is there something I am missing here? > > To be more concrete about what I want to achieve (maybe there is a > different / better way): > I would like to collect the *match positions* of several sub-queries, for > all matching documents. The ideal interface would be to perform search as > usual, and supply a collector that has access and collects this > information. Alternatively, to have it available in the returned results > object. > > The needed information is available during search (in the internal > iterators), but I did not find a way to access it. The Weight.matches() and > SpanWeight.getSpans() methods return the iterators I'd need, but they also > require a LeafReaderContext, which I believe is only available during > search? So I thought I'd create a scorer (that gets a context and gets > called for the relevant document), but I do not see a way to supply a > custom scorer. > > Any tips are greatly appreciated. > > Thanks! > > Yoav -- Adrien - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Index-time boosting: Deprecated setBoost method
No. That's how you do it: BooleanQuery with 2 should clauses. Or use a different query parser that offers this out of box. Uwe Am October 21, 2019 7:16:01 PM UTC schrieb baris.ka...@oracle.com: >Hi,- > >Thanks. > > lets apply to this case: > >QueryParser parser = new QueryParser("field1", analyzer) ; >parser.setPhraseSlop(2); >Query query = parser.parse("some string value here"+"*"); >TopDocs hits = indexsearcherObject.search(query, 10); > >Now i want to use BoostQuery > >QueryParser parser = new QueryParser("field1", analyzerObject) ; >parser.setPhraseSlop(2); >Query query = parser.parse("some string value here"+"*"); > >BoostQuery bq = new BoostQuery(query, "2.0f"); > >TopDocs hits = indexsearcherObject.search(bq, 10); > > >Now how will i process field2 with boost value 1.0f? > >Before, this was being done at index time. > > >i can see the only way here is the BooleanQuery which combines > >the first boostquery object bq and another one that i need to define >for >bq2 for field2. > >is there any other way? > >Best regards > > > >On 10/21/19 2:33 PM, Uwe Schindler wrote: >> Hi Boris, >> >>> That is ok, and i can see this case would be best with BoostQuery >and >>> also i dont have to use lucene expression jar and its dependents. >>> >>> However, i am curious how to do this kind of field based boosting at >>> index time even though i will prefer the query time boosting >methodology. >> The reason why it was deprecated is exactly the problem I mentioned >before: It did never do what the user expected. The boost factor given >in the document's field was multiplied into the per document norms. >Unfortunately, at the same time, he query normalization was using query >statistics and normalized the scores. As Lucene is working per field, >the same normalization is done per field, resulting in the constant >factor per field to disappear. There was still some effect of index >time boosting if different documents had different values, but it your >case all is the same. I am not sure how your queries worked before, but >the constant boost factors per field at index time did definitely not >have the effect you were thinking of. Since the earliest version of >Lucene, boosting at query time was the way to go to have different >weights per field. >> >> The new feature in Lucene is now that you can change the score per >document using docvalues and apply that per document at query time. >Previously this was also possible with Document/Field#setBoost, but the >flexibility was missing (only multiplying and limited precision). In >addition the normalization effects made the whole thing not reliable. >> >> Uwe >> >>> Best regards >>> >>> >>> On 10/21/19 12:54 PM, Uwe Schindler wrote: Hi, As I said, before that is a misuse of index-time boosting. In >addition in >>> previous versions it did not even work correctly, because of query >>> normalization it was normalized away anyways. And on top, to change >it >>> your have to reindex. What you intend to do is a typical use case for query time boosting >with >>> BoostQuery. That is explained in almost every book about search, >like those >>> about Solr or Elasticsearch. Most query parsers also allow to also add boost factors for fields, >e.g. >>> SimpleQueryParser (for humans that need simple syntax without >fields). >>> There you give a list of fields and boost factors. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://urldefense.proofpoint.com/v2/url?u=https- >>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr >>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX- >>> BKNeyLlULCbaezrgocEvPhQkl4&m=r7LRZQV82ywkycV4mBw1baHDKxar0wnm >>> JtLLTiUC0wI&s=Zj32e0QqmZFvPbBlD8DPeh7KHYfOgQr89wvmaRvy_n8&e= eMail: u...@thetaphi.de > -Original Message- > From: baris.ka...@oracle.com > Sent: Monday, October 21, 2019 6:45 PM > To: java-user@lucene.apache.org > Cc: baris.kazar > Subject: Re: Index-time boosting: Deprecated setBoost method > > Hi,- > > Thanks and i appreciate the disccussion. > > Let me please ask this way, i think i give too much info at one >time: > > Currently i have this: > > > >Field f1= new TextField("field1", "string1", Field.Store.YES); > > > doc.add(f1); >f1.setBoost(2.0f); > > > > Field f2 = new TextField("field2", "string2", Field.Store.YES); > > > doc.add(f2); > > > f2.setBoost(1.0f); > > > > > But this fails with Lucene 7.7.2. > > > Probably it is more efficient and more flexible to fix this by >using > BoostQuery. > > However, what could be the fix with index time boosting? the code >in my > previous post was trying to do that. > > Best regards > > > On 10/21/19 12:34 PM, Uwe Schindler wrote: >> Hi, >> >> sorry I don't fully understand what you intend to do? If the >boost values > per field are static and u
Re: ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer
David,- which version of Lucene are You using? Best regards On 10/21/19 1:31 PM, Shifflett, David [USA] wrote: Hi all, Using the code snippet: ComplexPhraseQueryParser qp = new ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer()); String teststr = "\"Foo Bar\"~2"; Query queryToSearch = qp.parse(teststr); System.out.println("Query : " + queryToSearch.toString()); System.out.println("Type of query : " + queryToSearch.getClass().getSimpleName()); I am getting the output Query : "Foo Bar"~2 Type of query : ComplexPhraseQuery If I change teststr to "\"Foo Bar\"" I get Query : "Foo Bar" Type of query : ComplexPhraseQuery If I change teststr to "Foo Bar" I get Query : content:foo content:bar Type of query : BooleanQuery In the first two cases I was expecting the search terms to be switched to lowercase. Were the Foo and Bar left as originally specified because the terms are inside double quotes? How can I specify a search term that I want treated as a Phrase, but also have the query parser apply the LowerCaseFilter? I am hoping to avoid the need to handle this using PhraseQuery, and continue to use the QueryParser. Thanks in advance for any help you can give me, David Shifflett - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Index-time boosting: Deprecated setBoost method
Hi,- Thanks. lets apply to this case: QueryParser parser = new QueryParser("field1", analyzer) ; parser.setPhraseSlop(2); Query query = parser.parse("some string value here"+"*"); TopDocs hits = indexsearcherObject.search(query, 10); Now i want to use BoostQuery QueryParser parser = new QueryParser("field1", analyzerObject) ; parser.setPhraseSlop(2); Query query = parser.parse("some string value here"+"*"); BoostQuery bq = new BoostQuery(query, "2.0f"); TopDocs hits = indexsearcherObject.search(bq, 10); Now how will i process field2 with boost value 1.0f? Before, this was being done at index time. i can see the only way here is the BooleanQuery which combines the first boostquery object bq and another one that i need to define for bq2 for field2. is there any other way? Best regards On 10/21/19 2:33 PM, Uwe Schindler wrote: Hi Boris, That is ok, and i can see this case would be best with BoostQuery and also i dont have to use lucene expression jar and its dependents. However, i am curious how to do this kind of field based boosting at index time even though i will prefer the query time boosting methodology. The reason why it was deprecated is exactly the problem I mentioned before: It did never do what the user expected. The boost factor given in the document's field was multiplied into the per document norms. Unfortunately, at the same time, he query normalization was using query statistics and normalized the scores. As Lucene is working per field, the same normalization is done per field, resulting in the constant factor per field to disappear. There was still some effect of index time boosting if different documents had different values, but it your case all is the same. I am not sure how your queries worked before, but the constant boost factors per field at index time did definitely not have the effect you were thinking of. Since the earliest version of Lucene, boosting at query time was the way to go to have different weights per field. The new feature in Lucene is now that you can change the score per document using docvalues and apply that per document at query time. Previously this was also possible with Document/Field#setBoost, but the flexibility was missing (only multiplying and limited precision). In addition the normalization effects made the whole thing not reliable. Uwe Best regards On 10/21/19 12:54 PM, Uwe Schindler wrote: Hi, As I said, before that is a misuse of index-time boosting. In addition in previous versions it did not even work correctly, because of query normalization it was normalized away anyways. And on top, to change it your have to reindex. What you intend to do is a typical use case for query time boosting with BoostQuery. That is explained in almost every book about search, like those about Solr or Elasticsearch. Most query parsers also allow to also add boost factors for fields, e.g. SimpleQueryParser (for humans that need simple syntax without fields). There you give a list of fields and boost factors. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://urldefense.proofpoint.com/v2/url?u=https- 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX- BKNeyLlULCbaezrgocEvPhQkl4&m=r7LRZQV82ywkycV4mBw1baHDKxar0wnm JtLLTiUC0wI&s=Zj32e0QqmZFvPbBlD8DPeh7KHYfOgQr89wvmaRvy_n8&e= eMail: u...@thetaphi.de -Original Message- From: baris.ka...@oracle.com Sent: Monday, October 21, 2019 6:45 PM To: java-user@lucene.apache.org Cc: baris.kazar Subject: Re: Index-time boosting: Deprecated setBoost method Hi,- Thanks and i appreciate the disccussion. Let me please ask this way, i think i give too much info at one time: Currently i have this: Field f1= new TextField("field1", "string1", Field.Store.YES); doc.add(f1); f1.setBoost(2.0f); Field f2 = new TextField("field2", "string2", Field.Store.YES); doc.add(f2); f2.setBoost(1.0f); But this fails with Lucene 7.7.2. Probably it is more efficient and more flexible to fix this by using BoostQuery. However, what could be the fix with index time boosting? the code in my previous post was trying to do that. Best regards On 10/21/19 12:34 PM, Uwe Schindler wrote: Hi, sorry I don't fully understand what you intend to do? If the boost values per field are static and used with exactly same value for every document, it's not needed a index time. You can just boost the field on the query side (e.g. using BoostQuery). Boosting every document with the same static values is an anti-pattern, that's something better suited for the query side - as you are more flexible. If you need a different boost value per document, you can save that boost value in the index per document using a docvalues field (this consumes extra space, of course). Then you need the ExpressionQuery on the query side. But just because it looks like Javascript, it's not slow. The syntax is compiled to bytecode and directly in
RE: Index-time boosting: Deprecated setBoost method
Hi Boris, > That is ok, and i can see this case would be best with BoostQuery and > also i dont have to use lucene expression jar and its dependents. > > However, i am curious how to do this kind of field based boosting at > index time even though i will prefer the query time boosting methodology. The reason why it was deprecated is exactly the problem I mentioned before: It did never do what the user expected. The boost factor given in the document's field was multiplied into the per document norms. Unfortunately, at the same time, he query normalization was using query statistics and normalized the scores. As Lucene is working per field, the same normalization is done per field, resulting in the constant factor per field to disappear. There was still some effect of index time boosting if different documents had different values, but it your case all is the same. I am not sure how your queries worked before, but the constant boost factors per field at index time did definitely not have the effect you were thinking of. Since the earliest version of Lucene, boosting at query time was the way to go to have different weights per field. The new feature in Lucene is now that you can change the score per document using docvalues and apply that per document at query time. Previously this was also possible with Document/Field#setBoost, but the flexibility was missing (only multiplying and limited precision). In addition the normalization effects made the whole thing not reliable. Uwe > Best regards > > > On 10/21/19 12:54 PM, Uwe Schindler wrote: > > Hi, > > > > As I said, before that is a misuse of index-time boosting. In addition in > previous versions it did not even work correctly, because of query > normalization it was normalized away anyways. And on top, to change it > your have to reindex. > > > > What you intend to do is a typical use case for query time boosting with > BoostQuery. That is explained in almost every book about search, like those > about Solr or Elasticsearch. > > > > Most query parsers also allow to also add boost factors for fields, e.g. > SimpleQueryParser (for humans that need simple syntax without fields). > There you give a list of fields and boost factors. > > > > Uwe > > > > - > > Uwe Schindler > > Achterdiek 19, D-28357 Bremen > > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr > MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX- > BKNeyLlULCbaezrgocEvPhQkl4&m=r7LRZQV82ywkycV4mBw1baHDKxar0wnm > JtLLTiUC0wI&s=Zj32e0QqmZFvPbBlD8DPeh7KHYfOgQr89wvmaRvy_n8&e= > > eMail: u...@thetaphi.de > > > >> -Original Message- > >> From: baris.ka...@oracle.com > >> Sent: Monday, October 21, 2019 6:45 PM > >> To: java-user@lucene.apache.org > >> Cc: baris.kazar > >> Subject: Re: Index-time boosting: Deprecated setBoost method > >> > >> Hi,- > >> > >> Thanks and i appreciate the disccussion. > >> > >> Let me please ask this way, i think i give too much info at one time: > >> > >> Currently i have this: > >> > >> Field f1= new TextField("field1", "string1", Field.Store.YES); > >> > >> doc.add(f1); f1.setBoost(2.0f); > >> > >> Field f2 = new TextField("field2", "string2", Field.Store.YES); > >> > >> doc.add(f2); > >> > >> f2.setBoost(1.0f); > >> > >> > >> But this fails with Lucene 7.7.2. > >> > >> > >> Probably it is more efficient and more flexible to fix this by using > >> BoostQuery. > >> > >> However, what could be the fix with index time boosting? the code in my > >> previous post was trying to do that. > >> > >> Best regards > >> > >> > >> On 10/21/19 12:34 PM, Uwe Schindler wrote: > >>> Hi, > >>> > >>> sorry I don't fully understand what you intend to do? If the boost values > >> per field are static and used with exactly same value for every document, > it's > >> not needed a index time. You can just boost the field on the query side > (e.g. > >> using BoostQuery). Boosting every document with the same static values > is > >> an anti-pattern, that's something better suited for the query side - as you > are > >> more flexible. > >>> If you need a different boost value per document, you can save that > boost > >> value in the index per document using a docvalues field (this consumes > extra > >> space, of course). Then you need the ExpressionQuery on the query side. > But > >> just because it looks like Javascript, it's not slow. The syntax is > >> compiled to > >> bytecode and directly included into the query execution as a dynamic java > >> class, so it's very fast. > >>> In short: > >>> - If you need to have a different boost factor per field name that's > constant > >> for all documents, apply it at query time with BoostQuery. > >>> - If you have to boost specific documents (e.g., top selling products), > index > >> a numeric docvalues field per document. On the query side you can use > >> different query types to modify the score of each result based on the > >> docvalues field. That can be done
ComplexPhraseQueryParser isn't switching search terms to lowercase with StandardAnalyzer
Hi all, Using the code snippet: ComplexPhraseQueryParser qp = new ComplexPhraseQueryParser(“somefield”, new StandardAnalyzer()); String teststr = "\"Foo Bar\"~2"; Query queryToSearch = qp.parse(teststr); System.out.println("Query : " + queryToSearch.toString()); System.out.println("Type of query : " + queryToSearch.getClass().getSimpleName()); I am getting the output Query : "Foo Bar"~2 Type of query : ComplexPhraseQuery If I change teststr to "\"Foo Bar\"" I get Query : "Foo Bar" Type of query : ComplexPhraseQuery If I change teststr to "Foo Bar" I get Query : content:foo content:bar Type of query : BooleanQuery In the first two cases I was expecting the search terms to be switched to lowercase. Were the Foo and Bar left as originally specified because the terms are inside double quotes? How can I specify a search term that I want treated as a Phrase, but also have the query parser apply the LowerCaseFilter? I am hoping to avoid the need to handle this using PhraseQuery, and continue to use the QueryParser. Thanks in advance for any help you can give me, David Shifflett
Re: Index-time boosting: Deprecated setBoost method
Hi,- That is ok, and i can see this case would be best with BoostQuery and also i dont have to use lucene expression jar and its dependents. However, i am curious how to do this kind of field based boosting at index time even though i will prefer the query time boosting methodology. Best regards On 10/21/19 12:54 PM, Uwe Schindler wrote: Hi, As I said, before that is a misuse of index-time boosting. In addition in previous versions it did not even work correctly, because of query normalization it was normalized away anyways. And on top, to change it your have to reindex. What you intend to do is a typical use case for query time boosting with BoostQuery. That is explained in almost every book about search, like those about Solr or Elasticsearch. Most query parsers also allow to also add boost factors for fields, e.g. SimpleQueryParser (for humans that need simple syntax without fields). There you give a list of fields and boost factors. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=r7LRZQV82ywkycV4mBw1baHDKxar0wnmJtLLTiUC0wI&s=Zj32e0QqmZFvPbBlD8DPeh7KHYfOgQr89wvmaRvy_n8&e= eMail: u...@thetaphi.de -Original Message- From: baris.ka...@oracle.com Sent: Monday, October 21, 2019 6:45 PM To: java-user@lucene.apache.org Cc: baris.kazar Subject: Re: Index-time boosting: Deprecated setBoost method Hi,- Thanks and i appreciate the disccussion. Let me please ask this way, i think i give too much info at one time: Currently i have this: Field f1= new TextField("field1", "string1", Field.Store.YES); doc.add(f1); f1.setBoost(2.0f); Field f2 = new TextField("field2", "string2", Field.Store.YES); doc.add(f2); f2.setBoost(1.0f); But this fails with Lucene 7.7.2. Probably it is more efficient and more flexible to fix this by using BoostQuery. However, what could be the fix with index time boosting? the code in my previous post was trying to do that. Best regards On 10/21/19 12:34 PM, Uwe Schindler wrote: Hi, sorry I don't fully understand what you intend to do? If the boost values per field are static and used with exactly same value for every document, it's not needed a index time. You can just boost the field on the query side (e.g. using BoostQuery). Boosting every document with the same static values is an anti-pattern, that's something better suited for the query side - as you are more flexible. If you need a different boost value per document, you can save that boost value in the index per document using a docvalues field (this consumes extra space, of course). Then you need the ExpressionQuery on the query side. But just because it looks like Javascript, it's not slow. The syntax is compiled to bytecode and directly included into the query execution as a dynamic java class, so it's very fast. In short: - If you need to have a different boost factor per field name that's constant for all documents, apply it at query time with BoostQuery. - If you have to boost specific documents (e.g., top selling products), index a numeric docvalues field per document. On the query side you can use different query types to modify the score of each result based on the docvalues field. That can be done with Expression modules (using compiled Javascript) or by another query in Lucene that operates on ValueSource (e.g., FunctionQuery). The first one is easier to use for complex formulas.4 Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://urldefense.proofpoint.com/v2/url?u=https- 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX- BKNeyLlULCbaezrgocEvPhQkl4&m=70RoM6loHhMGsp95phVzGQf8w5JxW7gX T5XnleMKrOs&s=td7cUfd22mXljSuvkUPXDunkIs_eO4GxdvHHxD2CTk0&e= eMail: u...@thetaphi.de -Original Message- From: baris.ka...@oracle.com Sent: Monday, October 21, 2019 5:17 PM To: java-user@lucene.apache.org Cc: baris.kazar Subject: Re: Index-time boosting: Deprecated setBoost method Hi,- Sorry about the missing parts in previous post. please accept my apologies for that. i needed to add a few more questions/corrections/additions to the previous post: Main Question was: if boost is a single constant value, do we need the Javascript part below? === Indexing code snippet for Lucene version 6.6.0 and before=== Document doc = new Document(); Field f1= new TextField("field1", "string1", Field.Store.YES); doc.add(f1); f1.setBoost(2.0f); Field f2 = new TextField("field2", "string2", Field.Store.YES); doc.add(f2); f2.setBoost(1.0f); === end of indexing code snippet for Lucene version 6.6.0 and before === This turns into this where _boost1 field is associated with field1 and _boost2 field is associated with field2 field: In Indexing code: === begining of indexing code snippet === Field f1= new TextField("field1",
RE: Index-time boosting: Deprecated setBoost method
Hi, As I said, before that is a misuse of index-time boosting. In addition in previous versions it did not even work correctly, because of query normalization it was normalized away anyways. And on top, to change it your have to reindex. What you intend to do is a typical use case for query time boosting with BoostQuery. That is explained in almost every book about search, like those about Solr or Elasticsearch. Most query parsers also allow to also add boost factors for fields, e.g. SimpleQueryParser (for humans that need simple syntax without fields). There you give a list of fields and boost factors. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: baris.ka...@oracle.com > Sent: Monday, October 21, 2019 6:45 PM > To: java-user@lucene.apache.org > Cc: baris.kazar > Subject: Re: Index-time boosting: Deprecated setBoost method > > Hi,- > > Thanks and i appreciate the disccussion. > > Let me please ask this way, i think i give too much info at one time: > > Currently i have this: > > Field f1= new TextField("field1", "string1", Field.Store.YES); > > doc.add(f1); f1.setBoost(2.0f); > > Field f2 = new TextField("field2", "string2", Field.Store.YES); > > doc.add(f2); > > f2.setBoost(1.0f); > > > But this fails with Lucene 7.7.2. > > > Probably it is more efficient and more flexible to fix this by using > BoostQuery. > > However, what could be the fix with index time boosting? the code in my > previous post was trying to do that. > > Best regards > > > On 10/21/19 12:34 PM, Uwe Schindler wrote: > > Hi, > > > > sorry I don't fully understand what you intend to do? If the boost values > per field are static and used with exactly same value for every document, it's > not needed a index time. You can just boost the field on the query side (e.g. > using BoostQuery). Boosting every document with the same static values is > an anti-pattern, that's something better suited for the query side - as you > are > more flexible. > > > > If you need a different boost value per document, you can save that boost > value in the index per document using a docvalues field (this consumes extra > space, of course). Then you need the ExpressionQuery on the query side. But > just because it looks like Javascript, it's not slow. The syntax is compiled > to > bytecode and directly included into the query execution as a dynamic java > class, so it's very fast. > > > > In short: > > - If you need to have a different boost factor per field name that's > > constant > for all documents, apply it at query time with BoostQuery. > > - If you have to boost specific documents (e.g., top selling products), > > index > a numeric docvalues field per document. On the query side you can use > different query types to modify the score of each result based on the > docvalues field. That can be done with Expression modules (using compiled > Javascript) or by another query in Lucene that operates on ValueSource (e.g., > FunctionQuery). The first one is easier to use for complex formulas.4 > > > > Uwe > > > > - > > Uwe Schindler > > Achterdiek 19, D-28357 Bremen > > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr > MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX- > BKNeyLlULCbaezrgocEvPhQkl4&m=70RoM6loHhMGsp95phVzGQf8w5JxW7gX > T5XnleMKrOs&s=td7cUfd22mXljSuvkUPXDunkIs_eO4GxdvHHxD2CTk0&e= > > eMail: u...@thetaphi.de > > > >> -Original Message- > >> From: baris.ka...@oracle.com > >> Sent: Monday, October 21, 2019 5:17 PM > >> To: java-user@lucene.apache.org > >> Cc: baris.kazar > >> Subject: Re: Index-time boosting: Deprecated setBoost method > >> > >> Hi,- > >> > >> Sorry about the missing parts in previous post. please accept my > >> apologies for that. > >> > >> i needed to add a few more questions/corrections/additions to the > >> previous post: > >> > >> Main Question was: if boost is a single constant value, do we need the > >> Javascript part below? > >> > >> > >> > >> === Indexing code snippet for Lucene version 6.6.0 and before=== > >> > >> Document doc = new Document(); > >> > >> > >> Field f1= new TextField("field1", "string1", Field.Store.YES); > >> > >> doc.add(f1); f1.setBoost(2.0f); > >> > >> Field f2 = new TextField("field2", "string2", Field.Store.YES); > >> > >> doc.add(f2); > >> > >> f2.setBoost(1.0f); > >> > >> === end of indexing code snippet for Lucene version 6.6.0 and before === > >> > >> > >> This turns into this where _boost1 field is associated with field1 and > >> > >> _boost2 field is associated with field2 field: > >> > >> > >> In Indexing code: > >> > >> === begining of indexing code snippet === > >> Field f1= new TextField("field1", "string1", Field.Store.YES); > >> > >> Field _boost1 = new NumericDocValuesField(“field1”, 2L); > >> doc.add(_boost1); > >> > >> // If this boost value needs to be stored, a separate sto
Re: Index-time boosting: Deprecated setBoost method
Hi,- Thanks and i appreciate the disccussion. Let me please ask this way, i think i give too much info at one time: Currently i have this: Field f1= new TextField("field1", "string1", Field.Store.YES); doc.add(f1); f1.setBoost(2.0f); Field f2 = new TextField("field2", "string2", Field.Store.YES); doc.add(f2); f2.setBoost(1.0f); But this fails with Lucene 7.7.2. Probably it is more efficient and more flexible to fix this by using BoostQuery. However, what could be the fix with index time boosting? the code in my previous post was trying to do that. Best regards On 10/21/19 12:34 PM, Uwe Schindler wrote: Hi, sorry I don't fully understand what you intend to do? If the boost values per field are static and used with exactly same value for every document, it's not needed a index time. You can just boost the field on the query side (e.g. using BoostQuery). Boosting every document with the same static values is an anti-pattern, that's something better suited for the query side - as you are more flexible. If you need a different boost value per document, you can save that boost value in the index per document using a docvalues field (this consumes extra space, of course). Then you need the ExpressionQuery on the query side. But just because it looks like Javascript, it's not slow. The syntax is compiled to bytecode and directly included into the query execution as a dynamic java class, so it's very fast. In short: - If you need to have a different boost factor per field name that's constant for all documents, apply it at query time with BoostQuery. - If you have to boost specific documents (e.g., top selling products), index a numeric docvalues field per document. On the query side you can use different query types to modify the score of each result based on the docvalues field. That can be done with Expression modules (using compiled Javascript) or by another query in Lucene that operates on ValueSource (e.g., FunctionQuery). The first one is easier to use for complex formulas.4 Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=70RoM6loHhMGsp95phVzGQf8w5JxW7gXT5XnleMKrOs&s=td7cUfd22mXljSuvkUPXDunkIs_eO4GxdvHHxD2CTk0&e= eMail: u...@thetaphi.de -Original Message- From: baris.ka...@oracle.com Sent: Monday, October 21, 2019 5:17 PM To: java-user@lucene.apache.org Cc: baris.kazar Subject: Re: Index-time boosting: Deprecated setBoost method Hi,- Sorry about the missing parts in previous post. please accept my apologies for that. i needed to add a few more questions/corrections/additions to the previous post: Main Question was: if boost is a single constant value, do we need the Javascript part below? === Indexing code snippet for Lucene version 6.6.0 and before=== Document doc = new Document(); Field f1= new TextField("field1", "string1", Field.Store.YES); doc.add(f1); f1.setBoost(2.0f); Field f2 = new TextField("field2", "string2", Field.Store.YES); doc.add(f2); f2.setBoost(1.0f); === end of indexing code snippet for Lucene version 6.6.0 and before === This turns into this where _boost1 field is associated with field1 and _boost2 field is associated with field2 field: In Indexing code: === begining of indexing code snippet === Field f1= new TextField("field1", "string1", Field.Store.YES); Field _boost1 = new NumericDocValuesField(“field1”, 2L); doc.add(_boost1); // If this boost value needs to be stored, a separate storedField instance needs to be added as well … ( i will post this soon) Field _boost2 = new NumericDocValuesField(“field2”, 1L); doc.add(_boost2); // If this boost value needs to be stored, a separate storedField instance needs to be added as well … ( i will post this soon) === end of indexing code snippet === Now, in the searching code (i.e., at query time) should i need the FunctionScoreQuery because in this case the boost is just a constant value but not a function? However, constant value can be argued to be a function with the same value all the time, too. == begining of query time code snippet === Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2"); // SimpleBindings just maps variables to SortField instances SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField("_boost1", SortField.Type.LONG)); // These have to LONG type i think since NumericDocValuesField accepts "long" type only, am i right? Can this be DOUBLE type? bindings.add(new SortField("_boost2", SortField.Type.LONG)); // same question here // create a query that matches based on body:contents but // scores using expr Query query = new FunctionScoreQuery( new TermQuery(new Term("field1", "term_to_look_for")), expr.getDoubleValuesSource(bindings)); searcher.search(query, 10); === end of code sn
RE: Index-time boosting: Deprecated setBoost method
Hi, sorry I don't fully understand what you intend to do? If the boost values per field are static and used with exactly same value for every document, it's not needed a index time. You can just boost the field on the query side (e.g. using BoostQuery). Boosting every document with the same static values is an anti-pattern, that's something better suited for the query side - as you are more flexible. If you need a different boost value per document, you can save that boost value in the index per document using a docvalues field (this consumes extra space, of course). Then you need the ExpressionQuery on the query side. But just because it looks like Javascript, it's not slow. The syntax is compiled to bytecode and directly included into the query execution as a dynamic java class, so it's very fast. In short: - If you need to have a different boost factor per field name that's constant for all documents, apply it at query time with BoostQuery. - If you have to boost specific documents (e.g., top selling products), index a numeric docvalues field per document. On the query side you can use different query types to modify the score of each result based on the docvalues field. That can be done with Expression modules (using compiled Javascript) or by another query in Lucene that operates on ValueSource (e.g., FunctionQuery). The first one is easier to use for complex formulas.4 Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: baris.ka...@oracle.com > Sent: Monday, October 21, 2019 5:17 PM > To: java-user@lucene.apache.org > Cc: baris.kazar > Subject: Re: Index-time boosting: Deprecated setBoost method > > Hi,- > > Sorry about the missing parts in previous post. please accept my > apologies for that. > > i needed to add a few more questions/corrections/additions to the > previous post: > > Main Question was: if boost is a single constant value, do we need the > Javascript part below? > > > > === Indexing code snippet for Lucene version 6.6.0 and before=== > > Document doc = new Document(); > > > Field f1= new TextField("field1", "string1", Field.Store.YES); > > doc.add(f1); f1.setBoost(2.0f); > > Field f2 = new TextField("field2", "string2", Field.Store.YES); > > doc.add(f2); > > f2.setBoost(1.0f); > > === end of indexing code snippet for Lucene version 6.6.0 and before === > > > This turns into this where _boost1 field is associated with field1 and > > _boost2 field is associated with field2 field: > > > In Indexing code: > > === begining of indexing code snippet === > Field f1= new TextField("field1", "string1", Field.Store.YES); > > Field _boost1 = new NumericDocValuesField(“field1”, 2L); > doc.add(_boost1); > > // If this boost value needs to be stored, a separate storedField > instance needs to be added as well > … ( i will post this soon) > > Field _boost2 = new NumericDocValuesField(“field2”, 1L); > doc.add(_boost2); > > // If this boost value needs to be stored, a separate storedField > instance needs to be added as well > … ( i will post this soon) > > === end of indexing code snippet === > > > Now, in the searching code (i.e., at query time) should i need the > FunctionScoreQuery because in this case > > the boost is just a constant value but not a function? However, constant > value can be argued to be a function with the same value all the time, too. > > > == begining of query time code snippet === > Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2"); > > // SimpleBindings just maps variables to SortField instances > > SimpleBindings bindings = new SimpleBindings(); > > bindings.add(new SortField("_boost1", SortField.Type.LONG)); // These > have to LONG type i think since NumericDocValuesField accepts "long" > type only, am i right? Can this be DOUBLE type? > > bindings.add(new SortField("_boost2", SortField.Type.LONG)); // same > question here > > // create a query that matches based on body:contents but > > // scores using expr > > Query query = new FunctionScoreQuery( > > new TermQuery(new Term("field1", "term_to_look_for")), > > expr.getDoubleValuesSource(bindings)); > > searcher.search(query, 10); > > === end of code snippet === > > > Best regards > > > On 10/21/19 11:05 AM, baris.ka...@oracle.com wrote: > > Hi,- > > > > i would like to ask the following to make it clearer (for me at least): > > > > Document doc = new Document(); > > > > Field f1= new TextField("field1", "string1", Field.Store.YES); > > > > doc.add(f1); f1.setBoost(2.0f); > > > > Field f2 = new TextField("field2", "string2", Field.Store.YES); > > > > doc.add(f2); > > > > f2.setBoost(1.0f); > > > > > > This turns into this where _boost1 field is associated with field1 and > > > > _boost2 field is associated with field2 field: > > > > > > In Indexing code: > > > > Field f1= new TextField("field1", "string1", Field
Re: Index-time boosting: Deprecated setBoost method
Hi,- Sorry about the missing parts in previous post. please accept my apologies for that. i needed to add a few more questions/corrections/additions to the previous post: Main Question was: if boost is a single constant value, do we need the Javascript part below? === Indexing code snippet for Lucene version 6.6.0 and before=== Document doc = new Document(); Field f1= new TextField("field1", "string1", Field.Store.YES); doc.add(f1); f1.setBoost(2.0f); Field f2 = new TextField("field2", "string2", Field.Store.YES); doc.add(f2); f2.setBoost(1.0f); === end of indexing code snippet for Lucene version 6.6.0 and before === This turns into this where _boost1 field is associated with field1 and _boost2 field is associated with field2 field: In Indexing code: === begining of indexing code snippet === Field f1= new TextField("field1", "string1", Field.Store.YES); Field _boost1 = new NumericDocValuesField(“field1”, 2L); doc.add(_boost1); // If this boost value needs to be stored, a separate storedField instance needs to be added as well … ( i will post this soon) Field _boost2 = new NumericDocValuesField(“field2”, 1L); doc.add(_boost2); // If this boost value needs to be stored, a separate storedField instance needs to be added as well … ( i will post this soon) === end of indexing code snippet === Now, in the searching code (i.e., at query time) should i need the FunctionScoreQuery because in this case the boost is just a constant value but not a function? However, constant value can be argued to be a function with the same value all the time, too. == begining of query time code snippet === Expression expr = JavascriptCompiler.compile(“_boost1 + _boost2"); // SimpleBindings just maps variables to SortField instances SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField("_boost1", SortField.Type.LONG)); // These have to LONG type i think since NumericDocValuesField accepts "long" type only, am i right? Can this be DOUBLE type? bindings.add(new SortField("_boost2", SortField.Type.LONG)); // same question here // create a query that matches based on body:contents but // scores using expr Query query = new FunctionScoreQuery( new TermQuery(new Term("field1", "term_to_look_for")), expr.getDoubleValuesSource(bindings)); searcher.search(query, 10); === end of code snippet === Best regards On 10/21/19 11:05 AM, baris.ka...@oracle.com wrote: Hi,- i would like to ask the following to make it clearer (for me at least): Document doc = new Document(); Field f1= new TextField("field1", "string1", Field.Store.YES); doc.add(f1); f1.setBoost(2.0f); Field f2 = new TextField("field2", "string2", Field.Store.YES); doc.add(f2); f2.setBoost(1.0f); This turns into this where _boost1 field is associated with field1 and _boost2 field is associated with field2 field: In Indexing code: Field f1= new TextField("field1", "string1", Field.Store.YES); Field _boost1 = new NumericDocValuesField(“field1”, 2L); doc.add(_boost1); // If this boost value needs to be stored, a separate storedField instance needs to be added as well … ( i will post this soon) Field _boost2 = new NumericDocValuesField(“field2”, 1L); doc.add(_boost2); // If this boost value needs to be stored, a separate storedField instance needs to be added as well … ( i will post this soon) Now, in the searching code (i.e., at query time) should i need the FunctionScoreQuery because in this case the boost is just a constant value but not a function? However, constant value can be argued to be a function with the same value all the time, too. Expression expr = JavascriptCompiler.compile(“_boost"); // SimpleBindings just maps variables to SortField instances SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField("_boost1", SortField.Type.SCORE)); // create a query that matches based on body:contents but // scores using expr Query query = new FunctionScoreQuery( new TermQuery(new Term("field1", "term_to_look_for")), expr.getDoubleValuesSource(bindings)); searcher.search(query, 10); So, if boost is a single constant value, do we need the Javascript part above? Best regards On 10/18/19 4:07 PM, baris.ka...@oracle.com wrote: Uwe,- can this https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=MR2S9Z9HEge6s665mtGOFRHKGmuiVYkjp4tXOciYl7A&s=tMCjb5H5KivfJsp-BfABonpjelgp6hn9cBg2GScCmic&e= doc example that You also gave be extended with NumericDocValuesField part that needs to be done at indexing time boosting, too? i see now why You meant that this is mixed type of boosting (i.e., both indexing time and search time). I need then include this query mentioned in this example on these _scor
Re: Index-time boosting: Deprecated setBoost method
Hi,- i would like to ask the following to make it clearer (for me at least): Document doc = new Document(); Field f1= new TextField("field1", "string1", Field.Store.YES); doc.add(f1); f1.setBoost(2.0f); Field f2 = new TextField("field2", "string2", Field.Store.YES); doc.add(f2); f2.setBoost(1.0f); This turns into this where _boost1 field is associated with field1 and _boost2 field is associated with field2 field: In Indexing code: Field f1= new TextField("field1", "string1", Field.Store.YES); Field _boost1 = new NumericDocValuesField(“field1”, 2L); doc.add(_boost1); // If this boost value needs to be stored, a separate storedField instance needs to be added as well … ( i will post this soon) Field _boost2 = new NumericDocValuesField(“field2”, 1L); doc.add(_boost2); // If this boost value needs to be stored, a separate storedField instance needs to be added as well … ( i will post this soon) Now, in the searching code (i.e., at query time) should i need the FunctionScoreQuery because in this case the boost is just a constant value but not a function? However, constant value can be argued to be a function with the same value all the time, too. Expression expr = JavascriptCompiler.compile(“_boost"); // SimpleBindings just maps variables to SortField instances SimpleBindings bindings = new SimpleBindings(); bindings.add(new SortField("_boost1", SortField.Type.SCORE)); // create a query that matches based on body:contents but // scores using expr Query query = new FunctionScoreQuery( new TermQuery(new Term("field1", "term_to_look_for")), expr.getDoubleValuesSource(bindings)); searcher.search(query, 10); So, if boost is a single constant value, do we need the Javascript part above? Best regards On 10/18/19 4:07 PM, baris.ka...@oracle.com wrote: Uwe,- can this https://lucene.apache.org/core/7_7_2/expressions/org/apache/lucene/expressions/Expression.html doc example that You also gave be extended with NumericDocValuesField part that needs to be done at indexing time boosting, too? i see now why You meant that this is mixed type of boosting (i.e., both indexing time and search time). I need then include this query mentioned in this example on these _score field (i would call it _boost field in my case) into my overall BooleanQuery. i will now try to combine these together and post here for future help. Best regards On 10/18/19 3:18 PM, Uwe Schindler wrote: Hi, Read my original email! The index time values are written using NumericDocValuesField. The expressions docs also refer to that when the bindings are documented. It's separate from the indexed data (TextField). Think of it like an additional numeric field in your database table with a factor in each row. Uwe Am October 18, 2019 7:14:03 PM UTC schrieb baris.ka...@oracle.com: Uwe,- Two questions there: i guess this is applicable to TextField, too. And i was expecting a index writer object in the example for index time boosting. Best regards On 10/18/19 2:57 PM, Uwe Schindler wrote: Sorry I was imprecise. It's a mix of both. The factors are stored per document in index (this is why I called it index time). During query time the expression use the index time values to fold them into the query boost at query time. What's your problem with that approach? Uwe Am October 18, 2019 6:50:40 PM UTC schrieb baris.ka...@oracle.com: Uwe,- Thanks, if possible i am looking for a pure Java methodology to do the index time boosting. This example looks like a search time boosting example: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e= Best regards On 10/18/19 2:31 PM, Uwe Schindler wrote: Hi, Is there a working example for this? Is this mentioned in the Lucene Javadocs or any other docs so that i can look it? To index the docvalues, see NumericDocValuesField (it can be added to documents like indexed or stored fields). You may have used them for sorting already. this methodology seems sort of like discouraging using index time boosting. Not really. Many use this all the time. It's one of the killer features of both Solr and Elasticsearch. The problem was how the Document.setBoost()worked (it did not work correctly, see below). Previous setBoost method call was fine and easy to use. Did it have some performance issues and then is that why it was deprecated? No the reason for deprecating this was for several reasons: setBoost was not doing what the user had expected. Internally the boost value was just multiplied into the document norm factor (which is internally also a docvalues field). The norm factors are only very inprecise floats stored i
Re: Parameterized queries in Lucene
I am curious — what use case are you targeting to solve here? In relational world, this is useful primarily due to the fact that prepared statements eliminate the need for re planning the query, thus saving the cost of iterating over a potentially large combinatorial space. However, for Lucene, there isn’t so much of a concept of a query plan (yet). Some queries try to achieve that (IndexOrDocValuesQuery for eg), but it is a far cry from what relational databases expose. Atri On Mon, 21 Oct 2019 at 17:42, Stamatis Zampetakis wrote: > Hi al > In the world of relational databases and SQL, the existence of > parameterized queries (aka. PreparedStatement) offers many advantages in > terms of security and performance. > > I guess everybody is familiar with the idea that you prepare a statement > and then you execute it multiple times by just changing certain parameters. > A simple use case for demonstrating the idea > is shown below: > > Query q = ... // An arbitrary complex query with a part that has a single > parameter of type int > for (int i=0; i<100; i++) { > int paramValue = i; > q.visit(new ParameterSetter(paramValue)); > TopDocs docs = searcher.search(q, 10); > } > > Note that this is a very simplistic use case and does not correspond to the > reality where the construction and execution are not done side by side. > > I already implemented something to satisfy use-cases like the one shown > above by introducing a new subclass of Query. However, I was wondering if > there is already a mechanism to compile and execute queries with parameters > in Lucene and I am just reinventing the wheel. > > Feedback is much appreciated! > > Best, > Stamatis > -- Regards, Atri Apache Concerted
Parameterized queries in Lucene
Hi all, In the world of relational databases and SQL, the existence of parameterized queries (aka. PreparedStatement) offers many advantages in terms of security and performance. I guess everybody is familiar with the idea that you prepare a statement and then you execute it multiple times by just changing certain parameters. A simple use case for demonstrating the idea is shown below: Query q = ... // An arbitrary complex query with a part that has a single parameter of type int for (int i=0; i<100; i++) { int paramValue = i; q.visit(new ParameterSetter(paramValue)); TopDocs docs = searcher.search(q, 10); } Note that this is a very simplistic use case and does not correspond to the reality where the construction and execution are not done side by side. I already implemented something to satisfy use-cases like the one shown above by introducing a new subclass of Query. However, I was wondering if there is already a mechanism to compile and execute queries with parameters in Lucene and I am just reinventing the wheel. Feedback is much appreciated! Best, Stamatis