Boost Problem (again), need example !
Hi, I know that there are many topics about scoring issues, but I didn't find an answer in the topics. This is the problem : Imagine I'm a teacher, and I have to index all the results, comments and score about students. Student : String name (eg : John Smith) String comments : (eg: John is a good student, but he needs to be more self confident bla bla bla) float score (eg : 98) I have to index all the students and when I use the search class, I want to get first the best students. So, if John Smith is a better student than John Mickael, when I search "John" I want to have John Smith BEFORE John Mickeal. To do that, I'm using BooleanQuery to search in name and comment fields. First, I thought I could use the function Document.setBoost(float boost) while indexing student, with boost = Student.score. But the result was not what I was expected, it didn't work correctly. Then I thought I could use a FunctionQuery to search : FunctionQuery functionQuery = new FunctionQuery(new ReverseOrdFieldSource("score")); But the result was still incorrect. I don't know what I'm doing wrong. Could you help me to find a solution ? Thank you :) -- View this message in context: http://old.nabble.com/Boost-Problem-%28again%29%2C-need-example-%21-tp27684388p27684388.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Boost Problem (again), need example !
Can't you simply sort by descending score (your score, not lucene's)? Seems to me that would give you what you are asking for. The setBoost() method is unlikely to work consistently because it only infuences the score rather than setting it. If your John Mickeal doc happens to have a higher lucene score, because of the normal idf/tf/etc stuff, then the setBoost() with a higher value for John Smith may well not be enough to force John Smith to the top. I don't know enough about function queries to help you much there but FieldScoreQuery might work. I can't see any sign of class FunctionQuery in the 3.0.0 core package so am not clear what that is. -- Ian. On Mon, Feb 22, 2010 at 8:54 AM, pdaures wrote: > > Hi, > I know that there are many topics about scoring issues, but I didn't find an > answer in the topics. > This is the problem : > Imagine I'm a teacher, and I have to index all the results, comments and > score about students. > > Student : > String name (eg : John Smith) > String comments : (eg: John is a good student, but he needs to be more self > confident bla bla bla) > float score (eg : 98) > > I have to index all the students and when I use the search class, I want to > get first the best students. So, if John Smith is a better student than John > Mickael, when I search "John" I want to have John Smith BEFORE John Mickeal. > > To do that, I'm using BooleanQuery to search in name and comment fields. > > First, I thought I could use the function Document.setBoost(float boost) > while indexing student, with boost = Student.score. But the result was not > what I was expected, it didn't work correctly. > > Then I thought I could use a FunctionQuery to search : > FunctionQuery functionQuery = new FunctionQuery(new > ReverseOrdFieldSource("score")); > But the result was still incorrect. > > I don't know what I'm doing wrong. Could you help me to find a solution ? > Thank you :) > -- > View this message in context: > http://old.nabble.com/Boost-Problem-%28again%29%2C-need-example-%21-tp27684388p27684388.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Boost Problem (again), need example !
It's CustomScoreQuery in 2.9 and 3.0. Please wait for 2.9.2 and 3.0.1 for an important API change in this experimental query type to work correct with the new per-segment-search! You can test the release artifacts of both new versions here: http://people.apache.org/~uschindler/staging-area/lucene-292-301-take2-rev912433/ With e.g. ValueSourceQuery you can score your documents using a separate numeric field from your documents (it uses FieldCache). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Ian Lea [mailto:ian@gmail.com] > Sent: Monday, February 22, 2010 10:33 AM > To: java-user@lucene.apache.org > Subject: Re: Boost Problem (again), need example ! > > Can't you simply sort by descending score (your score, not lucene's)? > Seems to me that would give you what you are asking for. > > The setBoost() method is unlikely to work consistently because it only > infuences the score rather than setting it. If your John Mickeal doc > happens to have a higher lucene score, because of the normal > idf/tf/etc stuff, then the setBoost() with a higher value for John > Smith may well not be enough to force John Smith to the top. > > I don't know enough about function queries to help you much there but > FieldScoreQuery might work. I can't see any sign of class > FunctionQuery in the 3.0.0 core package so am not clear what that is. > > > -- > Ian. > > > > On Mon, Feb 22, 2010 at 8:54 AM, pdaures > wrote: > > > > Hi, > > I know that there are many topics about scoring issues, but I didn't > find an > > answer in the topics. > > This is the problem : > > Imagine I'm a teacher, and I have to index all the results, comments > and > > score about students. > > > > Student : > > String name (eg : John Smith) > > String comments : (eg: John is a good student, but he needs to be > more self > > confident bla bla bla) > > float score (eg : 98) > > > > I have to index all the students and when I use the search class, I > want to > > get first the best students. So, if John Smith is a better student > than John > > Mickael, when I search "John" I want to have John Smith BEFORE John > Mickeal. > > > > To do that, I'm using BooleanQuery to search in name and comment > fields. > > > > First, I thought I could use the function Document.setBoost(float > boost) > > while indexing student, with boost = Student.score. But the result > was not > > what I was expected, it didn't work correctly. > > > > Then I thought I could use a FunctionQuery to search : > > FunctionQuery functionQuery = new FunctionQuery(new > > ReverseOrdFieldSource("score")); > > But the result was still incorrect. > > > > I don't know what I'm doing wrong. Could you help me to find a > solution ? > > Thank you :) > > -- > > View this message in context: http://old.nabble.com/Boost-Problem- > %28again%29%2C-need-example-%21-tp27684388p27684388.html > > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
range of scores : queryNorm()
Hello , I have observed that even if we change boosting drastically, scores are being normalized at the end because of queryNorm value. Is there anything ( regarding to the queryNorm) that we can rely on ? like score will always be under 10 or some fixed value ? The main objective is to provide scores in a fixed range to the partner. So have you been experienced anything like this? Is it possible to do so ?. Have you been experienced any strange situation like for a particular query, result scores were really high compared to routine? if yes,I would like to know the factor that effected scores drastically, because it may help me to proceed or understand the cases. Thanks (NOTE : I am sorry, I have also posted in solr group, there were no replies and also I feel this place is even more apt.). - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Boost Problem (again), need example !
HI ! Thank you for your help. I think I don't use CustomScoreQuery correctly when I do a "search". BooleanQuery combinedQuery = new BooleanQuery(); combinedQuery.add(textQuery, Occur.MUST); combinedQuery.add(titleQuery, Occur.MUST); CustomScoreQuery customQuery = new CustomScoreQuery(combinedQuery,new FieldScoreQuery(BOOST_FIELD,Type.INT)); indexSearcher.search(..., customQuery, ). in order to index the BOOST_FIELD, I do that : Field boostField = new Field(BOOST_FIELD, Integer.toString(boost), Field.Store.YES, Field.Index.ANALYZED.NO); Is that correct ? Thank you Uwe Schindler wrote: > > It's CustomScoreQuery in 2.9 and 3.0. > > Please wait for 2.9.2 and 3.0.1 for an important API change in this > experimental query type to work correct with the new per-segment-search! > You can test the release artifacts of both new versions here: > http://people.apache.org/~uschindler/staging-area/lucene-292-301-take2-rev912433/ > > With e.g. ValueSourceQuery you can score your documents using a separate > numeric field from your documents (it uses FieldCache). > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > >> -Original Message- >> From: Ian Lea [mailto:ian@gmail.com] >> Sent: Monday, February 22, 2010 10:33 AM >> To: java-user@lucene.apache.org >> Subject: Re: Boost Problem (again), need example ! >> >> Can't you simply sort by descending score (your score, not lucene's)? >> Seems to me that would give you what you are asking for. >> >> The setBoost() method is unlikely to work consistently because it only >> infuences the score rather than setting it. If your John Mickeal doc >> happens to have a higher lucene score, because of the normal >> idf/tf/etc stuff, then the setBoost() with a higher value for John >> Smith may well not be enough to force John Smith to the top. >> >> I don't know enough about function queries to help you much there but >> FieldScoreQuery might work. I can't see any sign of class >> FunctionQuery in the 3.0.0 core package so am not clear what that is. >> >> >> -- >> Ian. >> >> >> >> On Mon, Feb 22, 2010 at 8:54 AM, pdaures >> wrote: >> > >> > Hi, >> > I know that there are many topics about scoring issues, but I didn't >> find an >> > answer in the topics. >> > This is the problem : >> > Imagine I'm a teacher, and I have to index all the results, comments >> and >> > score about students. >> > >> > Student : >> > String name (eg : John Smith) >> > String comments : (eg: John is a good student, but he needs to be >> more self >> > confident bla bla bla) >> > float score (eg : 98) >> > >> > I have to index all the students and when I use the search class, I >> want to >> > get first the best students. So, if John Smith is a better student >> than John >> > Mickael, when I search "John" I want to have John Smith BEFORE John >> Mickeal. >> > >> > To do that, I'm using BooleanQuery to search in name and comment >> fields. >> > >> > First, I thought I could use the function Document.setBoost(float >> boost) >> > while indexing student, with boost = Student.score. But the result >> was not >> > what I was expected, it didn't work correctly. >> > >> > Then I thought I could use a FunctionQuery to search : >> > FunctionQuery functionQuery = new FunctionQuery(new >> > ReverseOrdFieldSource("score")); >> > But the result was still incorrect. >> > >> > I don't know what I'm doing wrong. Could you help me to find a >> solution ? >> > Thank you :) >> > -- >> > View this message in context: http://old.nabble.com/Boost-Problem- >> %28again%29%2C-need-example-%21-tp27684388p27684388.html >> > Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> > >> > >> > - >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://old.nabble.com/Boost-Problem-%28again%29%2C-need-example-%21-tp27684388p27685594.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Boost Problem (again), need example !
boostField needs to be indexed to be used in the FieldScoreQuery. Are you now using one of the the latest releases that Uwe mentioned, with fixes for CustomScoreQuery? And unless you provide your own implementation of CustomScoreQuery.customScore() I think that you are still not guaranteed to get what you want since the default implementation is to calculate the score as subQueryScore * valSrcScore. -- Ian. On Mon, Feb 22, 2010 at 11:00 AM, pdaures wrote: > > HI ! > Thank you for your help. > I think I don't use CustomScoreQuery correctly when I do a "search". > > BooleanQuery combinedQuery = new BooleanQuery(); > combinedQuery.add(textQuery, Occur.MUST); > combinedQuery.add(titleQuery, Occur.MUST); > > CustomScoreQuery customQuery = new CustomScoreQuery(combinedQuery,new > FieldScoreQuery(BOOST_FIELD,Type.INT)); > > indexSearcher.search(..., customQuery, ). > > in order to index the BOOST_FIELD, I do that : > Field boostField = new Field(BOOST_FIELD, Integer.toString(boost), > Field.Store.YES, Field.Index.ANALYZED.NO); > > > Is that correct ? > Thank you > > > > > Uwe Schindler wrote: >> >> It's CustomScoreQuery in 2.9 and 3.0. >> >> Please wait for 2.9.2 and 3.0.1 for an important API change in this >> experimental query type to work correct with the new per-segment-search! >> You can test the release artifacts of both new versions here: >> http://people.apache.org/~uschindler/staging-area/lucene-292-301-take2-rev912433/ >> >> With e.g. ValueSourceQuery you can score your documents using a separate >> numeric field from your documents (it uses FieldCache). >> >> Uwe >> >> - >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >>> -Original Message- >>> From: Ian Lea [mailto:ian@gmail.com] >>> Sent: Monday, February 22, 2010 10:33 AM >>> To: java-user@lucene.apache.org >>> Subject: Re: Boost Problem (again), need example ! >>> >>> Can't you simply sort by descending score (your score, not lucene's)? >>> Seems to me that would give you what you are asking for. >>> >>> The setBoost() method is unlikely to work consistently because it only >>> infuences the score rather than setting it. If your John Mickeal doc >>> happens to have a higher lucene score, because of the normal >>> idf/tf/etc stuff, then the setBoost() with a higher value for John >>> Smith may well not be enough to force John Smith to the top. >>> >>> I don't know enough about function queries to help you much there but >>> FieldScoreQuery might work. I can't see any sign of class >>> FunctionQuery in the 3.0.0 core package so am not clear what that is. >>> >>> >>> -- >>> Ian. >>> >>> >>> >>> On Mon, Feb 22, 2010 at 8:54 AM, pdaures >>> wrote: >>> > >>> > Hi, >>> > I know that there are many topics about scoring issues, but I didn't >>> find an >>> > answer in the topics. >>> > This is the problem : >>> > Imagine I'm a teacher, and I have to index all the results, comments >>> and >>> > score about students. >>> > >>> > Student : >>> > String name (eg : John Smith) >>> > String comments : (eg: John is a good student, but he needs to be >>> more self >>> > confident bla bla bla) >>> > float score (eg : 98) >>> > >>> > I have to index all the students and when I use the search class, I >>> want to >>> > get first the best students. So, if John Smith is a better student >>> than John >>> > Mickael, when I search "John" I want to have John Smith BEFORE John >>> Mickeal. >>> > >>> > To do that, I'm using BooleanQuery to search in name and comment >>> fields. >>> > >>> > First, I thought I could use the function Document.setBoost(float >>> boost) >>> > while indexing student, with boost = Student.score. But the result >>> was not >>> > what I was expected, it didn't work correctly. >>> > >>> > Then I thought I could use a FunctionQuery to search : >>> > FunctionQuery functionQuery = new FunctionQuery(new >>> > ReverseOrdFieldSource("score")); >>> > But the result was still incorrect. >>> > >>> > I don't know what I'm doing wrong. Could you help me to find a >>> solution ? >>> > Thank you :) >>> > -- >>> > View this message in context: http://old.nabble.com/Boost-Problem- >>> %28again%29%2C-need-example-%21-tp27684388p27684388.html >>> > Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>> > >>> > >>> > - >>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> > For additional commands, e-mail: java-user-h...@lucene.apache.org >>> > >>> > >>> >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apa
RE: Boost Problem (again), need example !
The simple fix for that is to wrap the subQuery using: new ConstantScoreQuery(new QueryWrapperFilter(query)) - after that its score is constant and the ValueSource only scores. I recommend to use NumericField for indexing this boost (no storing needed, only indexing, precisionStep=Integer.MAX_VALUE). Else (if using standard Field) the boost field does not need to be "stored", it must be indexed as NOT_ANALYZED. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Ian Lea [mailto:ian@gmail.com] > Sent: Monday, February 22, 2010 12:26 PM > To: java-user@lucene.apache.org > Subject: Re: Boost Problem (again), need example ! > > boostField needs to be indexed to be used in the FieldScoreQuery. > > Are you now using one of the the latest releases that Uwe mentioned, > with fixes for CustomScoreQuery? > > And unless you provide your own implementation of > CustomScoreQuery.customScore() I think that you are still not > guaranteed to get what you want since the default implementation is to > calculate the score as subQueryScore * valSrcScore. > > > -- > Ian. > > > On Mon, Feb 22, 2010 at 11:00 AM, pdaures > wrote: > > > > HI ! > > Thank you for your help. > > I think I don't use CustomScoreQuery correctly when I do a "search". > > > > BooleanQuery combinedQuery = new BooleanQuery(); > > combinedQuery.add(textQuery, Occur.MUST); > > combinedQuery.add(titleQuery, Occur.MUST); > > > > CustomScoreQuery customQuery = new CustomScoreQuery(combinedQuery,new > > FieldScoreQuery(BOOST_FIELD,Type.INT)); > > > > indexSearcher.search(..., customQuery, ). > > > > in order to index the BOOST_FIELD, I do that : > > Field boostField = new Field(BOOST_FIELD, Integer.toString(boost), > > Field.Store.YES, Field.Index.ANALYZED.NO); > > > > > > Is that correct ? > > Thank you > > > > > > > > > > Uwe Schindler wrote: > >> > >> It's CustomScoreQuery in 2.9 and 3.0. > >> > >> Please wait for 2.9.2 and 3.0.1 for an important API change in this > >> experimental query type to work correct with the new per-segment- > search! > >> You can test the release artifacts of both new versions here: > >> http://people.apache.org/~uschindler/staging-area/lucene-292-301- > take2-rev912433/ > >> > >> With e.g. ValueSourceQuery you can score your documents using a > separate > >> numeric field from your documents (it uses FieldCache). > >> > >> Uwe > >> > >> - > >> Uwe Schindler > >> H.-H.-Meier-Allee 63, D-28213 Bremen > >> http://www.thetaphi.de > >> eMail: u...@thetaphi.de > >> > >>> -Original Message- > >>> From: Ian Lea [mailto:ian@gmail.com] > >>> Sent: Monday, February 22, 2010 10:33 AM > >>> To: java-user@lucene.apache.org > >>> Subject: Re: Boost Problem (again), need example ! > >>> > >>> Can't you simply sort by descending score (your score, not > lucene's)? > >>> Seems to me that would give you what you are asking for. > >>> > >>> The setBoost() method is unlikely to work consistently because it > only > >>> infuences the score rather than setting it. If your John Mickeal > doc > >>> happens to have a higher lucene score, because of the normal > >>> idf/tf/etc stuff, then the setBoost() with a higher value for John > >>> Smith may well not be enough to force John Smith to the top. > >>> > >>> I don't know enough about function queries to help you much there > but > >>> FieldScoreQuery might work. I can't see any sign of class > >>> FunctionQuery in the 3.0.0 core package so am not clear what that > is. > >>> > >>> > >>> -- > >>> Ian. > >>> > >>> > >>> > >>> On Mon, Feb 22, 2010 at 8:54 AM, pdaures > >>> wrote: > >>> > > >>> > Hi, > >>> > I know that there are many topics about scoring issues, but I > didn't > >>> find an > >>> > answer in the topics. > >>> > This is the problem : > >>> > Imagine I'm a teacher, and I have to index all the results, > comments > >>> and > >>> > score about students. > >>> > > >>> > Student : > >>> > String name (eg : John Smith) > >>> > String comments : (eg: John is a good student, but he needs to be > >>> more self > >>> > confident bla bla bla) > >>> > float score (eg : 98) > >>> > > >>> > I have to index all the students and when I use the search class, > I > >>> want to > >>> > get first the best students. So, if John Smith is a better > student > >>> than John > >>> > Mickael, when I search "John" I want to have John Smith BEFORE > John > >>> Mickeal. > >>> > > >>> > To do that, I'm using BooleanQuery to search in name and comment > >>> fields. > >>> > > >>> > First, I thought I could use the function Document.setBoost(float > >>> boost) > >>> > while indexing student, with boost = Student.score. But the > result > >>> was not > >>> > what I was expected, it didn't work correctly. > >>> > > >>> > Then I thought I could use a FunctionQuery to search : > >>> > FunctionQuery functionQuery = new FunctionQuery(new > >>> > ReverseOrdFieldSource("score")); > >>> > But the result w
RE: Boost Problem (again), need example !
It WORKS ! Thank you so much, I spent a lot of time trying to do that, thank you again ! Uwe Schindler wrote: > > The simple fix for that is to wrap the subQuery using: new > ConstantScoreQuery(new QueryWrapperFilter(query)) - after that its score > is constant and the ValueSource only scores. > > I recommend to use NumericField for indexing this boost (no storing > needed, only indexing, precisionStep=Integer.MAX_VALUE). Else (if using > standard Field) the boost field does not need to be "stored", it must be > indexed as NOT_ANALYZED. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Ian Lea [mailto:ian@gmail.com] >> Sent: Monday, February 22, 2010 12:26 PM >> To: java-user@lucene.apache.org >> Subject: Re: Boost Problem (again), need example ! >> >> boostField needs to be indexed to be used in the FieldScoreQuery. >> >> Are you now using one of the the latest releases that Uwe mentioned, >> with fixes for CustomScoreQuery? >> >> And unless you provide your own implementation of >> CustomScoreQuery.customScore() I think that you are still not >> guaranteed to get what you want since the default implementation is to >> calculate the score as subQueryScore * valSrcScore. >> >> >> -- >> Ian. >> >> >> On Mon, Feb 22, 2010 at 11:00 AM, pdaures >> wrote: >> > >> > HI ! >> > Thank you for your help. >> > I think I don't use CustomScoreQuery correctly when I do a "search". >> > >> > BooleanQuery combinedQuery = new BooleanQuery(); >> > combinedQuery.add(textQuery, Occur.MUST); >> > combinedQuery.add(titleQuery, Occur.MUST); >> > >> > CustomScoreQuery customQuery = new CustomScoreQuery(combinedQuery,new >> > FieldScoreQuery(BOOST_FIELD,Type.INT)); >> > >> > indexSearcher.search(..., customQuery, ). >> > >> > in order to index the BOOST_FIELD, I do that : >> > Field boostField = new Field(BOOST_FIELD, Integer.toString(boost), >> > Field.Store.YES, Field.Index.ANALYZED.NO); >> > >> > >> > Is that correct ? >> > Thank you >> > >> > >> > >> > >> > Uwe Schindler wrote: >> >> >> >> It's CustomScoreQuery in 2.9 and 3.0. >> >> >> >> Please wait for 2.9.2 and 3.0.1 for an important API change in this >> >> experimental query type to work correct with the new per-segment- >> search! >> >> You can test the release artifacts of both new versions here: >> >> http://people.apache.org/~uschindler/staging-area/lucene-292-301- >> take2-rev912433/ >> >> >> >> With e.g. ValueSourceQuery you can score your documents using a >> separate >> >> numeric field from your documents (it uses FieldCache). >> >> >> >> Uwe >> >> >> >> - >> >> Uwe Schindler >> >> H.-H.-Meier-Allee 63, D-28213 Bremen >> >> http://www.thetaphi.de >> >> eMail: u...@thetaphi.de >> >> >> >>> -Original Message- >> >>> From: Ian Lea [mailto:ian@gmail.com] >> >>> Sent: Monday, February 22, 2010 10:33 AM >> >>> To: java-user@lucene.apache.org >> >>> Subject: Re: Boost Problem (again), need example ! >> >>> >> >>> Can't you simply sort by descending score (your score, not >> lucene's)? >> >>> Seems to me that would give you what you are asking for. >> >>> >> >>> The setBoost() method is unlikely to work consistently because it >> only >> >>> infuences the score rather than setting it. If your John Mickeal >> doc >> >>> happens to have a higher lucene score, because of the normal >> >>> idf/tf/etc stuff, then the setBoost() with a higher value for John >> >>> Smith may well not be enough to force John Smith to the top. >> >>> >> >>> I don't know enough about function queries to help you much there >> but >> >>> FieldScoreQuery might work. I can't see any sign of class >> >>> FunctionQuery in the 3.0.0 core package so am not clear what that >> is. >> >>> >> >>> >> >>> -- >> >>> Ian. >> >>> >> >>> >> >>> >> >>> On Mon, Feb 22, 2010 at 8:54 AM, pdaures >> >>> wrote: >> >>> > >> >>> > Hi, >> >>> > I know that there are many topics about scoring issues, but I >> didn't >> >>> find an >> >>> > answer in the topics. >> >>> > This is the problem : >> >>> > Imagine I'm a teacher, and I have to index all the results, >> comments >> >>> and >> >>> > score about students. >> >>> > >> >>> > Student : >> >>> > String name (eg : John Smith) >> >>> > String comments : (eg: John is a good student, but he needs to be >> >>> more self >> >>> > confident bla bla bla) >> >>> > float score (eg : 98) >> >>> > >> >>> > I have to index all the students and when I use the search class, >> I >> >>> want to >> >>> > get first the best students. So, if John Smith is a better >> student >> >>> than John >> >>> > Mickael, when I search "John" I want to have John Smith BEFORE >> John >> >>> Mickeal. >> >>> > >> >>> > To do that, I'm using BooleanQuery to search in name and comment >> >>> fields. >> >>> > >> >>> > First, I thought I could use the function Document.setBoost(float >> >>> boost) >> >>> > while indexing student, with boost = Student.score.
Re: PayloadNearSpanScorer explain method
Patch is in JIRA: LUCENE-2272 On Wed, Feb 17, 2010 at 8:40 PM, Peter Keegan wrote: > Yes, I will provide a patch. Our new proxy server has broken my access to > the svn repository, though :-( > > > On Tue, Feb 16, 2010 at 1:12 PM, Grant Ingersoll wrote: > >> That sounds reasonable. Patch? >> >> On Feb 15, 2010, at 10:29 AM, Peter Keegan wrote: >> >> > The 'explain' method in PayloadNearSpanScorer assumes the >> > AveragePayloadFunction was used. I don't see an easy way to override >> this >> > because 'payloadsSeen' and 'payloadScore' are private/protected. It >> seems >> > like the 'PayloadFunction' interface should have an 'explain' method >> that >> > the Scorer could call. Any thoughts? >> > >> > Peter >> >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >
Re: Boost Problem (again), need example !
I still don't understand why a simple sort as suggested by Ian wouldn't work. It'd be a lot more reliable than fiddling with doc scores if you want a strict ordering on a particular field (make sure it's untokenized though). Erick On Mon, Feb 22, 2010 at 8:19 AM, pdaures wrote: > > It WORKS ! > > Thank you so much, I spent a lot of time trying to do that, thank you again > ! > > > Uwe Schindler wrote: > > > > The simple fix for that is to wrap the subQuery using: new > > ConstantScoreQuery(new QueryWrapperFilter(query)) - after that its score > > is constant and the ValueSource only scores. > > > > I recommend to use NumericField for indexing this boost (no storing > > needed, only indexing, precisionStep=Integer.MAX_VALUE). Else (if using > > standard Field) the boost field does not need to be "stored", it must be > > indexed as NOT_ANALYZED. > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > >> -Original Message- > >> From: Ian Lea [mailto:ian@gmail.com] > >> Sent: Monday, February 22, 2010 12:26 PM > >> To: java-user@lucene.apache.org > >> Subject: Re: Boost Problem (again), need example ! > >> > >> boostField needs to be indexed to be used in the FieldScoreQuery. > >> > >> Are you now using one of the the latest releases that Uwe mentioned, > >> with fixes for CustomScoreQuery? > >> > >> And unless you provide your own implementation of > >> CustomScoreQuery.customScore() I think that you are still not > >> guaranteed to get what you want since the default implementation is to > >> calculate the score as subQueryScore * valSrcScore. > >> > >> > >> -- > >> Ian. > >> > >> > >> On Mon, Feb 22, 2010 at 11:00 AM, pdaures > >> wrote: > >> > > >> > HI ! > >> > Thank you for your help. > >> > I think I don't use CustomScoreQuery correctly when I do a "search". > >> > > >> > BooleanQuery combinedQuery = new BooleanQuery(); > >> > combinedQuery.add(textQuery, Occur.MUST); > >> > combinedQuery.add(titleQuery, Occur.MUST); > >> > > >> > CustomScoreQuery customQuery = new CustomScoreQuery(combinedQuery,new > >> > FieldScoreQuery(BOOST_FIELD,Type.INT)); > >> > > >> > indexSearcher.search(..., customQuery, ). > >> > > >> > in order to index the BOOST_FIELD, I do that : > >> > Field boostField = new Field(BOOST_FIELD, Integer.toString(boost), > >> > Field.Store.YES, Field.Index.ANALYZED.NO); > >> > > >> > > >> > Is that correct ? > >> > Thank you > >> > > >> > > >> > > >> > > >> > Uwe Schindler wrote: > >> >> > >> >> It's CustomScoreQuery in 2.9 and 3.0. > >> >> > >> >> Please wait for 2.9.2 and 3.0.1 for an important API change in this > >> >> experimental query type to work correct with the new per-segment- > >> search! > >> >> You can test the release artifacts of both new versions here: > >> >> http://people.apache.org/~uschindler/staging-area/lucene-292-301- > >> take2-rev912433/ > >> >> > >> >> With e.g. ValueSourceQuery you can score your documents using a > >> separate > >> >> numeric field from your documents (it uses FieldCache). > >> >> > >> >> Uwe > >> >> > >> >> - > >> >> Uwe Schindler > >> >> H.-H.-Meier-Allee 63, D-28213 Bremen > >> >> http://www.thetaphi.de > >> >> eMail: u...@thetaphi.de > >> >> > >> >>> -Original Message- > >> >>> From: Ian Lea [mailto:ian@gmail.com] > >> >>> Sent: Monday, February 22, 2010 10:33 AM > >> >>> To: java-user@lucene.apache.org > >> >>> Subject: Re: Boost Problem (again), need example ! > >> >>> > >> >>> Can't you simply sort by descending score (your score, not > >> lucene's)? > >> >>> Seems to me that would give you what you are asking for. > >> >>> > >> >>> The setBoost() method is unlikely to work consistently because it > >> only > >> >>> infuences the score rather than setting it. If your John Mickeal > >> doc > >> >>> happens to have a higher lucene score, because of the normal > >> >>> idf/tf/etc stuff, then the setBoost() with a higher value for John > >> >>> Smith may well not be enough to force John Smith to the top. > >> >>> > >> >>> I don't know enough about function queries to help you much there > >> but > >> >>> FieldScoreQuery might work. I can't see any sign of class > >> >>> FunctionQuery in the 3.0.0 core package so am not clear what that > >> is. > >> >>> > >> >>> > >> >>> -- > >> >>> Ian. > >> >>> > >> >>> > >> >>> > >> >>> On Mon, Feb 22, 2010 at 8:54 AM, pdaures > >> >>> wrote: > >> >>> > > >> >>> > Hi, > >> >>> > I know that there are many topics about scoring issues, but I > >> didn't > >> >>> find an > >> >>> > answer in the topics. > >> >>> > This is the problem : > >> >>> > Imagine I'm a teacher, and I have to index all the results, > >> comments > >> >>> and > >> >>> > score about students. > >> >>> > > >> >>> > Student : > >> >>> > String name (eg : John Smith) > >> >>> > String comments : (eg: John is a good student, but he needs to be > >> >>> more self > >> >>> > confident bla bla bla) > >> >>> > float score (eg :
Re: range of scores : queryNorm()
> I have observed that even if we change boosting > drastically, scores are being normalized at the end because of > queryNorm value. Is there anything ( regarding to the queryNorm) that > we can rely on ? Dunno. > like score will always be under 10 No. > or some fixed value ? I think not. > The main objective is to provide scores in a fixed range to > the partner. So have you been experienced anything like this? Is it > possible to do so ?. You could normalize the scores yourself, probably most easily in a pass through them once the search has completed. Beware of comparing scores across searches and indexes. > Have you been experienced any strange situation like for a > particular query, result scores were really high compared to routine? Not me, but I rarely look at scores directly. I just care that the right docs get found. -- Ian. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: range of scores : queryNorm()
Could you back up a step and tell us what the upper-level task you're trying to accomplish is? That is, why the partner wants the number? Because the raw score in Lucene is only relevant within that single query, and then only for ranking. The normalized score *is* in a fixed range already, between 0 and 1. Would it serve to just modify that and send it back to the partner? Erick On Mon, Feb 22, 2010 at 5:26 AM, Smith G wrote: > Hello , > I have observed that even if we change boosting > drastically, scores are being normalized at the end because of > queryNorm value. Is there anything ( regarding to the queryNorm) that > we can rely on ? like score will always be under 10 or some fixed > value ? The main objective is to provide scores in a fixed range to > the partner. So have you been experienced anything like this? Is it > possible to do so ?. > Have you been experienced any strange situation like for a > particular query, result scores were really high compared to routine? > if yes,I would like to know the factor that effected scores > drastically, because it may help me to proceed or understand the > cases. > Thanks > > (NOTE : I am sorry, I have also posted in solr group, there were no > replies and also I feel this place is even more apt.). > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Scanning docs at index time
I'd like to scan documents as they're being indexed, to find out immediately if any of them match certain queries. The goal is to find out of there are any new hits for these queries as soon as possible, without re-searching the index over and over (which would be inefficient, and higher latency). The documents still need to be indexed (not just scanned) so they can be searched later with different queries not known at index time. The indexing throughput is in the tens of millions per day, and there are maybe a thousand queries or so to be matched. So this has to work pretty fast. (-: Fortunately the number and size of fields are both fairly small. This scanning could of course be completely decoupled from the indexing process. But my thinking was that since we already have the documents in hand, and we'll be analyzing various fields in the course of indexing, we could ideally reuse those token streams somehow for this on-the-fly scanning process. I took a look at the org.apache.lucene.index.memory.MemoryIndex class in contrib. It looks like that would work, but I'm not sure if it's the most appropriate solution (for one thing, it would have to re-analyze all the fields). Has anyone here done something similar and/or know of other classes that would be suitable? Thanks, Chris
IndexWriter.getReader.getVersion behavior
Using Lucene 2.9.1, I have the following pseudocode which gets repeated at regular intervals: 1. FSDirectory dir = FSDirectory.open(java.io.File); 2. dir.setLockFactory(new SingleInstanceLockFactory()); 3. IndexWriter writer = new IndexWriter(dir, Analyzer, false, maxFieldLen) 4. writer.getReader().getVersion(); 5. writer.prepareCommit(); 6. writer.getReader().getVersion(); 7. writer.commit(); 8. writer.close(); I'm using the version number to keep external data in synch with the index. Usually, the version number from (6) is 1 greater than from (4) and the version from (4) equals the version from the previous (6). At least once a day, however, the version from (4) is 1 greater than from the previous (6). What would explain this sporadic behavior of version numbers? Thanks, Peter
Re: IndexWriter.getReader.getVersion behavior
Peter, Perhaps other concurrent operations? Jason On Tue, Feb 23, 2010 at 10:43 AM, Peter Keegan wrote: > Using Lucene 2.9.1, I have the following pseudocode which gets repeated at > regular intervals: > > 1. FSDirectory dir = FSDirectory.open(java.io.File); > 2. dir.setLockFactory(new SingleInstanceLockFactory()); > 3. IndexWriter writer = new IndexWriter(dir, Analyzer, false, maxFieldLen) > 4. writer.getReader().getVersion(); > 5. writer.prepareCommit(); > 6. writer.getReader().getVersion(); > 7. writer.commit(); > 8. writer.close(); > > I'm using the version number to keep external data in synch with the index. > Usually, the version number from (6) is 1 greater than from (4) and the > version from (4) equals the version from the previous (6). At least once a > day, however, the version from (4) is 1 greater than from the previous (6). > What would explain this sporadic behavior of version numbers? > > Thanks, > Peter > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexWriter.getReader.getVersion behavior
That's curious. It's only on prepareCommit (or, commit, if you didn't first prepare, since that will call prepareCommit internally) that this version should increase. Is there only 1 thread doing this? Oh, and, are you passing false for autoCommit? Mike On Mon, Feb 22, 2010 at 11:43 AM, Peter Keegan wrote: > Using Lucene 2.9.1, I have the following pseudocode which gets repeated at > regular intervals: > > 1. FSDirectory dir = FSDirectory.open(java.io.File); > 2. dir.setLockFactory(new SingleInstanceLockFactory()); > 3. IndexWriter writer = new IndexWriter(dir, Analyzer, false, maxFieldLen) > 4. writer.getReader().getVersion(); > 5. writer.prepareCommit(); > 6. writer.getReader().getVersion(); > 7. writer.commit(); > 8. writer.close(); > > I'm using the version number to keep external data in synch with the index. > Usually, the version number from (6) is 1 greater than from (4) and the > version from (4) equals the version from the previous (6). At least once a > day, however, the version from (4) is 1 greater than from the previous (6). > What would explain this sporadic behavior of version numbers? > > Thanks, > Peter > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexWriter.getReader.getVersion behavior
Only one writer thread and one writer process. I'm calling IndexWriter(Directory d, Analyzer a, boolean create, MaxFieldLength mfl), which sets autocommit=false. Peter On Mon, Feb 22, 2010 at 12:24 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > That's curious. > > It's only on prepareCommit (or, commit, if you didn't first prepare, > since that will call prepareCommit internally) that this version > should increase. > > Is there only 1 thread doing this? > > Oh, and, are you passing false for autoCommit? > > Mike > > On Mon, Feb 22, 2010 at 11:43 AM, Peter Keegan > wrote: > > Using Lucene 2.9.1, I have the following pseudocode which gets repeated > at > > regular intervals: > > > > 1. FSDirectory dir = FSDirectory.open(java.io.File); > > 2. dir.setLockFactory(new SingleInstanceLockFactory()); > > 3. IndexWriter writer = new IndexWriter(dir, Analyzer, false, > maxFieldLen) > > 4. writer.getReader().getVersion(); > > 5. writer.prepareCommit(); > > 6. writer.getReader().getVersion(); > > 7. writer.commit(); > > 8. writer.close(); > > > > I'm using the version number to keep external data in synch with the > index. > > Usually, the version number from (6) is 1 greater than from (4) and the > > version from (4) equals the version from the previous (6). At least once > a > > day, however, the version from (4) is 1 greater than from the previous > (6). > > What would explain this sporadic behavior of version numbers? > > > > Thanks, > > Peter > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: IndexWriter.getReader.getVersion behavior
Well I'm at a loss then. The version should only increment on commit. Can you make it all happen when infoStream is on, and post back? Mike On Mon, Feb 22, 2010 at 12:35 PM, Peter Keegan wrote: > Only one writer thread and one writer process. > I'm calling IndexWriter(Directory d, Analyzer a, boolean create, > MaxFieldLength mfl), which sets autocommit=false. > > Peter > > On Mon, Feb 22, 2010 at 12:24 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> That's curious. >> >> It's only on prepareCommit (or, commit, if you didn't first prepare, >> since that will call prepareCommit internally) that this version >> should increase. >> >> Is there only 1 thread doing this? >> >> Oh, and, are you passing false for autoCommit? >> >> Mike >> >> On Mon, Feb 22, 2010 at 11:43 AM, Peter Keegan >> wrote: >> > Using Lucene 2.9.1, I have the following pseudocode which gets repeated >> at >> > regular intervals: >> > >> > 1. FSDirectory dir = FSDirectory.open(java.io.File); >> > 2. dir.setLockFactory(new SingleInstanceLockFactory()); >> > 3. IndexWriter writer = new IndexWriter(dir, Analyzer, false, >> maxFieldLen) >> > 4. writer.getReader().getVersion(); >> > 5. writer.prepareCommit(); >> > 6. writer.getReader().getVersion(); >> > 7. writer.commit(); >> > 8. writer.close(); >> > >> > I'm using the version number to keep external data in synch with the >> index. >> > Usually, the version number from (6) is 1 greater than from (4) and the >> > version from (4) equals the version from the previous (6). At least once >> a >> > day, however, the version from (4) is 1 greater than from the previous >> (6). >> > What would explain this sporadic behavior of version numbers? >> > >> > Thanks, >> > Peter >> > >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: IndexWriter.getReader.getVersion behavior
I'm pretty sure there are flushes and segment merges going on, but as you said, that shouldn't affect the version increment. I'll see what I can do to get infoStream output. Thanks, Peter On Mon, Feb 22, 2010 at 2:30 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Well I'm at a loss then. The version should only increment on commit. > > Can you make it all happen when infoStream is on, and post back? > > Mike > > On Mon, Feb 22, 2010 at 12:35 PM, Peter Keegan > wrote: > > Only one writer thread and one writer process. > > I'm calling IndexWriter(Directory d, Analyzer a, boolean create, > > MaxFieldLength mfl), which sets autocommit=false. > > > > Peter > > > > On Mon, Feb 22, 2010 at 12:24 PM, Michael McCandless < > > luc...@mikemccandless.com> wrote: > > > >> That's curious. > >> > >> It's only on prepareCommit (or, commit, if you didn't first prepare, > >> since that will call prepareCommit internally) that this version > >> should increase. > >> > >> Is there only 1 thread doing this? > >> > >> Oh, and, are you passing false for autoCommit? > >> > >> Mike > >> > >> On Mon, Feb 22, 2010 at 11:43 AM, Peter Keegan > >> wrote: > >> > Using Lucene 2.9.1, I have the following pseudocode which gets > repeated > >> at > >> > regular intervals: > >> > > >> > 1. FSDirectory dir = FSDirectory.open(java.io.File); > >> > 2. dir.setLockFactory(new SingleInstanceLockFactory()); > >> > 3. IndexWriter writer = new IndexWriter(dir, Analyzer, false, > >> maxFieldLen) > >> > 4. writer.getReader().getVersion(); > >> > 5. writer.prepareCommit(); > >> > 6. writer.getReader().getVersion(); > >> > 7. writer.commit(); > >> > 8. writer.close(); > >> > > >> > I'm using the version number to keep external data in synch with the > >> index. > >> > Usually, the version number from (6) is 1 greater than from (4) and > the > >> > version from (4) equals the version from the previous (6). At least > once > >> a > >> > day, however, the version from (4) is 1 greater than from the previous > >> (6). > >> > What would explain this sporadic behavior of version numbers? > >> > > >> > Thanks, > >> > Peter > >> > > >> > >> - > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
can IndexWriter.addIndexes de-dupe documents?
When I call IndexWriter.addIndexes, is there anything I can do to make it filter out duplicates based a certain field (or group of fields)? If I know that the id field of the document is unique, can I make addIndexes know that if it finds a new document bat the same id, the new one is valid and the old one should be overwritten (or deleted and the new one added in its place)? I don't see anything like unique constraint in the Field class; I know Lucene is not a SQL database, but i just wanted to check to make sure I'm not missing anything. -- View this message in context: http://old.nabble.com/can-IndexWriter.addIndexes-de-dupe-documents--tp27694763p27694763.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: can IndexWriter.addIndexes de-dupe documents?
addIndexes doesn't make this possible. Maybe add the indexes but then make a 2nd pass to dedup? Mike On Mon, Feb 22, 2010 at 4:26 PM, jchang wrote: > > When I call IndexWriter.addIndexes, is there anything I can do to make it > filter out duplicates based a certain field (or group of fields)? If I > know that the id field of the document is unique, can I make addIndexes know > that if it finds a new document bat the same id, the new one is valid and > the old one should be overwritten (or deleted and the new one added in its > place)? > > I don't see anything like unique constraint in the Field class; I know > Lucene is not a SQL database, but i just wanted to check to make sure I'm > not missing anything. > > > -- > View this message in context: > http://old.nabble.com/can-IndexWriter.addIndexes-de-dupe-documents--tp27694763p27694763.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: can IndexWriter.addIndexes de-dupe documents?
What sorts of rules would govern which one should be kept? Say you were adding three indexes and there was a document in each that was identical. Which one should be kept? I suspect any rule would be wrong at least part of the time FWIW Erick On Mon, Feb 22, 2010 at 5:02 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > addIndexes doesn't make this possible. > > Maybe add the indexes but then make a 2nd pass to dedup? > > Mike > > On Mon, Feb 22, 2010 at 4:26 PM, jchang wrote: > > > > When I call IndexWriter.addIndexes, is there anything I can do to make it > > filter out duplicates based a certain field (or group of fields)? If I > > know that the id field of the document is unique, can I make addIndexes > know > > that if it finds a new document bat the same id, the new one is valid and > > the old one should be overwritten (or deleted and the new one added in > its > > place)? > > > > I don't see anything like unique constraint in the Field class; I know > > Lucene is not a SQL database, but i just wanted to check to make sure I'm > > not missing anything. > > > > > > -- > > View this message in context: > http://old.nabble.com/can-IndexWriter.addIndexes-de-dupe-documents--tp27694763p27694763.html > > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Scanning docs at index time
I don't know of classes which will be suitable but if they are ordered queries a simple code could easily be written. On Mon, Feb 22, 2010 at 9:59 PM, Nigel wrote: > I'd like to scan documents as they're being indexed, to find out > immediately > if any of them match certain queries. The goal is to find out of there are > any new hits for these queries as soon as possible, without re-searching > the > index over and over (which would be inefficient, and higher latency). The > documents still need to be indexed (not just scanned) so they can be > searched later with different queries not known at index time. > > The indexing throughput is in the tens of millions per day, and there are > maybe a thousand queries or so to be matched. So this has to work pretty > fast. (-: Fortunately the number and size of fields are both fairly > small. > > This scanning could of course be completely decoupled from the indexing > process. But my thinking was that since we already have the documents in > hand, and we'll be analyzing various fields in the course of indexing, we > could ideally reuse those token streams somehow for this on-the-fly > scanning > process. > > I took a look at the org.apache.lucene.index.memory.MemoryIndex class in > contrib. It looks like that would work, but I'm not sure if it's the most > appropriate solution (for one thing, it would have to re-analyze all the > fields). Has anyone here done something similar and/or know of other > classes that would be suitable? > > Thanks, > Chris >