Scoring issue
Hello , I have two document in my lucene index Document stored/uncompressed stored/uncompressed,indexed,tokenized stored/uncompressed> Document stored/uncompressed stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized stored/uncompressed,indexed,tokenized> and I am searching for +tagKey:hot +tagKey:dog which is exact match for 2nd document, but I am getting 1.0 score for first document and 0.7 for second one. I have custom similarity where lengthNorm is (1.0 / tokenCount) others are some consents why my first document is getting higher score? -- View this message in context: http://www.nabble.com/Scoring-issue-tp20707410p20707410.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: how to search for starts with multiple words in lucene
Hi, I think you can achieve your goal using StandardAnalyzer during indexing and for search, and use WildcardQuery for Query I think it will work!! naveen.a wrote: > > Hi, > > Below is a document in lucene > - > Field Value > - > ID:1 > 110_a:library and information > - > I need to search for starts with logic, below are the search cases for the > above document > > -- > Query Result > -- > 110_a:l* ID - 1 > 110_a:library* ID - 1 > 110_a:library * No Results > 110_a:library a*No Results > 110_a:"library a*" No Results > -- > here, if i apply single word for starts with search, it is found, > but if i add any space after the first word, it is not found > > so, how to apply the query to search for starts with multiple words > -- View this message in context: http://www.nabble.com/how-to-search-for-starts-with-multiple-words-in-lucene-tp20697741p20707534.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lunene 2.3-2.4 switch: Scoring change
Hello, I have project which I am trying to switch from lucene 2.3.2 to 2.4 I am getting some strange scores Before my code was: Hits hits= searcher.search(query); Float score = hits.score(1) and scores from hist was from 0-1; 1 was 100% match I change code to use hit collector TopDocCollector collector = new TopDocCollector(99); searcher.search(query, collector); ScoreDocs[] hits= collector.topDocs().scoreDocs; int docId = hits[1].doc; Document document = searcher.doc(docId); Float score = hits[1].score The scores from this class are from 2-12.5 for the same query. How to change my scores to old way? -- View this message in context: http://www.nabble.com/Lunene-2.3-2.4-switch%3A-Scoring-change-tp21739867p21739867.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
TopDocCollector vs Hits: TopDocCollector slowing....
Hello, I was using lucene 2.3.2 with hits and switch to lucene 2.4.0 and now I am using TopDocCollector. I have two queries which are running against the same index. One query is returning 80bytes information other one is returning 2000bytes With old Hits the query which was returning smaller data was faster which has bigger data was slower. After I change to TopDocCollector both big and small once returning same time. Searcher is exactly the same and queries are the same only difference is in one place I was using Hits in other TopDocCollector Who has any idea why, and how can I fix this? -- View this message in context: http://www.nabble.com/TopDocCollector-vs-Hits%3A-TopDocCollector-slowing-tp21822877p21822877.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Lunene 2.3-2.4 switch: Scoring change
AlexElba wrote: > > Hello, > I have project which I am trying to switch from lucene 2.3.2 to 2.4 I am > getting some strange scores > > Before my code was: > > Hits hits= searcher.search(query); > Float score = hits.score(1) > > and scores from hist was from 0-1; 1 was 100% match > > I change code to use hit collector > > TopDocCollector collector = new TopDocCollector(99); > searcher.search(query, collector); > ScoreDocs[] hits= collector.topDocs().scoreDocs; > int docId = hits[1].doc; > Document document = searcher.doc(docId); > Float score = hits[1].score > > The scores from this class are from 2-12.5 for the same query. > > How to change my scores to old way? > > > > > > I fix the problem. The prblem was there queue and pushing and poping. After some optimization of the TopDocCollector it got faster -- View this message in context: http://www.nabble.com/Lunene-2.3-2.4-switch%3A-Scoring-change-tp21739867p22092512.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: TopDocCollector vs Hits: TopDocCollector slowing....
Grant Ingersoll-6 wrote: > > I presume they are both now slower, right? Otherwise you wouldn't > mind the speedup on the bigger one. Hits did caching and prefetched > things, which has it's tradeoffs. Can you describe how you were > measuring the queries? How many results were you getting? > > > > -Grant > > On Feb 3, 2009, at 8:37 PM, AlexElba wrote: > >> >> Hello, >> >> I was using lucene 2.3.2 with hits and switch to lucene 2.4.0 and >> now I am >> using TopDocCollector. >> >> I have two queries which are running against the same index. >> One query is returning 80bytes information other one is returning >> 2000bytes >> >> With old Hits the query which was returning smaller data was faster >> which >> has bigger data was slower. >> After I change to TopDocCollector both big and small once returning >> same >> time. >> >> Searcher is exactly the same and queries are the same only >> difference is in >> one place I was using Hits in other TopDocCollector >> >> Who has any idea why, and how can I fix this? >> -- >> View this message in context: >> http://www.nabble.com/TopDocCollector-vs-Hits%3A-TopDocCollector-slowing-tp21822877p21822877.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > I fix the problem. The problem was there queue and pushing and poping they had. After some optimization of the TopDocCollector it got much faster -- View this message in context: http://www.nabble.com/TopDocCollector-vs-Hits%3A-TopDocCollector-slowing-tp21822877p22092548.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Lucene SnowBall unexpected behavior for some terms
Hello, I was working with lucene snowball 2.3.2 and I switch to 2.4.0. After switch I came by to some case where lucene doesn't do lemmatization correctly. So far I found only one case spa - spas. spas are not getting lemmatize at all... BTW I saw the same behavior on solr 1.3 Anybody have any idea why? -- View this message in context: http://www.nabble.com/Lucene-SnowBall-unexpected-behavior-for-some-terms-tp22991689p22991689.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Lucene SnowBall unexpected behavior for some terms
I look thru source code for snowball. I think this bug does exist in previous version as well I asked in there mailing list no response so far. This is there demo page it has the same issue http://snowball.tartarus.org/demo.php I was trying to find there pattern for words which will not get lemmatized. So far no success. -- View this message in context: http://www.nabble.com/Lucene-SnowBall-unexpected-behavior-for-some-terms-tp22991689p23088274.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Best way for paging with TopDocs class?
Why you don't extend to HitCollector and put all logic you need into it? Ivan Vasilev-2 wrote: > > Hi All, > > As Hits class was deprecated in current Lucene and is expected to be > excluded from Lucene 3.0 we decided to change our code so that to use > TopDocs class. > Our app provides paging and now we are uondering what is the bset way to > do it with th TopDocs. I can see only this possibility: > 1. User opens page 1 - we load by searcher.search(..., docNum, ... ) > method as many docs as for page 1; > 2. User opens page 2 - we load as many results as the amount for page 1 > and page 2 (note that docs for page 1 are loaded again); > ... > N. User opens page n - we load as many docs as the amount of all pages > from #1 to #N (note that page 1 docs were loaded N-1 times, page 2 docs > N-2 times etc). > > With Hits class this loading of documents of previous pages was avoided > - they were loaded once and when needed docs for the next page Hits just > loaded the next portion of docs without reloading the previous pages. > > So my question is: > Is there better way for paging with the class TopDocs than the one that > I describe here? > > Thanks in Advance, > Ivan > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Best-way-for-paging-with-TopDocs-class--tp23079735p23088509.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Appropriate analyzer
try to use RegexQuery Artyom Sokolov wrote: > > Hello. > > Currently I'm trying to find something like an analyzer to solve the > problem. > > Actually, what I need is next: search on a query string step-by-step, > trimming last char on each step. Small example: > > In index we've: abc, abcdef, xyz > When search on abcdefgh the most relevant result should be abcdef, while > searching on abcde the best one is abc. > > Thanks. > > Sincerely, > Artyom Sokolov > > -- View this message in context: http://www.nabble.com/Appropriate-analyzer-tp23164855p23166323.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Lucene Judge
Hello, I was looking to Judge interface with TrecJudge implementation and I am not clear how to use it. What data do I need to pass into constructor. Anybody have any experience with this class? Thanks, Alex -- View this message in context: http://www.nabble.com/Lucene-Judge-tp24209288p24209288.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RangeFilter
Hello, I am currently using lucene 2.4 and have document with 3 fields id name rank and have query and filter when I am trying to use rang filter on rank I am not getting any result back RangeFilter rangeFilter = new RangeFilter("rank", "3", "10", true, true); I have documents which are in this interval Any suggestion what am I doing wrong? Regards -- View this message in context: http://old.nabble.com/RangeFilter-tp27148785p27148785.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: RangeFilter
Thanks Steve. Mike for now I can not upgrade... -- View this message in context: http://old.nabble.com/RangeFilter-tp27148785p27151315.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: RangeFilter
Hello, I change filter to follow RangeFilter rangeFilter = new RangeFilter( "rank", NumberTools .longToString(rating), NumberTools .longToString(10), true, true); and change index to store rank the same way... But still not seeing :( any results AlexElba wrote: > > Hello, > > I am currently using lucene 2.4 and have document with 3 fields > > id > name > rank > > and have query and filter when I am trying to use rang filter on rank I am > not getting any result back > > RangeFilter rangeFilter = new RangeFilter("rank", "3", "10", true, true); > > I have documents which are in this interval > > > Any suggestion what am I doing wrong? > > Regards > > > > > -- View this message in context: http://old.nabble.com/RangeFilter-tp27148785p27155102.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: RangeFilter
Did you completely re-index? Yes I did Here is method which creates index public void write(List data, Directory directory, Analyzer analyzer) { IndexWriter indexWriter = new IndexWriter(directory, analyzer, MaxFieldLength.LIMITED); try { for (Object[] obj: data) { try { Document document = new Document(); Field field = new Field("id", obj[0] document.add(field); Field rank = new Field("rank", NumberTools .longToString(Long.valueOf(obj[3])), Store.NO, Index.ANALYZED_NO_NORMS); document.add(rank); indexWriter.addDocument(document); } catch (CorruptIndexException e) { } catch (IOException e) { } } } finally { try { indexWriter.commit(); } catch (CorruptIndexException e) { } catch (IOException e) { } } } Yeap I am using luke but this app is ram base index... Steven A Rowe wrote: > > Hi AlexElba, > > Did you completely re-index? > > If you did, then there is some other problem - can you share (more of) > your code? > > Do you know about Luke? It's an essential tool for Lucene index > debugging: > >http://www.getopt.org/luke/ > > Steve > > On 01/13/2010 at 8:34 PM, AlexElba wrote: >> >> Hello, >> >> I change filter to follow >> RangeFilter rangeFilter = new RangeFilter( >>"rank", NumberTools >> .longToString(rating), NumberTools >> .longToString(10), true, true); >> >> and change index to store rank the same way... But still not seeing :( >> any results >> >> >> AlexElba wrote: >> > >> > Hello, >> > >> > I am currently using lucene 2.4 and have document with 3 fields >> > >> > id >> > name >> > rank >> > >> > and have query and filter when I am trying to use rang filter on rank I >> > am not getting any result back >> > >> > RangeFilter rangeFilter = new RangeFilter("rank", "3", "10", true, >> > true); >> > >> > I have documents which are in this interval >> > >> > >> > Any suggestion what am I doing wrong? >> > >> > Regards >> > >> > >> > >> > >> > >> >> -- View this message in context: http://old.nabble.com/RangeFilter- >> tp27148785p27155102.html Sent from the Lucene - Java Users mailing list >> archive at Nabble.com. >> >> >> - To >> unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For >> additional commands, e-mail: java-user-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > -- View this message in context: http://old.nabble.com/RangeFilter-tp27148785p27166330.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Lucene search for OR
Hello I am trying to search for or(Oregon) even when it is not capitalized it is not returning any results. How to search for 'or' ? -- View this message in context: http://www.nabble.com/Lucene-search-for-OR-tp18990623p18990623.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]