RE: Query syntax on Keyword field question
Chad Small writes: > Here is my attempt at a KeywordAnalyzer - although is not working? Excuse the > length of the message, but wanted to give actual code. > > With this output: > > Analzying "HW-NCI_TOPICS" > org.apache.lucene.analysis.WhitespaceAnalyzer: > [HW-NCI_TOPICS] > org.apache.lucene.analysis.SimpleAnalyzer: > [hw] [nci] [topics] > org.apache.lucene.analysis.StopAnalyzer: > [hw] [nci] [topics] > org.apache.lucene.analysis.standard.StandardAnalyzer: > [hw] [nci] [topics] > healthecare.domain.lucenesearch.KeywordAnalyzer: > [HW-NCI_TOPICS] > > query.ToString = category:HW -"nci topics" +space > > junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is > Expected:+category:HW-NCI_TOPICS +space > Actual :category:HW -"nci topics" +space > Well query parser does not allow `-' within words currently. So before your analyzer is called, query parser reads one word HW, a `-' operator, one word NCI_TOPICS. The latter is analyzed as "nci topics" because it's not in field category anymore, I guess. I suggested to change this. See http://issues.apache.org/bugzilla/show_bug.cgi?id=27491 Either you escape the - using category:HW\-NCI_TOPICS in your query (untested. and I don't know where the escape character will be removed) or you apply my suggested change. Another option for using keywords with query parser might be adding a keyword syntax to the query parser. Something like category:key("HW-NCI_TOPICS") or category="HW-NCI_TOPICS". HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
Here is my attempt at a KeywordAnalyzer - although is not working? Excuse the length of the message, but wanted to give actual code. package domain.lucenesearch; import java.io.*; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.CharTokenizer; import org.apache.lucene.analysis.TokenStream; public class KeywordAnalyzer extends Analyzer { public TokenStream tokenStream(String s, Reader reader) { return new KeywordTokenizer(reader); } private class KeywordTokenizer extends CharTokenizer { public KeywordTokenizer(Reader in) { super(in); } /** * Collects all characters. */ protected boolean isTokenChar(char c) { return true; } } However, this test: fails public class KeywordAnalyzerTest extends TestCase { RAMDirectory directory; private IndexSearcher searcher; public void setUp() throws Exception { directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(), true); Document doc = new Document(); doc.add(Field.Keyword("category", "HW-NCI_TOPICS")); doc.add(Field.Text("description", "Illidium Space Modulator")); writer.addDocument(doc); writer.close(); searcher = new IndexSearcher(directory); } public void testPerFieldAnalyzer() throws Exception { analyze("HW-NCI_TOPICS"); PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new StandardAnalyzer()); analyzer.addAnalyzer("category", new KeywordAnalyzer()); //|#1 Query query = QueryParser.parse("category:HW-NCI_TOPICS AND SPACE", "description", analyzer); Hits hits = searcher.search(query); System.out.println("query.ToString = " + query.toString("description")); assertEquals("HW-NCI_TOPICS kept as-is", "category:HW-NCI_TOPICS +space", query.toString("description")); assertEquals("doc found!", 1, hits.length()); } private void analyze(String text) throws Exception { Analyzer[] analyzers = new Analyzer[]{ new WhitespaceAnalyzer(), new SimpleAnalyzer(), new StopAnalyzer(), new StandardAnalyzer(), new KeywordAnalyzer(), //new SnowballAnalyzer("English", StopAnalyzer.ENGLISH_STOP_WORDS) }; System.out.println("Analzying \"" + text + "\""); for (int i = 0; i < analyzers.length; i++) { Analyzer analyzer = analyzers[i]; System.out.println("\t" + analyzer.getClass().getName() + ":"); System.out.print("\t\t"); TokenStream stream = analyzer.tokenStream("category", new StringReader(text)); while (true) { Token token = stream.next(); if (token == null) break; System.out.print("[" + token.termText() + "] "); } System.out.println("\n"); } } } With this output: Analzying "HW-NCI_TOPICS" org.apache.lucene.analysis.WhitespaceAnalyzer: [HW-NCI_TOPICS] org.apache.lucene.analysis.SimpleAnalyzer: [hw] [nci] [topics] org.apache.lucene.analysis.StopAnalyzer: [hw] [nci] [topics] org.apache.lucene.analysis.standard.StandardAnalyzer: [hw] [nci] [topics] healthecare.domain.lucenesearch.KeywordAnalyzer: [HW-NCI_TOPICS] query.ToString = category:HW -"nci topics" +space junit.framework.ComparisonFailure: HW-NCI_TOPICS kept as-is Expected:+category:HW-NCI_TOPICS +space Actual :category:HW -"nci topics" +space See anything? thanks, chad. -Original Message- From: Chad Small Sent: Tue 3/23/2004 8:48 PM To: Lucene Users List Cc: Subject: RE: Query syntax on Keyword field question Thanks-you Erik and Incze. I now understand the issue and I'm trying to create a "KeywordAnalyzer" as suggested from you book excerpt, Erik: http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgNo=6727 However, not being all that familiar with the Analyzer framework, I'm not sure how to implement the "KeywordAnalyzer" even though it might be "trivial" :) Any hints, code, or messages to look at? <> Ok, here is the section from Lucene in Action. I'll leave the development of KeywordAnalyzer as an exercise for the reader (although its implementation is trivial, one of the simplest analyzers possible - only emit one token of the entire contents). I hope this helps. Erik >> thanks again, chad. -Original Message- From: Incze Lajos [mailto:[EMAIL PROTECTED] Sent: Tue 3/23/2004 8:08 PM To: Lucene User
RE: Query syntax on Keyword field question
Thanks-you Erik and Incze. I now understand the issue and I'm trying to create a "KeywordAnalyzer" as suggested from you book excerpt, Erik: http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]&msgNo=6727 However, not being all that familiar with the Analyzer framework, I'm not sure how to implement the "KeywordAnalyzer" even though it might be "trivial" :) Any hints, code, or messages to look at? <> Ok, here is the section from Lucene in Action. I'll leave the development of KeywordAnalyzer as an exercise for the reader (although its implementation is trivial, one of the simplest analyzers possible - only emit one token of the entire contents). I hope this helps. Erik >> thanks again, chad. -Original Message- From: Incze Lajos [mailto:[EMAIL PROTECTED] Sent: Tue 3/23/2004 8:08 PM To: Lucene Users List Cc: Subject: Re: Query syntax on Keyword field question On Tue, Mar 23, 2004 at 08:10:15PM -0500, Erik Hatcher wrote: > QueryParser and Field.Keyword fields are a strange mix. For some > background, check the archives as this has been covered pretty > extensively. > > A quick answer is yes you can use MFQP and QP with keyword fields, > however you need to be careful which analyzer you use. > PerFieldAnalyzerWrapper is a good solution - you'll just need to use an > analyzer for your keyword field which simply tokenizes the whole string > as one chunk. Perhaps such an analyzer should be made part of the > core? > > Erik I've implemented suche an analyzer but it's only partial solution if your keyword field contains spaces, as the QP would split the query, e.g.: NOTTOKNIZED:(term with spaces*) would give you no hit even with an not tokenized field "term with spaces and other useful things". The full solution would be to be able to tell the QP not to split at spaces, either by 'do not split till apos' syntax, or by the good ol' backslash: do\ not\ notice\ these\ spaces. incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query syntax on Keyword field question
On Tue, Mar 23, 2004 at 08:10:15PM -0500, Erik Hatcher wrote: > QueryParser and Field.Keyword fields are a strange mix. For some > background, check the archives as this has been covered pretty > extensively. > > A quick answer is yes you can use MFQP and QP with keyword fields, > however you need to be careful which analyzer you use. > PerFieldAnalyzerWrapper is a good solution - you'll just need to use an > analyzer for your keyword field which simply tokenizes the whole string > as one chunk. Perhaps such an analyzer should be made part of the > core? > > Erik I've implemented suche an analyzer but it's only partial solution if your keyword field contains spaces, as the QP would split the query, e.g.: NOTTOKNIZED:(term with spaces*) would give you no hit even with an not tokenized field "term with spaces and other useful things". The full solution would be to be able to tell the QP not to split at spaces, either by 'do not split till apos' syntax, or by the good ol' backslash: do\ not\ notice\ these\ spaces. incze - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query syntax on Keyword field question
QueryParser and Field.Keyword fields are a strange mix. For some background, check the archives as this has been covered pretty extensively. A quick answer is yes you can use MFQP and QP with keyword fields, however you need to be careful which analyzer you use. PerFieldAnalyzerWrapper is a good solution - you'll just need to use an analyzer for your keyword field which simply tokenizes the whole string as one chunk. Perhaps such an analyzer should be made part of the core? Erik On Mar 23, 2004, at 12:58 PM, Chad Small wrote: I have since learned that using the TermQuery instead of the MultiFieldQueryParser works for the keyword field in question below (HW-NCI_TOPICS). apiQuery = new BooleanQuery(); apiQuery.add(new TermQuery(new Term("category", "HW-NCI_TOPICS")), true, false); This finds a match. I found a message that talked about having to use the the Query API when searching Keyword fields in the index. Is this true? Is there not a way to get the MultiFieldQueryParser to find a match on this keyword? thanks, chad. -Original Message- From: Chad Small Sent: Tue 3/23/2004 10:57 AM To: [EMAIL PROTECTED] Cc: Subject: Query syntax on Keyword field question Hello, How can I format a query to get a hit? I'm using the StandardAnalyzer() at both index and search time. If I'm indexing a field like this: luceneDocument.add(Field.Keyword("category","HW-NCI_TOPICS")); I've tried the following with no success: // String searchArgs = "HW\\-NCI_TOPICS"; // String searchArgs = "HW\\-NCI_TOPICS".toLowerCase(); // String searchArgs = "+HW+NCI+TOPICS"; //this works with .Text field // String searchArgs = "+hw+nci+topics"; // String searchArgs = "hw nci topics"; thanks, chad. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Cover density ranking?
Boris Goldowsky wrote: How difficult would it be to implement something like Cover Density ranking for Lucene? Has anyone tried it? Cover density is described at http://citeseer.ist.psu.edu/558750.html , and is supposed to be particularly good for short queries of the type that you get in many web applications. I just glanced at the paper, so my analysis may be wrong, but I think one could implement cover density ranking in Lucene with spans (only in CVS, not in 1.3). I think spans correspond to covers in this paper. But you'd need to alter SpanScorer.java to implement the cover scoring described in that paper. And you'd probably need to use a custom Similarity implementation, which disables most other scoring (tf=1.0, idf=1.0, etc.), but exaggerates coordination. Finally, you'd need to construct span queries. Or something like that. If someone tries this, please tell us how it works. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Cover density ranking?
Since there have been a few discussions recently of overriding various aspects of Lucene's ranking formula, I got to wondering how difficult it might be to implement something more different from the base tf/idf ranking system that Lucene has built in. How difficult would it be to implement something like Cover Density ranking for Lucene? Has anyone tried it? Cover density is described at http://citeseer.ist.psu.edu/558750.html , and is supposed to be particularly good for short queries of the type that you get in many web applications. Boris - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search and Update one index with two processes simultaneously
Hello, Is it possible to have two separate process, one performing searches, and the other performing updates on the same index? I have a system in production that uses this design and occasionally the search program grinds to a halt. I first suspected that this was just a load issue, but there isn't that much load (peak times average 2-3 requests per second, with occasional bursts of 10-20 requests) and I can't replicate the problem. The logs show that when the slowdown occurs we are usually answering requests to search at first, but ongoing searches have stopped finishing (somewhere inside IndexSearcher.search()). There doesn't seem to be a single expensive query that might be bringing us to our kness either. So, I was wondering if maybe it is possible that this is a race condition caused by our update program, which is a separate program that updates the index while it is being searched. Some basic info: The search program uses a single IndexSearcher to perform all searches. Results are collected with a HitCollector which uses the same IndexSearcher to extract each document - there is a requirement that the documents be returned in a specific order, so we have an external structure to determine the order, once the ID (not the internal ID) has been extracted. A separate HitCollector is used for each search. This IndexSearcher in the search program is swapped for a new one when the update program has finished an update cycle and notifies the search program. The index is about 90k documents, average query returns less than 100 hits. Thanks for any information, or just for your opinion. Brad - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Similarity - position in Field[] effects scoring - how to change?
Joachim, ... > > you think its possible to order by e.g. date field without retrieving all > the values from the index?? Yes, the new sorting feature from CVS does that, see Doug's last note on the subject. (It might have been on lucene-dev, I didn't keep a copy). Have fun, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Query syntax on Keyword field question
I have since learned that using the TermQuery instead of the MultiFieldQueryParser works for the keyword field in question below (HW-NCI_TOPICS). apiQuery = new BooleanQuery(); apiQuery.add(new TermQuery(new Term("category", "HW-NCI_TOPICS")), true, false); This finds a match. I found a message that talked about having to use the the Query API when searching Keyword fields in the index. Is this true? Is there not a way to get the MultiFieldQueryParser to find a match on this keyword? thanks, chad. -Original Message- From: Chad Small Sent: Tue 3/23/2004 10:57 AM To: [EMAIL PROTECTED] Cc: Subject: Query syntax on Keyword field question Hello, How can I format a query to get a hit? I'm using the StandardAnalyzer() at both index and search time. If I'm indexing a field like this: luceneDocument.add(Field.Keyword("category","HW-NCI_TOPICS")); I've tried the following with no success: // String searchArgs = "HW\\-NCI_TOPICS"; // String searchArgs = "HW\\-NCI_TOPICS".toLowerCase(); // String searchArgs = "+HW+NCI+TOPICS"; //this works with .Text field // String searchArgs = "+hw+nci+topics"; // String searchArgs = "hw nci topics"; thanks, chad. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SpanXXQuery Usage
Terry, With regular queries (non-Span-queries) you cannot request that results of OR / AND / NOT operations are near to one another (i.e. (A or B) near (C or D)). The span queries solve that problem by allowing any span query to be used in a SpanNearQuery (and vice versa). There are other applications for this as well, but this is one of them. Hope that helps to get you started. Examples for the use can be found in the unit tests (TestBasics.java, I believe). Cheers, Jochen -Original Message- From: Terry Steichen [mailto:[EMAIL PROTECTED] Sent: Monday, March 22, 2004 3:37 AM To: Lucene Users List Subject: Re: SpanXXQuery Usage Otis, Can you give me/us a rough idea of what these are supposed to do? It's hard to extrapolate the terse unit test code into much of a general notion. I searched the archives with little success. Regards, Terry - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, March 22, 2004 2:46 AM Subject: Re: SpanXXQuery Usage > Only in unit tests, so far. > > Otis > > --- Terry Steichen <[EMAIL PROTECTED]> wrote: > > Is there any documentation (other than that in the source) on how to > > use the new SpanxxQuery features? Specifically: SpanNearQuery, > > SpanNotQuery, SpanFirstQuery and SpanOrQuery? > > > > Regards, > > > > Terry > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Query syntax on Keyword field question
Hello, How can I format a query to get a hit? I'm using the StandardAnalyzer() at both index and search time. If I'm indexing a field like this: luceneDocument.add(Field.Keyword("category","HW-NCI_TOPICS")); I've tried the following with no success: // String searchArgs = "HW\\-NCI_TOPICS"; // String searchArgs = "HW\\-NCI_TOPICS".toLowerCase(); // String searchArgs = "+HW+NCI+TOPICS"; //this works with .Text field // String searchArgs = "+hw+nci+topics"; // String searchArgs = "hw nci topics"; thanks, chad.
Re: Similarity - position in Field[] effects scoring - how to change?
> On Tuesday 23 March 2004 16:05, Joachim Schreiber wrote: > > Hallo, > > > > I run in following problem. Perhaps somebody can help me. > > > > I have a index with different ids in the same field > > something like > > > > > > 45678565 > > 87854546 > > > > Situation: I have different documents with the entry in the > > same index. > > > > > > document 1) > > > > 324235678565 > > 324dssd5678565 > > 45678324565 > > > > 8785454324326 > > > > > > document 2) > > > > 324235678565 > > > > 45678324565 > > 8785454324326 > > > > > > > > when I search for " s: " I receive both docs, but document 1 has > > a better scoring than document 2. > > Since the s field of document 2 is shorter, I'd expect document 2 to score > higher. As mentioned, lengthNorm() is responsible for this. > Something does not add up here. Are the documents in the same index? > > > The position of in doc 1 is Field[4] and in doc 2 it's > > Field[2], so this seems to effect scoring. > > Lucene's default scoring is independent of absolute term positions. > hm... > > How can I disable this behaviour, so doc 1 has the same scoring as doc 2??? > > Simply ignore the score. The easiest way is to use the low level scoring API > with your own HitCollector. Just make sure not to retrieve document field > values until you collected all your hits. you think its possible to order by e.g. date field without retrieving all the values from the index?? > > > Which method do I have to overwrite in DefaultSimilarity. > > Has anybody any idea, any help. > > In which order to you want the resulting documents presented? > The low level api gives them in index order when the query consists > of single search term, afaik. in index order is ok but not very flexibel Regards, yo > > Regards, > Ype > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Similarity - position in Field[] effects scoring - how to change?
Terry, > > I believe you'll have to replace the default Similarity class with one of > your own. Not sure exactly what the settings should be - maybe some other > list members can give you specifics. Otherwise, you'll probably have to > experiment with it. I tried the new sort feature from cvs and it works well ! But it's interesting, nobody knows exactly how scoring works (seems to me) ;-) thanks yo > > Regards, > > Terry > > - Original Message - > From: "Joachim Schreiber" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Tuesday, March 23, 2004 10:05 AM > Subject: Similarity - position in Field[] effects scoring - how to change? > > > > Hallo, > > > > I run in following problem. Perhaps somebody can help me. > > > > I have a index with different ids in the same field > > something like > > > > > > 45678565 > > 87854546 > > > > Situation: I have different documents with the entry in the > same > > index. > > > > > > document 1) > > > > 324235678565 > > 324dssd5678565 > > 45678324565 > > > > 8785454324326 > > > > > > document 2) > > > > 324235678565 > > > > 45678324565 > > 8785454324326 > > > > > > > > when I search for " s: " I receive both docs, but document 1 has > a > > better scoring than document 2. > > The position of in doc 1 is Field[4] and in doc 2 it's > Field[2], > > so this seems to effect scoring. > > > > How can I disable this behaviour, so doc 1 has the same scoring as doc > 2??? > > Which method do I have to overwrite in DefaultSimilarity. > > Has anybody any idea, any help. > > > > Thanks > > > > yo > > > > > > > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Similarity - position in Field[] effects scoring - how to change?
> > Why don't you use the method explain of IndexSearcher? > http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/IndexSear > cher.html > > This is the best way to find why your documents are different. I suspect the > lengthNorm method, which is used at indexation time. Yes but i think this is not a good choice because we have to receive all docs. this is not possible because i have hits with 300 000 and more yo > > Julien > > > > Hallo, > > > > I run in following problem. Perhaps somebody can help me. > > > > I have a index with different ids in the same field > > something like > > > > > > 45678565 > > 87854546 > > > > Situation: I have different documents with the entry in the > same > > index. > > > > > > document 1) > > > > 324235678565 > > 324dssd5678565 > > 45678324565 > > > > 8785454324326 > > > > > > document 2) > > > > 324235678565 > > > > 45678324565 > > 8785454324326 > > > > > > > > when I search for " s: " I receive both docs, but document 1 has > a > > better scoring than document 2. > > The position of in doc 1 is Field[4] and in doc 2 it's > Field[2], > > so this seems to effect scoring. > > > > How can I disable this behaviour, so doc 1 has the same scoring as doc > 2??? > > Which method do I have to overwrite in DefaultSimilarity. > > Has anybody any idea, any help. > > > > Thanks > > > > yo > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Similarity - position in Field[] effects scoring - how to change?
On Tuesday 23 March 2004 16:05, Joachim Schreiber wrote: > Hallo, > > I run in following problem. Perhaps somebody can help me. > > I have a index with different ids in the same field > something like > > > 45678565 > 87854546 > > Situation: I have different documents with the entry in the > same index. > > > document 1) > > 324235678565 > 324dssd5678565 > 45678324565 > > 8785454324326 > > > document 2) > > 324235678565 > > 45678324565 > 8785454324326 > > > > when I search for " s: " I receive both docs, but document 1 has > a better scoring than document 2. Since the s field of document 2 is shorter, I'd expect document 2 to score higher. As mentioned, lengthNorm() is responsible for this. Something does not add up here. Are the documents in the same index? > The position of in doc 1 is Field[4] and in doc 2 it's > Field[2], so this seems to effect scoring. Lucene's default scoring is independent of absolute term positions. > How can I disable this behaviour, so doc 1 has the same scoring as doc 2??? Simply ignore the score. The easiest way is to use the low level scoring API with your own HitCollector. Just make sure not to retrieve document field values until you collected all your hits. > Which method do I have to overwrite in DefaultSimilarity. > Has anybody any idea, any help. In which order to you want the resulting documents presented? The low level api gives them in index order when the query consists of single search term, afaik. Regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Similarity - position in Field[] effects scoring - how to change?
Thanks to Daniel the solutions is quite simple. Use the latest cvs src from the head and try the new sorting feature, it works very well ;-) This should be documented anywhere, perhaps in the wiki ! cool new feature! yo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Similarity - position in Field[] effects scoring - how to change?
Joachim, Why don't you use the method explain of IndexSearcher? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/IndexSear cher.html This is the best way to find why your documents are different. I suspect the lengthNorm method, which is used at indexation time. Julien - Original Message - From: "Joachim Schreiber" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, March 23, 2004 4:05 PM Subject: Similarity - position in Field[] effects scoring - how to change? > Hallo, > > I run in following problem. Perhaps somebody can help me. > > I have a index with different ids in the same field > something like > > > 45678565 > 87854546 > > Situation: I have different documents with the entry in the same > index. > > > document 1) > > 324235678565 > 324dssd5678565 > 45678324565 > > 8785454324326 > > > document 2) > > 324235678565 > > 45678324565 > 8785454324326 > > > > when I search for " s: " I receive both docs, but document 1 has a > better scoring than document 2. > The position of in doc 1 is Field[4] and in doc 2 it's Field[2], > so this seems to effect scoring. > > How can I disable this behaviour, so doc 1 has the same scoring as doc 2??? > Which method do I have to overwrite in DefaultSimilarity. > Has anybody any idea, any help. > > Thanks > > yo > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Similarity - position in Field[] effects scoring - how to change?
Joachim, I believe you'll have to replace the default Similarity class with one of your own. Not sure exactly what the settings should be - maybe some other list members can give you specifics. Otherwise, you'll probably have to experiment with it. Regards, Terry - Original Message - From: "Joachim Schreiber" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, March 23, 2004 10:05 AM Subject: Similarity - position in Field[] effects scoring - how to change? > Hallo, > > I run in following problem. Perhaps somebody can help me. > > I have a index with different ids in the same field > something like > > > 45678565 > 87854546 > > Situation: I have different documents with the entry in the same > index. > > > document 1) > > 324235678565 > 324dssd5678565 > 45678324565 > > 8785454324326 > > > document 2) > > 324235678565 > > 45678324565 > 8785454324326 > > > > when I search for " s: " I receive both docs, but document 1 has a > better scoring than document 2. > The position of in doc 1 is Field[4] and in doc 2 it's Field[2], > so this seems to effect scoring. > > How can I disable this behaviour, so doc 1 has the same scoring as doc 2??? > Which method do I have to overwrite in DefaultSimilarity. > Has anybody any idea, any help. > > Thanks > > yo > > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Similarity - position in Field[] effects scoring - how to change?
Hallo, I run in following problem. Perhaps somebody can help me. I have a index with different ids in the same field something like 45678565 87854546 Situation: I have different documents with the entry in the same index. document 1) 324235678565 324dssd5678565 45678324565 8785454324326 document 2) 324235678565 45678324565 8785454324326 when I search for " s: " I receive both docs, but document 1 has a better scoring than document 2. The position of in doc 1 is Field[4] and in doc 2 it's Field[2], so this seems to effect scoring. How can I disable this behaviour, so doc 1 has the same scoring as doc 2??? Which method do I have to overwrite in DefaultSimilarity. Has anybody any idea, any help. Thanks yo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: code works with 1.3-rc1 but not with 1.3-final??
Or set a big value with minMergeDocs on IndexWriter and keep a low mergeFactor (ie 10). You'll have a small number of files on your disk and the indexing should be faster as well. - Original Message - From: "Matt Quail" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, March 23, 2004 4:22 AM Subject: Re: code works with 1.3-rc1 but not with 1.3-final?? > Or use IndexWriter.setUseCompundFile(true) to reduce the number of files > created by Lucene. > > http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWrite r.html#setUseCompoundFile(boolean) > > =Matt > > Kevin A. Burton wrote: > > > Dan wrote: > > > >> I have some code that creates a lucene index. It has been working fine > >> with lucene-1.3-rc1.jar but I wanted to upgrade to > >> lucene-1.3-final.jar. I did this and the indexer breaks. I get the > >> following error when running the index with 1.3-final: > >> > >> Optimizing the index > >> IOException: /home/danl001/index-Mar-22-14_31_30/_ni.f43 (Too many > >> open files) > >> Indexed 884 files in 8 directories > >> Index creation took 242 seconds > >> % > >> > > No... it's you... ;) > > > > Read the FAQ and then run > > > > ulimit -n 100 or so... > > > > You need to increase your file handles. Chance are you never noticed > > this before but the problem was still present. If you're on a Linux box > > you would be amazed to find out that you're only about 200 file handles > > away from running out of your per-user quota file quota. > > > > You might have to su as root to change this.. RedHat is more strict > > because it uses the glibc resource restrictions thingy. (who's name > > slips my mind at the moment). > > Debian is configured better here as per defaults. > > > > Also a google query would have solved this for you very quickly ;).. > > > > Kevin > > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]