RE: Using the highlighter from the sandbox with a prefix query.
I am using query = searcher.rewrite( query ); and it is throwing java.lang.UnsupportedOperationException . Am I able to use the searcher rewrite method like this? Thanks, Michael -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: Thursday, February 17, 2005 4:09 AM To: Lucene Users List Subject: Re: Using the highlighter from the sandbox with a prefix query. On Thursday 17 February 2005 08:37, lucuser4851 wrote: We have been using the highlighter from the lucene sandbox, which works very nicely most of the time. However when we try and use it with a prefix query (which is what you get having parsed a wild-card query), it doesn't return any highlighted sections. Has anyone else experienced this problem, or found a way around it? You need to call rewrite() on the query before you pass it to the highlighter. Regards Daniel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Using the highlighter from the sandbox with a prefix query.
On Feb 21, 2005, at 10:20 AM, Michael Celona wrote: I am using query = searcher.rewrite( query ); and it is throwing java.lang.UnsupportedOperationException . Am I able to use the searcher rewrite method like this? What's the full stack trace? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Sorting isn't working for my date field
Hi Do I need to store and index the field I want to sort? Currently I am only indexing the field without storing nor tokenizing it. I have a date field indexing as MMdd and I have two documents with the same date. When I do my search with: searcher.search(query, new SortField(date, true)); searcher.search(query, new SortField(date, false)); they both return the same order. Any idea? Thanks. Ben - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Iterate through all the document ids in the index?
William Lee wrote: is there a simple and fast way to get a list of document IDs through the lucene index? I can use a loop to iterate from 0 to IndexReader.maxDoc and check whether an the document id is valid through IndexReader.document(i), but this would imply that I have to retrieve the documents fields. Use IndexReader.isDeleted() to check if each id is valid. This is quite fast. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Handling Synonyms
Hello; Does anyone see a problem with the following approach? For synonyms, rather than putting them in the index, I put the original term and all the synonyms in the query. Every time I create a query, I check if the term has any synonyms. If it does, I create Boolean Query OR'ing one Query object for each synonym. So if I have a synoym list: red = colour, primary, stop And someone wants to search the desc field for the red, I would end up with something like: ( (desc:*red*) (desc:*colout*) (desc:*stop*) ). Now the synonyms would'nt be in the index, the Query would account for all the possible synonym terms. Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Using the highlighter from the sandbox with a prefix query.
On Feb 21, 2005, at 10:53 AM, Michael Celona wrote: That the only stack I get. One thing to mention that I am using a MultiSearcher to rewrite the queries. I tried... query = searcher_last.rewrite( query ); query = searcher_cur.rewrite( query ); using IndexSearcher and I don't get an error... However, I not able to highlight wildcard queries. I use Highlighter for lucenebook.com and have two indexes that I search with MultiSearcher. Here's how I highlight: IndexReader reader = readers[indexIndex]; QueryScorer scorer = new QueryScorer(query.rewrite(reader)); SimpleHTMLFormatter formatter = new SimpleHTMLFormatter(span class=\highlight\, /span); Highlighter highlighter = new Highlighter(formatter, scorer); I get the appropriate IndexReader for the document being highlighted. You can get the index _index_ this way: ' int indexIndex = searcher.subSearcher(hits.id(position)); Hope this helps. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Using the highlighter from the sandbox with a prefix query.
One thing to mention that I am using a MultiSearcher to rewrite the queries. I tried... Ah. I remember this got a little ugly. The highlighter has a Junit test that demonstrates highlighting fuzzy queries when using a multisearcher. Take a look at that. I can't remember the ins and outs of the issues but I know the code there still runs clean with the latest versions. Cheers Mark. ___ ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Using the highlighter from the sandbox with a prefix query.
Thank you this helped a lot... Michael Celona -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, February 21, 2005 11:55 AM To: Lucene Users List Subject: Re: Using the highlighter from the sandbox with a prefix query. On Feb 21, 2005, at 10:53 AM, Michael Celona wrote: That the only stack I get. One thing to mention that I am using a MultiSearcher to rewrite the queries. I tried... query = searcher_last.rewrite( query ); query = searcher_cur.rewrite( query ); using IndexSearcher and I don't get an error... However, I not able to highlight wildcard queries. I use Highlighter for lucenebook.com and have two indexes that I search with MultiSearcher. Here's how I highlight: IndexReader reader = readers[indexIndex]; QueryScorer scorer = new QueryScorer(query.rewrite(reader)); SimpleHTMLFormatter formatter = new SimpleHTMLFormatter(span class=\highlight\, /span); Highlighter highlighter = new Highlighter(formatter, scorer); I get the appropriate IndexReader for the document being highlighted. You can get the index _index_ this way: ' int indexIndex = searcher.subSearcher(hits.id(position)); Hope this helps. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Query Tuning
Hi All, How does Lucene handle multi term queries? Does it use short circuiting? So if a user entered: (a OR b) AND c But my program knew testing for c is cheaper than testing for (a OR b) and I rewrote the query as: c AND (a OR b) Would the query run faster? Sorry if this has already be answered, but for some reason the Archive search is not working for me today. Thanks, Kevin
Re: Query Tuning
On Monday 21 February 2005 19:59, Runde, Kevin wrote: Hi All, How does Lucene handle multi term queries? Does it use short circuiting? So if a user entered: (a OR b) AND c But my program knew testing for c is cheaper than testing for (a OR b) and I rewrote the query as: c AND (a OR b) Would the query run faster? Exchanging the operands of AND would not make a noticeable difference in speed. Queries are evaluated by iterating the inverted term index entries for all query terms in parallel, with buffering. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query Tuning
Runde, Kevin wrote: Hi All, How does Lucene handle multi term queries? Does it use short circuiting? So if a user entered: (a OR b) AND c But my program knew testing for c is cheaper than testing for (a OR b) and I rewrote the query as: c AND (a OR b) Would the query run faster? Sorry if this has already be answered, but for some reason the Archive search is not working for me today. Thanks, Kevin Not sure about what is in CVS, but look at BooleanQuery.scorer(). If all of the clauses of the BooleanQuery are required and none of the clauses are BooleanQueries a ConjunctionScorer is returned that offers the optimizations you seek. In the example you gave, there is a clause that is boolean ( a or b) that will have to be evaluated independently with a boolean scorer. This will be performed regardless of the ordering. (BooleanScorer doesn't preserve document order when it return results and hence it can't utilize the optimal algorithm provided by ConjuntionScorer). Others have been down this path as evidenced by the sigh in the javadoc. If calculating (a or b) is expensive and the docFreq of a is much less than the union of a and b, you might consider rewriting it to (a and c) or (b and c) using DeMorgan's law. Expansion like this isn't always beneficial and can't be applied blindly. As far as I can tell there is no query planning/optimization aside from the merging of related clauses and attempts to rewrite to simpler queries. Cheers, Todd VanderVeen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: knowing which field contributed the search result
Anyone has any thoughts on this? Thanks -John On Wed, 16 Feb 2005 14:39:52 -0800, John Wang [EMAIL PROTECTED] wrote: Hi: Is there way to find out given a hit from a search, find out which fields contributed to the hit? e.g. If my search for: contents1=brown fox OR contents2=black bear can the document founded by this query also have information on whether it was found via contents1 or contents2 or both. Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Handling Synonyms
Luke Shannon wrote: Hello; Does anyone see a problem with the following approach? No, no problem with it and it's in fact what my Wordnet Query Expansion sandbox module does. The nice thing about Lucene is you at least have the option of doing things the other way - you can write a custom Analyzer that puts all synonyms at the same token offset so they appear to be in the same place in the token stream. Thinking about it...this approach, with the Analyzer, lets user search for phrases which would match a synonym, so, using your example below, the text bright red engine would be matched by either phrase bright red or bright colour. Doing the query expansion is trickier if you allow phrases. For synonyms, rather than putting them in the index, I put the original term and all the synonyms in the query. Every time I create a query, I check if the term has any synonyms. If it does, I create Boolean Query OR'ing one Query object for each synonym. So if I have a synoym list: red = colour, primary, stop And someone wants to search the desc field for the red, I would end up with something like: ( (desc:*red*) (desc:*colout*) (desc:*stop*) ). I don't like that bit about substring terms, but if it's right for you ok - if you insist on loosening things I'd consider fuzzy terms (desc:red~ ...etc). Now the synonyms would'nt be in the index, the Query would account for all the possible synonym terms. Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: knowing which field contributed the search result
John Wang wrote: Anyone has any thoughts on this? Does this help? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Searchable.html#explain(org.apache.lucene.search.Query,%20int) Thanks -John On Wed, 16 Feb 2005 14:39:52 -0800, John Wang [EMAIL PROTECTED] wrote: Hi: Is there way to find out given a hit from a search, find out which fields contributed to the hit? e.g. If my search for: contents1=brown fox OR contents2=black bear can the document founded by this query also have information on whether it was found via contents1 or contents2 or both. Thanks -John - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Query Tuning
On Monday 21 February 2005 20:43, Todd VanderVeen wrote: Runde, Kevin wrote: Hi All, How does Lucene handle multi term queries? Does it use short circuiting? So if a user entered: (a OR b) AND c But my program knew testing for c is cheaper than testing for (a OR b) and I rewrote the query as: c AND (a OR b) Would the query run faster? Sorry if this has already be answered, but for some reason the Archive search is not working for me today. Thanks, Kevin Not sure about what is in CVS, but look at BooleanQuery.scorer(). If all It's in svn nowadays. of the clauses of the BooleanQuery are required and none of the clauses are BooleanQueries a ConjunctionScorer is returned that offers the optimizations you seek. In the example you gave, there is a clause that is boolean ( a or b) that will have to be evaluated independently with a boolean scorer. This will be performed regardless of the ordering. (BooleanScorer doesn't preserve document order when it return results and hence it can't utilize the optimal algorithm provided by ConjuntionScorer). Others have been down this path as evidenced by the sigh in the javadoc. In the svn version a ConjunctionScorer is used for all top level AND queries. If calculating (a or b) is expensive and the docFreq of a is much less than the union of a and b, you might consider rewriting it to (a and c) or (b and c) using DeMorgan's law. Expansion like this isn't always beneficial and can't be applied blindly. As far as I can tell there is In the svn version the subquery (a or b) is only evaluated for documents matching c. In the current version the expansion to (a and c) or (b and c) might help: the tradeoff is between evaluating c twice and having less work for the OR operator. no query planning/optimization aside from the merging of related clauses and attempts to rewrite to simpler queries. One optimization in the current version is the use of ConjunctionScorer for some cases. One such case, which happens a lot in practice, is a query that has a few required terms. Another optimization in the current version that some scoring is done ahead for each clause into an unordered buffer. This helps for top level OR queries, but loses for OR queries that are subqueries of AND. The svn version does not score ahead. It relies on the buffering done by TermScorer. Perhaps the buffering for a TermScorer should be made dependent on it's expected use: more buffering for top level OR, less buffering when used under AND. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: More Analyzer Question
The problem is your KeywordSynonymAnalyzer is not truly a keyword analyzer in that it is tokenizing the field into parts. So Document 1 has [test] and [mario] as tokens that come from the LowerCaseTokenizer. Look at Lucene's svn repository under contrib/analyzers and you'll see a KeywordTokenizer and corresponding KeywordAnalyzer you can use. Erik On Feb 18, 2005, at 5:44 PM, Luke Shannon wrote: I have created an Analyzer that I think should just be converting to lower case and add synonyms in the index (it is at the end of the email). The problem is, after running it I get one more result than I was expecting (Document 1 should not be there): Running testNameCombination1, expecting: 1 result The query: +(type:138) +(name:mario*) returned 2 Start Listing documents: Document: 0 contains: Name: Textname:mario test Desc: Textdesc:this is test from mario Document: 1 contains: Name: Textname:test mario Desc: Textdesc:retro End Listing documents Those same 2 documents in Luke look like this: Document 0 Textname:mario test Textdesc:this is test from mario Document 1 Textname:test mario Textdesc:retro That looks correct to me. The query shouldn't match Document 1. The analzyer used on this field is below and is applied like so: //set the default PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new SynonymAnalyzer(new FBSynonymEngine())); //the analyzer for the name field (only converts to lower case and adds synonyms analyzer.addAnalyzer(name, new KeywordSynonymAnalyzer(new FBSynonymEngine())); Any help would be appreciated. Thanks, Luke import org.apache.lucene.analysis.*; import java.io.Reader; public class KeywordSynonymAnalyzer extends Analyzer { private SynonymEngine engine; public KeywordSynonymAnalyzer(SynonymEngine engine) { this.engine = engine; } public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result = new SynonymFilter(new LowerCaseTokenizer(reader), engine); return result; } } Luke Shannon | Software Developer FutureBrand Toronto 207 Queen's Quay, Suite 400 Toronto, ON, M5J 1A7 416 642 7935 (office) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Optional Terms in a single query
Hi; I'm trying to create a query that look for a field containing type:181 and name doesn't contain tim, bill or harry. +(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere)) +(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere)) +(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere)) +(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere)) I would really think to do this all in one Query. Is this even possible? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optional Terms in a single query
On Monday 21 February 2005 23:23, Luke Shannon wrote: Hi; I'm trying to create a query that look for a field containing type:181 and name doesn't contain tim, bill or harry. type: 181 -(name: tim name:bill name:harry) +(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere)) stillHere is normally lowercased before searching. Is that ok? +(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere)) +(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere)) typo? olfaithfull +(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere)) typo? (type:1 81) I would really think to do this all in one Query. Is this even possible? How would you want to combine the results? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optional Terms in a single query
Sorry about the typos. What I would like is a document with a type field = 181, olfaithfull=stillHere and a name field not containing tim, bill or harry. Thanks, Luke - Original Message - From: Paul Elschot [EMAIL PROTECTED] To: lucene-user@jakarta.apache.org Sent: Monday, February 21, 2005 5:31 PM Subject: Re: Optional Terms in a single query On Monday 21 February 2005 23:23, Luke Shannon wrote: Hi; I'm trying to create a query that look for a field containing type:181 and name doesn't contain tim, bill or harry. type: 181 -(name: tim name:bill name:harry) +(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere)) stillHere is normally lowercased before searching. Is that ok? +(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere)) +(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere)) typo? olfaithfull +(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere)) typo? (type:1 81) I would really think to do this all in one Query. Is this even possible? How would you want to combine the results? Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optional Terms in a single query
The API I'm working with combines a series of queries into one larger one using a boolean query. Queries on the same field get OR's into one big query. All remaining queries are AND'd with this big one. Working with in this system I have: arg = (mario luigi bobby joe) //i do have control of how this list is created I pass this to the QueryParser: Query query1 = QueryParser.parse(arg, name, new StandardAnalyzer()); Query query2 = QueryParser.parse(stillhere, olfaithfull, new StandardAnalyzer()); BooleanQuery typeNegativeSearch = new BooleanQuery(); typeNegativeSearch.add(query1, false, true); typeNegativeSearch.add(query2, true, false); This is half the query. It gets AND'd with the other half, to create what you see below: +(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere)) What I am having trouble with is getting the QueryParser to create this: -name:(tim bill harry) I feel like this is something simple, but for some reason I can't figure it out. Thanks, Luke - Original Message - From: Todd VanderVeen [EMAIL PROTECTED] To: Lucene Users List lucene-user@jakarta.apache.org Sent: Monday, February 21, 2005 5:33 PM Subject: Re: Optional Terms in a single query Luke Shannon wrote: Hi; I'm trying to create a query that look for a field containing type:181 and name doesn't contain tim, bill or harry. +(type: 181) +((-name: tim -name:bill -name:harry +oldfaith:stillHere)) +(type: 181) +((-name: tim OR bill OR harry +oldfaith:stillHere)) +(type: 181) +((-name:*(tim bill harry)* +olfaithfull:stillhere)) +(type:1 81) +((-name:*(tim OR bill OR harry)* +olfaithfull:stillhere)) I would really think to do this all in one Query. Is this even possible? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] All all the queries listed attempts at the same things? I'm guessing you want this: +type:181 -name:(tim bill harry) +oldfaith:stillHere - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optional Terms in a single query
Luke Shannon wrote: The API I'm working with combines a series of queries into one larger one using a boolean query. Queries on the same field get OR's into one big query. All remaining queries are AND'd with this big one. Working with in this system I have: arg = (mario luigi bobby joe) //i do have control of how this list is created I pass this to the QueryParser: Query query1 = QueryParser.parse(arg, name, new StandardAnalyzer()); Query query2 = QueryParser.parse(stillhere, olfaithfull, new StandardAnalyzer()); BooleanQuery typeNegativeSearch = new BooleanQuery(); typeNegativeSearch.add(query1, false, true); typeNegativeSearch.add(query2, true, false); This is half the query. It gets AND'd with the other half, to create what you see below: +(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere)) What I am having trouble with is getting the QueryParser to create this: -name:(tim bill harry) I feel like this is something simple, but for some reason I can't figure it out. Thanks, Luke Is the API something you control? Lets call the other half of you query query3. To avoid the extra nesting you need to do the composition in a single boolean query. Query query1 = QueryParser.parse(arg, name, new StandardAnalyzer()); Query query2 = QueryParser.parse(stillhere, olfaithfull, new StandardAnalyzer()); Query query3 = BooleanQuery finalQuery = new BooleanQuery(); finalQuery.add(query1, false, true); finalQuery.add(query2, true, false); finalQuery.add(query3, true, false); Cheers, Todd VanderVeen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Optional Terms in a single query
Hi Tod; Thanks for your help. I was able to do what you said but in a much uglier way using a Boolean Query and adding Wildcard Queries. The end result looks like this: The query: +(type:138) +((-name:*tim* -name:*bill* -name:*harry* +olfaithfull:stillhere)) But this one works as expected. Thanks! Luke - Original Message - From: Todd VanderVeen [EMAIL PROTECTED] To: Lucene Users List lucene-user@jakarta.apache.org Sent: Monday, February 21, 2005 6:26 PM Subject: Re: Optional Terms in a single query Luke Shannon wrote: The API I'm working with combines a series of queries into one larger one using a boolean query. Queries on the same field get OR's into one big query. All remaining queries are AND'd with this big one. Working with in this system I have: arg = (mario luigi bobby joe) //i do have control of how this list is created I pass this to the QueryParser: Query query1 = QueryParser.parse(arg, name, new StandardAnalyzer()); Query query2 = QueryParser.parse(stillhere, olfaithfull, new StandardAnalyzer()); BooleanQuery typeNegativeSearch = new BooleanQuery(); typeNegativeSearch.add(query1, false, true); typeNegativeSearch.add(query2, true, false); This is half the query. It gets AND'd with the other half, to create what you see below: +(type:181) +((-(name:tim name:harry name:bill) +olfaithfull:stillhere)) What I am having trouble with is getting the QueryParser to create this: -name:(tim bill harry) I feel like this is something simple, but for some reason I can't figure it out. Thanks, Luke Is the API something you control? Lets call the other half of you query query3. To avoid the extra nesting you need to do the composition in a single boolean query. Query query1 = QueryParser.parse(arg, name, new StandardAnalyzer()); Query query2 = QueryParser.parse(stillhere, olfaithfull, new StandardAnalyzer()); Query query3 = BooleanQuery finalQuery = new BooleanQuery(); finalQuery.add(query1, false, true); finalQuery.add(query2, true, false); finalQuery.add(query3, true, false); Cheers, Todd VanderVeen - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]