Wildcard vs Term query
Hi, I'm working my way through the Lucene In Action book, and there is one thing I need explained that I didn't find there; While wildcard queries are potentially slower than ordinary term queries, are they slower even if theyt don't contain a wildcard? Significantly slower? The reason I ask is that if we assume we are going to allow wildcards in a search engine, but we want to optimize, to take advantage of when they are NOT used, do we have to check for the presence of "*" or "?" in the term, and create the most appropriate query, or can I assume that when a wildcard is not present, the WildcardQuery will be as fast (or almost as fast) a a plain term query? Thanks in advance! John B. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Wildcard vs Term query
Are you using the out of the box Lucene QueryParser? It will automatically detect wildcard queries by the presence of * or ? chars. If the user input does not contain these characters a plain TermQuery is used. BooleanQuery.setMaxClauseCount can be used to control the upper limit on terms produced by Wildcard/Fuzzy Queries. If this limit is exceeded (e.g when searching for something like "a*" ) then an exception is thrown. Cheers Mark - Original Message From: John Byrne <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, 26 September, 2007 9:48:17 AM Subject: Wildcard vs Term query Hi, I'm working my way through the Lucene In Action book, and there is one thing I need explained that I didn't find there; While wildcard queries are potentially slower than ordinary term queries, are they slower even if theyt don't contain a wildcard? Significantly slower? The reason I ask is that if we assume we are going to allow wildcards in a search engine, but we want to optimize, to take advantage of when they are NOT used, do we have to check for the presence of "*" or "?" in the term, and create the most appropriate query, or can I assume that when a wildcard is not present, the WildcardQuery will be as fast (or almost as fast) a a plain term query? Thanks in advance! John B. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Wildcard vs Term query
I'm not using the QueryParser at all. I need to do a little more with the terms, so i'm explicitly creating a Query from a single term. What I was hoping was to avoid something like this: ... if(term.contains("*") || terms.contains("?") { return new WildcardQuery(... } else { return new TermQuery(... ... and instead just go like this: ... return new WilcardQuery(... ... on the basis that the WildacardQuery would only be slower if it does contain a wildcard character. But as you pointed out, the QueryParser makes this optimization, so I suppose I should too. mark harwood wrote: Are you using the out of the box Lucene QueryParser? It will automatically detect wildcard queries by the presence of * or ? chars. If the user input does not contain these characters a plain TermQuery is used. BooleanQuery.setMaxClauseCount can be used to control the upper limit on terms produced by Wildcard/Fuzzy Queries. If this limit is exceeded (e.g when searching for something like "a*" ) then an exception is thrown. Cheers Mark - Original Message From: John Byrne <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, 26 September, 2007 9:48:17 AM Subject: Wildcard vs Term query Hi, I'm working my way through the Lucene In Action book, and there is one thing I need explained that I didn't find there; While wildcard queries are potentially slower than ordinary term queries, are they slower even if theyt don't contain a wildcard? Significantly slower? The reason I ask is that if we assume we are going to allow wildcards in a search engine, but we want to optimize, to take advantage of when they are NOT used, do we have to check for the presence of "*" or "?" in the term, and create the most appropriate query, or can I assume that when a wildcard is not present, the WildcardQuery will be as fast (or almost as fast) a a plain term query? Thanks in advance! John B. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Wildcard vs Term query
WildcardQuery won't be slower than TermQuery if there are no wildcard characters. Beyond what QueryParser does, WildcardQuery itself reverts to a TermQuery: public Query rewrite(IndexReader reader) throws IOException { if (this.termContainsWildcard) { return super.rewrite(reader); } return new TermQuery(getTerm()); } I personally would optimize which query gets created, but performance- wise you won't pay a penalty for just using WildcardQuery. Erik On Sep 26, 2007, at 5:45 AM, John Byrne wrote: I'm not using the QueryParser at all. I need to do a little more with the terms, so i'm explicitly creating a Query from a single term. What I was hoping was to avoid something like this: ... if(term.contains("*") || terms.contains("?") { return new WildcardQuery(... } else { return new TermQuery(... ... and instead just go like this: ... return new WilcardQuery(... ... on the basis that the WildacardQuery would only be slower if it does contain a wildcard character. But as you pointed out, the QueryParser makes this optimization, so I suppose I should too. mark harwood wrote: Are you using the out of the box Lucene QueryParser? It will automatically detect wildcard queries by the presence of * or ? chars. If the user input does not contain these characters a plain TermQuery is used. BooleanQuery.setMaxClauseCount can be used to control the upper limit on terms produced by Wildcard/Fuzzy Queries. If this limit is exceeded (e.g when searching for something like "a*" ) then an exception is thrown. Cheers Mark - Original Message From: John Byrne <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, 26 September, 2007 9:48:17 AM Subject: Wildcard vs Term query Hi, I'm working my way through the Lucene In Action book, and there is one thing I need explained that I didn't find there; While wildcard queries are potentially slower than ordinary term queries, are they slower even if theyt don't contain a wildcard? Significantly slower? The reason I ask is that if we assume we are going to allow wildcards in a search engine, but we want to optimize, to take advantage of when they are NOT used, do we have to check for the presence of "*" or "?" in the term, and create the most appropriate query, or can I assume that when a wildcard is not present, the WildcardQuery will be as fast (or almost as fast) a a plain term query? Thanks in advance! John B. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: user index sigature
Would IndexReader: /** * Reads version number from segments files. The version number is * initialized with a timestamp and then increased by one for each change of * the index. * * @param directory where the index resides. * @return version number. * @throws CorruptIndexException if the index is corrupt * @throws IOException if there is a low-level IO error */ public static long getCurrentVersion(Directory directory) throws CorruptIndexException, IOException { do what you are looking for? Also, why does it have to be in the index if you are concerned about loading the whole IndexReader? That is, if your application is versioning the application, why not just store it in the same location or something like that? -Grant On Sep 25, 2007, at 6:51 PM, John Wang wrote: Hi: Is there a way to added custom signature data to a lucene index, e.g data version etc? Thanks -John -- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[JOB] Full-time opportunity in Paris, France
Arisem is a French ISV delivering best-of-breed text analytics software. We are using Lucene in our products since 2001 and are in search of a Lucene expert to complement our R&D team. Required skills: - Master degree in computer science - 2+ years of experience in working with Lucene - Strong design and coding skills in Java on Linux platforms - Strong desire to work in an environment combining development and research - Innovation and excellent communication skills Fluency in French is a plus. Ideal candidates will also have an experience in research and skills in text mining and NLP. Familiarity with C++, SOLR and Eclipse is also desired. If you are available and interested, please contact me directly at nicolas.dessaigne_at_arisem.com Nicolas Dessaigne Chief Technical Officer ARISEM
Re: user index sigature
I have my own versioning system and I use it to keep index in sync with other parts of the system. Just wanted to know if there is a shortcut to keep it in the Lucene index and be able to read it by using something similar to getCurrentVersion. I guess I will have to store it somewhere outside of the index then. -John On 9/26/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > Would IndexReader: > /** > * Reads version number from segments files. The version number is > * initialized with a timestamp and then increased by one for each > change of > * the index. > * > * @param directory where the index resides. > * @return version number. > * @throws CorruptIndexException if the index is corrupt > * @throws IOException if there is a low-level IO error > */ >public static long getCurrentVersion(Directory directory) throws > CorruptIndexException, IOException { > > do what you are looking for? Also, why does it have to be in the > index if you are concerned about loading the whole IndexReader? That > is, if your application is versioning the application, why not just > store it in the same location or something like that? > > -Grant > > On Sep 25, 2007, at 6:51 PM, John Wang wrote: > > > Hi: > > > >Is there a way to added custom signature data to a lucene index, > > e.g data > > version etc? > > > > Thanks > > > > -John > > -- > Grant Ingersoll > http://lucene.grantingersoll.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >