Wildcard vs Term query

2007-09-26 Thread John Byrne

Hi,

I'm working my way through the Lucene In Action book, and there is one 
thing I need explained that I didn't find there;


While wildcard queries are potentially slower than ordinary term 
queries, are they slower even if theyt don't contain a wildcard? 
Significantly slower?


The reason I ask is that if we assume we are going to allow wildcards in 
a search engine, but we want to optimize, to take advantage of  when 
they are NOT used, do we have to check for the presence of "*" or "?" in 
the term, and create the most appropriate query, or can I assume that 
when a wildcard is not present, the WildcardQuery will be as fast (or 
almost as fast) a a plain term query?


Thanks in advance!
John B.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Wildcard vs Term query

2007-09-26 Thread mark harwood
Are you using the out of the box Lucene QueryParser?  It will automatically 
detect wildcard queries by the presence of * or ? chars.
If the user input does not contain these characters a plain TermQuery is used.

BooleanQuery.setMaxClauseCount can be used to control the upper limit on terms 
produced by Wildcard/Fuzzy Queries.
If this limit is exceeded (e.g when searching for something like "a*" ) then an 
exception is thrown.

Cheers
Mark
- Original Message 
From: John Byrne <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, 26 September, 2007 9:48:17 AM
Subject: Wildcard vs Term query

Hi,

I'm working my way through the Lucene In Action book, and there is one 
thing I need explained that I didn't find there;

While wildcard queries are potentially slower than ordinary term 
queries, are they slower even if theyt don't contain a wildcard? 
Significantly slower?

The reason I ask is that if we assume we are going to allow wildcards in 
a search engine, but we want to optimize, to take advantage of  when 
they are NOT used, do we have to check for the presence of "*" or "?" in 
the term, and create the most appropriate query, or can I assume that 
when a wildcard is not present, the WildcardQuery will be as fast (or 
almost as fast) a a plain term query?

Thanks in advance!
John B.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






  ___
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Wildcard vs Term query

2007-09-26 Thread John Byrne
I'm not using the QueryParser at all. I need to do a little more with 
the terms, so i'm explicitly creating a Query from a single term. What I 
was hoping was to avoid something like this:

...
if(term.contains("*") || terms.contains("?")   {
   return new WildcardQuery(...
}
else   {
return new TermQuery(...
...

and instead just go like this:
...
return new WilcardQuery(...
...
on the basis that the WildacardQuery would only be slower if it does 
contain a wildcard character. But as you pointed out, the QueryParser 
makes this optimization, so I suppose I should too.


mark harwood wrote:

Are you using the out of the box Lucene QueryParser?  It will automatically 
detect wildcard queries by the presence of * or ? chars.
If the user input does not contain these characters a plain TermQuery is used.

BooleanQuery.setMaxClauseCount can be used to control the upper limit on terms 
produced by Wildcard/Fuzzy Queries.
If this limit is exceeded (e.g when searching for something like "a*" ) then an 
exception is thrown.

Cheers
Mark
- Original Message 
From: John Byrne <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, 26 September, 2007 9:48:17 AM
Subject: Wildcard vs Term query

Hi,

I'm working my way through the Lucene In Action book, and there is one 
thing I need explained that I didn't find there;


While wildcard queries are potentially slower than ordinary term 
queries, are they slower even if theyt don't contain a wildcard? 
Significantly slower?


The reason I ask is that if we assume we are going to allow wildcards in 
a search engine, but we want to optimize, to take advantage of  when 
they are NOT used, do we have to check for the presence of "*" or "?" in 
the term, and create the most appropriate query, or can I assume that 
when a wildcard is not present, the WildcardQuery will be as fast (or 
almost as fast) a a plain term query?


Thanks in advance!
John B.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






  ___
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



  



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Wildcard vs Term query

2007-09-26 Thread Erik Hatcher
WildcardQuery won't be slower than TermQuery if there are no wildcard  
characters.  Beyond what QueryParser does, WildcardQuery itself  
reverts to a TermQuery:


  public Query rewrite(IndexReader reader) throws IOException {
  if (this.termContainsWildcard) {
  return super.rewrite(reader);
  }

  return new TermQuery(getTerm());
  }

I personally would optimize which query gets created, but performance- 
wise you won't pay a penalty for just using WildcardQuery.


Erik


On Sep 26, 2007, at 5:45 AM, John Byrne wrote:

I'm not using the QueryParser at all. I need to do a little more  
with the terms, so i'm explicitly creating a Query from a single  
term. What I was hoping was to avoid something like this:

...
if(term.contains("*") || terms.contains("?")   {
   return new WildcardQuery(...
}
else   {
return new TermQuery(...
...

and instead just go like this:
...
return new WilcardQuery(...
...
on the basis that the WildacardQuery would only be slower if it  
does contain a wildcard character. But as you pointed out, the  
QueryParser makes this optimization, so I suppose I should too.


mark harwood wrote:
Are you using the out of the box Lucene QueryParser?  It will  
automatically detect wildcard queries by the presence of * or ?  
chars.
If the user input does not contain these characters a plain  
TermQuery is used.


BooleanQuery.setMaxClauseCount can be used to control the upper  
limit on terms produced by Wildcard/Fuzzy Queries.
If this limit is exceeded (e.g when searching for something like  
"a*" ) then an exception is thrown.


Cheers
Mark
- Original Message 
From: John Byrne <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, 26 September, 2007 9:48:17 AM
Subject: Wildcard vs Term query

Hi,

I'm working my way through the Lucene In Action book, and there is  
one thing I need explained that I didn't find there;


While wildcard queries are potentially slower than ordinary term  
queries, are they slower even if theyt don't contain a wildcard?  
Significantly slower?


The reason I ask is that if we assume we are going to allow  
wildcards in a search engine, but we want to optimize, to take  
advantage of  when they are NOT used, do we have to check for the  
presence of "*" or "?" in the term, and create the most  
appropriate query, or can I assume that when a wildcard is not  
present, the WildcardQuery will be as fast (or almost as fast) a a  
plain term query?


Thanks in advance!
John B.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






  ___
Yahoo! Answers - Got a question? Someone out there knows the  
answer. Try it

now.
http://uk.answers.yahoo.com/
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: user index sigature

2007-09-26 Thread Grant Ingersoll

Would IndexReader:
/**
   * Reads version number from segments files. The version number is
   * initialized with a timestamp and then increased by one for each  
change of

   * the index.
   *
   * @param directory where the index resides.
   * @return version number.
   * @throws CorruptIndexException if the index is corrupt
   * @throws IOException if there is a low-level IO error
   */
  public static long getCurrentVersion(Directory directory) throws  
CorruptIndexException, IOException {


do what you are looking for?  Also, why does it have to be in the  
index if you are concerned about loading the whole IndexReader?  That  
is, if your application is versioning the application, why not just  
store it in the same location or something like that?


-Grant

On Sep 25, 2007, at 6:51 PM, John Wang wrote:


Hi:

   Is there a way to added custom signature data to a lucene index,  
e.g data

version etc?

Thanks

-John


--
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[JOB] Full-time opportunity in Paris, France

2007-09-26 Thread nicolas . dessaigne
Arisem is a French ISV delivering best-of-breed text analytics software. We
are using Lucene in our products since 2001 and are in search of a Lucene
expert to complement our R&D team.

 

Required skills:

- Master degree in computer science

- 2+ years of experience in working with Lucene

- Strong design and coding skills in Java on Linux platforms

- Strong desire to work in an environment combining development and research

- Innovation and excellent communication skills

 

Fluency in French is a plus.

Ideal candidates will also have an experience in research and skills in text
mining and NLP. Familiarity with C++, SOLR and Eclipse is also desired.

 

If you are available and interested, please contact me directly at
nicolas.dessaigne_at_arisem.com

 

Nicolas Dessaigne

Chief Technical Officer

ARISEM

 

 



Re: user index sigature

2007-09-26 Thread John Wang
I have my own versioning system and I use it to keep index in sync with
other parts of the system. Just wanted to know if there is a shortcut to
keep it in the Lucene index and be able to read it by using something
similar to getCurrentVersion.
I guess I will have to store it somewhere outside of the index then.

-John

On 9/26/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
>
> Would IndexReader:
> /**
> * Reads version number from segments files. The version number is
> * initialized with a timestamp and then increased by one for each
> change of
> * the index.
> *
> * @param directory where the index resides.
> * @return version number.
> * @throws CorruptIndexException if the index is corrupt
> * @throws IOException if there is a low-level IO error
> */
>public static long getCurrentVersion(Directory directory) throws
> CorruptIndexException, IOException {
>
> do what you are looking for?  Also, why does it have to be in the
> index if you are concerned about loading the whole IndexReader?  That
> is, if your application is versioning the application, why not just
> store it in the same location or something like that?
>
> -Grant
>
> On Sep 25, 2007, at 6:51 PM, John Wang wrote:
>
> > Hi:
> >
> >Is there a way to added custom signature data to a lucene index,
> > e.g data
> > version etc?
> >
> > Thanks
> >
> > -John
>
> --
> Grant Ingersoll
> http://lucene.grantingersoll.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>