Re: Philosophy(??) question

2004-01-14 Thread Erik Hatcher
On Jan 13, 2004, at 5:23 PM, Scott Smith wrote:
Some day, I'd be interested to understand the deeper question.
Here is a scenario to ponder about using different analyzers at index 
and query time.

Suppose you have a custom analyzer that places synonyms of words into 
the same token position as the original words.  QueryParser does not 
deal with token position, neither does PhraseQuery currently.  If you 
use the same analyzer for QueryParser, the query will be mangled.  
Using an analyzer that does everything the indexing analyzer does but 
without putting the synonyms into the token stream will do the trick 
(no need to look up synonyms at query time, they are already indexed 
anyway).

Also, if you use Field.Keyword at indexing time, perhaps having an 
analyzer at QueryParser time that does not analyze those keyword 
fields probably makes sense too.

	Erik


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 13, 2004 3:19 AM
To: Lucene Users List
Subject: Re: Philosophy(??) question
On Jan 12, 2004, at 7:59 PM, Scott Smith wrote:
I have some documents I'm indexing which have multiple languages in
them
(i.e., some fields in the document are always English; other fields
may be
other languages).  Now, I understand why a query against a certain
field
must use the same analyzer as was used when that field was indexed
(stemming, stop words, etc.).  It seems like different fields could 
use
different analyzers and the world would still be a happy place.
However,
since the analyzer() is passed in as part of the IndexWriter, that
can't
happen.  Is there a way to do this (other than having multiple indexes
which
is a problem trying to do combined searches)?  Or am I missing
something
more subtle?  Sorry if I'm plowing old ground.
The new PerFieldAnalyzerWrapper (in v. 1.3) allows you to specify
different analyzers, as its name says, per field.  You simply specify
which analyzer to use as a default and then any special ones for
individual fields.
As for using the same analyzer for querying as for indexing - that is a
deeper question that I've yet to agree with.  There are some
interesting reasons why you may want a different one - although they
must cooperate in some fashion.
	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Philosophy(??) question

2004-01-13 Thread Morus Walter
Scott Smith writes:
 I have some documents I'm indexing which have multiple languages in them
 (i.e., some fields in the document are always English; other fields may be
 other languages).  Now, I understand why a query against a certain field
 must use the same analyzer as was used when that field was indexed
 (stemming, stop words, etc.).  It seems like different fields could use
 different analyzers and the world would still be a happy place.  However,
 since the analyzer() is passed in as part of the IndexWriter, that can't
 happen.  Is there a way to do this (other than having multiple indexes which
 is a problem trying to do combined searches)?  Or am I missing something
 more subtle?  Sorry if I'm plowing old ground.
 
AFAIK you need to write one analyzer that acts different based on the
the 'fieldName' parameter in the tokenStream method.
I haven't done that though.

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Philosophy(??) question

2004-01-13 Thread Erik Hatcher
On Jan 12, 2004, at 7:59 PM, Scott Smith wrote:
I have some documents I'm indexing which have multiple languages in 
them
(i.e., some fields in the document are always English; other fields 
may be
other languages).  Now, I understand why a query against a certain 
field
must use the same analyzer as was used when that field was indexed
(stemming, stop words, etc.).  It seems like different fields could use
different analyzers and the world would still be a happy place.  
However,
since the analyzer() is passed in as part of the IndexWriter, that 
can't
happen.  Is there a way to do this (other than having multiple indexes 
which
is a problem trying to do combined searches)?  Or am I missing 
something
more subtle?  Sorry if I'm plowing old ground.
The new PerFieldAnalyzerWrapper (in v. 1.3) allows you to specify 
different analyzers, as its name says, per field.  You simply specify 
which analyzer to use as a default and then any special ones for 
individual fields.

As for using the same analyzer for querying as for indexing - that is a 
deeper question that I've yet to agree with.  There are some 
interesting reasons why you may want a different one - although they 
must cooperate in some fashion.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Philosophy(??) question

2004-01-13 Thread Scott Smith
I looked at PerFieldAnalyzerWrapper.  Seems perfect for what I want.
Thanks.

Some day, I'd be interested to understand the deeper question.

Thanks again

Scott

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, January 13, 2004 3:19 AM
To: Lucene Users List
Subject: Re: Philosophy(??) question


On Jan 12, 2004, at 7:59 PM, Scott Smith wrote:
 I have some documents I'm indexing which have multiple languages in
 them
 (i.e., some fields in the document are always English; other fields 
 may be
 other languages).  Now, I understand why a query against a certain 
 field
 must use the same analyzer as was used when that field was indexed
 (stemming, stop words, etc.).  It seems like different fields could use
 different analyzers and the world would still be a happy place.  
 However,
 since the analyzer() is passed in as part of the IndexWriter, that 
 can't
 happen.  Is there a way to do this (other than having multiple indexes 
 which
 is a problem trying to do combined searches)?  Or am I missing 
 something
 more subtle?  Sorry if I'm plowing old ground.

The new PerFieldAnalyzerWrapper (in v. 1.3) allows you to specify 
different analyzers, as its name says, per field.  You simply specify 
which analyzer to use as a default and then any special ones for 
individual fields.

As for using the same analyzer for querying as for indexing - that is a 
deeper question that I've yet to agree with.  There are some 
interesting reasons why you may want a different one - although they 
must cooperate in some fashion.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]