Erik,

On Sep 26, 2010, at 8:04 AM, Erick Erickson wrote:

> The reason I ask is that you had to put the payloads into the
> input in the first place, and they don't affect searching unless
> you want them to. So why do you want to remove them
> with a token filter?

Our Tokenizer puts a part-of-speech tag into each Token
as a payload.  There is an accompanying TokenFilter that 
removes Tokens marked with a configurable set of 
part-of-speech tags later in the analysis chain.  

As I understand it, payloads go to the Lucene index.
In most cases, the part-of-speech tags are not used if
retrieved by the search applications.  So they shouldn't
go to the index.  So I'd like to know if there is an
existing TokenFilter that does this.  Otherwise, I'd like
to write one.
----
T. "Kuro" Kurosaka



Reply via email to