Re: TokenFilter that removes payload ?
Robert Erik, I appreciate your suggestions but we use Type for other purpose. Also, the product is out and we can't change the design so easily. So it seems the conclusion there is no such TokenFilter. I'll write one. Thanks. On Sep 27, 2010, at 1:00 PM, Robert Muir wrote: On Sun, Sep 26, 2010 at 11:49 PM, Teruhiko Kurosaka k...@basistech.comwrote: As I understand it, payloads go to the Lucene index. In most cases, the part-of-speech tags are not used if retrieved by the search applications. So they shouldn't go to the index. So I'd like to know if there is an existing TokenFilter that does this. Otherwise, I'd like to write one. I agree with Erick, I think a better approach would be to put the part of speech tags into another attribute. For example, you can put them in TypeAttribute, which is not stored in the index by default. Then, if the user wants to store them in the index, they just add TypeAsPayloadTokenFilterFactory, which copies the type into the payload... but otherwise they would not be stored. -- Robert Muir rcm...@gmail.com T. Kuro Kurosaka, 415-227-9600x122, 617-386-7122(direct)
Re: TokenFilter that removes payload ?
Erik, On Sep 26, 2010, at 8:04 AM, Erick Erickson wrote: The reason I ask is that you had to put the payloads into the input in the first place, and they don't affect searching unless you want them to. So why do you want to remove them with a token filter? Our Tokenizer puts a part-of-speech tag into each Token as a payload. There is an accompanying TokenFilter that removes Tokens marked with a configurable set of part-of-speech tags later in the analysis chain. As I understand it, payloads go to the Lucene index. In most cases, the part-of-speech tags are not used if retrieved by the search applications. So they shouldn't go to the index. So I'd like to know if there is an existing TokenFilter that does this. Otherwise, I'd like to write one. T. Kuro Kurosaka
Re: TokenFilter that removes payload ?
Hmmm, why do you want to do this? I'm wondering if this is an XY problem (See:http://people.apache.org/~hossman/#xyproblem) The reason I ask is that you had to put the payloads into the input in the first place, and they don't affect searching unless you want them to. So why do you want to remove them with a token filter? Best Erick On Fri, Sep 24, 2010 at 1:32 AM, Teruhiko Kurosaka k...@basistech.comwrote: Is there an existing TokenFilter that simply removes payloads from the token stream? Teruhiko Kuro Kurosaka RLP + Lucene Solr = powerful search for global contents