Re: TokenFilter that removes payload ?

2010-09-27 Thread Teruhiko Kurosaka
Robert  Erik,
I appreciate your suggestions but we use Type for other purpose.
Also, the product is out and we can't change the design so easily.

So it seems the conclusion there is no such TokenFilter.
I'll write one.

Thanks.

On Sep 27, 2010, at 1:00 PM, Robert Muir wrote:

 On Sun, Sep 26, 2010 at 11:49 PM, Teruhiko Kurosaka k...@basistech.comwrote:
 
 
 As I understand it, payloads go to the Lucene index.
 In most cases, the part-of-speech tags are not used if
 retrieved by the search applications.  So they shouldn't
 go to the index.  So I'd like to know if there is an
 existing TokenFilter that does this.  Otherwise, I'd like
 to write one.
 
 
 I agree with Erick, I think a better approach would be to put the part of
 speech tags into another attribute.
 
 For example, you can put them in TypeAttribute, which is not stored in the
 index by default.
 Then, if the user wants to store them in the index, they just add
 TypeAsPayloadTokenFilterFactory, which copies the type into the payload...
 but otherwise they would not be stored.
 
 -- 
 Robert Muir
 rcm...@gmail.com


T. Kuro Kurosaka, 415-227-9600x122, 617-386-7122(direct)





Re: TokenFilter that removes payload ?

2010-09-26 Thread Teruhiko Kurosaka
Erik,

On Sep 26, 2010, at 8:04 AM, Erick Erickson wrote:

 The reason I ask is that you had to put the payloads into the
 input in the first place, and they don't affect searching unless
 you want them to. So why do you want to remove them
 with a token filter?

Our Tokenizer puts a part-of-speech tag into each Token
as a payload.  There is an accompanying TokenFilter that 
removes Tokens marked with a configurable set of 
part-of-speech tags later in the analysis chain.  

As I understand it, payloads go to the Lucene index.
In most cases, the part-of-speech tags are not used if
retrieved by the search applications.  So they shouldn't
go to the index.  So I'd like to know if there is an
existing TokenFilter that does this.  Otherwise, I'd like
to write one.

T. Kuro Kurosaka





Re: TokenFilter that removes payload ?

2010-09-25 Thread Erick Erickson
Hmmm, why do you want to do this? I'm wondering if this
is an XY problem (See:http://people.apache.org/~hossman/#xyproblem)

The reason I ask is that you had to put the payloads into the
input in the first place, and they don't affect searching unless
you want them to. So why do you want to remove them
with a token filter?

Best
Erick

On Fri, Sep 24, 2010 at 1:32 AM, Teruhiko Kurosaka k...@basistech.comwrote:

 Is there an existing TokenFilter that simply removes
 payloads from the token stream?
 
 Teruhiko Kuro Kurosaka
 RLP + Lucene  Solr = powerful search for global contents