Re: Token filter on multivalue field

2009-06-06 Thread David Giffin
I'm doing a combination of update processor and token filter. The
token filter is necessary to reduce the duplicates after stemming has
occurred.

David

2009/6/4 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 isn't better to use an UpdateProcessor  for this?

 On Thu, Jun 4, 2009 at 1:52 AM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:

 Hello,

 It's ugly, but the first thing that came to mind was ThreadLocal.

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: David Giffin da...@giffin.org
 To: solr-user@lucene.apache.org
 Sent: Wednesday, June 3, 2009 1:57:42 PM
 Subject: Token filter on multivalue field

 Hi There,

 I'm working on a unique token filter, to eliminate duplicates on a
 multivalue field. My filter works properly for a single value field.
 It seems that a new TokenFilter is created for each value in the
 multivalue field. I need to maintain an array of used tokens across
 all of the values in the multivalue field. Is there a good way to do
 this? Here is my current code:

 public class UniqueTokenFilter extends TokenFilter {

     private ArrayList words;
     public UniqueTokenFilter(TokenStream input) {
         super(input);
         this.words = new ArrayList();
     }

     @Override
     public final Token next(Token in) throws IOException {
         for (Token token=input.next(in); token!=null; token=input.next()) {
             if ( !words.contains(token.term()) ) {
                 words.add(token.term());
                 return token;
             }
         }
         return null;
     }
 }

 Thanks,
 David





 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



Re: Token filter on multivalue field

2009-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्
isn't better to use an UpdateProcessor  for this?

On Thu, Jun 4, 2009 at 1:52 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Hello,

 It's ugly, but the first thing that came to mind was ThreadLocal.

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: David Giffin da...@giffin.org
 To: solr-user@lucene.apache.org
 Sent: Wednesday, June 3, 2009 1:57:42 PM
 Subject: Token filter on multivalue field

 Hi There,

 I'm working on a unique token filter, to eliminate duplicates on a
 multivalue field. My filter works properly for a single value field.
 It seems that a new TokenFilter is created for each value in the
 multivalue field. I need to maintain an array of used tokens across
 all of the values in the multivalue field. Is there a good way to do
 this? Here is my current code:

 public class UniqueTokenFilter extends TokenFilter {

     private ArrayList words;
     public UniqueTokenFilter(TokenStream input) {
         super(input);
         this.words = new ArrayList();
     }

     @Override
     public final Token next(Token in) throws IOException {
         for (Token token=input.next(in); token!=null; token=input.next()) {
             if ( !words.contains(token.term()) ) {
                 words.add(token.term());
                 return token;
             }
         }
         return null;
     }
 }

 Thanks,
 David





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Token filter on multivalue field

2009-06-03 Thread Otis Gospodnetic

Hello,

It's ugly, but the first thing that came to mind was ThreadLocal.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: David Giffin da...@giffin.org
 To: solr-user@lucene.apache.org
 Sent: Wednesday, June 3, 2009 1:57:42 PM
 Subject: Token filter on multivalue field
 
 Hi There,
 
 I'm working on a unique token filter, to eliminate duplicates on a
 multivalue field. My filter works properly for a single value field.
 It seems that a new TokenFilter is created for each value in the
 multivalue field. I need to maintain an array of used tokens across
 all of the values in the multivalue field. Is there a good way to do
 this? Here is my current code:
 
 public class UniqueTokenFilter extends TokenFilter {
 
 private ArrayList words;
 public UniqueTokenFilter(TokenStream input) {
 super(input);
 this.words = new ArrayList();
 }
 
 @Override
 public final Token next(Token in) throws IOException {
 for (Token token=input.next(in); token!=null; token=input.next()) {
 if ( !words.contains(token.term()) ) {
 words.add(token.term());
 return token;
 }
 }
 return null;
 }
 }
 
 Thanks,
 David