On 5/12/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
Is your expectation that for situations like Trey's "remove all dups,
regardless of position" type TokenFilter, the implimentation of process
would be such that it compltely emptied out the input queue, writing all
non-dups to the output queue (effectively insuring that it will only ever
be called once)
ie...

cladd RemoveAllDups extends BufferedTokenStream {
   Set<String> m = new HashSet<String>(23);
   public Token process(Token t) throws IOException {
     while (null != t) {
       if (! m.contains(t.termText()))
          write(t);
       m.add(t.termText());
       t = read();
     }
     return null;
   }
}


That would work, but in that case, the buffering isn't needed.  If you
wanted to use BufferedTokenStream for some reason it would be:

cladd RemoveAllDupsRegardlessOfPosition extends BufferedTokenStream {
    Set<String> m = new HashSet<String>(23);
   public Token process(Token t) throws IOException {
     if (!m.contains(t.termText()) {
       m.add(t.termText)
       return t;
     } else return null
   }}

One might as well just throw in a for loop and not use
BufferedTokenStream for that (rather unique) case.

   /**
    * This method is garunteed to be called at least once after the input
    * stream is  exhausted, but before the output stream is exhausted.
    * By default it is a No-Op.
    */
   protected void done() throw IOException { /* NOOP*/ }

+1 for the general idea of flushing state (but I'm not sure how hard
implementing the exact semantics w.r.t. "before the output stream is
exhausted").

-Yonik

Reply via email to