Help! Issue with tokens in custom synonym filter

Lajos Mon, 31 Aug 2009 07:33:54 -0700

Hi all,

I've been writing some custom synonym filters and have run into an issuewith returning a list of tokens. I have a synonym filter that uses theWordNet database to extract synonyms. My problem is how to define theoffsets and position increments in the new Tokens I'm returning.

For an input token, I get a list of synonyms from the WordNet database.I then create a List<Token> of those results. Each Token is created withthe same startOffset, endOffset and positionIncrement of the inputToken. Is this correct? My understanding from looking at the Lucenecodebase is that the startOffset/endOffset should be the same, as we arereferring to the same term in the original text. However, I don't quiteget the positionIncrement. I understand that it is relative to theprevious term ... does this mean all my synonyms should have apositionIncrement of 0? But whether I use 0 or the positionIncrement ofthe original input Token, Solr seems to ignore the returned tokens ...


This is a summary of what is in my filter:

*************************************************

private Iterator<Token> output;
private ArrayList<Token> synonyms = null;

public Token next(Token in) throws IOException {
  if (output != null) {
    // Here we are just outputing matched synonyms
    // that we previously created from the input token
    // The input token has already been returned
    if (output.hasNext()) {
      return output.next();
    } else {
      return null;
    }
  }

  synonyms = new ArrayList<Token>();

  Token t = input.next(in);
  if (t == null) return null;

  String value = new String(t.termBuffer(), 0,
    t.termLength()).toLowerCase();

  // Get list of WordNet synonyms (code removed)
  // Iterate thru WordNet synonyms
  for (String wordNetSyn : wordNetSyns) {

Token synonym = new Token(t.startOffset(), t.endOffset(),t.type()); synonym.setPositionIncrement(t.getPositionIncrement());

    synonym.setTermBuffer(wordNetSyn .toCharArray(), 0,
      wordNetSyn .length());
    synonyms.add(synonym);
  }

  output = synonyms.iterator();

  // Return the original word, we want it
  return t;
}

Help! Issue with tokens in custom synonym filter

Reply via email to