modifying parse implementation

Cam Bazz Fri, 15 Jul 2011 17:21:46 -0700

Hello,

In my quest to create a custom parser, I have modified parseimpl to
hold another ParseText called features, such as:


  public ParseImpl(String text, String features, ParseData data) {
    this(new ParseText(text), new ParseText(features), data, true);
  }

  public ParseImpl(ParseText text, ParseText features, ParseData data,
boolean isCanonical) {
    this.text = text;
    this.data = data;
    this.features = features;
    this.isCanonical = isCanonical;
  }

  public String getFeatures() {
        return this.features.getText();
  }


and although I create the parseImpl like

ParseResult parseResult =
ParseResult.createParseResult(content.getUrl(), new ParseImpl(text,
features, parseData));

in the HtmlParser.java

I get an error when indexing if I do parse.getFeatures() -
parse.getText() will return the correct text, but if I call
parse.getFeatures() in index-basic plugin I get:

SolrIndexer: starting at 2011-07-16 03:06:54
java.io.IOException: Job failed!


I am getting a much better understanding of how nutch works. I dont
think my approach of butchering HtmlParser and ParseImpl is the best,
and I am sure all these can be put inside a another plugin.

Best Regards,
C.B.

modifying parse implementation

Reply via email to