Hello, I did not understand ParseData.parseData -
In ParseData there are getContentMeta and getParseMeta There is also a getMeta(String string) - it appears that there is no setter for this. There is also setParseMeta, but it appears content meta is not settable. Best Regards, C.B. On Sat, Jul 16, 2011 at 3:43 AM, Joye <[email protected]> wrote: > Hello, > > Because the ParseImpl implements the interface of Writable and it will be > serialized and deserialized when transferring among namenode and datanodes > in hadoop. So, if you add a property in any class implements "Writable", you > should add the read and write code for the new property in read and write > functions of ParseImpl class, which tells nutch how to do when serializing > and deserializing ParseImpl class. > > P.S. For the "features" is a string, so you could put it into > ParseData.parseData (it's Map structure), without any changes in base > classes of nutch. > > Regards, > Joey > > > On 07/16/2011 08:21 AM, Cam Bazz wrote: >> >> Hello, >> >> In my quest to create a custom parser, I have modified parseimpl to >> hold another ParseText called features, such as: >> >> public ParseImpl(String text, String features, ParseData data) { >> this(new ParseText(text), new ParseText(features), data, true); >> } >> >> public ParseImpl(ParseText text, ParseText features, ParseData data, >> boolean isCanonical) { >> this.text = text; >> this.data = data; >> this.features = features; >> this.isCanonical = isCanonical; >> } >> >> public String getFeatures() { >> return this.features.getText(); >> } >> >> >> and although I create the parseImpl like >> >> ParseResult parseResult = >> ParseResult.createParseResult(content.getUrl(), new ParseImpl(text, >> features, parseData)); >> >> in the HtmlParser.java >> >> I get an error when indexing if I do parse.getFeatures() - >> parse.getText() will return the correct text, but if I call >> parse.getFeatures() in index-basic plugin I get: >> >> SolrIndexer: starting at 2011-07-16 03:06:54 >> java.io.IOException: Job failed! >> >> >> I am getting a much better understanding of how nutch works. I dont >> think my approach of butchering HtmlParser and ParseImpl is the best, >> and I am sure all these can be put inside a another plugin. >> >> Best Regards, >> C.B. > >

