Hello,

I did not understand ParseData.parseData -

In ParseData there are getContentMeta and getParseMeta

There is also a getMeta(String string) - it appears that there is no
setter for this.

There is also setParseMeta, but it appears content meta is not settable.

Best Regards,
C.B.




On Sat, Jul 16, 2011 at 3:43 AM, Joye <[email protected]> wrote:
> Hello,
>
> Because the ParseImpl implements the interface of Writable and it will be
> serialized and deserialized when transferring among namenode and datanodes
> in hadoop. So, if you add a property in any class implements "Writable", you
> should add the read and write code for the new property in read and write
> functions of ParseImpl class, which tells nutch how to do when serializing
> and deserializing ParseImpl class.
>
> P.S. For the "features" is a string, so you could put it into
> ParseData.parseData (it's Map structure), without any changes in base
> classes of nutch.
>
> Regards,
> Joey
>
>
> On 07/16/2011 08:21 AM, Cam Bazz wrote:
>>
>> Hello,
>>
>> In my quest to create a custom parser, I have modified parseimpl to
>> hold another ParseText called features, such as:
>>
>>   public ParseImpl(String text, String features, ParseData data) {
>>     this(new ParseText(text), new ParseText(features), data, true);
>>   }
>>
>>   public ParseImpl(ParseText text, ParseText features, ParseData data,
>> boolean isCanonical) {
>>     this.text = text;
>>     this.data = data;
>>     this.features = features;
>>     this.isCanonical = isCanonical;
>>   }
>>
>>   public String getFeatures() {
>>         return this.features.getText();
>>   }
>>
>>
>> and although I create the parseImpl like
>>
>> ParseResult parseResult =
>> ParseResult.createParseResult(content.getUrl(), new ParseImpl(text,
>> features, parseData));
>>
>> in the HtmlParser.java
>>
>> I get an error when indexing if I do parse.getFeatures() -
>> parse.getText() will return the correct text, but if I call
>> parse.getFeatures() in index-basic plugin I get:
>>
>> SolrIndexer: starting at 2011-07-16 03:06:54
>> java.io.IOException: Job failed!
>>
>>
>> I am getting a much better understanding of how nutch works. I dont
>> think my approach of butchering HtmlParser and ParseImpl is the best,
>> and I am sure all these can be put inside a another plugin.
>>
>> Best Regards,
>> C.B.
>
>

Reply via email to