On Mon, 17 Jan 2005 16:17:46 +0100, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> .... Or we could provide a separate hook to call some other type of filter,
> let's say ExtendedContentFilter, after the Content has been parsed:
> 
>         Content filter(Content content, Parse parse);
> 
> This approach has also the benefit that you could replace the original
> content with something more suitable for web interface preview (e.g.
> replace PDF with HTML - currently Nutch doesn't allow you out-of-the-box
> to view cached copies of non-html formats).

Andrej, I think this is a great idea. The ContentFilter interface
would be much more useful if the parsed data was available for
analysis too. I'd suggest keeping the interface very simple -- perhaps
the above signature is all that's needed. If a given filter doesn't
care about Parse data, it can ignore it.

However, I'm not sure about content-transforming filters. Wouldn't you
want to get both Content and Parse back from filter() if this was the
goal?


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to