On Mon, 17 Jan 2005 16:17:46 +0100, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > .... Or we could provide a separate hook to call some other type of filter, > let's say ExtendedContentFilter, after the Content has been parsed: > > Content filter(Content content, Parse parse); > > This approach has also the benefit that you could replace the original > content with something more suitable for web interface preview (e.g. > replace PDF with HTML - currently Nutch doesn't allow you out-of-the-box > to view cached copies of non-html formats).
Andrej, I think this is a great idea. The ContentFilter interface would be much more useful if the parsed data was available for analysis too. I'd suggest keeping the interface very simple -- perhaps the above signature is all that's needed. If a given filter doesn't care about Parse data, it can ignore it. However, I'm not sure about content-transforming filters. Wouldn't you want to get both Content and Parse back from filter() if this was the goal? ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
