Doug,

DC> Sorry, I've been away from Nutch email for a few days.

Absolutely not a problem, I've been busy as well.

[snip]
DC> HTML.  The simplest workaround might be to use, e.g., a SAX-based parser

Yeah, I realized I could decode the summaries, but it seemed like a
small refactoring that would be useful for other people anyway (think
of user interfaces if you wanted to run Nutch as a local engine with a
visual GUI components).

DC> A longer term fix might be to add an option to construct summaries
DC> directly as plain text.  This might be done as follows:
DC>    - replace Summary.toString() with both toText() and toHtml() methods;
DC>    - replace HitSummarizer.getSummary() with getHtmlSummary() and 
DC> getTextSummary().  This would require changes to NutchBean, 
DC> DistributedSearch, etc.

I think the latter approach provides a cleaner solution. I will
implement it and provide a patch to this list.

Thanks,
Dawid



-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to