Doug, DC> Sorry, I've been away from Nutch email for a few days.
Absolutely not a problem, I've been busy as well. [snip] DC> HTML. The simplest workaround might be to use, e.g., a SAX-based parser Yeah, I realized I could decode the summaries, but it seemed like a small refactoring that would be useful for other people anyway (think of user interfaces if you wanted to run Nutch as a local engine with a visual GUI components). DC> A longer term fix might be to add an option to construct summaries DC> directly as plain text. This might be done as follows: DC> - replace Summary.toString() with both toText() and toHtml() methods; DC> - replace HitSummarizer.getSummary() with getHtmlSummary() and DC> getTextSummary(). This would require changes to NutchBean, DC> DistributedSearch, etc. I think the latter approach provides a cleaner solution. I will implement it and provide a patch to this list. Thanks, Dawid ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
