Richard Braman wrote:
I too have noticed menu text appearing in the search results.

The proper place to fix it would be in parse-html, perhaps in DOMContentUtils.

However, be warned that this is definitely NOT trivial - i.e. it doesn't say in pages "this is menu, this is body text", you have to figure it out, and it's hard to come up with a method that works for any layout. You may hardcode something that works well for your target group of hosts, with pre-determined page layouts.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to