Richard Braman wrote:
I too have noticed menu text appearing in the search results.
The proper place to fix it would be in parse-html, perhaps in DOMContentUtils.
However, be warned that this is definitely NOT trivial - i.e. it doesn't say in pages "this is menu, this is body text", you have to figure it out, and it's hard to come up with a method that works for any layout. You may hardcode something that works well for your target group of hosts, with pre-determined page layouts.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
