On Fri, Aug 21, 2009 at 4:20 AM, Julien
Nioche<[email protected]> wrote:
> ou'll need to write a custom parser implementing HtmlParseFilter and get it
> to store the keywords found in the Metadata, then write a custom Indexer.
>
> By default the HTML parser does not do anything about meta tags.

That's unfortunate, because org.apache.nutch.parse.html.HtmlParser
actually extracts all the meta tags, and then takes a few and throws
the rest away.  It's mildly annoying that I'm going to have to
re-implement all of HtmlParser just to add two lines to take that data
out of "metaTags" and put it in "content.getMetaData()".

-- 
http://www.linkedin.com/in/paultomblin

Reply via email to