On 5/29/07, Ian Holsman <[EMAIL PROTECTED]> wrote:

...What I was planning to do was use the nutch tool to fetch the URL data
into segments, and then write a custom tool to extract the HTML out of
the segment and run it through my code, similar to what the 'crawl'
does, but dumping the metrics into a mysql DB.

Is this similar to what you guys had in mind with Tika?...

I think so, the "extract the HTML" part would be a standard Tika
plugin, and your metrics stuff would be a custom plugin.

-Bertrand

Reply via email to