We're planning to run some Ruby parsers on the fetched content from a
Nutch crawl. It seems like the best way to do this would be through an
interface like Hadoop's streaming.jar, but streaming.jar expects a
line-based input format.

Has anyone written a version of streaming.jar for Nutch? We're working
on one, so if you'd like to collaborate (or have any advice), please
reply!

Thanks,
Chris

--
Chris Anderson
http://jchris.mfdz.com



-- 
Chris Anderson
http://jchris.mfdz.com

Reply via email to