We're planning to run some Ruby parsers on the fetched content from a Nutch crawl. It seems like the best way to do this would be through an interface like Hadoop's streaming.jar, but streaming.jar expects a line-based input format.
Has anyone written a version of streaming.jar for Nutch? We're working on one, so if you'd like to collaborate (or have any advice), please reply! Thanks, Chris -- Chris Anderson http://jchris.mfdz.com -- Chris Anderson http://jchris.mfdz.com
