Re: Streaming.jar for Nutch?

David Grandinetti Wed, 11 Jun 2008 15:07:46 -0700

Chris,

I'm not sure I understand completely, but I would try to write aparser plugin that pipes content to an external ruby process...or evenjust use JRuby. This way would keep you from having to worry about thecomplexities of interacting with Hadoop directly.

What kind of ruby parsing are you looking to do? I had considereddoing the same thing to parse and sanitize news feeds.


-dave

--
david grandinetti
ideas for food and code


On Jun 11, 2008, at 16:46, "Chris Anderson" <[EMAIL PROTECTED]> wrote:

We're planning to run some Ruby parsers on the fetched content from a
Nutch crawl. It seems like the best way to do this would be through an
interface like Hadoop's streaming.jar, but streaming.jar expects a
line-based input format.

Has anyone written a version of streaming.jar for Nutch? We're working
on one, so if you'd like to collaborate (or have any advice), please
reply!

Thanks,
Chris

--
Chris Anderson
http://jchris.mfdz.com



--
Chris Anderson
http://jchris.mfdz.com

Re: Streaming.jar for Nutch?

Reply via email to