Hi Eric

You'll need to modify the class o.a.n.crawl.Injector in order to do that and
replace the first map-reduce job in order to generate a sequencefile of
crawldatum objects straight from Oracle. The second mapred job should work
as is.

J.
-- 
DigitalPebble Ltd
http://www.digitalpebble.com

On 25 May 2010 13:46, eric park <[email protected]> wrote:

> Hello guys,
>
> I'm trying to get rid of the url injection text file and read the starting
> urls from oracle database.
> It seems that nutch is integrated tightly with hadoop, and I cannot find
> the
> way to modify this mechanism.
> Anyone tried a similar modification?
>
> Thank you.
>

Reply via email to