Hi Eric You'll need to modify the class o.a.n.crawl.Injector in order to do that and replace the first map-reduce job in order to generate a sequencefile of crawldatum objects straight from Oracle. The second mapred job should work as is.
J. -- DigitalPebble Ltd http://www.digitalpebble.com On 25 May 2010 13:46, eric park <[email protected]> wrote: > Hello guys, > > I'm trying to get rid of the url injection text file and read the starting > urls from oracle database. > It seems that nutch is integrated tightly with hadoop, and I cannot find > the > way to modify this mechanism. > Anyone tried a similar modification? > > Thank you. >

