Re: Problem with crawling using the latest 1.0 trunk

Tony Wang Mon, 02 Mar 2009 13:20:44 -0800

thanks Justin. the build #736 works flawlessly!

On Mon, Mar 2, 2009 at 1:34 PM, Justin Yao <[email protected]> wrote:


> Same problem here if using build #740 (Mar 2, 2009 4:01:53 AM)
> I switched to build #736 (Feb 26, 2009 4:01:15 AM) and it worked then.
>
> Justin
>
> Tony Wang wrote:
> > man, I have exactly the same problem with nutch 1.0 in the SVN trunk! I
> > wonder when the nutch team will release the official 1.0. really cannot
> > wait.
> >
> > On Mon, Mar 2, 2009 at 12:09 PM, ahammad <[email protected]> wrote:
> >
> >> I am aware that this is still a development version, but I need to test
> a
> >> few
> >> things with Nutch/Solr so I installed the latest dev version of Nutch
> 1.0.
> >>
> >> I tried running a crawl like I did with the working 0.9 version. From
> the
> >> log, it seems to fetch all the pages properly, but it fails at the
> >> indexing:
> >>
> >> CrawlDb update: starting
> >> CrawlDb update: db: kb/crawldb
> >> CrawlDb update: segments: [kb/segments/20090302135858]
> >> CrawlDb update: additions allowed: true
> >> CrawlDb update: URL normalizing: true
> >> CrawlDb update: URL filtering: true
> >> CrawlDb update: Merging segment data into db.
> >> CrawlDb update: done
> >> LinkDb: starting
> >> LinkDb: linkdb: kb/linkdb
> >> LinkDb: URL normalize: true
> >> LinkDb: URL filter: true
> >> LinkDb: adding segment:
> >> file:/c:/nutch-2009-03-02_04-01-53/kb/segments/20090302135757
> >> LinkDb: adding segment:
> >> file:/c:/nutch-2009-03-02_04-01-53/kb/segments/20090302135807
> >> LinkDb: adding segment:
> >> file:/c:/nutch-2009-03-02_04-01-53/kb/segments/20090302135858
> >> LinkDb: done
> >> Indexer: starting
> >> Exception in thread "main" java.io.IOException: Job failed!
> >>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
> >>        at org.apache.nutch.indexer.Indexer.index(Indexer.java:72)
> >>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:146)
> >>
> >>
> >> I took a look at all the configuration and as far as I can tell, I did
> the
> >> same thing with my 0.9 install. Could it be that I didn't install it
> >> properly? I unzipped it and ran ant and ant war in the root directory.
> >>
> >> Thanks
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Problem-with-crawling-using-the-latest-1.0-trunk-tp22294581p22294581.html
> >> Sent from the Nutch - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
>
>


-- 
Are you RCholic? www.RCholic.com
温 良 恭 俭 让 仁 义 礼 智 信

Re: Problem with crawling using the latest 1.0 trunk

Reply via email to