Hi,Anand.
You wrote 29 сентября 2006 г., 18:22:39: > I am new to nutch and am trying to see if we can use it for web search > functionality. > I am running the site on my local box on a Weblogic server. I am using > nutch 0.8.1 on Windows XP using cygwin. > I created a "urls" directory and then created a file called "frontend" in > that directory > The local url that I have specified in that file is > http://172.16.10.99:7001/frontend/ > <http://172.16.10.99:7001/frontend/> > This is the only line in that file. > I have also changed the crawl-urlfilter file as follows > # accept hosts in MY.DOMAIN.NAME > +^http://172.16.10.99:7001/frontend/ this is baaadd. remove this string from file. and then copy urls from "frented" into crawl-urlfilter file directly after this # accept hosts in MY.DOMAIN.NAME . remove +. from end of file and write -. > The command I am executing is > bin/nutch crawl urls -dir _crawloutput -depth 3 -topN 50 > The crawl output I get is as follows: > crawl started in: _crawloutput > rootUrlDir = urls > threads = 10 > depth = 3 > topN = 50 > Injector: starting > Injector: crawlDb: _crawloutput/crawldb > Injector: urlDir: urls > Injector: Converting injected urls to crawl db entries. > Injector: Merging injected urls into crawl db. > Injector: done > Generator: starting > Generator: segment: _crawloutput/segments/20060929101916 > Generator: Selecting best-scoring urls due for fetch. > Generator: Partitioning selected urls by host, for politeness. > Generator: done. > Fetcher: starting > Fetcher: segment: _crawloutput/segments/20060929101916 > Fetcher: threads: 10 > fetching http://172.16.10.99:7001/frontend/ > <http://172.16.10.99:7001/frontend/> > fetch of http://172.16.10.99:7001/frontend/ > <http://172.16.10.99:7001/frontend/> failed with: > java.lang.NullPointerException > Fetcher: done > CrawlDb update: starting > CrawlDb update: db: _crawloutput/crawldb > CrawlDb update: segment: _crawloutput/segments/20060929101916 > CrawlDb update: Merging segment data into db. > CrawlDb update: done > Generator: starting > Generator: segment: _crawloutput/segments/20060929101924 > Generator: Selecting best-scoring urls due for fetch. > Generator: Partitioning selected urls by host, for politeness. > Generator: done. > Fetcher: starting > Fetcher: segment: _crawloutput/segments/20060929101924 > Fetcher: threads: 10 > fetching http://172.16.10.99:7001/frontend/ > <http://172.16.10.99:7001/frontend/> > fetch of http://172.16.10.99:7001/frontend/ > <http://172.16.10.99:7001/frontend/> failed with: > java.lang.NullPointerException > Fetcher: done > CrawlDb update: starting > CrawlDb update: db: _crawloutput/crawldb > CrawlDb update: segment: _crawloutput/segments/20060929101924 > CrawlDb update: Merging segment data into db. > CrawlDb update: done > Generator: starting > Generator: segment: _crawloutput/segments/20060929101932 > Generator: Selecting best-scoring urls due for fetch. > Generator: Partitioning selected urls by host, for politeness. > Generator: done. > Fetcher: starting > Fetcher: segment: _crawloutput/segments/20060929101932 > Fetcher: threads: 10 > fetching http://172.16.10.99:7001/frontend/ > <http://172.16.10.99:7001/frontend/> > fetch of http://172.16.10.99:7001/frontend/ > <http://172.16.10.99:7001/frontend/> failed with: > java.lang.NullPointerException > Fetcher: done > CrawlDb update: starting > CrawlDb update: db: _crawloutput/crawldb > CrawlDb update: segment: _crawloutput/segments/20060929101932 > CrawlDb update: Merging segment data into db. > CrawlDb update: done > LinkDb: starting > LinkDb: linkdb: _crawloutput/linkdb > LinkDb: adding segment: _crawloutput/segments/20060929101916 > LinkDb: adding segment: _crawloutput/segments/20060929101924 > LinkDb: adding segment: _crawloutput/segments/20060929101932 > LinkDb: done > Indexer: starting > Indexer: linkdb: _crawloutput/linkdb > Indexer: adding segment: _crawloutput/segments/20060929101916 > Indexer: adding segment: _crawloutput/segments/20060929101924 > Indexer: adding segment: _crawloutput/segments/20060929101932 > Optimizing index. > Indexer: done > Dedup: starting > Dedup: adding indexes in: _crawloutput/indexes > Dedup: done > Adding _crawloutput/indexes/part-00000 > crawl finished: _crawloutput > I am not sure what I am doing wrong. Can someone help? > Thanks > Anand Narayan > __________ NOD32 1.1783 (20060929) Information __________ > This message was checked by NOD32 antivirus system. > http://www.eset.com -- Regards, Dima mailto:[EMAIL PROTECTED] ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
