Hi, I think you are maybe getting tangled here. Please see the following tutorial for Nutch 1.3 [1]
Please also note that the URL you provided is the old Nutch site and now redirects to http://nutch.apache.org [1] http://wiki.apache.org/nutch/RunningNutchAndSolr On Tue, Jul 12, 2011 at 5:23 PM, Sethi, Parampreet < [email protected]> wrote: > Thanks for updating the tutorial. I tried my setup, the crawl command is > running. But none of the pages are being crawled. > I created urls directory inside local folder and added new file nutch with > url in the same as mentioned in tutorial. > > (I also tried file named urls inside nutch/runtime/local diretcory. The > contents of urls file is http://lucene.apache.org/nutch/ ) > > Here's the log: > > us137390:local parampreetsethi$ bin/nutch crawl urls -dir crawl -depth 3 > -topN 50 > solrUrl is not set, indexing will be skipped... > crawl started in: crawl > rootUrlDir = urls > threads = 10 > depth = 3 > solrUrl=null > topN = 50 > Injector: starting at 2011-07-12 12:22:12 > Injector: crawlDb: crawl/crawldb > Injector: urlDir: urls > Injector: Converting injected urls to crawl db entries. > Injector: Merging injected urls into crawl db. > Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03 > Generator: starting at 2011-07-12 12:22:15 > Generator: Selecting best-scoring urls due for fetch. > Generator: filtering: true > Generator: normalizing: true > Generator: topN: 50 > Generator: jobtracker is 'local', generating exactly one partition. > Generator: 0 records selected for fetching, exiting ... > Stopping at depth=0 - no more URLs to fetch. > No URLs to fetch - check your seed list and URL filters. > crawl finished: crawl > > > Please help. > > Thanks > Param > > On 7/12/11 5:52 AM, "Julien Nioche" <[email protected]> wrote: > > > On 12 July 2011 10:30, Julien Nioche <[email protected]> > wrote: > > > >> > >> > >>>>> There seems to be no crawl-urlfilter file indeed. Don't know why it's > >>>>> gone since > >>>>> the crawl command is still there. You can find the file in the 1.2 > >>>>> release: > http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/ > >>>> > >>>> Crawl-urlfilter has been removed purposefully as it did not add > >>> anything > >>>> to the other url filters (automaton | regex) in terms of > functionality. > >>> By > >>>> default the urlfilters contain (+.) which IIRC was what the > >>>> Crawl-urlfilter used to do. > >>>> > >>> > >>> That's reasonable. But now news users are unaware and don't know what > to > >>> do > >>> with this error message. > >>> > >> > >> Yep, the tutorial needs updating indeed > >> > > > > done > > > > > >> > >> > >> > >>> > >>>>>> Thanks for a quick reply. > >>>>>> > >>>>>> I searched in the nutch directory but still do not see that file :(. > >>>>> > >>>>> Here's > >>>>> > >>>>>> complete file list inside runtime/local/conf directory. > >>>>>> > >>>>>> us137390:conf parampreetsethi$ pwd > >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf > >>>>>> us137390:conf parampreetsethi$ ls -t > >>>>>> automaton-urlfilter.txt domain-urlfilter.txt nutch-default.xml > >>>>>> prefix-urlfilter.txt solrindex-mapping.xml > >>>>>> configuration.xsl httpclient-auth.xml nutch-site.xml > >>>>>> regex-normalize.xml subcollections.xml > >>>>>> domain-suffixes.xml log4j.properties parse-plugins.dtd > >>>>>> regex-urlfilter.txt suffix-urlfilter.txt > >>>>>> domain-suffixes.xsd nutch-conf.xsl parse-plugins.xml > >>>>>> schema.xml tika-mimetypes.xml > >>>>>> > >>>>>> By the way, I tried deploying the code by checking out from svn > >>>>> > >>>>> repository, > >>>>> > >>>>>> but could not build it. I was getting following error: > >>>>>> > >>>>>> resolve-default: > >>>>> > >>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 :: > >>>>> http://ant.apache.org/ivy/ > >>>>> > >>>>>> :: [ivy:resolve] :: loading settings :: file = > >>>>>> > >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml > >>>>>> [ivy:resolve] > >>>>>> [ivy:resolve] :: problems summary :: > >>>>>> [ivy:resolve] :::: WARNINGS > >>>>>> [ivy:resolve] module not found: > >>>>>> org.apache.gora#gora-core;0.2-incubating > >>>>>> [ivy:resolve] ==== local: tried > >>>>>> [ivy:resolve] > >>>>> > >>>>> > >>> > /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati > >>>>> ng > >>>>> > >>>>>> / ivys/ivy.xml > >>>>>> [ivy:resolve] -- artifact > >>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar: > >>>>>> [ivy:resolve] > >>>>> > >>>>> > >>> > /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati > >>>>> ng > >>>>> > >>>>>> / jars/gora-core.jar > >>>>>> [ivy:resolve] module not found: > >>>>>> org.apache.gora#gora-sql;0.2-incubating > >>>>>> [ivy:resolve] ==== local: tried > >>>>>> [ivy:resolve] > >>>>> > >>>>> > >>> > /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin > >>>>> g/ > >>>>> > >>>>>> i vys/ivy.xml > >>>>>> [ivy:resolve] -- artifact > >>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar: > >>>>>> [ivy:resolve] > >>>>> > >>>>> > >>> > /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin > >>>>> g/ > >>>>> > >>>>>> j ars/gora-sql.jar > >>>>>> [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: > >>>>>> [ivy:resolve] :: UNRESOLVED DEPENDENCIES :: > >>>>>> [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: > >>>>>> [ivy:resolve] :: org.apache.gora#gora-core;0.2-incubating: > >>> not > >>>>>> found [ivy:resolve] :: > >>> org.apache.gora#gora-sql;0.2-incubating: > >>>>>> not found [ivy:resolve] > >>>>>> > >>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] > >>>>>> > >>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS > >>>>>> > >>>>>> BUILD FAILED > >>>>> > >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458: > >>>>> impossible > >>>>> > >>>>>> to resolve dependencies: > >>>>>> resolve failed - see output for details > >>>>>> > >>>>>> -param > >>>>>> > >>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <[email protected]> > >>>>> > >>>>> wrote: > >>>>>>> Look down a little further for the > >>>>>>> > >>>>>>> or > >>>>>>> runtime/local/bin/nutch (version >= 1.3) > >>>>>>> > >>>>>>> If you download the bin then it's in the runtime directory. > >>>>>>> > >>>>>>> Jerry E. Craig, Jr. > >>>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Sethi, Parampreet [mailto:[email protected]] > >>>>>>> Sent: Monday, July 11, 2011 2:51 PM > >>>>>>> To: [email protected] > >>>>>>> Subject: Nutch Novice help > >>>>>>> > >>>>>>> Hi All, > >>>>>>> > >>>>>>> Sorry for such a naïve question, I downloaded nutch 1.3 binary > >>> today > >>>>> > >>>>> and > >>>>> > >>>>>>> trying to set it up as mentioned in Tutorial at > >>>>>>> http://wiki.apache.org/nutch/NutchTutorial > >>>>>>> > >>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf > >>>>> > >>>>> directory. > >>>>> > >>>>>>> Is there any other place where I should look for this file? > >>>>>>> > >>>>>>> Thanks > >>>>>>> Param > >>> > >> > >> > >> > >> -- > >> * > >> *Open Source Solutions for Text Engineering > >> > >> http://digitalpebble.blogspot.com/ > >> http://www.digitalpebble.com > >> > > > > > > -- *Lewis*

