Thanks for updating the tutorial. I tried my setup, the crawl command is running. But none of the pages are being crawled. I created urls directory inside local folder and added new file nutch with url in the same as mentioned in tutorial.
(I also tried file named urls inside nutch/runtime/local diretcory. The contents of urls file is http://lucene.apache.org/nutch/ ) Here's the log: us137390:local parampreetsethi$ bin/nutch crawl urls -dir crawl -depth 3 -topN 50 solrUrl is not set, indexing will be skipped... crawl started in: crawl rootUrlDir = urls threads = 10 depth = 3 solrUrl=null topN = 50 Injector: starting at 2011-07-12 12:22:12 Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03 Generator: starting at 2011-07-12 12:22:15 Generator: Selecting best-scoring urls due for fetch. Generator: filtering: true Generator: normalizing: true Generator: topN: 50 Generator: jobtracker is 'local', generating exactly one partition. Generator: 0 records selected for fetching, exiting ... Stopping at depth=0 - no more URLs to fetch. No URLs to fetch - check your seed list and URL filters. crawl finished: crawl Please help. Thanks Param On 7/12/11 5:52 AM, "Julien Nioche" <[email protected]> wrote: > On 12 July 2011 10:30, Julien Nioche <[email protected]> wrote: > >> >> >>>>> There seems to be no crawl-urlfilter file indeed. Don't know why it's >>>>> gone since >>>>> the crawl command is still there. You can find the file in the 1.2 >>>>> release: http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/ >>>> >>>> Crawl-urlfilter has been removed purposefully as it did not add >>> anything >>>> to the other url filters (automaton | regex) in terms of functionality. >>> By >>>> default the urlfilters contain (+.) which IIRC was what the >>>> Crawl-urlfilter used to do. >>>> >>> >>> That's reasonable. But now news users are unaware and don't know what to >>> do >>> with this error message. >>> >> >> Yep, the tutorial needs updating indeed >> > > done > > >> >> >> >>> >>>>>> Thanks for a quick reply. >>>>>> >>>>>> I searched in the nutch directory but still do not see that file :(. >>>>> >>>>> Here's >>>>> >>>>>> complete file list inside runtime/local/conf directory. >>>>>> >>>>>> us137390:conf parampreetsethi$ pwd >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf >>>>>> us137390:conf parampreetsethi$ ls -t >>>>>> automaton-urlfilter.txt domain-urlfilter.txt nutch-default.xml >>>>>> prefix-urlfilter.txt solrindex-mapping.xml >>>>>> configuration.xsl httpclient-auth.xml nutch-site.xml >>>>>> regex-normalize.xml subcollections.xml >>>>>> domain-suffixes.xml log4j.properties parse-plugins.dtd >>>>>> regex-urlfilter.txt suffix-urlfilter.txt >>>>>> domain-suffixes.xsd nutch-conf.xsl parse-plugins.xml >>>>>> schema.xml tika-mimetypes.xml >>>>>> >>>>>> By the way, I tried deploying the code by checking out from svn >>>>> >>>>> repository, >>>>> >>>>>> but could not build it. I was getting following error: >>>>>> >>>>>> resolve-default: >>>>> >>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 :: >>>>> http://ant.apache.org/ivy/ >>>>> >>>>>> :: [ivy:resolve] :: loading settings :: file = >>>>>> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml >>>>>> [ivy:resolve] >>>>>> [ivy:resolve] :: problems summary :: >>>>>> [ivy:resolve] :::: WARNINGS >>>>>> [ivy:resolve] module not found: >>>>>> org.apache.gora#gora-core;0.2-incubating >>>>>> [ivy:resolve] ==== local: tried >>>>>> [ivy:resolve] >>>>> >>>>> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati >>>>> ng >>>>> >>>>>> / ivys/ivy.xml >>>>>> [ivy:resolve] -- artifact >>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar: >>>>>> [ivy:resolve] >>>>> >>>>> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati >>>>> ng >>>>> >>>>>> / jars/gora-core.jar >>>>>> [ivy:resolve] module not found: >>>>>> org.apache.gora#gora-sql;0.2-incubating >>>>>> [ivy:resolve] ==== local: tried >>>>>> [ivy:resolve] >>>>> >>>>> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin >>>>> g/ >>>>> >>>>>> i vys/ivy.xml >>>>>> [ivy:resolve] -- artifact >>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar: >>>>>> [ivy:resolve] >>>>> >>>>> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin >>>>> g/ >>>>> >>>>>> j ars/gora-sql.jar >>>>>> [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: >>>>>> [ivy:resolve] :: UNRESOLVED DEPENDENCIES :: >>>>>> [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: >>>>>> [ivy:resolve] :: org.apache.gora#gora-core;0.2-incubating: >>> not >>>>>> found [ivy:resolve] :: >>> org.apache.gora#gora-sql;0.2-incubating: >>>>>> not found [ivy:resolve] >>>>>> >>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] >>>>>> >>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS >>>>>> >>>>>> BUILD FAILED >>>>> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458: >>>>> impossible >>>>> >>>>>> to resolve dependencies: >>>>>> resolve failed - see output for details >>>>>> >>>>>> -param >>>>>> >>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <[email protected]> >>>>> >>>>> wrote: >>>>>>> Look down a little further for the >>>>>>> >>>>>>> or >>>>>>> runtime/local/bin/nutch (version >= 1.3) >>>>>>> >>>>>>> If you download the bin then it's in the runtime directory. >>>>>>> >>>>>>> Jerry E. Craig, Jr. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Sethi, Parampreet [mailto:[email protected]] >>>>>>> Sent: Monday, July 11, 2011 2:51 PM >>>>>>> To: [email protected] >>>>>>> Subject: Nutch Novice help >>>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> Sorry for such a naïve question, I downloaded nutch 1.3 binary >>> today >>>>> >>>>> and >>>>> >>>>>>> trying to set it up as mentioned in Tutorial at >>>>>>> http://wiki.apache.org/nutch/NutchTutorial >>>>>>> >>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf >>>>> >>>>> directory. >>>>> >>>>>>> Is there any other place where I should look for this file? >>>>>>> >>>>>>> Thanks >>>>>>> Param >>> >> >> >> >> -- >> * >> *Open Source Solutions for Text Engineering >> >> http://digitalpebble.blogspot.com/ >> http://www.digitalpebble.com >> > >

