No URLs to fetch - check your seed list and URL filters

The error is quite clear. You injected URL's that did not pass your url 
filters. Check your url filters, likely crawl-urlfilter since you seem to use 
the 
crawl command.


> Thanks for updating the tutorial. I tried my setup, the crawl command is
> running. But none of the pages are being crawled.
> I created urls directory inside local folder and added new file nutch with
> url in the same as mentioned in tutorial.
> 
> (I also tried file named urls inside nutch/runtime/local diretcory. The
> contents of urls file is http://lucene.apache.org/nutch/ )
> 
> Here's the log:
> 
> us137390:local parampreetsethi$  bin/nutch crawl urls -dir crawl -depth 3
> -topN 50
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> solrUrl=null
> topN = 50
> Injector: starting at 2011-07-12 12:22:12
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03
> Generator: starting at 2011-07-12 12:22:15
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 50
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl
> 
> 
> Please help.
> 
> Thanks
> Param
> 
> On 7/12/11 5:52 AM, "Julien Nioche" <[email protected]> wrote:
> > On 12 July 2011 10:30, Julien Nioche <[email protected]> 
wrote:
> >>>>> There seems to be no crawl-urlfilter file indeed. Don't know why it's
> >>>>> gone since
> >>>>> the crawl command is still there. You can find the file in the 1.2
> >>>>> release: http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
> >>>> 
> >>>> Crawl-urlfilter has been removed  purposefully as it did not add
> >>> 
> >>> anything
> >>> 
> >>>> to the other url filters (automaton | regex) in terms of
> >>>> functionality.
> >>> 
> >>> By
> >>> 
> >>>> default the urlfilters contain (+.) which IIRC was what the
> >>>> Crawl-urlfilter used to do.
> >>> 
> >>> That's reasonable. But now news users are unaware and don't know what
> >>> to do
> >>> with this error message.
> >> 
> >> Yep, the tutorial needs updating indeed
> > 
> > done
> > 
> >>>>>> Thanks for a quick reply.
> >>>>>> 
> >>>>>> I searched in the nutch directory but still do not see that file :(.
> >>>>> 
> >>>>> Here's
> >>>>> 
> >>>>>> complete file list inside runtime/local/conf directory.
> >>>>>> 
> >>>>>> us137390:conf parampreetsethi$ pwd
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> >>>>>> us137390:conf parampreetsethi$ ls -t
> >>>>>> automaton-urlfilter.txt    domain-urlfilter.txt    nutch-default.xml
> >>>>>> prefix-urlfilter.txt    solrindex-mapping.xml
> >>>>>> configuration.xsl    httpclient-auth.xml    nutch-site.xml
> >>>>>> regex-normalize.xml    subcollections.xml
> >>>>>> domain-suffixes.xml    log4j.properties    parse-plugins.dtd
> >>>>>> regex-urlfilter.txt    suffix-urlfilter.txt
> >>>>>> domain-suffixes.xsd    nutch-conf.xsl        parse-plugins.xml
> >>>>>> schema.xml tika-mimetypes.xml
> >>>>>> 
> >>>>>> By the way, I tried deploying the code by checking out from svn
> >>>>> 
> >>>>> repository,
> >>>>> 
> >>>>>> but could not build it. I was getting following error:
> >>>>>> 
> >>>>>> resolve-default:
> >>>>> 
> >>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
> >>>>> http://ant.apache.org/ivy/
> >>>>> 
> >>>>>> :: [ivy:resolve] :: loading settings :: file =
> >>>>>> 
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> >>>>>> [ivy:resolve]
> >>>>>> [ivy:resolve] :: problems summary ::
> >>>>>> [ivy:resolve] :::: WARNINGS
> >>>>>> [ivy:resolve]         module not found:
> >>>>>> org.apache.gora#gora-core;0.2-incubating
> >>>>>> [ivy:resolve]     ==== local: tried
> >>>>>> [ivy:resolve]
> >>> 
> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incuba
> >>> ti
> >>> 
> >>>>> ng
> >>>>> 
> >>>>>> / ivys/ivy.xml
> >>>>>> [ivy:resolve]       -- artifact
> >>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> >>>>>> [ivy:resolve]
> >>> 
> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incuba
> >>> ti
> >>> 
> >>>>> ng
> >>>>> 
> >>>>>> / jars/gora-core.jar
> >>>>>> [ivy:resolve]         module not found:
> >>>>>> org.apache.gora#gora-sql;0.2-incubating
> >>>>>> [ivy:resolve]     ==== local: tried
> >>>>>> [ivy:resolve]
> >>> 
> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubat
> >>> in
> >>> 
> >>>>> g/
> >>>>> 
> >>>>>> i vys/ivy.xml
> >>>>>> [ivy:resolve]       -- artifact
> >>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> >>>>>> [ivy:resolve]
> >>> 
> >>> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubat
> >>> in
> >>> 
> >>>>> g/
> >>>>> 
> >>>>>> j ars/gora-sql.jar
> >>>>>> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>> [ivy:resolve]         ::          UNRESOLVED DEPENDENCIES         ::
> >>>>>> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> >>> 
> >>>>>> [ivy:resolve]         :: org.apache.gora#gora-core;0.2-incubating:
> >>> not
> >>> 
> >>>>>> found [ivy:resolve]         ::
> >>> org.apache.gora#gora-sql;0.2-incubating:
> >>>>>> not found [ivy:resolve]
> >>>>>> 
> >>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> >>>>>> 
> >>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> >>>>>> 
> >>>>>> BUILD FAILED
> >>>>> 
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
> >>>>> impossible
> >>>>> 
> >>>>>> to resolve dependencies:
> >>>>>>     resolve failed - see output for details
> >>>>>> 
> >>>>>> -param
> >>>>>> 
> >>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <[email protected]>
> >>>>> 
> >>>>> wrote:
> >>>>>>> Look down a little further for the
> >>>>>>> 
> >>>>>>> or
> >>>>>>> runtime/local/bin/nutch (version >= 1.3)
> >>>>>>> 
> >>>>>>> If you download the bin then it's in the runtime directory.
> >>>>>>> 
> >>>>>>> Jerry E. Craig, Jr.
> >>>>>>> 
> >>>>>>> -----Original Message-----
> >>>>>>> From: Sethi, Parampreet [mailto:[email protected]]
> >>>>>>> Sent: Monday, July 11, 2011 2:51 PM
> >>>>>>> To: [email protected]
> >>>>>>> Subject: Nutch Novice help
> >>>>>>> 
> >>>>>>> Hi All,
> >>>>>>> 
> >>>>>>> Sorry for such a naïve question,  I downloaded nutch 1.3 binary
> >>> 
> >>> today
> >>> 
> >>>>> and
> >>>>> 
> >>>>>>> trying to set it up as mentioned in Tutorial at
> >>>>>>> http://wiki.apache.org/nutch/NutchTutorial
> >>>>>>> 
> >>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf
> >>>>> 
> >>>>> directory.
> >>>>> 
> >>>>>>> Is there any other place where I should look for this file?
> >>>>>>> 
> >>>>>>> Thanks
> >>>>>>> Param
> >> 
> >> --
> >> *
> >> *Open Source Solutions for Text Engineering
> >> 
> >> http://digitalpebble.blogspot.com/
> >> http://www.digitalpebble.com

Reply via email to