Hi,

I think you are maybe getting tangled here. Please see the following
tutorial for Nutch 1.3 [1]

Please also note that the URL you provided is the old Nutch site and now
redirects to http://nutch.apache.org

[1] http://wiki.apache.org/nutch/RunningNutchAndSolr

On Tue, Jul 12, 2011 at 5:23 PM, Sethi, Parampreet <
[email protected]> wrote:

> Thanks for updating the tutorial. I tried my setup, the crawl command is
> running. But none of the pages are being crawled.
> I created urls directory inside local folder and added new file nutch with
> url in the same as mentioned in tutorial.
>
> (I also tried file named urls inside nutch/runtime/local diretcory. The
> contents of urls file is http://lucene.apache.org/nutch/ )
>
> Here's the log:
>
> us137390:local parampreetsethi$  bin/nutch crawl urls -dir crawl -depth 3
> -topN 50
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> solrUrl=null
> topN = 50
> Injector: starting at 2011-07-12 12:22:12
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2011-07-12 12:22:15, elapsed: 00:00:03
> Generator: starting at 2011-07-12 12:22:15
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 50
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl
>
>
> Please help.
>
> Thanks
> Param
>
> On 7/12/11 5:52 AM, "Julien Nioche" <[email protected]> wrote:
>
> > On 12 July 2011 10:30, Julien Nioche <[email protected]>
> wrote:
> >
> >>
> >>
> >>>>> There seems to be no crawl-urlfilter file indeed. Don't know why it's
> >>>>> gone since
> >>>>> the crawl command is still there. You can find the file in the 1.2
> >>>>> release:
> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
> >>>>
> >>>> Crawl-urlfilter has been removed  purposefully as it did not add
> >>> anything
> >>>> to the other url filters (automaton | regex) in terms of
> functionality.
> >>> By
> >>>> default the urlfilters contain (+.) which IIRC was what the
> >>>> Crawl-urlfilter used to do.
> >>>>
> >>>
> >>> That's reasonable. But now news users are unaware and don't know what
> to
> >>> do
> >>> with this error message.
> >>>
> >>
> >> Yep, the tutorial needs updating indeed
> >>
> >
> > done
> >
> >
> >>
> >>
> >>
> >>>
> >>>>>> Thanks for a quick reply.
> >>>>>>
> >>>>>> I searched in the nutch directory but still do not see that file :(.
> >>>>>
> >>>>> Here's
> >>>>>
> >>>>>> complete file list inside runtime/local/conf directory.
> >>>>>>
> >>>>>> us137390:conf parampreetsethi$ pwd
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> >>>>>> us137390:conf parampreetsethi$ ls -t
> >>>>>> automaton-urlfilter.txt    domain-urlfilter.txt    nutch-default.xml
> >>>>>> prefix-urlfilter.txt    solrindex-mapping.xml
> >>>>>> configuration.xsl    httpclient-auth.xml    nutch-site.xml
> >>>>>> regex-normalize.xml    subcollections.xml
> >>>>>> domain-suffixes.xml    log4j.properties    parse-plugins.dtd
> >>>>>> regex-urlfilter.txt    suffix-urlfilter.txt
> >>>>>> domain-suffixes.xsd    nutch-conf.xsl        parse-plugins.xml
> >>>>>> schema.xml tika-mimetypes.xml
> >>>>>>
> >>>>>> By the way, I tried deploying the code by checking out from svn
> >>>>>
> >>>>> repository,
> >>>>>
> >>>>>> but could not build it. I was getting following error:
> >>>>>>
> >>>>>> resolve-default:
> >>>>>
> >>>>>> [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
> >>>>> http://ant.apache.org/ivy/
> >>>>>
> >>>>>> :: [ivy:resolve] :: loading settings :: file =
> >>>>>>
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> >>>>>> [ivy:resolve]
> >>>>>> [ivy:resolve] :: problems summary ::
> >>>>>> [ivy:resolve] :::: WARNINGS
> >>>>>> [ivy:resolve]         module not found:
> >>>>>> org.apache.gora#gora-core;0.2-incubating
> >>>>>> [ivy:resolve]     ==== local: tried
> >>>>>> [ivy:resolve]
> >>>>>
> >>>>>
> >>>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> >>>>> ng
> >>>>>
> >>>>>> / ivys/ivy.xml
> >>>>>> [ivy:resolve]       -- artifact
> >>>>>> org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> >>>>>> [ivy:resolve]
> >>>>>
> >>>>>
> >>>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati
> >>>>> ng
> >>>>>
> >>>>>> / jars/gora-core.jar
> >>>>>> [ivy:resolve]         module not found:
> >>>>>> org.apache.gora#gora-sql;0.2-incubating
> >>>>>> [ivy:resolve]     ==== local: tried
> >>>>>> [ivy:resolve]
> >>>>>
> >>>>>
> >>>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> >>>>> g/
> >>>>>
> >>>>>> i vys/ivy.xml
> >>>>>> [ivy:resolve]       -- artifact
> >>>>>> org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> >>>>>> [ivy:resolve]
> >>>>>
> >>>>>
> >>>
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin
> >>>>> g/
> >>>>>
> >>>>>> j ars/gora-sql.jar
> >>>>>> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>> [ivy:resolve]         ::          UNRESOLVED DEPENDENCIES         ::
> >>>>>> [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> >>>>>> [ivy:resolve]         :: org.apache.gora#gora-core;0.2-incubating:
> >>> not
> >>>>>> found [ivy:resolve]         ::
> >>> org.apache.gora#gora-sql;0.2-incubating:
> >>>>>> not found [ivy:resolve]
> >>>>>>
> >>>>>> :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> >>>>>>
> >>>>>> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> >>>>>>
> >>>>>> BUILD FAILED
> >>>>>
> >>>>>> /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
> >>>>> impossible
> >>>>>
> >>>>>> to resolve dependencies:
> >>>>>>     resolve failed - see output for details
> >>>>>>
> >>>>>> -param
> >>>>>>
> >>>>>> On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <[email protected]>
> >>>>>
> >>>>> wrote:
> >>>>>>> Look down a little further for the
> >>>>>>>
> >>>>>>> or
> >>>>>>> runtime/local/bin/nutch (version >= 1.3)
> >>>>>>>
> >>>>>>> If you download the bin then it's in the runtime directory.
> >>>>>>>
> >>>>>>> Jerry E. Craig, Jr.
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Sethi, Parampreet [mailto:[email protected]]
> >>>>>>> Sent: Monday, July 11, 2011 2:51 PM
> >>>>>>> To: [email protected]
> >>>>>>> Subject: Nutch Novice help
> >>>>>>>
> >>>>>>> Hi All,
> >>>>>>>
> >>>>>>> Sorry for such a naïve question,  I downloaded nutch 1.3 binary
> >>> today
> >>>>>
> >>>>> and
> >>>>>
> >>>>>>> trying to set it up as mentioned in Tutorial at
> >>>>>>> http://wiki.apache.org/nutch/NutchTutorial
> >>>>>>>
> >>>>>>> How ever I am not able to find crawl-urlfilter.txt inside conf
> >>>>>
> >>>>> directory.
> >>>>>
> >>>>>>> Is there any other place where I should look for this file?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>> Param
> >>>
> >>
> >>
> >>
> >> --
> >> *
> >> *Open Source Solutions for Text Engineering
> >>
> >> http://digitalpebble.blogspot.com/
> >> http://www.digitalpebble.com
> >>
> >
> >
>
>


-- 
*Lewis*

Reply via email to