>
> There seems to be no crawl-urlfilter file indeed. Don't know why it's gone
> since
> the crawl command is still there. You can find the file in the 1.2 release:
> http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/
>

Crawl-urlfilter has been removed  purposefully as it did not add anything to
the other url filters (automaton | regex) in terms of functionality. By
default the urlfilters contain (+.) which IIRC was what the Crawl-urlfilter
used to do.



>
> > Thanks for a quick reply.
> >
> > I searched in the nutch directory but still do not see that file :(.
> Here's
> > complete file list inside runtime/local/conf directory.
> >
> > us137390:conf parampreetsethi$ pwd
> > /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf
> > us137390:conf parampreetsethi$ ls -t
> > automaton-urlfilter.txt    domain-urlfilter.txt    nutch-default.xml
> > prefix-urlfilter.txt    solrindex-mapping.xml
> > configuration.xsl    httpclient-auth.xml    nutch-site.xml
> > regex-normalize.xml    subcollections.xml
> > domain-suffixes.xml    log4j.properties    parse-plugins.dtd
> > regex-urlfilter.txt    suffix-urlfilter.txt
> > domain-suffixes.xsd    nutch-conf.xsl        parse-plugins.xml
> > schema.xml tika-mimetypes.xml
> >
> > By the way, I tried deploying the code by checking out from svn
> repository,
> > but could not build it. I was getting following error:
> >
> > resolve-default:
> > [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 ::
> http://ant.apache.org/ivy/
> > :: [ivy:resolve] :: loading settings :: file =
> > /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml
> > [ivy:resolve]
> > [ivy:resolve] :: problems summary ::
> > [ivy:resolve] :::: WARNINGS
> > [ivy:resolve]         module not found:
> > org.apache.gora#gora-core;0.2-incubating
> > [ivy:resolve]     ==== local: tried
> > [ivy:resolve]
> >
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating
> > / ivys/ivy.xml
> > [ivy:resolve]       -- artifact
> > org.apache.gora#gora-core;0.2-incubating!gora-core.jar:
> > [ivy:resolve]
> >
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubating
> > / jars/gora-core.jar
> > [ivy:resolve]         module not found:
> > org.apache.gora#gora-sql;0.2-incubating
> > [ivy:resolve]     ==== local: tried
> > [ivy:resolve]
> >
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/
> > i vys/ivy.xml
> > [ivy:resolve]       -- artifact
> > org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar:
> > [ivy:resolve]
> >
> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubating/
> > j ars/gora-sql.jar
> > [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> > [ivy:resolve]         ::          UNRESOLVED DEPENDENCIES         ::
> > [ivy:resolve]         ::::::::::::::::::::::::::::::::::::::::::::::
> > [ivy:resolve]         :: org.apache.gora#gora-core;0.2-incubating: not
> > found [ivy:resolve]         :: org.apache.gora#gora-sql;0.2-incubating:
> > not found [ivy:resolve]
> > :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve]
> > [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> >
> > BUILD FAILED
> > /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458:
> impossible
> > to resolve dependencies:
> >     resolve failed - see output for details
> >
> >
> > -param
> >
> > On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <[email protected]>
> wrote:
> > > Look down a little further for the
> > >
> > > or
> > > runtime/local/bin/nutch (version >= 1.3)
> > >
> > > If you download the bin then it's in the runtime directory.
> > >
> > > Jerry E. Craig, Jr.
> > >
> > > -----Original Message-----
> > > From: Sethi, Parampreet [mailto:[email protected]]
> > > Sent: Monday, July 11, 2011 2:51 PM
> > > To: [email protected]
> > > Subject: Nutch Novice help
> > >
> > > Hi All,
> > >
> > > Sorry for such a naïve question,  I downloaded nutch 1.3 binary today
> and
> > > trying to set it up as mentioned in Tutorial at
> > > http://wiki.apache.org/nutch/NutchTutorial
> > >
> > > How ever I am not able to find crawl-urlfilter.txt inside conf
> directory.
> > > Is there any other place where I should look for this file?
> > >
> > > Thanks
> > > Param
>




-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to