On 12 July 2011 10:30, Julien Nioche <[email protected]> wrote:
> > >> > > There seems to be no crawl-urlfilter file indeed. Don't know why it's >> > > gone since >> > > the crawl command is still there. You can find the file in the 1.2 >> > > release: http://svn.apache.org/viewvc/nutch/branches/branch-1.2/conf/ >> > >> > Crawl-urlfilter has been removed purposefully as it did not add >> anything >> > to the other url filters (automaton | regex) in terms of functionality. >> By >> > default the urlfilters contain (+.) which IIRC was what the >> > Crawl-urlfilter used to do. >> > >> >> That's reasonable. But now news users are unaware and don't know what to >> do >> with this error message. >> > > Yep, the tutorial needs updating indeed > done > > > >> >> > > > Thanks for a quick reply. >> > > > >> > > > I searched in the nutch directory but still do not see that file :(. >> > > >> > > Here's >> > > >> > > > complete file list inside runtime/local/conf directory. >> > > > >> > > > us137390:conf parampreetsethi$ pwd >> > > > /Users/parampreetsethi/Documents/workspace/nutch/runtime/local/conf >> > > > us137390:conf parampreetsethi$ ls -t >> > > > automaton-urlfilter.txt domain-urlfilter.txt nutch-default.xml >> > > > prefix-urlfilter.txt solrindex-mapping.xml >> > > > configuration.xsl httpclient-auth.xml nutch-site.xml >> > > > regex-normalize.xml subcollections.xml >> > > > domain-suffixes.xml log4j.properties parse-plugins.dtd >> > > > regex-urlfilter.txt suffix-urlfilter.txt >> > > > domain-suffixes.xsd nutch-conf.xsl parse-plugins.xml >> > > > schema.xml tika-mimetypes.xml >> > > > >> > > > By the way, I tried deploying the code by checking out from svn >> > > >> > > repository, >> > > >> > > > but could not build it. I was getting following error: >> > > > >> > > > resolve-default: >> > > >> > > > [ivy:resolve] :: Ivy 2.2.0 - 20100923230623 :: >> > > http://ant.apache.org/ivy/ >> > > >> > > > :: [ivy:resolve] :: loading settings :: file = >> > > > >> > > > /Users/parampreetsethi/Documents/workspace/nutch/ivy/ivysettings.xml >> > > > [ivy:resolve] >> > > > [ivy:resolve] :: problems summary :: >> > > > [ivy:resolve] :::: WARNINGS >> > > > [ivy:resolve] module not found: >> > > > org.apache.gora#gora-core;0.2-incubating >> > > > [ivy:resolve] ==== local: tried >> > > > [ivy:resolve] >> > > >> > > >> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati >> > > ng >> > > >> > > > / ivys/ivy.xml >> > > > [ivy:resolve] -- artifact >> > > > org.apache.gora#gora-core;0.2-incubating!gora-core.jar: >> > > > [ivy:resolve] >> > > >> > > >> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-core/0.2-incubati >> > > ng >> > > >> > > > / jars/gora-core.jar >> > > > [ivy:resolve] module not found: >> > > > org.apache.gora#gora-sql;0.2-incubating >> > > > [ivy:resolve] ==== local: tried >> > > > [ivy:resolve] >> > > >> > > >> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin >> > > g/ >> > > >> > > > i vys/ivy.xml >> > > > [ivy:resolve] -- artifact >> > > > org.apache.gora#gora-sql;0.2-incubating!gora-sql.jar: >> > > > [ivy:resolve] >> > > >> > > >> /Users/parampreetsethi/.ivy2/local/org.apache.gora/gora-sql/0.2-incubatin >> > > g/ >> > > >> > > > j ars/gora-sql.jar >> > > > [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: >> > > > [ivy:resolve] :: UNRESOLVED DEPENDENCIES :: >> > > > [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: >> > > > [ivy:resolve] :: org.apache.gora#gora-core;0.2-incubating: >> not >> > > > found [ivy:resolve] :: >> org.apache.gora#gora-sql;0.2-incubating: >> > > > not found [ivy:resolve] >> > > > >> > > > :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] >> > > > >> > > > [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS >> > > > >> > > > BUILD FAILED >> > > >> > > > /Users/parampreetsethi/Documents/workspace/nutch/build.xml:458: >> > > impossible >> > > >> > > > to resolve dependencies: >> > > > resolve failed - see output for details >> > > > >> > > > -param >> > > > >> > > > On 7/11/11 5:56 PM, "Jerry E. Craig, Jr." <[email protected]> >> > > >> > > wrote: >> > > > > Look down a little further for the >> > > > > >> > > > > or >> > > > > runtime/local/bin/nutch (version >= 1.3) >> > > > > >> > > > > If you download the bin then it's in the runtime directory. >> > > > > >> > > > > Jerry E. Craig, Jr. >> > > > > >> > > > > -----Original Message----- >> > > > > From: Sethi, Parampreet [mailto:[email protected]] >> > > > > Sent: Monday, July 11, 2011 2:51 PM >> > > > > To: [email protected] >> > > > > Subject: Nutch Novice help >> > > > > >> > > > > Hi All, >> > > > > >> > > > > Sorry for such a naïve question, I downloaded nutch 1.3 binary >> today >> > > >> > > and >> > > >> > > > > trying to set it up as mentioned in Tutorial at >> > > > > http://wiki.apache.org/nutch/NutchTutorial >> > > > > >> > > > > How ever I am not able to find crawl-urlfilter.txt inside conf >> > > >> > > directory. >> > > >> > > > > Is there any other place where I should look for this file? >> > > > > >> > > > > Thanks >> > > > > Param >> > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

