You need to add domains of those new url/domains to conf/crawl-urlfilter.txt
On 7/17/06, Schackenberg, Benedikt <[EMAIL PROTECTED]> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > hello, volks, > i did the follwing steps; > > 1. > bin/nutch admin db -create > > 2. > bin/nutch inject db -urlfile urls.txt > -- > in my urls.txt are 2 links_/ http://www.termindoc.de > http://www.heise.de > 3. > bin/nutch generate db segments > > 4. > s1=`ls -d segments/2* | tail -1` > > bin/nutch fetch $s1 > > 5. > bin/nutch updatedb db $s1 > > 6. > > bin/nutch generate db segments > > s2=`ls -d segments/2* | tail -1` > > bin/nutch fetch $s2 > > bin/nutch updatedb db $s2 > > > 7. > bin/nutch generate db segments > > s3=`ls -d segments/2* | tail -1` > > bin/nutch fetch $s3 > > bin/nutch updatedb db $s3 > > 8. > bin/nutch index $s1 > > bin/nutch index $s2 > > bin/nutch index $s3 > > > ************************************ > > my problem, the frist run works fine, sites termindoc.de and heise.de > are crawled, > > but when i put new websites/links to the urls.txt thes new domains are > not crawled, what i am doing wrong ???? > > thx > benedikt schackenberg > > > - -- > - - -- > S&P data GmbH > T 06131 218111 > F 06131 218112 > E [EMAIL PROTECTED] > W www.termindoc.de > > PGP-Key-ID: 0x0D2E4AE4 > > Unser Impressum finden Sie unter http://www.termindoc.de/Impressum.htm > > Alle Willenserklärungen der S&P data GmbH bedürfen zu ihrer Wirksamkeit > der Schriftform versehen mit zwei Originalunterschriften. > > Für viele der Dateien, die Sie von uns erhalten, benötigen Sie zum > Betrachten den Acrobat Reader, den Sie hier erhalten können. > http://www.adobe.de/products/acrobat/readstep2.html > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.5 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFEuoaodUpiAQ0uSuQRAqylAJ99znUgETzfJUqnolpAX9L27siDjACePELO > 6T3gKGHXVWABv7UDyrDV8pU= > =Y+T+ > -----END PGP SIGNATURE----- > > -- www.jkg.in | http://www.jkg.in/contact-me/ Jayant Kr. Gandhi M.Tech. Computer Tech. Class of 2007, IIT Delhi ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
