You need to add domains of those new url/domains to
conf/crawl-urlfilter.txt

On 7/17/06, Schackenberg, Benedikt <[EMAIL PROTECTED]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> hello, volks,
> i did the follwing steps;
>
> 1.
>  bin/nutch admin db -create
>
> 2.
>  bin/nutch inject db -urlfile urls.txt
>   -- > in my urls.txt are 2 links_/ http://www.termindoc.de
>                                     http://www.heise.de
> 3.
>  bin/nutch generate db segments
>
> 4.
>  s1=`ls -d segments/2* | tail -1`
>
>  bin/nutch fetch $s1
>
> 5.
>  bin/nutch updatedb db $s1
>
> 6.
>
>  bin/nutch generate db segments
>
>  s2=`ls -d segments/2* | tail -1`
>
>  bin/nutch fetch $s2
>
>  bin/nutch updatedb db $s2
>
>
> 7.
>  bin/nutch generate db segments
>
>  s3=`ls -d segments/2* | tail -1`
>
>  bin/nutch fetch $s3
>
>  bin/nutch updatedb db $s3
>
> 8.
>  bin/nutch index $s1
>
>  bin/nutch index $s2
>
>  bin/nutch index $s3
>
>
> ************************************
>
> my problem, the frist run works fine, sites termindoc.de and heise.de
> are crawled,
>
> but when i put new websites/links to the urls.txt thes new domains are
> not crawled, what i am doing wrong ????
>
> thx
> benedikt schackenberg
>
>
> - --
> - - --
> S&P data GmbH
> T 06131 218111
> F 06131 218112
> E [EMAIL PROTECTED]
> W www.termindoc.de
>
> PGP-Key-ID: 0x0D2E4AE4
>
> Unser Impressum finden Sie unter http://www.termindoc.de/Impressum.htm
>
> Alle Willenserklärungen der S&P data GmbH bedürfen zu ihrer Wirksamkeit
> der Schriftform versehen mit zwei Originalunterschriften.
>
> Für viele der Dateien, die Sie von uns erhalten, benötigen Sie zum
> Betrachten den Acrobat Reader, den Sie hier erhalten können.
> http://www.adobe.de/products/acrobat/readstep2.html
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.5 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFEuoaodUpiAQ0uSuQRAqylAJ99znUgETzfJUqnolpAX9L27siDjACePELO
> 6T3gKGHXVWABv7UDyrDV8pU=
> =Y+T+
> -----END PGP SIGNATURE-----
>
>


-- 
www.jkg.in | http://www.jkg.in/contact-me/
Jayant Kr. Gandhi
M.Tech. Computer Tech. Class of 2007,
IIT Delhi


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to