Ok, the work goes on ... some answers to my questions I found for myself.

I found the prune command to delete pages but the inject command should not 
work ... 

I use the scrip from 
http://www.mail-archive.com/nutch-user%40lucene.apache.org/msg03638.html and 
add the prune-command before the merge-command ... this should working - but 
when I use the inject-command the site will not be found in the index after I 
run the script.

Then I habe a problem understanding nutch - when I use the script above I use 
the whole web search - when nutch recrawl a site? Is this the adddays-option? 
Or how can I control when a site will be recrawled. ==> When I recrawl a site 
and a link or a url ist broken will this site or link deleted from the index - 
or marked as broken? 

Thanks for your help and exuce my bumpy english. ;)
Christian


-----Ursprüngliche Nachricht-----
Von: [email protected]
Gesendet: 18.03.06 13:46:48
An: [email protected]
Betreff: newbie question - urlfilter and crawling


Hi,

I tried to install nutch the first time and after a long night my 
nutch-test-installation is working fine ... thanks for that work (software, 
tutorials ... )!!

anyway I have some questions:

1.) I have build an index with some start-urls (3) => how can I add some other 
pages to the index without deleting the whole directory and start again from 
the beginning? 
2.) I saw that there many sites index I dont want support => how cam I include 
a blacklist like: 
http://wiki.nebel.de/snipsnap/space/comment-Nutch/Blacklist-1
http://wiki.nebel.de/snipsnap/space/Nutch/Blacklist
I searched the whole night but cant find a tutorial about this - only tipps for 
plugins I dont understand (I think its up o my bumpy english ... )
3.) How can I "recrawl" the index against this blacklist to remove unmeant 
webpages?
4.) Is it right, that I have to restart tomcat after I merged the new pages? I 
cant find any new pages in the index after I run the script I found in the 
mailinglist archive ... 

So, this where the first questions - it would be very nice if you can help me 
or give me some links to read. ;o)
Best regards,
Christian
______________________________________________________________
Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193



______________________________________________________________
Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to