I will use nutch to search one (!) internet site (example: www.mysite.de).

I am quit new to nutch and checked it out. In the tutorial I found the
intranet crawl chapter.

I think, that is what I need. I followed the example and all works fine and
I can search my site.

My questions:

- How do I update/refresh the index? There is no explanation or example
about the intranet crawl!
- What is the refresh period of the index? And how can I change it?
- What are the meta-tags nutch uses to decide if a page is new or modified?
Or is the entire site recrawled with every update?
- I need to refresh / update the index daily. Is that possible? There are
every day content updates made by users, which I must
- If I deploy the nutch war on an application server, can I update/refresh
the index by a servlet and not using an shell script? We are using an
windows box and I don't want to install cygwin.


Can someone send me an step by step explanation or an script that crawl and
periodicallly refresh / updates the index for one site?

Is there a german out there, who can guide me? My english is not as good as
it should be, you see.





-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to