Thanks Jacob, for the help.
It is a pity the results of the previous crawl must be removed. Specially because it's a problem to restart the container (JBoss, in my case). Is this a feature inherited from lucene? Or maybe this will be improved in the future?

Thanks again.

En/na Vanderdray, Jacob ha escrit:

        If you look at the section of the tutorial for doing intranet
crawls, you should be able to use that for your small number of
websites.  The bin/nutch script wraps up all the crawl functions for you
(fetching, indexing, deduping, etc).  You'll just need to delete the
results of your previous day's crawl, copy over the results of the new
crawl and restart tomcat each night.

Jake.

-----Original Message-----
From: Sugra Llistaire [mailto:[EMAIL PROTECTED] Sent: Thursday, February 23, 2006 4:55 AM
To: [email protected]
Subject: Simple indexation and reindexation


Hello,
I have a small number of websites to be indexed. Formerly, my search engine was udmGoSearch. But I'm glad to see there is this J2EE search engine. But I'm trying to emulate process of udm search with nutch and it doesn't seem to be possible.

The system was simple.
First day, I indexed the web site.
Nightly, I executed a script to reindex the website.
I didn't have to think in fetching, duplicating, injecting. All this was included in udm's script.
Of course, it is unavoidable, reconfiguring urls filters and all that
stuff.
Is it possible to use nutch with this easy process? Has anyone implemented the script that makes all this job? A first indexation script and a nightly reindexation script.

Thanks in advance.


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to