[Nutch-general] RE: Simple indexation and reindexation

Vanderdray, Jacob Thu, 23 Feb 2006 06:52:40 -0800

        If you look at the section of the tutorial for doing intranet
crawls, you should be able to use that for your small number of
websites.  The bin/nutch script wraps up all the crawl functions for you
(fetching, indexing, deduping, etc).  You'll just need to delete the
results of your previous day's crawl, copy over the results of the new
crawl and restart tomcat each night.

Jake.

-----Original Message-----
From: Sugra Llistaire [mailto:[EMAIL PROTECTED] 
Sent: Thursday, February 23, 2006 4:55 AM
To: [email protected]
Subject: Simple indexation and reindexation

Hello,
I have a small number of websites to be indexed. Formerly, my search 
engine was udmGoSearch. But I'm glad to see there is this J2EE search 
engine.
But I'm trying to emulate process of udm search with nutch and it 
doesn't seem to be possible.

The system was simple.
First day, I indexed the web site.
Nightly, I executed a script to reindex the website.
I didn't have to think in fetching,  duplicating, injecting. All this 
was included in udm's script.
Of course, it is unavoidable, reconfiguring urls filters and all that
stuff.
Is it possible to use nutch with this easy process? Has anyone 
implemented the script that makes all this job? A first indexation 
script and a nightly reindexation script.

Thanks in advance.

-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0944&bid$1720&dat1642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] RE: Simple indexation and reindexation

Reply via email to