Hi,
As suggested by Michael Ji, you can do crawling to a separate directory
and merge to the original one. You can use "mergesegs" command (which
actually uses net.nutch.tools.SegmentMergeTool class) to merge original
and new segments and create a new index. I have written a script to
automate it and it seems to work.
To avoid Tomcat restart, you can change search.jsp to check the
timestamp of index directory when it is called. If the timestamp has
changed, you can remove the "nutchBean" attribute from the application
(servlet context) variable so that NutchBean class loads the new index.

Hope it helps.

Kannan
On Fri, 2005-08-26 at 11:50 -0400, blackwater dev wrote:
> I'll give it a shot, how do I merge?  I guess I should actually look
> it up in the manual.
> 
> On 8/26/05, Michael Ji <[EMAIL PROTECTED]> wrote:
> > How about crawling to a new segment directory and
> > merge to the original one---where tomcat lives in?
> > 
> > But, I guess, in order to reflect your new crawled
> > data, you need at least stop and restart tomcat. ( ? I
> > did such testing, not sure if it is 100% correct, I
> > didn't see a formal document talking about data
> > updating in lucene indexing for nutch segment merging)
> > 
> > Michael Ji
> > 
> > --- blackwater dev <[EMAIL PROTECTED]> wrote:
> > 
> > > I have completed a crawl and have my crawl
> > > directory.  I now want to
> > > set up a cron job to run nightly to keep updating
> > > this directory.  How
> > > do I do this so I don't have to create a new
> > > directory each time (It
> > > dies if the directory exists) and so I can keep
> > > tomcat running without
> > > restarts into the new directory?
> > >
> > > Thanks!
> > >
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> >


This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient, please contact the sender by reply 
e-mail and destroy all copies of the original message. 
Any unauthorized review, use, disclosure, dissemination, forwarding, printing 
or copying of this email or any action taken in reliance on this e-mail is 
strictly 
prohibited and may be unlawful.

  Visit us at http://www.cognizant.com

Reply via email to