Not crawldb, and surely not entire files, but information about the indexes.
If you modify directory information while files are still open by a process
(e.g. by renaming a directory that contains them, and create a new directory
with the old name) the process keeps accessing the original files on disk
until it closes and reopens them (hence my question about mergesegs and
mergedb).
----- Original Message -----
From: "Manoharam Reddy" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Monday, May 28, 2007 1:53 PM
Subject: Re: Deleting crawl still gives proper results
The webapp caches the whole crawldb? Can anyone please tell me where
does it cache the whole crawldb? I don't think it is possible to cache
it on RAM. Is it cached in some location on the hard disk.
Please clarify this point.
On 5/27/07, Enzo Michelangeli <[EMAIL PROTECTED]> wrote:
----- Original Message -----
From: "Manoharam Reddy" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Saturday, May 26, 2007 6:23 PM
> After I create the crawldb after running bin/nutch crawl, I start my
> Tomcat server. It gives proper search results.
>
> What I am wondering is that even after I delete, the 'crawl' folder,
> the search page still gives proper search results. How is this
> possible? Only after I restart the Tomcat server, it stops giving
> results.
The webapp seems to cache data. I have a related problem: updates to the
indexes are only noticed after restarting Tomcat (so I have scheduled a
nightly cron job to do that).
Question for the Ones Who Know: in "bin/nutch mergesegs", can I use the
same
directory for input and output?
For example:
bin/nutch mergesegs crawl/segments -dir crawl/segments
Same for mergedb: can I issue:
bin/nutch mergedb crawl/crawldb crawl/crawldb
At present I pass through temporary directories, and then I switch them
in
place of the old ones with a couple of "mv", but I don't know if that's
necessary, or may even be harmful (for example, leaving the webapp,
unaware
of the "mv", pointing to the inode of the old directory). And I noticed
that
"bin/nutch mergedb" does not create the output directory until it's done,
so
I wonder if the explicit use of a temporary directory in my scripts is
redundant.
Enzo