Index folder is created when your merge indexes - not needed, but usually enhanced performance. Crawl is probably merging the indexes automagically while the manual process won't.
During crawl/segment creation and indexing there are tons of files that get created and the optimize process goes through and cleans this up a bit. -byron -----Original Message----- From: Ian Reardon <[EMAIL PROTECTED]> To: [email protected] Date: Fri, 20 May 2005 08:44:45 -0400 Subject: Crawler/Fetcher Questions > I've noticed a few things that I'm puzzled about with nutch. > > When I just do a "nutch crawl" and give it a directory it creates 3 > folders off the root "db", "index" and "segments". > > On the other hand if I just create a root directory by hand. > > -Make 2 folders inside "segments" and "db" > -Create an empty web db > -Copy my segments from an existing crawl into the new segments folder > -Run updatedb > -Run index on those newly copied segments > (i've been using this method to combine multiple crawls of single > sites into 1 repository) > > it seems to work fine but I do not have an "index" folder like it > makes when you just do "nutch crawl". What is the index folder? Is > it ok that I don't have it, everything appears to be working. > > > 2nd question which is not as important. > > I've been tracking the size of the folders containing the crawls I'm > doing. It seems like they go up to say 20 megs, then it will go down > to 2 megs and slowly go up again. Where is this drastic reduction > coming from? I just hope I am not losing documents. > > Thanks in advance. >
