Hi there!
After a crawl/index cycle a segment directory is created which usually
contains content, index, and so on directories.
Here is what actually my current segment directory has after crawl/index
build of 2 Million URLs:
/segments/20070114151631> du -sh *
9.6G content
212M fetcher
5.0G index
0 index.done
5.8G parse_data
3.7G parse_text
The segment directory is copied to a searcher. As you can see the
content directory is huge.
My question is, if you just remove this directory, would that affect the
search capability, or later the recrawling and reindexing?
The content directory is so big, is there is a way not to have to copy
that directory to the searcher?
Thanks,
Ledio
Ledio Ago * Sr. Software Engineer * [EMAIL PROTECTED]
w: 415-348-7693 * f: 415-348-7032
LookSmart - Where To Look For What You Need. - Find. Save. Share.
625 Second Street, San Francisco, CA 94107
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general