The web database will eventually remove URLs that cannot be fetched, and removes pages that are not linked to by other pages.
Doug
Matthias Jaekle wrote:
Hi,
for analyzing the webdb it seems to be, that there is much free hdd on my system necessary. Analyzing the webdb uses 6 times the hdd space then the own size of the webdb.
I am running just a small nutch system with a 80 GB hard disk. There I have around 25 GB segments, 3 GB index and 6 GB webdb. Together with the OS and 30 GB I have to keep free for analyzing the webdb, the hdd is full.
Any possibility to reduce the amount of space I have to keep free or do I make something wrong?
Is the webdb a always growing system or is it useful and possible to delete unimportant urls?
Many thanks for your answers.
Matthias Jaekle
-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
