Unfortunately, you can't manage disk space usage via configuration parameters... it is not easy... just try to keep your eyes on services/processes/ram/swap (disk swapping happens if RAM is not enough) during merge, even browse file/folders and click 'refresh' button to get an idea... it is strange that 50G was not enough to merge 2G, may be problem is somewhere else (OS X specifics for instance)... try to play with Nutch with smaller segment sizes and study it's behaviour on your OS... -Fuad
-----Original Message----- From: [email protected] [mailto:[email protected]] Sent: August-26-09 6:41 PM To: [email protected] Subject: Re: content of hadoop-site.xml Thanks for the response. How can I check disk swap? 50GB was before running merge command. When it crashed available space was 1 kb. RAM in my MacPro is 2GB. I deleted tmp folders created by hadoop during merge and after that OS X does not start. I plan to run merge again and need to reduce disk space usage by merge. I have read on the net that for reducing space we must use hadoop-site.xml. But, there is no hadoop-default.xml file and hadoop-site.xml file is empty. Thanks. Alex. -----Original Message----- From: Fuad Efendi <[email protected]> To: [email protected] Sent: Wed, Aug 26, 2009 3:28 pm Subject: RE: content of hadoop-site.xml You can override default settings (nutch-default.xml) in nutch-site.xml; but it won't help with spacing; empty file is Ok. "merge" may generate temporary files, but 50Gb against 2Gb looks extremely strange; try to empty recycle bin for instance... check disk swap... OS may report 50G available but you may be out of space... for instance heavy disk swap during merge due to low RAM... -Fuad http://www.linkedin.com/in/liferay http://www.tokenizer.org -----Original Message----- From: [email protected] [mailto:[email protected]] Sent: August-26-09 5:33 PM To: [email protected] Subject: content of hadoop-site.xml Hello, ?I have run merge script? to merge two crawl dirs, one 1.6G another 120MB. But my MacPro with 50G free space did not start, after merge crashed with no space error. I have been told that OSX got corrupted. I looked inside my nutch-1.0/conf/hadoop-site.xml file and it is empty. Can anyone let me know what must be put inside this file in order for merge not to take too much space. Thanks in advance. Alex.
