RE: How much RAMs needed...

2007-07-17 Thread Dhruba Borthakur
? Thanks, Dhruba -Original Message- From: Nguyen Kien Trung [mailto:[EMAIL PROTECTED] Sent: Monday, July 16, 2007 8:13 PM To: [email protected] Subject: Re: How much RAMs needed... Thanks Peter and Ted, your explanations do make some sense to me. The out of memory error is

Re: How much RAMs needed...

2007-07-16 Thread Nguyen Kien Trung
Thanks Peter and Ted, your explanations do make some sense to me. The out of memory error is as follows: java.lang.OutOfMemoryError at sun.misc.Unsafe.allocateMemory(Native Method) at java.nio.DirectByteBuffer.(DirectByteBuffer.java:99) at java.nio.ByteBuffer.allocateDirect(B

Re: How much RAMs needed...

2007-07-15 Thread Ted Dunning
Peter is pointing out that he was able to process the equivalent of many small files using very modest hardware (smaller than your hardware). This is confirmation that you need to combine your inputs into larger chunks. On 7/15/07 7:07 PM, "Nguyen Kien Trung" <[EMAIL PROTECTED]> wrote: > Hi Pe

Re: How much RAMs needed...

2007-07-15 Thread Ted Dunning
HDFS can't really do the combination into larger files, but if you can do that, it will help quite a bit. You might need a custom InputFormat or split to make it all sing, but you should be much better off with fewer large input files. One of the biggest advantages will be that your disk reading

Re: How much RAMs needed...

2007-07-15 Thread Peter W .
Trung, Someone more knowledgeable will need to help. It's my very simple understanding that Hadoop DFS creates multiple blocks for every file being processed. The JVM heap being exceeded could possibly be a file handle issue instead of being due to overall block count. In other words, your nam

Re: How much RAMs needed...

2007-07-15 Thread Nguyen Kien Trung
Hi Peter, I appreciate for the info. I'm afraid I'm not getting what you mean. The issue I've encountered is i'm not able to start up the namenode due to out of memory error. Given that there are huge number of tiny files in datanodes. Cheers, Trung Peter W. wrote: Trung, Using one machin

Re: How much RAMs needed...

2007-07-15 Thread Peter W.
Trung, Using one machine (with 2GB RAM) and 300 input files I was able to successfully run: INFO mapred.JobClient: Map input records=10785802 Map output records=10785802 Map input bytes=1302175673 Map output bytes=1101864522 Reduce input groups=1244034 Reduce input records=10785802 Reduce outpu

Re: How much RAMs needed...

2007-07-15 Thread Nguyen Kien Trung
Thanks Ted, Unfortunately, those files are really tiny files. Is it a good practice if HDFS can combine those tiny files into a single block which fits a standard size of 64M? Ted Dunning wrote: Are these really tiny files, or are you really storing 2M x 100MB = 200TB of data? Or do you have

Re: How much RAMs needed...

2007-07-15 Thread Ted Dunning
Are these really tiny files, or are you really storing 2M x 100MB = 200TB of data? Or do you have more like 2M x 10KB = 20GB of data? Map-reduce and HDFS will generally work much better if you can arrange to have relatively larger files. On 7/15/07 8:04 AM, "erolagnab" <[EMAIL PROTECTED]> wrote

How much RAMs needed...

2007-07-15 Thread erolagnab
://www.nabble.com/How-much-RAMs-needed...-tf4082367.html#a11603027 Sent from the Hadoop Users mailing list archive at Nabble.com.