Re: dfs.block.size vs avg block size

2008-05-18 Thread Dhruba Borthakur
There isn's a way to change the block size of an existing file. The block size of a file can be specified only at the time of file creation and cannot be changed later. There isn't any wasted space in your system. If the block size is 128MB but you create a HDFS file of say size 10MB, then that

can each mapper's key is a file name, and value is the content of corresponding file, not just a line?

2008-05-18 Thread Jeremy Chow
Hi list, I want to read a directory of text files using mappers, can each mapper's key is a text file name, and value is the content of corresponding file, not just a line? It's seems that the MultiFileInputFormat may do this job, how can I use it? Thanks, Jeremy -- My research interests are

Re: Handling Large Number Of Files, Fastest Way

2008-05-18 Thread Brian Vargas
Hi, You can realize a huge improvement by sticking them into a sequence file. With lots of small files, name lookups against the name node will be a big bottleneck. One easy approach is making the key be a Text of the filename that was loaded in, and the value be a BytesWritable, which is

Large(Thousands) of files -fast

2008-05-18 Thread Saptarshi Guha
Hello, I have a similar scenario to jkupferman's situation - 1000's of files mostly ranging from Kb,some MBs and few of which GBs. I am not too familiar with java and am using hadoopstreaming with python. The mapper must work on individual files. I've placed the 1000's of

Re: Large(Thousands) of files -fast

2008-05-18 Thread Saptarshi Guha
Aah, use org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat as the inputformat. Thanks Saptarshi On May 18, 2008, at 11:17 PM, Saptarshi Guha wrote: Hello, I have a similar scenario to jkupferman's situation - 1000's of files mostly ranging from Kb,some MBs and few of which GBs. I

Re: How does one learn to program in Hadoop?

2008-05-18 Thread Akshar
Hi Hadoop :) Please start off with hadoop wiki @ http://wiki.apache.org/hadoop/ Good luck! On Sun, May 18, 2008 at 4:02 PM, Hadoop [EMAIL PROTECTED] wrote: How does one learn to program in Hadoop? What do you suggest? Where I can start? -- View this message in context: