There isn's a way to change the block size of an existing file. The
block size of a file can be specified only at the time of file
creation and cannot be changed later.
There isn't any wasted space in your system. If the block size is
128MB but you create a HDFS file of say size 10MB, then that
Hi list,
I want to read a directory of text files using mappers, can each mapper's
key is a text file name, and value is the content of corresponding file, not
just a line?
It's seems that the MultiFileInputFormat may do this job, how can I use it?
Thanks,
Jeremy
--
My research interests are
Hi,
You can realize a huge improvement by sticking them into a sequence
file. With lots of small files, name lookups against the name node will
be a big bottleneck.
One easy approach is making the key be a Text of the filename that was
loaded in, and the value be a BytesWritable, which is
Hello,
I have a similar scenario to jkupferman's situation - 1000's of files
mostly ranging from Kb,some MBs and few of which GBs. I am not too
familiar with java and am using
hadoopstreaming with python. The mapper must work on individual files.
I've placed the 1000's of
Aah, use org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat as
the inputformat.
Thanks
Saptarshi
On May 18, 2008, at 11:17 PM, Saptarshi Guha wrote:
Hello,
I have a similar scenario to jkupferman's situation - 1000's of
files mostly ranging from Kb,some MBs and few of which GBs. I
Hi Hadoop :)
Please start off with hadoop wiki @ http://wiki.apache.org/hadoop/
Good luck!
On Sun, May 18, 2008 at 4:02 PM, Hadoop [EMAIL PROTECTED] wrote:
How does one learn to program in Hadoop?
What do you suggest?
Where I can start?
--
View this message in context: