hi to all, Good Morning, I had a set of 378documents in the form of text files of size 200-500MB and loaded in hdfs,when running hadoop streaming map/reduce funtion from command line of hdfs ,it took 48mins 43sec for streaming the text files.How to increase the map/reduce process as fast as possible so that these text files should complete the process by 10-15 seconds. What changes I need to do on hadoop 2.0 with mapreduce2 and yarn And having cores = 2 Allocated 2GB of data for Yarn,and 400GB for HDFS default virtual memory for a job map-task = 1024MB default virtual memory for a job reduce-task = 512MB mapreduce.map.java.opt = -xmx512m mapreduce.reduce.java.opt = -xmx256m MAP side sort buffer memory = 256 MB And using only yarn = 75% for this process
Thanks & regards, Bodla Dharani Kumar,
