Hi Hadoop users, I am running a m/r job with an input file of 23 million records. I can see all our files are not getting used.
What can I change to utilize all nodes? Containers Mem Used Mem Avail Vcores used Vcores avail 8 11.25 GB 0 B 8 0 0 0 B 11.25 GB 0 8 0 0 B 11.25 GB 0 8 8 11.25 GB 0 B 8 0 8 11.25 GB 0 B 8 0 7 11.25 GB 0 B 7 1 5 7.03 GB 4.22 GB 5 3 0 0 B 11.25 GB 0 8 0 0 B 11.25 GB 0 8 My command looks like - hadoop jar target/pooled-time-series-1.0-SNAPSHOT-jar-with-dependencies.jar gov.nasa.jpl.memex.pooledtimeseries.MeanChiSquareDistanceCalculation /user/pts/output/MeanChiSquareAndSimilarityInput /user/pts/output/ MeanChiSquaredCalcOutput Directory - */user/pts/output/MeanChiSquareAndSimilarityInput* have a input file of 23 m records. File size is ~3 GB Code - https://github.com/smadha/pooled_time_series/blob/ master/src/main/java/gov/nasa/jpl/memex/pooledtimeseries/ MeanChiSquareDistanceCalculation.java#L135 -- Madhav Sharan