Hi Hadoop users,

I am running a m/r job with an input file of 23 million records. I can see
all our files are not getting used.

What can I change to utilize all nodes?


Containers Mem Used Mem Avail Vcores used Vcores avail
8 11.25 GB 0 B 8 0
0 0 B 11.25 GB 0 8
0 0 B 11.25 GB 0 8
8 11.25 GB 0 B 8 0
8 11.25 GB 0 B 8 0
7 11.25 GB 0 B 7 1
5 7.03 GB 4.22 GB 5 3
0 0 B 11.25 GB 0 8
0 0 B 11.25 GB 0 8


My command looks like -

hadoop jar target/pooled-time-series-1.0-SNAPSHOT-jar-with-dependencies.jar
gov.nasa.jpl.memex.pooledtimeseries.MeanChiSquareDistanceCalculation
/user/pts/output/MeanChiSquareAndSimilarityInput /user/pts/output/
MeanChiSquaredCalcOutput

Directory - */user/pts/output/MeanChiSquareAndSimilarityInput* have a input
file of 23 m records. File size is ~3 GB

Code - https://github.com/smadha/pooled_time_series/blob/
master/src/main/java/gov/nasa/jpl/memex/pooledtimeseries/
MeanChiSquareDistanceCalculation.java#L135


--
Madhav Sharan

Reply via email to