I am able to normalize a given data say 100,1:2:3 101,2:3:4 into 100 1 100 2 100 3 101 2 101 3 101 4
How to do binning for a numerical data say iris.csv. I worked out the maths behind it Iris DataSet: http://archive.ics.uci.edu/ml/datasets/Iris 1. find out the minimum and maximum values of each attribute in the data set. Sepal Length Sepal Width Petal Length Petal Width Min 4.3 2.0 1.0 0.1 Max 7.9 4.4 6.9 2.5 Then, we should divide the data values of each attributes into 'n' buckets . Say, n=5. Bucket Width= (Max - Min) /n Eg: Sepal Length = (7.9-4.3)/5 = 0.72 So, the intervals will be as follows : 4.3 - 5.02 5.02 - 5.74 Likewise, 5.74 -6.46 6.46 - 7.18 7.18- 7.9 continue for all attributes How to do the same in Mapreduce . -- *Thanks & Regards* Unmesha Sreeveni U.B
