I am able to normalize a given data say
100,1:2:3
101,2:3:4

into
100 1
100 2
100 3
101 2
101 3
101 4

How to do binning for a numerical data say iris.csv.

I worked out the maths behind it
Iris DataSet:  http://archive.ics.uci.edu/ml/datasets/Iris
1. find out the minimum and maximum values of each attribute
in the data set.

             Sepal Length Sepal Width Petal Length Petal Width
Min            4.3                2.0             1.0                0.1
Max            7.9               4.4             6.9                2.5

Then, we should divide the data values of each attributes into 'n' buckets .
Say, n=5.
Bucket Width= (Max - Min) /n


Eg: Sepal Length
= (7.9-4.3)/5
= 0.72
So, the intervals will be as follows :
4.3 -   5.02
5.02 - 5.74
Likewise,
5.74 -6.46
6.46 - 7.18
7.18- 7.9
continue for all attributes
How to do the same in Mapreduce .



-- 
*Thanks & Regards*

Unmesha Sreeveni U.B

Reply via email to