I am testing decision tree using iris.scale data set (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#iris) In the data set there are three class labels 1, 2, and 3. However in the following code, I have to make numClasses = 4. I will get an ArrayIndexOutOfBound Exception if I make the numClasses = 3. Why?
var conf = new SparkConf().setAppName("DecisionTree") var sc = new SparkContext(conf) val data = MLUtils.loadLibSVMFile(sc,"data/iris.scale.txt"); val numClasses = 4; val categoricalFeaturesInfo = Map[Int,Int](); val impurity = "gini"; val maxDepth = 5; val maxBins = 100; val model = DecisionTree.trainClassifier(data, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins); val labelAndPreds = data.map{ point => val prediction = model.predict(point.features); (point.label, prediction) } val trainErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / data.count; println("Training Error = " + trainErr); println("Learned classification tree model:\n" + model); -Yao