Hello, I am trying to run LinearRegression on a dummy data set, given below. Here I tried all different settings but I am still failing to reproduce desired coefficients.
Please help me out, as I facing the same problem in my actual dataset. Thank you. This dataset is generated based on the simple equation: Y = 4 + (2 * x1) + (3 * x2) *Data:* y,x1,x2 6.3,1,0.1 8.6,2,0.2 10.9,3,0.3 13.8,4,0.6 16.4,5,0.8 19.6,6,1.2 22.8,7,1.6 25.7,8,1.9 28.3,9,2.1 31.2,10,2.4 34.1,11,2.7 *Spark Code:* val data = sc.textFile("Data/tempData_1.csv" ) val parsedData = data.mapPartitions(_.drop(1)).map { line => val parts = line.split(',') LabeledPoint(parts(0).toDouble,Vectors.dense(Array(1.0,parts(1).toDouble,parts(2).toDouble))) }.cache() var numIterations = 400 val step = 0.01 val algorithm = new LinearRegressionWithSGD() algorithm.setIntercept(false) //Even tried with intercept(True) and just (x1,x2) features algorithm.optimizer.setStepSize(step) algorithm.optimizer.setNumIterations(numIterations) .setUpdater(new SimpleUpdater()) //.setRegParam(0.1) .setMiniBatchFraction(1.0) val initialWeights = Vectors.dense(Array.fill(3)(scala.util.Random.nextDouble())) val model = algorithm.run(parsedData,initialWeights) println(s">>>> Model intercept: ${model.intercept}, weights: ${model.weights}") Regards, Arun