Thanks dbtsai for the info. Are you using the case class for: Case(response, vec) => ?
Also, what library do I need to import to use .toBreeze ? Thanks, tri -----Original Message----- From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com] Sent: Friday, December 12, 2014 3:27 PM To: Bui, Tri Cc: user@spark.apache.org Subject: Re: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression? You can do something like the following. val rddVector = input.map({ case (response, vec) => { val newVec = MLUtils.appendBias(vec) newVec.toBreeze(newVec.size - 1) = response newVec } } val scalerWithResponse = new StandardScaler(true, true).fit(rddVector) val trainingData = scalerWithResponse.transform(rddVector).map(x=> { (x(x.size - 1), Vectors.dense(x.toArray.slice(0, x.size -1)) }) Sincerely, DB Tsai ------------------------------------------------------- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Fri, Dec 12, 2014 at 12:23 PM, Bui, Tri <tri....@verizonwireless.com> wrote: > Thanks for the info. > > How do I use StandardScaler() to scale example data (10246.0,[14111.0,1.0]) ? > > Thx > tri > > -----Original Message----- > From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com] > Sent: Friday, December 12, 2014 1:26 PM > To: Bui, Tri > Cc: user@spark.apache.org > Subject: Re: Do I need to applied feature scaling via StandardScaler for > LBFGS for Linear Regression? > > It seems that your response is not scaled which will cause issue in LBFGS. > Typically, people train Linear Regression with zero-mean/unit-variable > feature and response without training the intercept. Since the response is > zero-mean, the intercept will be always zero. When you convert the > coefficients to the oringal space from the scaled space, the intercept can be > computed by w0 = y - \sum <x_n> w_n where <x_n> is the average of column n. > > Sincerely, > > DB Tsai > ------------------------------------------------------- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai > > > On Fri, Dec 12, 2014 at 10:49 AM, Bui, Tri <tri....@verizonwireless.com> > wrote: >> Thanks for the confirmation. >> >> Fyi..The code below works for similar dataset, but with the feature >> magnitude changed, LBFGS converged to the right weights. >> >> Example, time sequential Feature value 1, 2, 3, 4, 5, would generate the >> error while sequential feature 14111, 14112, 14113,14115 would converge to >> the right weight. Why? >> >> Below is code to implement standardscaler() for sample data >> (10246.0,[14111.0,1.0])): >> >> val scaler1 = new StandardScaler().fit(train.map(x => x.features)) >> val >> train1 = train.map(x => (x.label, scaler1.transform(x.features))) >> >> But I keeps on getting error: "value features is not a member of (Double, >> org.apache.spark.mllib.linalg.Vector)" >> >> Should my feature vector be .toInt instead of Double? >> >> Also, the error org.apache.spark.mllib.linalg.Vector should have an >> "s" to match import library org.apache.spark.mllib.linalg.Vectors >> >> Thanks >> Tri >> >> >> >> >> >> -----Original Message----- >> From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com] >> Sent: Friday, December 12, 2014 12:16 PM >> To: Bui, Tri >> Cc: user@spark.apache.org >> Subject: Re: Do I need to applied feature scaling via StandardScaler for >> LBFGS for Linear Regression? >> >> You need to do the StandardScaler to help the convergency yourself. >> LBFGS just takes whatever objective function you provide without doing any >> scaling. I will like to provide LinearRegressionWithLBFGS which does the >> scaling internally in the nearly feature. >> >> Sincerely, >> >> DB Tsai >> ------------------------------------------------------- >> My Blog: https://www.dbtsai.com >> LinkedIn: https://www.linkedin.com/in/dbtsai >> >> >> On Fri, Dec 12, 2014 at 8:49 AM, Bui, Tri >> <tri....@verizonwireless.com.invalid> wrote: >>> Hi, >>> >>> >>> >>> Trying to use LBFGS as the optimizer, do I need to implement feature >>> scaling via StandardScaler or does LBFGS do it by default? >>> >>> >>> >>> Following code generated error “ Failure again! Giving up and >>> returning, Maybe the objective is just poorly behaved ?”. >>> >>> >>> >>> val data = sc.textFile("file:///data/Train/final2.train") >>> >>> val parsedata = data.map { line => >>> >>> val partsdata = line.split(',') >>> >>> LabeledPoint(partsdata(0).toDouble, Vectors.dense(partsdata(1).split(' >>> ').map(_.toDouble))) >>> >>> } >>> >>> >>> >>> val train = parsedata.map(x => (x.label, >>> MLUtils.appendBias(x.features))).cache() >>> >>> >>> >>> val numCorrections = 10 >>> >>> val convergenceTol = 1e-4 >>> >>> val maxNumIterations = 50 >>> >>> val regParam = 0.1 >>> >>> val initialWeightsWithIntercept = Vectors.dense(new >>> Array[Double](2)) >>> >>> >>> >>> val (weightsWithIntercept, loss) = LBFGS.runLBFGS(train, >>> >>> new LeastSquaresGradient(), >>> >>> new SquaredL2Updater(), >>> >>> numCorrections, >>> >>> convergenceTol, >>> >>> maxNumIterations, >>> >>> regParam, >>> >>> initialWeightsWithIntercept) >>> >>> >>> >>> Did I implement LBFGS for Linear Regression via “LeastSquareGradient()” >>> correctly? >>> >>> >>> >>> Thanks >>> >>> Tri >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For >> additional commands, e-mail: user-h...@spark.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For > additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org