RE: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-18 Thread Bui, Tri
Thanks dbtsai for the info.

Are you using the case class for:
Case(response, vec) = ?

Also, what library do I need to import to use .toBreeze ?

Thanks, 
tri

-Original Message-
From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com] 
Sent: Friday, December 12, 2014 3:27 PM
To: Bui, Tri
Cc: user@spark.apache.org
Subject: Re: Do I need to applied feature scaling via StandardScaler for LBFGS 
for Linear Regression?

You can do something like the following.

val rddVector = input.map({
  case (response, vec) = {
val newVec = MLUtils.appendBias(vec)
newVec.toBreeze(newVec.size - 1) = response
newVec
  }
}

val scalerWithResponse = new StandardScaler(true, true).fit(rddVector)

val trainingData =  scalerWithResponse.transform(rddVector).map(x= {
  (x(x.size - 1), Vectors.dense(x.toArray.slice(0, x.size -1))
})

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Fri, Dec 12, 2014 at 12:23 PM, Bui, Tri tri@verizonwireless.com wrote:
 Thanks for the info.

 How do I use StandardScaler() to scale example data  (10246.0,[14111.0,1.0]) ?

 Thx
 tri

 -Original Message-
 From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com]
 Sent: Friday, December 12, 2014 1:26 PM
 To: Bui, Tri
 Cc: user@spark.apache.org
 Subject: Re: Do I need to applied feature scaling via StandardScaler for 
 LBFGS for Linear Regression?

 It seems that your response is not scaled which will cause issue in LBFGS. 
 Typically, people train Linear Regression with zero-mean/unit-variable 
 feature and response without training the intercept. Since the response is 
 zero-mean, the intercept will be always zero. When you convert the 
 coefficients to the oringal space from the scaled space, the intercept can be 
 computed by w0 = y - \sum x_n w_n where x_n is the average of column n.

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Fri, Dec 12, 2014 at 10:49 AM, Bui, Tri tri@verizonwireless.com 
 wrote:
 Thanks for the confirmation.

 Fyi..The code below works for similar dataset, but with the feature 
 magnitude changed,  LBFGS converged to the right weights.

 Example, time sequential Feature value 1, 2, 3, 4, 5, would generate the 
 error while sequential feature 14111, 14112, 14113,14115 would converge to  
 the right weight.  Why?

 Below is code to implement standardscaler() for sample data  
 (10246.0,[14111.0,1.0])):

 val scaler1 = new StandardScaler().fit(train.map(x = x.features)) 
 val
 train1 = train.map(x = (x.label, scaler1.transform(x.features)))

 But I  keeps on getting error: value features is not a member of (Double, 
 org.apache.spark.mllib.linalg.Vector)

 Should my feature vector be .toInt instead of Double?

 Also, the error  org.apache.spark.mllib.linalg.Vector  should have an 
 s to match import library org.apache.spark.mllib.linalg.Vectors

 Thanks
 Tri





 -Original Message-
 From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com]
 Sent: Friday, December 12, 2014 12:16 PM
 To: Bui, Tri
 Cc: user@spark.apache.org
 Subject: Re: Do I need to applied feature scaling via StandardScaler for 
 LBFGS for Linear Regression?

 You need to do the StandardScaler to help the convergency yourself.
 LBFGS just takes whatever objective function you provide without doing any 
 scaling. I will like to provide LinearRegressionWithLBFGS which does the 
 scaling internally in the nearly feature.

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Fri, Dec 12, 2014 at 8:49 AM, Bui, Tri 
 tri@verizonwireless.com.invalid wrote:
 Hi,



 Trying to use LBFGS as the optimizer, do I need to implement feature 
 scaling via StandardScaler or does LBFGS do it by default?



 Following code  generated error “ Failure again!  Giving up and 
 returning, Maybe the objective is just poorly behaved ?”.



 val data = sc.textFile(file:///data/Train/final2.train)

 val parsedata = data.map { line =

 val partsdata = line.split(',')

 LabeledPoint(partsdata(0).toDouble, Vectors.dense(partsdata(1).split('
 ').map(_.toDouble)))

 }



 val train = parsedata.map(x = (x.label,
 MLUtils.appendBias(x.features))).cache()



 val numCorrections = 10

 val convergenceTol = 1e-4

 val maxNumIterations = 50

 val regParam = 0.1

 val initialWeightsWithIntercept = Vectors.dense(new 
 Array[Double](2))



 val (weightsWithIntercept, loss) = LBFGS.runLBFGS(train,

   new LeastSquaresGradient(),

   new SquaredL2Updater(),

   numCorrections,

   convergenceTol,

   maxNumIterations,

   regParam,

   initialWeightsWithIntercept)



 Did I implement LBFGS for Linear Regression via “LeastSquareGradient()”
 correctly?



 Thanks

 Tri

 -

Re: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-12 Thread DB Tsai
You need to do the StandardScaler to help the convergency yourself.
LBFGS just takes whatever objective function you provide without doing
any scaling. I will like to provide LinearRegressionWithLBFGS which
does the scaling internally in the nearly feature.

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Fri, Dec 12, 2014 at 8:49 AM, Bui, Tri
tri@verizonwireless.com.invalid wrote:
 Hi,



 Trying to use LBFGS as the optimizer, do I need to implement feature scaling
 via StandardScaler or does LBFGS do it by default?



 Following code  generated error “ Failure again!  Giving up and returning,
 Maybe the objective is just poorly behaved ?”.



 val data = sc.textFile(file:///data/Train/final2.train)

 val parsedata = data.map { line =

 val partsdata = line.split(',')

 LabeledPoint(partsdata(0).toDouble, Vectors.dense(partsdata(1).split('
 ').map(_.toDouble)))

 }



 val train = parsedata.map(x = (x.label,
 MLUtils.appendBias(x.features))).cache()



 val numCorrections = 10

 val convergenceTol = 1e-4

 val maxNumIterations = 50

 val regParam = 0.1

 val initialWeightsWithIntercept = Vectors.dense(new Array[Double](2))



 val (weightsWithIntercept, loss) = LBFGS.runLBFGS(train,

   new LeastSquaresGradient(),

   new SquaredL2Updater(),

   numCorrections,

   convergenceTol,

   maxNumIterations,

   regParam,

   initialWeightsWithIntercept)



 Did I implement LBFGS for Linear Regression via “LeastSquareGradient()”
 correctly?



 Thanks

 Tri

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-12 Thread Bui, Tri
Thanks for the confirmation.  

Fyi..The code below works for similar dataset, but with the feature magnitude 
changed,  LBFGS converged to the right weights.  

Example, time sequential Feature value 1, 2, 3, 4, 5, would generate the error 
while sequential feature 14111, 14112, 14113,14115 would converge to  the right 
weight.  Why?

Below is code to implement standardscaler() for sample data  
(10246.0,[14111.0,1.0])):

val scaler1 = new StandardScaler().fit(train.map(x = x.features))
val train1 = train.map(x = (x.label, scaler1.transform(x.features)))

But I  keeps on getting error: value features is not a member of (Double, 
org.apache.spark.mllib.linalg.Vector)

Should my feature vector be .toInt instead of Double?

Also, the error  org.apache.spark.mllib.linalg.Vector  should have an s to 
match import library org.apache.spark.mllib.linalg.Vectors

Thanks
Tri





-Original Message-
From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com] 
Sent: Friday, December 12, 2014 12:16 PM
To: Bui, Tri
Cc: user@spark.apache.org
Subject: Re: Do I need to applied feature scaling via StandardScaler for LBFGS 
for Linear Regression?

You need to do the StandardScaler to help the convergency yourself.
LBFGS just takes whatever objective function you provide without doing any 
scaling. I will like to provide LinearRegressionWithLBFGS which does the 
scaling internally in the nearly feature.

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Fri, Dec 12, 2014 at 8:49 AM, Bui, Tri tri@verizonwireless.com.invalid 
wrote:
 Hi,



 Trying to use LBFGS as the optimizer, do I need to implement feature 
 scaling via StandardScaler or does LBFGS do it by default?



 Following code  generated error “ Failure again!  Giving up and 
 returning, Maybe the objective is just poorly behaved ?”.



 val data = sc.textFile(file:///data/Train/final2.train)

 val parsedata = data.map { line =

 val partsdata = line.split(',')

 LabeledPoint(partsdata(0).toDouble, Vectors.dense(partsdata(1).split('
 ').map(_.toDouble)))

 }



 val train = parsedata.map(x = (x.label,
 MLUtils.appendBias(x.features))).cache()



 val numCorrections = 10

 val convergenceTol = 1e-4

 val maxNumIterations = 50

 val regParam = 0.1

 val initialWeightsWithIntercept = Vectors.dense(new Array[Double](2))



 val (weightsWithIntercept, loss) = LBFGS.runLBFGS(train,

   new LeastSquaresGradient(),

   new SquaredL2Updater(),

   numCorrections,

   convergenceTol,

   maxNumIterations,

   regParam,

   initialWeightsWithIntercept)



 Did I implement LBFGS for Linear Regression via “LeastSquareGradient()”
 correctly?



 Thanks

 Tri

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org



Re: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-12 Thread DB Tsai
It seems that your response is not scaled which will cause issue in
LBFGS. Typically, people train Linear Regression with
zero-mean/unit-variable feature and response without training the
intercept. Since the response is zero-mean, the intercept will be
always zero. When you convert the coefficients to the oringal space
from the scaled space, the intercept can be computed by w0 = y - \sum
x_n w_n where x_n is the average of column n.

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Fri, Dec 12, 2014 at 10:49 AM, Bui, Tri tri@verizonwireless.com wrote:
 Thanks for the confirmation.

 Fyi..The code below works for similar dataset, but with the feature magnitude 
 changed,  LBFGS converged to the right weights.

 Example, time sequential Feature value 1, 2, 3, 4, 5, would generate the 
 error while sequential feature 14111, 14112, 14113,14115 would converge to  
 the right weight.  Why?

 Below is code to implement standardscaler() for sample data  
 (10246.0,[14111.0,1.0])):

 val scaler1 = new StandardScaler().fit(train.map(x = x.features))
 val train1 = train.map(x = (x.label, scaler1.transform(x.features)))

 But I  keeps on getting error: value features is not a member of (Double, 
 org.apache.spark.mllib.linalg.Vector)

 Should my feature vector be .toInt instead of Double?

 Also, the error  org.apache.spark.mllib.linalg.Vector  should have an s to 
 match import library org.apache.spark.mllib.linalg.Vectors

 Thanks
 Tri





 -Original Message-
 From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com]
 Sent: Friday, December 12, 2014 12:16 PM
 To: Bui, Tri
 Cc: user@spark.apache.org
 Subject: Re: Do I need to applied feature scaling via StandardScaler for 
 LBFGS for Linear Regression?

 You need to do the StandardScaler to help the convergency yourself.
 LBFGS just takes whatever objective function you provide without doing any 
 scaling. I will like to provide LinearRegressionWithLBFGS which does the 
 scaling internally in the nearly feature.

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Fri, Dec 12, 2014 at 8:49 AM, Bui, Tri 
 tri@verizonwireless.com.invalid wrote:
 Hi,



 Trying to use LBFGS as the optimizer, do I need to implement feature
 scaling via StandardScaler or does LBFGS do it by default?



 Following code  generated error “ Failure again!  Giving up and
 returning, Maybe the objective is just poorly behaved ?”.



 val data = sc.textFile(file:///data/Train/final2.train)

 val parsedata = data.map { line =

 val partsdata = line.split(',')

 LabeledPoint(partsdata(0).toDouble, Vectors.dense(partsdata(1).split('
 ').map(_.toDouble)))

 }



 val train = parsedata.map(x = (x.label,
 MLUtils.appendBias(x.features))).cache()



 val numCorrections = 10

 val convergenceTol = 1e-4

 val maxNumIterations = 50

 val regParam = 0.1

 val initialWeightsWithIntercept = Vectors.dense(new Array[Double](2))



 val (weightsWithIntercept, loss) = LBFGS.runLBFGS(train,

   new LeastSquaresGradient(),

   new SquaredL2Updater(),

   numCorrections,

   convergenceTol,

   maxNumIterations,

   regParam,

   initialWeightsWithIntercept)



 Did I implement LBFGS for Linear Regression via “LeastSquareGradient()”
 correctly?



 Thanks

 Tri

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
 commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-12 Thread Bui, Tri
Thanks for the info.

How do I use StandardScaler() to scale example data  (10246.0,[14111.0,1.0]) ?

Thx
tri

-Original Message-
From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com] 
Sent: Friday, December 12, 2014 1:26 PM
To: Bui, Tri
Cc: user@spark.apache.org
Subject: Re: Do I need to applied feature scaling via StandardScaler for LBFGS 
for Linear Regression?

It seems that your response is not scaled which will cause issue in LBFGS. 
Typically, people train Linear Regression with zero-mean/unit-variable feature 
and response without training the intercept. Since the response is zero-mean, 
the intercept will be always zero. When you convert the coefficients to the 
oringal space from the scaled space, the intercept can be computed by w0 = y - 
\sum x_n w_n where x_n is the average of column n.

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Fri, Dec 12, 2014 at 10:49 AM, Bui, Tri tri@verizonwireless.com wrote:
 Thanks for the confirmation.

 Fyi..The code below works for similar dataset, but with the feature magnitude 
 changed,  LBFGS converged to the right weights.

 Example, time sequential Feature value 1, 2, 3, 4, 5, would generate the 
 error while sequential feature 14111, 14112, 14113,14115 would converge to  
 the right weight.  Why?

 Below is code to implement standardscaler() for sample data  
 (10246.0,[14111.0,1.0])):

 val scaler1 = new StandardScaler().fit(train.map(x = x.features)) val 
 train1 = train.map(x = (x.label, scaler1.transform(x.features)))

 But I  keeps on getting error: value features is not a member of (Double, 
 org.apache.spark.mllib.linalg.Vector)

 Should my feature vector be .toInt instead of Double?

 Also, the error  org.apache.spark.mllib.linalg.Vector  should have an 
 s to match import library org.apache.spark.mllib.linalg.Vectors

 Thanks
 Tri





 -Original Message-
 From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com]
 Sent: Friday, December 12, 2014 12:16 PM
 To: Bui, Tri
 Cc: user@spark.apache.org
 Subject: Re: Do I need to applied feature scaling via StandardScaler for 
 LBFGS for Linear Regression?

 You need to do the StandardScaler to help the convergency yourself.
 LBFGS just takes whatever objective function you provide without doing any 
 scaling. I will like to provide LinearRegressionWithLBFGS which does the 
 scaling internally in the nearly feature.

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Fri, Dec 12, 2014 at 8:49 AM, Bui, Tri 
 tri@verizonwireless.com.invalid wrote:
 Hi,



 Trying to use LBFGS as the optimizer, do I need to implement feature 
 scaling via StandardScaler or does LBFGS do it by default?



 Following code  generated error “ Failure again!  Giving up and 
 returning, Maybe the objective is just poorly behaved ?”.



 val data = sc.textFile(file:///data/Train/final2.train)

 val parsedata = data.map { line =

 val partsdata = line.split(',')

 LabeledPoint(partsdata(0).toDouble, Vectors.dense(partsdata(1).split('
 ').map(_.toDouble)))

 }



 val train = parsedata.map(x = (x.label,
 MLUtils.appendBias(x.features))).cache()



 val numCorrections = 10

 val convergenceTol = 1e-4

 val maxNumIterations = 50

 val regParam = 0.1

 val initialWeightsWithIntercept = Vectors.dense(new Array[Double](2))



 val (weightsWithIntercept, loss) = LBFGS.runLBFGS(train,

   new LeastSquaresGradient(),

   new SquaredL2Updater(),

   numCorrections,

   convergenceTol,

   maxNumIterations,

   regParam,

   initialWeightsWithIntercept)



 Did I implement LBFGS for Linear Regression via “LeastSquareGradient()”
 correctly?



 Thanks

 Tri

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For 
 additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org



Re: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?

2014-12-12 Thread DB Tsai
You can do something like the following.

val rddVector = input.map({
  case (response, vec) = {
val newVec = MLUtils.appendBias(vec)
newVec.toBreeze(newVec.size - 1) = response
newVec
  }
}

val scalerWithResponse = new StandardScaler(true, true).fit(rddVector)

val trainingData =  scalerWithResponse.transform(rddVector).map(x= {
  (x(x.size - 1), Vectors.dense(x.toArray.slice(0, x.size -1))
})

Sincerely,

DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Fri, Dec 12, 2014 at 12:23 PM, Bui, Tri tri@verizonwireless.com wrote:
 Thanks for the info.

 How do I use StandardScaler() to scale example data  (10246.0,[14111.0,1.0]) ?

 Thx
 tri

 -Original Message-
 From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com]
 Sent: Friday, December 12, 2014 1:26 PM
 To: Bui, Tri
 Cc: user@spark.apache.org
 Subject: Re: Do I need to applied feature scaling via StandardScaler for 
 LBFGS for Linear Regression?

 It seems that your response is not scaled which will cause issue in LBFGS. 
 Typically, people train Linear Regression with zero-mean/unit-variable 
 feature and response without training the intercept. Since the response is 
 zero-mean, the intercept will be always zero. When you convert the 
 coefficients to the oringal space from the scaled space, the intercept can be 
 computed by w0 = y - \sum x_n w_n where x_n is the average of column n.

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Fri, Dec 12, 2014 at 10:49 AM, Bui, Tri tri@verizonwireless.com 
 wrote:
 Thanks for the confirmation.

 Fyi..The code below works for similar dataset, but with the feature 
 magnitude changed,  LBFGS converged to the right weights.

 Example, time sequential Feature value 1, 2, 3, 4, 5, would generate the 
 error while sequential feature 14111, 14112, 14113,14115 would converge to  
 the right weight.  Why?

 Below is code to implement standardscaler() for sample data  
 (10246.0,[14111.0,1.0])):

 val scaler1 = new StandardScaler().fit(train.map(x = x.features)) val
 train1 = train.map(x = (x.label, scaler1.transform(x.features)))

 But I  keeps on getting error: value features is not a member of (Double, 
 org.apache.spark.mllib.linalg.Vector)

 Should my feature vector be .toInt instead of Double?

 Also, the error  org.apache.spark.mllib.linalg.Vector  should have an
 s to match import library org.apache.spark.mllib.linalg.Vectors

 Thanks
 Tri





 -Original Message-
 From: dbt...@dbtsai.com [mailto:dbt...@dbtsai.com]
 Sent: Friday, December 12, 2014 12:16 PM
 To: Bui, Tri
 Cc: user@spark.apache.org
 Subject: Re: Do I need to applied feature scaling via StandardScaler for 
 LBFGS for Linear Regression?

 You need to do the StandardScaler to help the convergency yourself.
 LBFGS just takes whatever objective function you provide without doing any 
 scaling. I will like to provide LinearRegressionWithLBFGS which does the 
 scaling internally in the nearly feature.

 Sincerely,

 DB Tsai
 ---
 My Blog: https://www.dbtsai.com
 LinkedIn: https://www.linkedin.com/in/dbtsai


 On Fri, Dec 12, 2014 at 8:49 AM, Bui, Tri 
 tri@verizonwireless.com.invalid wrote:
 Hi,



 Trying to use LBFGS as the optimizer, do I need to implement feature
 scaling via StandardScaler or does LBFGS do it by default?



 Following code  generated error “ Failure again!  Giving up and
 returning, Maybe the objective is just poorly behaved ?”.



 val data = sc.textFile(file:///data/Train/final2.train)

 val parsedata = data.map { line =

 val partsdata = line.split(',')

 LabeledPoint(partsdata(0).toDouble, Vectors.dense(partsdata(1).split('
 ').map(_.toDouble)))

 }



 val train = parsedata.map(x = (x.label,
 MLUtils.appendBias(x.features))).cache()



 val numCorrections = 10

 val convergenceTol = 1e-4

 val maxNumIterations = 50

 val regParam = 0.1

 val initialWeightsWithIntercept = Vectors.dense(new Array[Double](2))



 val (weightsWithIntercept, loss) = LBFGS.runLBFGS(train,

   new LeastSquaresGradient(),

   new SquaredL2Updater(),

   numCorrections,

   convergenceTol,

   maxNumIterations,

   regParam,

   initialWeightsWithIntercept)



 Did I implement LBFGS for Linear Regression via “LeastSquareGradient()”
 correctly?



 Thanks

 Tri

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For
 additional commands, e-mail: user-h...@spark.apache.org


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
 commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: