Re: Fwd: Model weights of linear regression becomes abnormal values

2015-05-29 Thread Petar Zecevic


You probably need to scale the values in the data set so that they are 
all of comparable ranges and translate them so that their means get to 0.


You can use pyspark.mllib.feature.StandardScaler(True, True) object for 
that.


On 28.5.2015. 6:08, Maheshakya Wijewardena wrote:


Hi,

I'm trying to use Sparks' *LinearRegressionWithSGD* in PySpark with 
the attached dataset. The code is attached. When I check the model 
weights vector after training, it contains `nan` values.

[nan,nan,nan,nan,nan,nan,nan,nan]
But for some data sets, this problem does not occur. What might be the reason 
for this?
Is this an issue with the data I'm using or a bug?
Best regards.
--
Pruthuvi Maheshakya Wijewardena
Software Engineer
WSO2 Lanka (Pvt) Ltd
Email: mahesha...@wso2.com mailto:mahesha...@wso2.com
Mobile: +94711228855/*
*/




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




Fwd: Model weights of linear regression becomes abnormal values

2015-05-27 Thread Maheshakya Wijewardena
Hi,

I'm trying to use Sparks' *LinearRegressionWithSGD* in PySpark with the
attached dataset. The code is attached. When I check the model weights
vector after training, it contains `nan` values.

[nan,nan,nan,nan,nan,nan,nan,nan]

But for some data sets, this problem does not occur. What might be the
reason for this?
Is this an issue with the data I'm using or a bug?

Best regards.

-- 
Pruthuvi Maheshakya Wijewardena
Software Engineer
WSO2 Lanka (Pvt) Ltd
Email: mahesha...@wso2.com
Mobile: +94711228855
6,148,72,35,0,336,627,50,1
1,85,66,29,0,266,351,31,0
8,183,64,0,0,233,672,32,1
1,89,66,23,94,281,167,21,0
0,137,40,35,168,431,2288,33,1
5,116,74,0,0,256,201,30,0
3,78,50,32,88,310,248,26,1
10,115,0,0,0,353,134,29,0
2,197,70,45,543,305,158,53,1
8,125,96,0,0,0,232,54,1
4,110,92,0,0,376,191,30,0
10,168,74,0,0,380,537,34,1
10,139,80,0,0,271,1441,57,0
1,189,60,23,846,301,398,59,1
5,166,72,19,175,258,587,51,1
7,100,0,0,0,300,484,32,1
0,118,84,47,230,458,551,31,1
7,107,74,0,0,296,254,31,1
1,103,30,38,83,433,183,33,0
1,115,70,30,96,346,529,32,1
3,126,88,41,235,393,704,27,0
import sys
from pyspark import SparkContext
from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD
from numpy import array

# Load and parse data
def parse_point(line):
values = [float(x) for x in line.split(',')]
return LabeledPoint(values[0], values[1:])

sc = SparkContext(appName='LinearRegression')
# Add path to your dataset.
data = sc.textFile('dummy_data_sest.csv')
parsedData = data.map(parse_point)

# Build the model
model = LinearRegressionWithSGD.train(parsedData)

# Check model weight vector
print(model.weights)
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org