Hi Saif, Please show your complete code and data that can make others help you diagnose the issue more efficiency if your data is not private. >From the log you pasted, I suspect you set mistake columns for "feature" and "label".
2015-12-03 4:50 GMT+08:00 <saif.a.ell...@wellsfargo.com>: > *Data:* > > +-------------------+--------------------+ > | label| features| > +-------------------+--------------------+ > |0.13271745268556925|[-0.2006809895664...| > |0.23956421080605234|[-0.0938342314459...| > |0.47464690691431843|[0.14124846466227...| > | 0.0941426858669834|[-0.2392557563850...| > |0.18127172833957172|[-0.1521267139124...| > | 0.4279981695794981|[0.09459972732745...| > |0.04648603521554342|[-0.2869124070364...| > | 0.4164836719056925|[0.08308522965365...| > |0.15519130823516833|[-0.1782071340168...| > |0.34583751349139175|[0.01243907123934...| > | 0.5732358988284585|[0.2398374565764162]| > |0.12352025893247957|[-0.2098781833195...| > | 0.672220700788423|[0.3388222585363807]| > |0.11796247818430779|[-0.2154359640677...| > |0.32647852580932724|[-0.0069199164427...| > |0.09211654339348248|[-0.2412818988585...| > | 0.4907542977669017|[0.15735585551485...| > | 0.3255888257160203|[-0.0078096165360...| > | 0.8542890157811815|[0.5208905735291393]| > | 0.1132558594215048|[-0.2201425828305...| > +-------------------+--------------------+ > only showing top 20 rows > > val model = lr.fit(data) > val predict_data = model.transform(data) > > +--------------------+-------------------+-------------------+ > | features| label| predicted_label| > +--------------------+-------------------+-------------------+ > |[-0.2006809895664...|0.13271745268556925|0.13271745268556925| > |[-0.0938342314459...|0.23956421080605234|0.23956421080605234| > |[0.14124846466227...|0.47464690691431843|0.47464690691431843| > |[-0.2392557563850...| 0.0941426858669834| 0.0941426858669834| > |[-0.1521267139124...|0.18127172833957172|0.18127172833957172| > |[0.09459972732745...| 0.4279981695794981| 0.4279981695794981| > |[-0.2869124070364...|0.04648603521554342| 0.0464860352155434| > |[0.08308522965365...| 0.4164836719056925| 0.4164836719056925| > |[-0.1782071340168...|0.15519130823516833|0.15519130823516833| > |[0.01243907123934...|0.34583751349139175|0.34583751349139175| > |[0.2398374565764162]| 0.5732358988284585| 0.5732358988284585| > |[-0.2098781833195...|0.12352025893247957|0.12352025893247959| > |[0.3388222585363807]| 0.672220700788423| 0.672220700788423| > |[-0.2154359640677...|0.11796247818430779|0.11796247818430777| > |[-0.0069199164427...|0.32647852580932724|0.32647852580932724| > |[-0.2412818988585...|0.09211654339348248|0.09211654339348246| > |[0.15735585551485...| 0.4907542977669017| 0.4907542977669017| > |[-0.0078096165360...| 0.3255888257160203| 0.3255888257160203| > |[0.5208905735291393]| 0.8542890157811815| 0.8542890157811815| > |[-0.2201425828305...| 0.1132558594215048|0.11325585942150479| > +--------------------+-------------------+-------------------+ > only showing top 20 rows > > model.weights > res49: org.apache.spark.mllib.linalg.Vector = [1.0] > > *if instead, I remove the intercept:* > > val zz = lrr.setFitIntercept(*false*).fit(vnt_data) > zz.transform(vnt_data).select(scnd_feat_col, scnd_lab_col, > scnd_pred_col).show > > +--------------------+-------------------+--------------------+ > | features| label| predicted_label| > +--------------------+-------------------+--------------------+ > |[-0.2006809895664...|0.13271745268556925|-0.20472873432747501| > |[-0.0938342314459...|0.23956421080605234|-0.09572687219665929| > |[0.14124846466227...|0.47464690691431843| 0.1440974526709132| > |[-0.2392557563850...| 0.0941426858669834|-0.24408155596148765| > |[-0.1521267139124...|0.18127172833957172|-0.15519511670726388| > |[0.09459972732745...| 0.4279981695794981| 0.09650780816515314| > |[-0.2869124070364...|0.04648603521554342|-0.29269944344167753| > |[0.08308522965365...| 0.4164836719056925| 0.0847610625453144| > |[-0.1782071340168...|0.15519130823516833| -0.1818015800809893| > |[0.01243907123934...|0.34583751349139175|0.012689967876592361| > |[0.2398374565764162]| 0.5732358988284585| 0.24467498907237623| > |[-0.2098781833195...|0.12352025893247957|-0.21411143590026604| > |[0.3388222585363807]| 0.672220700788423| 0.3456563190264363| > |[-0.2154359640677...|0.11796247818430779| -0.2197813173409589| > |[-0.0069199164427...|0.32647852580932724|-0.00705949147465...| > |[-0.2412818988585...|0.09211654339348248|-0.24614856582157998| > |[0.15735585551485...| 0.4907542977669017| 0.16052973033553486| > |[-0.0078096165360...| 0.3255888257160203|-0.00796713685963...| > |[0.5208905735291393]| 0.8542890157811815| 0.5313969602806332| > |[-0.2201425828305...| 0.1132558594215048|-0.22458286882001133| > +--------------------+-------------------+--------------------+ > only showing top 20 rows > > *makes much more sense* > > Thanks for the help, > saif > >