[jira] [Updated] (SPARK-21003) Spark Java Configuration : spark.jars.packages not working properly

2017-06-07 Thread Dulaj Rajitha (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dulaj Rajitha updated SPARK-21003:
--
Summary: Spark Java Configuration : spark.jars.packages not working 
properly  (was: Spark Packages not working properly)

> Spark Java Configuration : spark.jars.packages not working properly
> ---
>
> Key: SPARK-21003
> URL: https://issues.apache.org/jira/browse/SPARK-21003
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, Spark Core
>Affects Versions: 2.1.0
> Environment: Ubuntu 16 standalone cluster.
>Reporter: Dulaj Rajitha
>
> I am unable to load maven dependencies for spark executors using Spark 
> Configuration : "spark.jars.packages".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21003) Spark Java Configuration : spark.jars.packages not working properly

2017-06-07 Thread Dulaj Rajitha (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dulaj Rajitha updated SPARK-21003:
--
Description: I am unable to load maven dependencies for spark executors 
using SparkConfiguration : "spark.jars.packages".  (was: I am unable to load 
maven dependencies for spark executors using Spark Configuration : 
"spark.jars.packages".)

> Spark Java Configuration : spark.jars.packages not working properly
> ---
>
> Key: SPARK-21003
> URL: https://issues.apache.org/jira/browse/SPARK-21003
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, Spark Core
>Affects Versions: 2.1.0
> Environment: Ubuntu 16 standalone cluster.
>Reporter: Dulaj Rajitha
>
> I am unable to load maven dependencies for spark executors using 
> SparkConfiguration : "spark.jars.packages".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21003) Spark Packages not working properly

2017-06-07 Thread Dulaj Rajitha (JIRA)
Dulaj Rajitha created SPARK-21003:
-

 Summary: Spark Packages not working properly
 Key: SPARK-21003
 URL: https://issues.apache.org/jira/browse/SPARK-21003
 Project: Spark
  Issue Type: Bug
  Components: Java API, Spark Core
Affects Versions: 2.1.0
 Environment: Ubuntu 16 standalone cluster.
Reporter: Dulaj Rajitha


I am unable to load maven dependencies for spark executors using Spark 
Configuration : "spark.jars.packages".



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-16993) model.transform without label column in random forest regression

2016-08-16 Thread Dulaj Rajitha (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dulaj Rajitha closed SPARK-16993.
-
Resolution: Not A Problem

Not a bug in Spark.

> model.transform without label column in random forest regression
> 
>
> Key: SPARK-16993
> URL: https://issues.apache.org/jira/browse/SPARK-16993
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> I need to use a separate data set to prediction (Not as show in example's 
> training data split).
> But those data do not have the label column. (Since these data are the data 
> that needs to be predict the label).
> but model.transform is informing label column is missing.
> org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input 
> columns: [id,features,prediction]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16993) model.transform without label column in random forest regression

2016-08-16 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422597#comment-15422597
 ] 

Dulaj Rajitha edited comment on SPARK-16993 at 8/16/16 11:13 AM:
-

The issue is solved and that was not a bug.
Thank you.
There was a error in withColumn statement and I had used a column form a wrong 
data-frame.


was (Author: dulajrajitha):
The issue is solved and that was not a bug.
Thank you.
There was a error in with column statement and I had used a column form a wrong 
data-frame.

> model.transform without label column in random forest regression
> 
>
> Key: SPARK-16993
> URL: https://issues.apache.org/jira/browse/SPARK-16993
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> I need to use a separate data set to prediction (Not as show in example's 
> training data split).
> But those data do not have the label column. (Since these data are the data 
> that needs to be predict the label).
> but model.transform is informing label column is missing.
> org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input 
> columns: [id,features,prediction]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression

2016-08-16 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422597#comment-15422597
 ] 

Dulaj Rajitha commented on SPARK-16993:
---

The issue is solved and that was not a bug.
Thank you.
There was a error in with column statement and I had used a column form a wrong 
data-frame.

> model.transform without label column in random forest regression
> 
>
> Key: SPARK-16993
> URL: https://issues.apache.org/jira/browse/SPARK-16993
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> I need to use a separate data set to prediction (Not as show in example's 
> training data split).
> But those data do not have the label column. (Since these data are the data 
> that needs to be predict the label).
> but model.transform is informing label column is missing.
> org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input 
> columns: [id,features,prediction]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression

2016-08-11 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417847#comment-15417847
 ] 

Dulaj Rajitha commented on SPARK-16993:
---

But the thing is if add dummy column as as the label column, the process goes 
fine.
I could not continue without add dummy the label column for the data set that 
needs the prediction.

> model.transform without label column in random forest regression
> 
>
> Key: SPARK-16993
> URL: https://issues.apache.org/jira/browse/SPARK-16993
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> I need to use a separate data set to prediction (Not as show in example's 
> training data split).
> But those data do not have the label column. (Since these data are the data 
> that needs to be predict the label).
> but model.transform is informing label column is missing.
> org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input 
> columns: [id,features,prediction]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression

2016-08-11 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417144#comment-15417144
 ] 

Dulaj Rajitha commented on SPARK-16993:
---

Here is the scenario.
My train data set has : features,and label column
Using that I do train and get a model. (Also I do an evaluation using a split 
of the training data.)
Using the above model I need to predict for data set which has only id and 
features column.
But when using the second data frame I get the error.
So how we use the same model for different data frame for prediction after 
evaluation?

> model.transform without label column in random forest regression
> 
>
> Key: SPARK-16993
> URL: https://issues.apache.org/jira/browse/SPARK-16993
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> I need to use a separate data set to prediction (Not as show in example's 
> training data split).
> But those data do not have the label column. (Since these data are the data 
> that needs to be predict the label).
> but model.transform is informing label column is missing.
> org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input 
> columns: [id,features,prediction]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression

2016-08-10 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415102#comment-15415102
 ] 

Dulaj Rajitha commented on SPARK-16993:
---

Is there a method to use do the prediction for non evaluating purposes (Just 
predictictions).

> model.transform without label column in random forest regression
> 
>
> Key: SPARK-16993
> URL: https://issues.apache.org/jira/browse/SPARK-16993
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> I need to use a separate data set to prediction (Not as show in example's 
> training data split).
> But those data do not have the label column. (Since these data are the data 
> that needs to be predict the label).
> but model.transform is informing label column is missing.
> org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input 
> columns: [id,features,prediction]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression

2016-08-10 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415081#comment-15415081
 ] 

Dulaj Rajitha commented on SPARK-16993:
---

I do not want to evaluate. I just need to predict using the model I got from 
the regressor.fit(dataframe) method.

> model.transform without label column in random forest regression
> 
>
> Key: SPARK-16993
> URL: https://issues.apache.org/jira/browse/SPARK-16993
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> I need to use a separate data set to prediction (Not as show in example's 
> training data split).
> But those data do not have the label column. (Since these data are the data 
> that needs to be predict the label).
> but model.transform is informing label column is missing.
> org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input 
> columns: [id,features,prediction]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression

2016-08-10 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415063#comment-15415063
 ] 

Dulaj Rajitha commented on SPARK-16993:
---

When using the RandomForestRegressor.
I trained using a dataframe with the label column and got a model.
by: model = regressor.fit(trainData)

But my test data does not have a label column. (Since this is the column I need 
to be prediicted).
therefore when transforming I got a error.
model.transform(test)

> model.transform without label column in random forest regression
> 
>
> Key: SPARK-16993
> URL: https://issues.apache.org/jira/browse/SPARK-16993
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> I need to use a separate data set to prediction (Not as show in example's 
> training data split).
> But those data do not have the label column. (Since these data are the data 
> that needs to be predict the label).
> but model.transform is informing label column is missing.
> org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input 
> columns: [id,features,prediction]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16993) model.transform without label column in random forest regression

2016-08-10 Thread Dulaj Rajitha (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dulaj Rajitha updated SPARK-16993:
--
Description: 
I need to use a separate data set to prediction (Not as show in example's 
training data split).
But those data do not have the label column. (Since these data are the data 
that needs to be predict the label).
but model.transform is informing label column is missing.

org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input 
columns: [id,features,prediction]

  was:
I need to use a separate data set to prediction (Not as show in example's 
training data split).
But those data do not have the label column. (Since these data are the data 
that needs to be predict the label).
but model.transform is informing label column is missing.


> model.transform without label column in random forest regression
> 
>
> Key: SPARK-16993
> URL: https://issues.apache.org/jira/browse/SPARK-16993
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> I need to use a separate data set to prediction (Not as show in example's 
> training data split).
> But those data do not have the label column. (Since these data are the data 
> that needs to be predict the label).
> but model.transform is informing label column is missing.
> org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input 
> columns: [id,features,prediction]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16993) model.transform without label column in random forest regression

2016-08-10 Thread Dulaj Rajitha (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dulaj Rajitha updated SPARK-16993:
--
Summary: model.transform without label column in random forest regression  
(was: model.transform withot label column)

> model.transform without label column in random forest regression
> 
>
> Key: SPARK-16993
> URL: https://issues.apache.org/jira/browse/SPARK-16993
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> I need to use a separate data set to prediction (Not as show in example's 
> training data split).
> But those data do not have the label column. (Since these data are the data 
> that needs to be predict the label).
> but model.transform is informing label column is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16993) model.transform withot label column

2016-08-10 Thread Dulaj Rajitha (JIRA)
Dulaj Rajitha created SPARK-16993:
-

 Summary: model.transform withot label column
 Key: SPARK-16993
 URL: https://issues.apache.org/jira/browse/SPARK-16993
 Project: Spark
  Issue Type: Question
  Components: Java API, ML
Reporter: Dulaj Rajitha


I need to use a separate data set to prediction (Not as show in example's 
training data split).
But those data do not have the label column. (Since these data are the data 
that needs to be predict the label).
but model.transform is informing label column is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-30 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219298#comment-15219298
 ] 

Dulaj Rajitha commented on SPARK-14153:
---

Will you please give me a solution, because the training data-set I used might 
have some problem and I cannot understand what it is?

> My dataset does not provide proper predictions in ALS
> -
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-29 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215390#comment-15215390
 ] 

Dulaj Rajitha edited comment on SPARK-14153 at 3/29/16 9:28 AM:


I changed only this line which will change the data set form one to 
another..(Line 26)
final static String trainDataFile = dataPathPrefix + "train.csv";


was (Author: dulajrajitha):
I changed only this line which will chane the data set form one to 
another..(Line 26)
final static String trainDataFile = dataPathPrefix + "train.csv";

> My dataset does not provide proper predictions in ALS
> -
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-29 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213765#comment-15213765
 ] 

Dulaj Rajitha edited comment on SPARK-14153 at 3/29/16 9:28 AM:


This is the java code I used..
https://drive.google.com/file/d/0BzDPzVBAaXCYTkRFZHhJNEhpOFE/view?usp=sharing



was (Author: dulajrajitha):
This is the java code I used..
package ml.test;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.ml.evaluation.RegressionEvaluator;
import org.apache.spark.ml.param.ParamMap;
import org.apache.spark.ml.recommendation.ALS;
import org.apache.spark.ml.recommendation.ALSModel;
import org.apache.spark.ml.tuning.CrossValidator;
import org.apache.spark.ml.tuning.CrossValidatorModel;
import org.apache.spark.ml.tuning.ParamGridBuilder;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.DataTypes;

public class ALSImplicitTest
{
static JavaSparkContext jsc;

final static String sparkMaster = "spark://192.168.1.71:7077";
final static String dataPathPrefix = "hdfs://192.168.1.71/res/";

//this is the only line I chnaged to run two tests.. (Different data sets)
final static String trainDataFile = dataPathPrefix + "train.csv";
final static String testDataFile = dataPathPrefix + "test.csv";
final static String modelPath = dataPathPrefix + "als_implicit1.model";
final static String sparkLogDir = dataPathPrefix + "logs/";
static DataFrame test;
static DataFrame train;

public static void main( String[] args )
{
final int folds = 2;
final int[] ranks = { 10,20 };
double alpha = 0.01, regParam = 0.02, tuningInterval = 0.01;

final double[] alphas = prepareDoubleParams( alpha, 
tuningInterval, 1 );
final double[] regParams = prepareDoubleParams( regParam, 
tuningInterval, 1 );

prepareDataFrames();

// Build the recommendation model using ALS on the training data
ALS implicitALS = new ALS().setImplicitPrefs( true 
).setUserCol( "user" ).setItemCol( "item" )
.setRatingCol( "confidence" ).setPredictionCol( 
"prediction" );

ParamMap[] paramMaps = new ParamGridBuilder().addGrid( 
implicitALS.alpha(), alphas )
.addGrid( implicitALS.regParam(), regParams 
).addGrid( implicitALS.rank(), ranks ).build();

RegressionEvaluator evaluator = new 
RegressionEvaluator().setMetricName( "rmse" ).setLabelCol( "confidence" )
.setPredictionCol( "prediction" );

CrossValidator crossValidator = new 
CrossValidator().setEstimator( implicitALS ).setEvaluator( evaluator )
.setEstimatorParamMaps( paramMaps 
).setNumFolds( folds );

CrossValidatorModel crossValidatorModel = crossValidator.fit( 
train );

// save model
ALSModel alsModel = ( ALSModel ) 
crossValidatorModel.bestModel();
alsModel.write().overwrite().saveImpl( modelPath );

// load model
ALSModel bestModel = ALSModel.read().load( modelPath );

// predict
DataFrame predictDf = bestModel.transform( train.randomSplit( 
new double[] { 0.8, 0.2 } )[1] );
DataFrame predictions = predictDf
.withColumn( "confidence", predictDf.col( 
"confidence" ).cast( DataTypes.DoubleType ) )
.withColumn( "prediction", predictDf.col( 
"prediction" ) );
predictDf.show();

System.out.println( "Root-mean-square error = " + 
evaluator.evaluate( predictions ) );

jsc.stop();

}

private static void prepareDataFrames()
{

final SparkConf conf = new SparkConf().setAppName( "ALS-Implict 
with cross validation Model" )
.setMaster( sparkMaster ).set( 
"spark.executor.memory", "4g" ).set( "spark.eventLog.dir", sparkLogDir )
.set( "spark.eventLog.enabled", "false" );
jsc = new JavaSparkContext( conf );

jsc.addJar( dataPathPrefix + "spark-csv_2.10-1.3.0.jar" );
jsc.addJar( dataPathPrefix + "commons-csv-1.2.jar" );

final SQLContext sqlContext = new SQLContext( jsc );

DataFrame tst = sqlContext.read().format( 
"com.databricks.spark.csv" ).option( "inferSchema", "true" )
.option( "header", "true" ).load( testDataFile 
);
test = tst.withColumn( "confidence", tst.col( "confidence" 
).cast( DataTypes.DoubleType ) 

[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-28 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215393#comment-15215393
 ] 

Dulaj Rajitha commented on SPARK-14153:
---

The problem is by changing the dataset, why the best parameter matrix has NaN 
fields when I changed to my data set.
PS
(I testing for the same training set's random split : line 75)

> My dataset does not provide proper predictions in ALS
> -
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-28 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215390#comment-15215390
 ] 

Dulaj Rajitha edited comment on SPARK-14153 at 3/29/16 4:15 AM:


I changed only this line which will chane the data set form one to 
another..(Line 26)
final static String trainDataFile = dataPathPrefix + "train.csv";


was (Author: dulajrajitha):
I changed only this line which will chane the data set form one to another..
final static String trainDataFile = dataPathPrefix + "train.csv";

> My dataset does not provide proper predictions in ALS
> -
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-28 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215390#comment-15215390
 ] 

Dulaj Rajitha commented on SPARK-14153:
---

I changed only this line which will chane the data set form one to another..
final static String trainDataFile = dataPathPrefix + "train.csv";

> My dataset does not provide proper predictions in ALS
> -
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-28 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215388#comment-15215388
 ] 

Dulaj Rajitha commented on SPARK-14153:
---

This is the code:
https://drive.google.com/file/d/0BzDPzVBAaXCYTkRFZHhJNEhpOFE/view?usp=sharing


> My dataset does not provide proper predictions in ALS
> -
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-27 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213765#comment-15213765
 ] 

Dulaj Rajitha edited comment on SPARK-14153 at 3/28/16 4:23 AM:


This is the java code I used..
package ml.test;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.ml.evaluation.RegressionEvaluator;
import org.apache.spark.ml.param.ParamMap;
import org.apache.spark.ml.recommendation.ALS;
import org.apache.spark.ml.recommendation.ALSModel;
import org.apache.spark.ml.tuning.CrossValidator;
import org.apache.spark.ml.tuning.CrossValidatorModel;
import org.apache.spark.ml.tuning.ParamGridBuilder;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.DataTypes;

public class ALSImplicitTest
{
static JavaSparkContext jsc;

final static String sparkMaster = "spark://192.168.1.71:7077";
final static String dataPathPrefix = "hdfs://192.168.1.71/res/";

//this is the only line I chnaged to run two tests.. (Different data sets)
final static String trainDataFile = dataPathPrefix + "train.csv";
final static String testDataFile = dataPathPrefix + "test.csv";
final static String modelPath = dataPathPrefix + "als_implicit1.model";
final static String sparkLogDir = dataPathPrefix + "logs/";
static DataFrame test;
static DataFrame train;

public static void main( String[] args )
{
final int folds = 2;
final int[] ranks = { 10,20 };
double alpha = 0.01, regParam = 0.02, tuningInterval = 0.01;

final double[] alphas = prepareDoubleParams( alpha, 
tuningInterval, 1 );
final double[] regParams = prepareDoubleParams( regParam, 
tuningInterval, 1 );

prepareDataFrames();

// Build the recommendation model using ALS on the training data
ALS implicitALS = new ALS().setImplicitPrefs( true 
).setUserCol( "user" ).setItemCol( "item" )
.setRatingCol( "confidence" ).setPredictionCol( 
"prediction" );

ParamMap[] paramMaps = new ParamGridBuilder().addGrid( 
implicitALS.alpha(), alphas )
.addGrid( implicitALS.regParam(), regParams 
).addGrid( implicitALS.rank(), ranks ).build();

RegressionEvaluator evaluator = new 
RegressionEvaluator().setMetricName( "rmse" ).setLabelCol( "confidence" )
.setPredictionCol( "prediction" );

CrossValidator crossValidator = new 
CrossValidator().setEstimator( implicitALS ).setEvaluator( evaluator )
.setEstimatorParamMaps( paramMaps 
).setNumFolds( folds );

CrossValidatorModel crossValidatorModel = crossValidator.fit( 
train );

// save model
ALSModel alsModel = ( ALSModel ) 
crossValidatorModel.bestModel();
alsModel.write().overwrite().saveImpl( modelPath );

// load model
ALSModel bestModel = ALSModel.read().load( modelPath );

// predict
DataFrame predictDf = bestModel.transform( train.randomSplit( 
new double[] { 0.8, 0.2 } )[1] );
DataFrame predictions = predictDf
.withColumn( "confidence", predictDf.col( 
"confidence" ).cast( DataTypes.DoubleType ) )
.withColumn( "prediction", predictDf.col( 
"prediction" ) );
predictDf.show();

System.out.println( "Root-mean-square error = " + 
evaluator.evaluate( predictions ) );

jsc.stop();

}

private static void prepareDataFrames()
{

final SparkConf conf = new SparkConf().setAppName( "ALS-Implict 
with cross validation Model" )
.setMaster( sparkMaster ).set( 
"spark.executor.memory", "4g" ).set( "spark.eventLog.dir", sparkLogDir )
.set( "spark.eventLog.enabled", "false" );
jsc = new JavaSparkContext( conf );

jsc.addJar( dataPathPrefix + "spark-csv_2.10-1.3.0.jar" );
jsc.addJar( dataPathPrefix + "commons-csv-1.2.jar" );

final SQLContext sqlContext = new SQLContext( jsc );

DataFrame tst = sqlContext.read().format( 
"com.databricks.spark.csv" ).option( "inferSchema", "true" )
.option( "header", "true" ).load( testDataFile 
);
test = tst.withColumn( "confidence", tst.col( "confidence" 
).cast( DataTypes.DoubleType ) ).cache();

DataFrame trn = sqlContext.read().format( 
"com.databricks.spark.csv" ).option( "inferSchema", "true" )
 

[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-27 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213765#comment-15213765
 ] 

Dulaj Rajitha edited comment on SPARK-14153 at 3/28/16 4:22 AM:


This is the java code I used..
package ml.test;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.ml.evaluation.RegressionEvaluator;
import org.apache.spark.ml.param.ParamMap;
import org.apache.spark.ml.recommendation.ALS;
import org.apache.spark.ml.recommendation.ALSModel;
import org.apache.spark.ml.tuning.CrossValidator;
import org.apache.spark.ml.tuning.CrossValidatorModel;
import org.apache.spark.ml.tuning.ParamGridBuilder;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.DataTypes;

public class ALSImplicitTest
{
static JavaSparkContext jsc;

final static String sparkMaster = "spark://192.168.1.71:7077";
final static String dataPathPrefix = "hdfs://192.168.1.71/res/";

//this is the only line I chnaged to run two tests.. (Different data sets)
final static String trainDataFile = dataPathPrefix + "train.csv";
final static String testDataFile = dataPathPrefix + "test.csv";
final static String modelPath = dataPathPrefix + "als_implicit1.model";
final static String sparkLogDir = dataPathPrefix + "logs/";
static DataFrame test;
static DataFrame train;

public static void main( String[] args )
{
final int folds = 2;
final int[] ranks = { 10,20 };
double alpha = 0.01, regParam = 0.02, tuningInterval = 0.01;

final double[] alphas = prepareDoubleParams( alpha, 
tuningInterval, 1 );
final double[] regParams = prepareDoubleParams( regParam, 
tuningInterval, 1 );

prepareDataFrames();

// Build the recommendation model using ALS on the training data
ALS implicitALS = new ALS().setImplicitPrefs( true 
).setUserCol( "user" ).setItemCol( "item" )
.setRatingCol( "confidence" ).setPredictionCol( 
"prediction" );

ParamMap[] paramMaps = new ParamGridBuilder().addGrid( 
implicitALS.alpha(), alphas )
.addGrid( implicitALS.regParam(), regParams 
).addGrid( implicitALS.rank(), ranks ).build();

RegressionEvaluator evaluator = new 
RegressionEvaluator().setMetricName( "rmse" ).setLabelCol( "confidence" )
.setPredictionCol( "prediction" );

CrossValidator crossValidator = new 
CrossValidator().setEstimator( implicitALS ).setEvaluator( evaluator )
.setEstimatorParamMaps( paramMaps 
).setNumFolds( folds );

CrossValidatorModel crossValidatorModel = crossValidator.fit( 
train );

// save model
ALSModel alsModel = ( ALSModel ) 
crossValidatorModel.bestModel();
alsModel.write().overwrite().saveImpl( modelPath );

// load model
ALSModel bestModel = ALSModel.read().load( modelPath );

// predict
DataFrame predictDf = bestModel.transform( train.randomSplit( 
new double[] { 0.8, 0.2 } )[1] );
DataFrame predictions = predictDf
.withColumn( "confidence", predictDf.col( 
"confidence" ).cast( DataTypes.DoubleType ) )
.withColumn( "prediction", predictDf.col( 
"prediction" ) );
predictDf.show();

System.out.println( "Root-mean-square error = " + 
evaluator.evaluate( predictions ) );

jsc.stop();

}

private static void prepareDataFrames()
{
final SparkConf conf = new SparkConf().setAppName( "ALS-Implict 
with cross validation Model" )
.setMaster( sparkMaster ).set( 
"spark.executor.memory", "4g" ).set( "spark.eventLog.dir", sparkLogDir )
.set( "spark.eventLog.enabled", "false" );
jsc = new JavaSparkContext( conf );

jsc.addJar( dataPathPrefix + "spark-csv_2.10-1.3.0.jar" );
jsc.addJar( dataPathPrefix + "commons-csv-1.2.jar" );

final SQLContext sqlContext = new SQLContext( jsc );

DataFrame tst = sqlContext.read().format( 
"com.databricks.spark.csv" ).option( "inferSchema", "true" )
.option( "header", "true" ).load( testDataFile 
);
test = tst.withColumn( "confidence", tst.col( "confidence" 
).cast( DataTypes.DoubleType ) ).cache();

DataFrame trn = sqlContext.read().format( 
"com.databricks.spark.csv" ).option( "inferSchema", "true" )
  

[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-27 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213765#comment-15213765
 ] 

Dulaj Rajitha edited comment on SPARK-14153 at 3/28/16 4:20 AM:


This is the java code I used..
package ml.test;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.ml.evaluation.RegressionEvaluator;
import org.apache.spark.ml.param.ParamMap;
import org.apache.spark.ml.recommendation.ALS;
import org.apache.spark.ml.recommendation.ALSModel;
import org.apache.spark.ml.tuning.CrossValidator;
import org.apache.spark.ml.tuning.CrossValidatorModel;
import org.apache.spark.ml.tuning.ParamGridBuilder;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.DataTypes;

public class ALSImplicitTest
{
static JavaSparkContext jsc;

final static String sparkMaster = "spark://192.168.1.71:7077";
final static String dataPathPrefix = "hdfs://192.168.1.71/res/";

//this is the only line I chnaged to run two tests.. (Different data sets)
final static String trainDataFile = dataPathPrefix + "train.csv";
final static String testDataFile = dataPathPrefix + "test.csv";
final static String modelPath = dataPathPrefix + "als_implicit1.model";
final static String sparkLogDir = dataPathPrefix + "logs/";
static DataFrame test;
static DataFrame train;

public static void main( String[] args )
{
final int folds = 2;
final int[] ranks = { 120, 500 };
double alpha = 0.01, regParam = 0.02, tuningInterval = 0.01;

final double[] alphas = prepareDoubleParams( alpha, 
tuningInterval, 1 );
final double[] regParams = prepareDoubleParams( regParam, 
tuningInterval, 1 );

prepareDataFrames();

// Build the recommendation model using ALS on the training data
ALS implicitALS = new ALS().setImplicitPrefs( true 
).setUserCol( "user" ).setItemCol( "item" )
.setRatingCol( "confidence" ).setPredictionCol( 
"prediction" );

ParamMap[] paramMaps = new ParamGridBuilder().addGrid( 
implicitALS.alpha(), alphas )
.addGrid( implicitALS.regParam(), regParams 
).addGrid( implicitALS.rank(), ranks ).build();

RegressionEvaluator evaluator = new 
RegressionEvaluator().setMetricName( "rmse" ).setLabelCol( "confidence" )
.setPredictionCol( "prediction" );

CrossValidator crossValidator = new 
CrossValidator().setEstimator( implicitALS ).setEvaluator( evaluator )
.setEstimatorParamMaps( paramMaps 
).setNumFolds( folds );

CrossValidatorModel crossValidatorModel = crossValidator.fit( 
train );

// save model
ALSModel alsModel = ( ALSModel ) 
crossValidatorModel.bestModel();
alsModel.write().overwrite().saveImpl( modelPath );

// load model
ALSModel bestModel = ALSModel.read().load( modelPath );

// predict
DataFrame predictDf = bestModel.transform( train.randomSplit( 
new double[] { 0.8, 0.2 } )[1] );
DataFrame predictions = predictDf
.withColumn( "confidence", predictDf.col( 
"confidence" ).cast( DataTypes.DoubleType ) )
.withColumn( "prediction", predictDf.col( 
"prediction" ) );
predictDf.show();

System.out.println( "Root-mean-square error = " + 
evaluator.evaluate( predictions ) );

jsc.stop();

}

private static void prepareDataFrames()
{
final SparkConf conf = new SparkConf().setAppName( "ALS-Implict 
with cross validation Model" )
.setMaster( sparkMaster ).set( 
"spark.executor.memory", "4g" ).set( "spark.eventLog.dir", sparkLogDir )
.set( "spark.eventLog.enabled", "false" );
jsc = new JavaSparkContext( conf );

jsc.addJar( dataPathPrefix + "spark-csv_2.10-1.3.0.jar" );
jsc.addJar( dataPathPrefix + "commons-csv-1.2.jar" );

final SQLContext sqlContext = new SQLContext( jsc );

DataFrame tst = sqlContext.read().format( 
"com.databricks.spark.csv" ).option( "inferSchema", "true" )
.option( "header", "true" ).load( testDataFile 
);
test = tst.withColumn( "confidence", tst.col( "confidence" 
).cast( DataTypes.DoubleType ) ).cache();

DataFrame trn = sqlContext.read().format( 
"com.databricks.spark.csv" ).option( "inferSchema", "true" )
   

[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-27 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213779#comment-15213779
 ] 

Dulaj Rajitha commented on SPARK-14153:
---

The only thing I changed was the data set (train.csv to movies_data.csv)
Here are the data sets..

train.csv : 
https://drive.google.com/file/d/0BzDPzVBAaXCYb3hBVnh2bndMbFE/view?usp=sharing

movies_data.csv : 
https://drive.google.com/file/d/0BzDPzVBAaXCYT2xlWkdsNERKY1E/view?usp=sharing

> My dataset does not provide proper predictions in ALS
> -
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-27 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213767#comment-15213767
 ] 

Dulaj Rajitha commented on SPARK-14153:
---

When I used train.csv (om data set)
The best param matrix is like

16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
{ als_c911c0e183a3-alpha: 0.02, als_c911c0e183a3-rank: 500, 
als_c911c0e183a3-regParam: 0.03 } 


-
But If i use movie data set as training set
Best param matrix will be like

16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 1.9488322049918922, 
2.0489573853226797, 2.0584252131752, 1.9464006741621391, 2.048241271354197, 
2.057853990227443)
16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
{
als_31a605e7717b-alpha: 0.02,
als_31a605e7717b-rank: 1,
als_31a605e7717b-regParam: 0.02
}
16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
1.9457234533860048.


> My dataset does not provide proper predictions in ALS
> -
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-27 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213767#comment-15213767
 ] 

Dulaj Rajitha edited comment on SPARK-14153 at 3/28/16 4:00 AM:


When I used train.csv (my data set)
The best param matrix is like

16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
{ als_c911c0e183a3-alpha: 0.02, als_c911c0e183a3-rank: 500, 
als_c911c0e183a3-regParam: 0.03 } 


-
But If i use movie data set as training set
Best param matrix will be like

16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 1.9488322049918922, 
2.0489573853226797, 2.0584252131752, 1.9464006741621391, 2.048241271354197, 
2.057853990227443)
16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
{
als_31a605e7717b-alpha: 0.02,
als_31a605e7717b-rank: 1,
als_31a605e7717b-regParam: 0.02
}
16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
1.9457234533860048.



was (Author: dulajrajitha):
When I used train.csv (om data set)
The best param matrix is like

16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
{ als_c911c0e183a3-alpha: 0.02, als_c911c0e183a3-rank: 500, 
als_c911c0e183a3-regParam: 0.03 } 


-
But If i use movie data set as training set
Best param matrix will be like

16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 1.9488322049918922, 
2.0489573853226797, 2.0584252131752, 1.9464006741621391, 2.048241271354197, 
2.057853990227443)
16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
{
als_31a605e7717b-alpha: 0.02,
als_31a605e7717b-rank: 1,
als_31a605e7717b-regParam: 0.02
}
16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
1.9457234533860048.


> My dataset does not provide proper predictions in ALS
> -
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-27 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213765#comment-15213765
 ] 

Dulaj Rajitha edited comment on SPARK-14153 at 3/28/16 3:56 AM:


This is the java code I used..
package ml.test;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.ml.evaluation.RegressionEvaluator;
import org.apache.spark.ml.param.ParamMap;
import org.apache.spark.ml.recommendation.ALS;
import org.apache.spark.ml.recommendation.ALSModel;
import org.apache.spark.ml.tuning.CrossValidator;
import org.apache.spark.ml.tuning.CrossValidatorModel;
import org.apache.spark.ml.tuning.ParamGridBuilder;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.DataTypes;

/**
 * @author Dulaj Pathirana - Mar 14, 2016
 */
public class ALSImplicitTest
{
static JavaSparkContext jsc;

final static String sparkMaster = "spark://192.168.1.71:7077";
final static String dataPathPrefix = "hdfs://192.168.1.71/res/";

final static String trainDataFile = dataPathPrefix + "train.csv";
final static String testDataFile = dataPathPrefix + "test.csv";
final static String modelPath = dataPathPrefix + "als_implicit1.model";
final static String sparkLogDir = dataPathPrefix + "logs/";
static DataFrame test;
static DataFrame train;

public static void main( String[] args )
{
final int folds = 2;
final int[] ranks = { 120, 500 };
double alpha = 0.01, regParam = 0.02, tuningInterval = 0.01;

final double[] alphas = prepareDoubleParams( alpha, 
tuningInterval, 1 );
final double[] regParams = prepareDoubleParams( regParam, 
tuningInterval, 1 );

prepareDataFrames();

// Build the recommendation model using ALS on the training data

// numBlocks is the number of blocks used to parallelize 
computation (set to -1 to auto-configure).
// rank is the number of latent factors in the model.
// iterations is the number of iterations to run.
// lambda specifies the regularization parameter in ALS.
// implicitPrefs specifies whether to use the explicit feedback 
ALS variant or one adapted for implicit feedback data.
// alpha is a parameter applicable to the implicit feedback 
variant of ALS that governs the baseline confidence in preference observations.

ALS implicitALS = new ALS().setImplicitPrefs( true 
).setUserCol( "user" ).setItemCol( "item" )
.setRatingCol( "confidence" ).setPredictionCol( 
"prediction" );

ParamMap[] paramMaps = new ParamGridBuilder().addGrid( 
implicitALS.alpha(), alphas )
.addGrid( implicitALS.regParam(), regParams 
).addGrid( implicitALS.rank(), ranks ).build();

RegressionEvaluator evaluator = new 
RegressionEvaluator().setMetricName( "rmse" ).setLabelCol( "confidence" )
.setPredictionCol( "prediction" );

CrossValidator crossValidator = new 
CrossValidator().setEstimator( implicitALS ).setEvaluator( evaluator )
.setEstimatorParamMaps( paramMaps 
).setNumFolds( folds );

CrossValidatorModel crossValidatorModel = crossValidator.fit( 
train );

// save model
ALSModel alsModel = ( ALSModel ) 
crossValidatorModel.bestModel();
alsModel.write().overwrite().saveImpl( modelPath );

// load model
ALSModel bestModel = ALSModel.read().load( modelPath );

// predict
DataFrame predictDf = bestModel.transform( train.randomSplit( 
new double[] { 0.8, 0.2 } )[1] );
DataFrame predictions = predictDf
.withColumn( "confidence", predictDf.col( 
"confidence" ).cast( DataTypes.DoubleType ) )
.withColumn( "prediction", predictDf.col( 
"prediction" ) );
predictDf.show();

System.out.println( "Root-mean-square error = " + 
evaluator.evaluate( predictions ) );

jsc.stop();

}

private static void prepareDataFrames()
{
final SparkConf conf = new SparkConf().setAppName( "ALS-Implict 
with cross validation Model" )
.setMaster( sparkMaster ).set( 
"spark.executor.memory", "4g" ).set( "spark.eventLog.dir", sparkLogDir )
.set( "spark.eventLog.enabled", "false" );
jsc = new JavaSparkContext( conf );

jsc.addJar( dataPathPrefix + "spark-csv_2.10-1.3.0.jar" );
  

[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-27 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213765#comment-15213765
 ] 

Dulaj Rajitha commented on SPARK-14153:
---

This is the java code I used..
package it.codegen.rnd.ml.test;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.ml.evaluation.RegressionEvaluator;
import org.apache.spark.ml.param.ParamMap;
import org.apache.spark.ml.recommendation.ALS;
import org.apache.spark.ml.recommendation.ALSModel;
import org.apache.spark.ml.tuning.CrossValidator;
import org.apache.spark.ml.tuning.CrossValidatorModel;
import org.apache.spark.ml.tuning.ParamGridBuilder;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.types.DataTypes;

/**
 * @author Dulaj Pathirana - Mar 14, 2016
 */
public class ALSImplicitTest
{
static JavaSparkContext jsc;

final static String sparkMaster = "spark://192.168.1.71:7077";
final static String dataPathPrefix = "hdfs://192.168.1.71/res/";

final static String trainDataFile = dataPathPrefix + "train.csv";
final static String testDataFile = dataPathPrefix + "test.csv";
final static String modelPath = dataPathPrefix + "als_implicit1.model";
final static String sparkLogDir = dataPathPrefix + "logs/";
static DataFrame test;
static DataFrame train;

public static void main( String[] args )
{
final int folds = 2;
final int[] ranks = { 120, 500 };
double alpha = 0.01, regParam = 0.02, tuningInterval = 0.01;

final double[] alphas = prepareDoubleParams( alpha, 
tuningInterval, 1 );
final double[] regParams = prepareDoubleParams( regParam, 
tuningInterval, 1 );

prepareDataFrames();

// Build the recommendation model using ALS on the training data

// numBlocks is the number of blocks used to parallelize 
computation (set to -1 to auto-configure).
// rank is the number of latent factors in the model.
// iterations is the number of iterations to run.
// lambda specifies the regularization parameter in ALS.
// implicitPrefs specifies whether to use the explicit feedback 
ALS variant or one adapted for implicit feedback data.
// alpha is a parameter applicable to the implicit feedback 
variant of ALS that governs the baseline confidence in preference observations.

ALS implicitALS = new ALS().setImplicitPrefs( true 
).setUserCol( "user" ).setItemCol( "item" )
.setRatingCol( "confidence" ).setPredictionCol( 
"prediction" );

ParamMap[] paramMaps = new ParamGridBuilder().addGrid( 
implicitALS.alpha(), alphas )
.addGrid( implicitALS.regParam(), regParams 
).addGrid( implicitALS.rank(), ranks ).build();

RegressionEvaluator evaluator = new 
RegressionEvaluator().setMetricName( "rmse" ).setLabelCol( "confidence" )
.setPredictionCol( "prediction" );

CrossValidator crossValidator = new 
CrossValidator().setEstimator( implicitALS ).setEvaluator( evaluator )
.setEstimatorParamMaps( paramMaps 
).setNumFolds( folds );

CrossValidatorModel crossValidatorModel = crossValidator.fit( 
train );

// save model
ALSModel alsModel = ( ALSModel ) 
crossValidatorModel.bestModel();
alsModel.write().overwrite().saveImpl( modelPath );

// load model
ALSModel bestModel = ALSModel.read().load( modelPath );

// predict
DataFrame predictDf = bestModel.transform( train.randomSplit( 
new double[] { 0.8, 0.2 } )[1] );
DataFrame predictions = predictDf
.withColumn( "confidence", predictDf.col( 
"confidence" ).cast( DataTypes.DoubleType ) )
.withColumn( "prediction", predictDf.col( 
"prediction" ) );
predictDf.show();

System.out.println( "Root-mean-square error = " + 
evaluator.evaluate( predictions ) );

jsc.stop();

}

private static void prepareDataFrames()
{
final SparkConf conf = new SparkConf().setAppName( "ALS-Implict 
with cross validation Model" )
.setMaster( sparkMaster ).set( 
"spark.executor.memory", "4g" ).set( "spark.eventLog.dir", sparkLogDir )
.set( "spark.eventLog.enabled", "false" );
jsc = new JavaSparkContext( conf );

jsc.addJar( dataPathPrefix + "spark-csv_2.10-1.3.0.jar" );
jsc.addJar( 

[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-25 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211876#comment-15211876
 ] 

Dulaj Rajitha commented on SPARK-14153:
---

No I added same headers for both data sets..
Is there any other ways that it can become NaN in this method..

> My dataset does not provide proper predictions in ALS
> -
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-25 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211794#comment-15211794
 ] 

Dulaj Rajitha commented on SPARK-14153:
---

No.
I think you understand my problem incorrectly.
I mean when I used example provided dataset the predictions are valid..
But if change the dataset to mine, it will give invalid predictions..
(Not at once, in two tests )
PS..
I fit and transformed same dataset's random splits..

> My dataset does not provide proper predictions in ALS
> -
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14153) My dataset does not provide proper prdictions in ALS

2016-03-25 Thread Dulaj Rajitha (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dulaj Rajitha updated SPARK-14153:
--
Summary: My dataset does not provide proper prdictions in ALS  (was: My 
dataset does not provide proer prdictions in ALS)

> My dataset does not provide proper prdictions in ALS
> 
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14153) My dataset does not provide proper predictions in ALS

2016-03-25 Thread Dulaj Rajitha (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dulaj Rajitha updated SPARK-14153:
--
Summary: My dataset does not provide proper predictions in ALS  (was: My 
dataset does not provide proper prdictions in ALS)

> My dataset does not provide proper predictions in ALS
> -
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14153) My dataset does not provide proer prdictions in ALS

2016-03-25 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211503#comment-15211503
 ] 

Dulaj Rajitha commented on SPARK-14153:
---

Here is my data set url : 
https://drive.google.com/file/d/0BzDPzVBAaXCYb3hBVnh2bndMbFE/view?usp=sharing 

> My dataset does not provide proer prdictions in ALS
> ---
>
> Key: SPARK-14153
> URL: https://issues.apache.org/jira/browse/SPARK-14153
> Project: Spark
>  Issue Type: Question
>  Components: Java API, ML
>Reporter: Dulaj Rajitha
>
> When I used data-set in the git-hub example, I get proper predictions. But 
> when I used my data set It does not predict well. (I has a large RMSE). 
> I used cross validator for ALS  (in Spark ML) and here are the best model 
> parameters.
> 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
> 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
> {
>   als_c911c0e183a3-alpha: 0.02,
>   als_c911c0e183a3-rank: 500,
>   als_c911c0e183a3-regParam: 0.03
> }
> But when I used movie data set It gives proper values for parameters. as below
> 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
> WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
> 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 
> 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 
> 2.048241271354197, 2.057853990227443)
> 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
> {
>   als_31a605e7717b-alpha: 0.02,
>   als_31a605e7717b-rank: 1,
>   als_31a605e7717b-regParam: 0.02
> }
> 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
> 1.9457234533860048.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14153) My dataset does not provide proer prdictions in ALS

2016-03-25 Thread Dulaj Rajitha (JIRA)
Dulaj Rajitha created SPARK-14153:
-

 Summary: My dataset does not provide proer prdictions in ALS
 Key: SPARK-14153
 URL: https://issues.apache.org/jira/browse/SPARK-14153
 Project: Spark
  Issue Type: Question
  Components: Java API, ML
Reporter: Dulaj Rajitha


When I used data-set in the git-hub example, I get proper predictions. But when 
I used my data set It does not predict well. (I has a large RMSE). 
I used cross validator for ALS  (in Spark ML) and here are the best model 
parameters.

16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: 
WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN)
16/03/25 12:03:06 INFO CrossValidator: Best set of parameters:
{
als_c911c0e183a3-alpha: 0.02,
als_c911c0e183a3-rank: 500,
als_c911c0e183a3-regParam: 0.03
}

But when I used movie data set It gives proper values for parameters. as below
16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: 
WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 
1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 1.9488322049918922, 
2.0489573853226797, 2.0584252131752, 1.9464006741621391, 2.048241271354197, 
2.057853990227443)
16/03/24 14:07:07 INFO CrossValidator: Best set of parameters:
{
als_31a605e7717b-alpha: 0.02,
als_31a605e7717b-rank: 1,
als_31a605e7717b-regParam: 0.02
}
16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 
1.9457234533860048.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-14093) org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with HDFS

2016-03-23 Thread Dulaj Rajitha (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dulaj Rajitha closed SPARK-14093.
-
   Resolution: Fixed
Fix Version/s: 1.6.0

I was able to solve the issue by using the ALSModel.write().saveImpl( hdfsPath) 
method to save to the HDFS

> org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with 
> HDFS
> 
>
> Key: SPARK-14093
> URL: https://issues.apache.org/jira/browse/SPARK-14093
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, ML
>Affects Versions: 1.6.0
>Reporter: Dulaj Rajitha
> Fix For: 1.6.0
>
>
> ALSModel.save(path) is not working for HDFS paths and it gives  
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://192.168.1.71/res/als.model, expected: file:/// 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14093) org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with HDFS

2016-03-23 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209724#comment-15209724
 ] 

Dulaj Rajitha commented on SPARK-14093:
---

I was able to solve the issue by using the ALSModel.write().saveImpl( hdfsPath) 
method to save to the HDFS. 

> org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with 
> HDFS
> 
>
> Key: SPARK-14093
> URL: https://issues.apache.org/jira/browse/SPARK-14093
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, ML
>Affects Versions: 1.6.0
>Reporter: Dulaj Rajitha
>
> ALSModel.save(path) is not working for HDFS paths and it gives  
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://192.168.1.71/res/als.model, expected: file:/// 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14093) org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with HDFS

2016-03-23 Thread Dulaj Rajitha (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208078#comment-15208078
 ] 

Dulaj Rajitha commented on SPARK-14093:
---

I'm running spark in standalone mode and I need to save the trained model and 
load it back from HDFS using ALSModel.save method and it gives an error as 
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: 
hdfs://192.168.1.71/res/als.model, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at 
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:529)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80)
at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:130)
at org.apache.spark.ml.recommendation.ALSModel.save(ALS.scala:182)
at it.codegen.rnd.ml.test.ALSImplicitTest.main(ALSImplicitTest.java:107)
.
When I used jsc.parallelize( alsModels ).saveAsObjectFile( modelPath ),
I cannot load back the exact model which I have saved. 

> org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with 
> HDFS
> 
>
> Key: SPARK-14093
> URL: https://issues.apache.org/jira/browse/SPARK-14093
> Project: Spark
>  Issue Type: Bug
>  Components: Java API, ML
>Affects Versions: 1.6.0
>Reporter: Dulaj Rajitha
>
> ALSModel.save(path) is not working for HDFS paths and it gives  
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://192.168.1.71/res/als.model, expected: file:/// 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14093) org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with HDFS

2016-03-23 Thread Dulaj Rajitha (JIRA)
Dulaj Rajitha created SPARK-14093:
-

 Summary: org.apache.spark.ml.recommendation.ALSModel.save method 
cannot be used with HDFS
 Key: SPARK-14093
 URL: https://issues.apache.org/jira/browse/SPARK-14093
 Project: Spark
  Issue Type: Bug
  Components: Java API, ML
Affects Versions: 1.6.0
Reporter: Dulaj Rajitha


ALSModel.save(path) is not working for HDFS paths and it gives  
java.lang.IllegalArgumentException: Wrong FS: 
hdfs://192.168.1.71/res/als.model, expected: file:/// 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org