[jira] [Updated] (SPARK-21003) Spark Java Configuration : spark.jars.packages not working properly
[ https://issues.apache.org/jira/browse/SPARK-21003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dulaj Rajitha updated SPARK-21003: -- Summary: Spark Java Configuration : spark.jars.packages not working properly (was: Spark Packages not working properly) > Spark Java Configuration : spark.jars.packages not working properly > --- > > Key: SPARK-21003 > URL: https://issues.apache.org/jira/browse/SPARK-21003 > Project: Spark > Issue Type: Bug > Components: Java API, Spark Core >Affects Versions: 2.1.0 > Environment: Ubuntu 16 standalone cluster. >Reporter: Dulaj Rajitha > > I am unable to load maven dependencies for spark executors using Spark > Configuration : "spark.jars.packages". -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21003) Spark Java Configuration : spark.jars.packages not working properly
[ https://issues.apache.org/jira/browse/SPARK-21003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dulaj Rajitha updated SPARK-21003: -- Description: I am unable to load maven dependencies for spark executors using SparkConfiguration : "spark.jars.packages". (was: I am unable to load maven dependencies for spark executors using Spark Configuration : "spark.jars.packages".) > Spark Java Configuration : spark.jars.packages not working properly > --- > > Key: SPARK-21003 > URL: https://issues.apache.org/jira/browse/SPARK-21003 > Project: Spark > Issue Type: Bug > Components: Java API, Spark Core >Affects Versions: 2.1.0 > Environment: Ubuntu 16 standalone cluster. >Reporter: Dulaj Rajitha > > I am unable to load maven dependencies for spark executors using > SparkConfiguration : "spark.jars.packages". -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21003) Spark Packages not working properly
Dulaj Rajitha created SPARK-21003: - Summary: Spark Packages not working properly Key: SPARK-21003 URL: https://issues.apache.org/jira/browse/SPARK-21003 Project: Spark Issue Type: Bug Components: Java API, Spark Core Affects Versions: 2.1.0 Environment: Ubuntu 16 standalone cluster. Reporter: Dulaj Rajitha I am unable to load maven dependencies for spark executors using Spark Configuration : "spark.jars.packages". -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dulaj Rajitha closed SPARK-16993. - Resolution: Not A Problem Not a bug in Spark. > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422597#comment-15422597 ] Dulaj Rajitha edited comment on SPARK-16993 at 8/16/16 11:13 AM: - The issue is solved and that was not a bug. Thank you. There was a error in withColumn statement and I had used a column form a wrong data-frame. was (Author: dulajrajitha): The issue is solved and that was not a bug. Thank you. There was a error in with column statement and I had used a column form a wrong data-frame. > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422597#comment-15422597 ] Dulaj Rajitha commented on SPARK-16993: --- The issue is solved and that was not a bug. Thank you. There was a error in with column statement and I had used a column form a wrong data-frame. > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417847#comment-15417847 ] Dulaj Rajitha commented on SPARK-16993: --- But the thing is if add dummy column as as the label column, the process goes fine. I could not continue without add dummy the label column for the data set that needs the prediction. > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417144#comment-15417144 ] Dulaj Rajitha commented on SPARK-16993: --- Here is the scenario. My train data set has : features,and label column Using that I do train and get a model. (Also I do an evaluation using a split of the training data.) Using the above model I need to predict for data set which has only id and features column. But when using the second data frame I get the error. So how we use the same model for different data frame for prediction after evaluation? > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415102#comment-15415102 ] Dulaj Rajitha commented on SPARK-16993: --- Is there a method to use do the prediction for non evaluating purposes (Just predictictions). > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415081#comment-15415081 ] Dulaj Rajitha commented on SPARK-16993: --- I do not want to evaluate. I just need to predict using the model I got from the regressor.fit(dataframe) method. > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415063#comment-15415063 ] Dulaj Rajitha commented on SPARK-16993: --- When using the RandomForestRegressor. I trained using a dataframe with the label column and got a model. by: model = regressor.fit(trainData) But my test data does not have a label column. (Since this is the column I need to be prediicted). therefore when transforming I got a error. model.transform(test) > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dulaj Rajitha updated SPARK-16993: -- Description: I need to use a separate data set to prediction (Not as show in example's training data split). But those data do not have the label column. (Since these data are the data that needs to be predict the label). but model.transform is informing label column is missing. org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input columns: [id,features,prediction] was: I need to use a separate data set to prediction (Not as show in example's training data split). But those data do not have the label column. (Since these data are the data that needs to be predict the label). but model.transform is informing label column is missing. > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dulaj Rajitha updated SPARK-16993: -- Summary: model.transform without label column in random forest regression (was: model.transform withot label column) > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-16993) model.transform withot label column
Dulaj Rajitha created SPARK-16993: - Summary: model.transform withot label column Key: SPARK-16993 URL: https://issues.apache.org/jira/browse/SPARK-16993 Project: Spark Issue Type: Question Components: Java API, ML Reporter: Dulaj Rajitha I need to use a separate data set to prediction (Not as show in example's training data split). But those data do not have the label column. (Since these data are the data that needs to be predict the label). but model.transform is informing label column is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219298#comment-15219298 ] Dulaj Rajitha commented on SPARK-14153: --- Will you please give me a solution, because the training data-set I used might have some problem and I cannot understand what it is? > My dataset does not provide proper predictions in ALS > - > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215390#comment-15215390 ] Dulaj Rajitha edited comment on SPARK-14153 at 3/29/16 9:28 AM: I changed only this line which will change the data set form one to another..(Line 26) final static String trainDataFile = dataPathPrefix + "train.csv"; was (Author: dulajrajitha): I changed only this line which will chane the data set form one to another..(Line 26) final static String trainDataFile = dataPathPrefix + "train.csv"; > My dataset does not provide proper predictions in ALS > - > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213765#comment-15213765 ] Dulaj Rajitha edited comment on SPARK-14153 at 3/29/16 9:28 AM: This is the java code I used.. https://drive.google.com/file/d/0BzDPzVBAaXCYTkRFZHhJNEhpOFE/view?usp=sharing was (Author: dulajrajitha): This is the java code I used.. package ml.test; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.ml.evaluation.RegressionEvaluator; import org.apache.spark.ml.param.ParamMap; import org.apache.spark.ml.recommendation.ALS; import org.apache.spark.ml.recommendation.ALSModel; import org.apache.spark.ml.tuning.CrossValidator; import org.apache.spark.ml.tuning.CrossValidatorModel; import org.apache.spark.ml.tuning.ParamGridBuilder; import org.apache.spark.sql.DataFrame; import org.apache.spark.sql.SQLContext; import org.apache.spark.sql.types.DataTypes; public class ALSImplicitTest { static JavaSparkContext jsc; final static String sparkMaster = "spark://192.168.1.71:7077"; final static String dataPathPrefix = "hdfs://192.168.1.71/res/"; //this is the only line I chnaged to run two tests.. (Different data sets) final static String trainDataFile = dataPathPrefix + "train.csv"; final static String testDataFile = dataPathPrefix + "test.csv"; final static String modelPath = dataPathPrefix + "als_implicit1.model"; final static String sparkLogDir = dataPathPrefix + "logs/"; static DataFrame test; static DataFrame train; public static void main( String[] args ) { final int folds = 2; final int[] ranks = { 10,20 }; double alpha = 0.01, regParam = 0.02, tuningInterval = 0.01; final double[] alphas = prepareDoubleParams( alpha, tuningInterval, 1 ); final double[] regParams = prepareDoubleParams( regParam, tuningInterval, 1 ); prepareDataFrames(); // Build the recommendation model using ALS on the training data ALS implicitALS = new ALS().setImplicitPrefs( true ).setUserCol( "user" ).setItemCol( "item" ) .setRatingCol( "confidence" ).setPredictionCol( "prediction" ); ParamMap[] paramMaps = new ParamGridBuilder().addGrid( implicitALS.alpha(), alphas ) .addGrid( implicitALS.regParam(), regParams ).addGrid( implicitALS.rank(), ranks ).build(); RegressionEvaluator evaluator = new RegressionEvaluator().setMetricName( "rmse" ).setLabelCol( "confidence" ) .setPredictionCol( "prediction" ); CrossValidator crossValidator = new CrossValidator().setEstimator( implicitALS ).setEvaluator( evaluator ) .setEstimatorParamMaps( paramMaps ).setNumFolds( folds ); CrossValidatorModel crossValidatorModel = crossValidator.fit( train ); // save model ALSModel alsModel = ( ALSModel ) crossValidatorModel.bestModel(); alsModel.write().overwrite().saveImpl( modelPath ); // load model ALSModel bestModel = ALSModel.read().load( modelPath ); // predict DataFrame predictDf = bestModel.transform( train.randomSplit( new double[] { 0.8, 0.2 } )[1] ); DataFrame predictions = predictDf .withColumn( "confidence", predictDf.col( "confidence" ).cast( DataTypes.DoubleType ) ) .withColumn( "prediction", predictDf.col( "prediction" ) ); predictDf.show(); System.out.println( "Root-mean-square error = " + evaluator.evaluate( predictions ) ); jsc.stop(); } private static void prepareDataFrames() { final SparkConf conf = new SparkConf().setAppName( "ALS-Implict with cross validation Model" ) .setMaster( sparkMaster ).set( "spark.executor.memory", "4g" ).set( "spark.eventLog.dir", sparkLogDir ) .set( "spark.eventLog.enabled", "false" ); jsc = new JavaSparkContext( conf ); jsc.addJar( dataPathPrefix + "spark-csv_2.10-1.3.0.jar" ); jsc.addJar( dataPathPrefix + "commons-csv-1.2.jar" ); final SQLContext sqlContext = new SQLContext( jsc ); DataFrame tst = sqlContext.read().format( "com.databricks.spark.csv" ).option( "inferSchema", "true" ) .option( "header", "true" ).load( testDataFile ); test = tst.withColumn( "confidence", tst.col( "confidence" ).cast( DataTypes.DoubleType )
[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215393#comment-15215393 ] Dulaj Rajitha commented on SPARK-14153: --- The problem is by changing the dataset, why the best parameter matrix has NaN fields when I changed to my data set. PS (I testing for the same training set's random split : line 75) > My dataset does not provide proper predictions in ALS > - > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215390#comment-15215390 ] Dulaj Rajitha edited comment on SPARK-14153 at 3/29/16 4:15 AM: I changed only this line which will chane the data set form one to another..(Line 26) final static String trainDataFile = dataPathPrefix + "train.csv"; was (Author: dulajrajitha): I changed only this line which will chane the data set form one to another.. final static String trainDataFile = dataPathPrefix + "train.csv"; > My dataset does not provide proper predictions in ALS > - > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215390#comment-15215390 ] Dulaj Rajitha commented on SPARK-14153: --- I changed only this line which will chane the data set form one to another.. final static String trainDataFile = dataPathPrefix + "train.csv"; > My dataset does not provide proper predictions in ALS > - > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15215388#comment-15215388 ] Dulaj Rajitha commented on SPARK-14153: --- This is the code: https://drive.google.com/file/d/0BzDPzVBAaXCYTkRFZHhJNEhpOFE/view?usp=sharing > My dataset does not provide proper predictions in ALS > - > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213765#comment-15213765 ] Dulaj Rajitha edited comment on SPARK-14153 at 3/28/16 4:23 AM: This is the java code I used.. package ml.test; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.ml.evaluation.RegressionEvaluator; import org.apache.spark.ml.param.ParamMap; import org.apache.spark.ml.recommendation.ALS; import org.apache.spark.ml.recommendation.ALSModel; import org.apache.spark.ml.tuning.CrossValidator; import org.apache.spark.ml.tuning.CrossValidatorModel; import org.apache.spark.ml.tuning.ParamGridBuilder; import org.apache.spark.sql.DataFrame; import org.apache.spark.sql.SQLContext; import org.apache.spark.sql.types.DataTypes; public class ALSImplicitTest { static JavaSparkContext jsc; final static String sparkMaster = "spark://192.168.1.71:7077"; final static String dataPathPrefix = "hdfs://192.168.1.71/res/"; //this is the only line I chnaged to run two tests.. (Different data sets) final static String trainDataFile = dataPathPrefix + "train.csv"; final static String testDataFile = dataPathPrefix + "test.csv"; final static String modelPath = dataPathPrefix + "als_implicit1.model"; final static String sparkLogDir = dataPathPrefix + "logs/"; static DataFrame test; static DataFrame train; public static void main( String[] args ) { final int folds = 2; final int[] ranks = { 10,20 }; double alpha = 0.01, regParam = 0.02, tuningInterval = 0.01; final double[] alphas = prepareDoubleParams( alpha, tuningInterval, 1 ); final double[] regParams = prepareDoubleParams( regParam, tuningInterval, 1 ); prepareDataFrames(); // Build the recommendation model using ALS on the training data ALS implicitALS = new ALS().setImplicitPrefs( true ).setUserCol( "user" ).setItemCol( "item" ) .setRatingCol( "confidence" ).setPredictionCol( "prediction" ); ParamMap[] paramMaps = new ParamGridBuilder().addGrid( implicitALS.alpha(), alphas ) .addGrid( implicitALS.regParam(), regParams ).addGrid( implicitALS.rank(), ranks ).build(); RegressionEvaluator evaluator = new RegressionEvaluator().setMetricName( "rmse" ).setLabelCol( "confidence" ) .setPredictionCol( "prediction" ); CrossValidator crossValidator = new CrossValidator().setEstimator( implicitALS ).setEvaluator( evaluator ) .setEstimatorParamMaps( paramMaps ).setNumFolds( folds ); CrossValidatorModel crossValidatorModel = crossValidator.fit( train ); // save model ALSModel alsModel = ( ALSModel ) crossValidatorModel.bestModel(); alsModel.write().overwrite().saveImpl( modelPath ); // load model ALSModel bestModel = ALSModel.read().load( modelPath ); // predict DataFrame predictDf = bestModel.transform( train.randomSplit( new double[] { 0.8, 0.2 } )[1] ); DataFrame predictions = predictDf .withColumn( "confidence", predictDf.col( "confidence" ).cast( DataTypes.DoubleType ) ) .withColumn( "prediction", predictDf.col( "prediction" ) ); predictDf.show(); System.out.println( "Root-mean-square error = " + evaluator.evaluate( predictions ) ); jsc.stop(); } private static void prepareDataFrames() { final SparkConf conf = new SparkConf().setAppName( "ALS-Implict with cross validation Model" ) .setMaster( sparkMaster ).set( "spark.executor.memory", "4g" ).set( "spark.eventLog.dir", sparkLogDir ) .set( "spark.eventLog.enabled", "false" ); jsc = new JavaSparkContext( conf ); jsc.addJar( dataPathPrefix + "spark-csv_2.10-1.3.0.jar" ); jsc.addJar( dataPathPrefix + "commons-csv-1.2.jar" ); final SQLContext sqlContext = new SQLContext( jsc ); DataFrame tst = sqlContext.read().format( "com.databricks.spark.csv" ).option( "inferSchema", "true" ) .option( "header", "true" ).load( testDataFile ); test = tst.withColumn( "confidence", tst.col( "confidence" ).cast( DataTypes.DoubleType ) ).cache(); DataFrame trn = sqlContext.read().format( "com.databricks.spark.csv" ).option( "inferSchema", "true" )
[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213765#comment-15213765 ] Dulaj Rajitha edited comment on SPARK-14153 at 3/28/16 4:22 AM: This is the java code I used.. package ml.test; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.ml.evaluation.RegressionEvaluator; import org.apache.spark.ml.param.ParamMap; import org.apache.spark.ml.recommendation.ALS; import org.apache.spark.ml.recommendation.ALSModel; import org.apache.spark.ml.tuning.CrossValidator; import org.apache.spark.ml.tuning.CrossValidatorModel; import org.apache.spark.ml.tuning.ParamGridBuilder; import org.apache.spark.sql.DataFrame; import org.apache.spark.sql.SQLContext; import org.apache.spark.sql.types.DataTypes; public class ALSImplicitTest { static JavaSparkContext jsc; final static String sparkMaster = "spark://192.168.1.71:7077"; final static String dataPathPrefix = "hdfs://192.168.1.71/res/"; //this is the only line I chnaged to run two tests.. (Different data sets) final static String trainDataFile = dataPathPrefix + "train.csv"; final static String testDataFile = dataPathPrefix + "test.csv"; final static String modelPath = dataPathPrefix + "als_implicit1.model"; final static String sparkLogDir = dataPathPrefix + "logs/"; static DataFrame test; static DataFrame train; public static void main( String[] args ) { final int folds = 2; final int[] ranks = { 10,20 }; double alpha = 0.01, regParam = 0.02, tuningInterval = 0.01; final double[] alphas = prepareDoubleParams( alpha, tuningInterval, 1 ); final double[] regParams = prepareDoubleParams( regParam, tuningInterval, 1 ); prepareDataFrames(); // Build the recommendation model using ALS on the training data ALS implicitALS = new ALS().setImplicitPrefs( true ).setUserCol( "user" ).setItemCol( "item" ) .setRatingCol( "confidence" ).setPredictionCol( "prediction" ); ParamMap[] paramMaps = new ParamGridBuilder().addGrid( implicitALS.alpha(), alphas ) .addGrid( implicitALS.regParam(), regParams ).addGrid( implicitALS.rank(), ranks ).build(); RegressionEvaluator evaluator = new RegressionEvaluator().setMetricName( "rmse" ).setLabelCol( "confidence" ) .setPredictionCol( "prediction" ); CrossValidator crossValidator = new CrossValidator().setEstimator( implicitALS ).setEvaluator( evaluator ) .setEstimatorParamMaps( paramMaps ).setNumFolds( folds ); CrossValidatorModel crossValidatorModel = crossValidator.fit( train ); // save model ALSModel alsModel = ( ALSModel ) crossValidatorModel.bestModel(); alsModel.write().overwrite().saveImpl( modelPath ); // load model ALSModel bestModel = ALSModel.read().load( modelPath ); // predict DataFrame predictDf = bestModel.transform( train.randomSplit( new double[] { 0.8, 0.2 } )[1] ); DataFrame predictions = predictDf .withColumn( "confidence", predictDf.col( "confidence" ).cast( DataTypes.DoubleType ) ) .withColumn( "prediction", predictDf.col( "prediction" ) ); predictDf.show(); System.out.println( "Root-mean-square error = " + evaluator.evaluate( predictions ) ); jsc.stop(); } private static void prepareDataFrames() { final SparkConf conf = new SparkConf().setAppName( "ALS-Implict with cross validation Model" ) .setMaster( sparkMaster ).set( "spark.executor.memory", "4g" ).set( "spark.eventLog.dir", sparkLogDir ) .set( "spark.eventLog.enabled", "false" ); jsc = new JavaSparkContext( conf ); jsc.addJar( dataPathPrefix + "spark-csv_2.10-1.3.0.jar" ); jsc.addJar( dataPathPrefix + "commons-csv-1.2.jar" ); final SQLContext sqlContext = new SQLContext( jsc ); DataFrame tst = sqlContext.read().format( "com.databricks.spark.csv" ).option( "inferSchema", "true" ) .option( "header", "true" ).load( testDataFile ); test = tst.withColumn( "confidence", tst.col( "confidence" ).cast( DataTypes.DoubleType ) ).cache(); DataFrame trn = sqlContext.read().format( "com.databricks.spark.csv" ).option( "inferSchema", "true" )
[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213765#comment-15213765 ] Dulaj Rajitha edited comment on SPARK-14153 at 3/28/16 4:20 AM: This is the java code I used.. package ml.test; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.ml.evaluation.RegressionEvaluator; import org.apache.spark.ml.param.ParamMap; import org.apache.spark.ml.recommendation.ALS; import org.apache.spark.ml.recommendation.ALSModel; import org.apache.spark.ml.tuning.CrossValidator; import org.apache.spark.ml.tuning.CrossValidatorModel; import org.apache.spark.ml.tuning.ParamGridBuilder; import org.apache.spark.sql.DataFrame; import org.apache.spark.sql.SQLContext; import org.apache.spark.sql.types.DataTypes; public class ALSImplicitTest { static JavaSparkContext jsc; final static String sparkMaster = "spark://192.168.1.71:7077"; final static String dataPathPrefix = "hdfs://192.168.1.71/res/"; //this is the only line I chnaged to run two tests.. (Different data sets) final static String trainDataFile = dataPathPrefix + "train.csv"; final static String testDataFile = dataPathPrefix + "test.csv"; final static String modelPath = dataPathPrefix + "als_implicit1.model"; final static String sparkLogDir = dataPathPrefix + "logs/"; static DataFrame test; static DataFrame train; public static void main( String[] args ) { final int folds = 2; final int[] ranks = { 120, 500 }; double alpha = 0.01, regParam = 0.02, tuningInterval = 0.01; final double[] alphas = prepareDoubleParams( alpha, tuningInterval, 1 ); final double[] regParams = prepareDoubleParams( regParam, tuningInterval, 1 ); prepareDataFrames(); // Build the recommendation model using ALS on the training data ALS implicitALS = new ALS().setImplicitPrefs( true ).setUserCol( "user" ).setItemCol( "item" ) .setRatingCol( "confidence" ).setPredictionCol( "prediction" ); ParamMap[] paramMaps = new ParamGridBuilder().addGrid( implicitALS.alpha(), alphas ) .addGrid( implicitALS.regParam(), regParams ).addGrid( implicitALS.rank(), ranks ).build(); RegressionEvaluator evaluator = new RegressionEvaluator().setMetricName( "rmse" ).setLabelCol( "confidence" ) .setPredictionCol( "prediction" ); CrossValidator crossValidator = new CrossValidator().setEstimator( implicitALS ).setEvaluator( evaluator ) .setEstimatorParamMaps( paramMaps ).setNumFolds( folds ); CrossValidatorModel crossValidatorModel = crossValidator.fit( train ); // save model ALSModel alsModel = ( ALSModel ) crossValidatorModel.bestModel(); alsModel.write().overwrite().saveImpl( modelPath ); // load model ALSModel bestModel = ALSModel.read().load( modelPath ); // predict DataFrame predictDf = bestModel.transform( train.randomSplit( new double[] { 0.8, 0.2 } )[1] ); DataFrame predictions = predictDf .withColumn( "confidence", predictDf.col( "confidence" ).cast( DataTypes.DoubleType ) ) .withColumn( "prediction", predictDf.col( "prediction" ) ); predictDf.show(); System.out.println( "Root-mean-square error = " + evaluator.evaluate( predictions ) ); jsc.stop(); } private static void prepareDataFrames() { final SparkConf conf = new SparkConf().setAppName( "ALS-Implict with cross validation Model" ) .setMaster( sparkMaster ).set( "spark.executor.memory", "4g" ).set( "spark.eventLog.dir", sparkLogDir ) .set( "spark.eventLog.enabled", "false" ); jsc = new JavaSparkContext( conf ); jsc.addJar( dataPathPrefix + "spark-csv_2.10-1.3.0.jar" ); jsc.addJar( dataPathPrefix + "commons-csv-1.2.jar" ); final SQLContext sqlContext = new SQLContext( jsc ); DataFrame tst = sqlContext.read().format( "com.databricks.spark.csv" ).option( "inferSchema", "true" ) .option( "header", "true" ).load( testDataFile ); test = tst.withColumn( "confidence", tst.col( "confidence" ).cast( DataTypes.DoubleType ) ).cache(); DataFrame trn = sqlContext.read().format( "com.databricks.spark.csv" ).option( "inferSchema", "true" )
[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213779#comment-15213779 ] Dulaj Rajitha commented on SPARK-14153: --- The only thing I changed was the data set (train.csv to movies_data.csv) Here are the data sets.. train.csv : https://drive.google.com/file/d/0BzDPzVBAaXCYb3hBVnh2bndMbFE/view?usp=sharing movies_data.csv : https://drive.google.com/file/d/0BzDPzVBAaXCYT2xlWkdsNERKY1E/view?usp=sharing > My dataset does not provide proper predictions in ALS > - > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213767#comment-15213767 ] Dulaj Rajitha commented on SPARK-14153: --- When I used train.csv (om data set) The best param matrix is like 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: { als_c911c0e183a3-alpha: 0.02, als_c911c0e183a3-rank: 500, als_c911c0e183a3-regParam: 0.03 } - But If i use movie data set as training set Best param matrix will be like 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 2.048241271354197, 2.057853990227443) 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: { als_31a605e7717b-alpha: 0.02, als_31a605e7717b-rank: 1, als_31a605e7717b-regParam: 0.02 } 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 1.9457234533860048. > My dataset does not provide proper predictions in ALS > - > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213767#comment-15213767 ] Dulaj Rajitha edited comment on SPARK-14153 at 3/28/16 4:00 AM: When I used train.csv (my data set) The best param matrix is like 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: { als_c911c0e183a3-alpha: 0.02, als_c911c0e183a3-rank: 500, als_c911c0e183a3-regParam: 0.03 } - But If i use movie data set as training set Best param matrix will be like 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 2.048241271354197, 2.057853990227443) 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: { als_31a605e7717b-alpha: 0.02, als_31a605e7717b-rank: 1, als_31a605e7717b-regParam: 0.02 } 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 1.9457234533860048. was (Author: dulajrajitha): When I used train.csv (om data set) The best param matrix is like 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: { als_c911c0e183a3-alpha: 0.02, als_c911c0e183a3-rank: 500, als_c911c0e183a3-regParam: 0.03 } - But If i use movie data set as training set Best param matrix will be like 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 2.048241271354197, 2.057853990227443) 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: { als_31a605e7717b-alpha: 0.02, als_31a605e7717b-rank: 1, als_31a605e7717b-regParam: 0.02 } 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 1.9457234533860048. > My dataset does not provide proper predictions in ALS > - > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213765#comment-15213765 ] Dulaj Rajitha edited comment on SPARK-14153 at 3/28/16 3:56 AM: This is the java code I used.. package ml.test; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.ml.evaluation.RegressionEvaluator; import org.apache.spark.ml.param.ParamMap; import org.apache.spark.ml.recommendation.ALS; import org.apache.spark.ml.recommendation.ALSModel; import org.apache.spark.ml.tuning.CrossValidator; import org.apache.spark.ml.tuning.CrossValidatorModel; import org.apache.spark.ml.tuning.ParamGridBuilder; import org.apache.spark.sql.DataFrame; import org.apache.spark.sql.SQLContext; import org.apache.spark.sql.types.DataTypes; /** * @author Dulaj Pathirana - Mar 14, 2016 */ public class ALSImplicitTest { static JavaSparkContext jsc; final static String sparkMaster = "spark://192.168.1.71:7077"; final static String dataPathPrefix = "hdfs://192.168.1.71/res/"; final static String trainDataFile = dataPathPrefix + "train.csv"; final static String testDataFile = dataPathPrefix + "test.csv"; final static String modelPath = dataPathPrefix + "als_implicit1.model"; final static String sparkLogDir = dataPathPrefix + "logs/"; static DataFrame test; static DataFrame train; public static void main( String[] args ) { final int folds = 2; final int[] ranks = { 120, 500 }; double alpha = 0.01, regParam = 0.02, tuningInterval = 0.01; final double[] alphas = prepareDoubleParams( alpha, tuningInterval, 1 ); final double[] regParams = prepareDoubleParams( regParam, tuningInterval, 1 ); prepareDataFrames(); // Build the recommendation model using ALS on the training data // numBlocks is the number of blocks used to parallelize computation (set to -1 to auto-configure). // rank is the number of latent factors in the model. // iterations is the number of iterations to run. // lambda specifies the regularization parameter in ALS. // implicitPrefs specifies whether to use the explicit feedback ALS variant or one adapted for implicit feedback data. // alpha is a parameter applicable to the implicit feedback variant of ALS that governs the baseline confidence in preference observations. ALS implicitALS = new ALS().setImplicitPrefs( true ).setUserCol( "user" ).setItemCol( "item" ) .setRatingCol( "confidence" ).setPredictionCol( "prediction" ); ParamMap[] paramMaps = new ParamGridBuilder().addGrid( implicitALS.alpha(), alphas ) .addGrid( implicitALS.regParam(), regParams ).addGrid( implicitALS.rank(), ranks ).build(); RegressionEvaluator evaluator = new RegressionEvaluator().setMetricName( "rmse" ).setLabelCol( "confidence" ) .setPredictionCol( "prediction" ); CrossValidator crossValidator = new CrossValidator().setEstimator( implicitALS ).setEvaluator( evaluator ) .setEstimatorParamMaps( paramMaps ).setNumFolds( folds ); CrossValidatorModel crossValidatorModel = crossValidator.fit( train ); // save model ALSModel alsModel = ( ALSModel ) crossValidatorModel.bestModel(); alsModel.write().overwrite().saveImpl( modelPath ); // load model ALSModel bestModel = ALSModel.read().load( modelPath ); // predict DataFrame predictDf = bestModel.transform( train.randomSplit( new double[] { 0.8, 0.2 } )[1] ); DataFrame predictions = predictDf .withColumn( "confidence", predictDf.col( "confidence" ).cast( DataTypes.DoubleType ) ) .withColumn( "prediction", predictDf.col( "prediction" ) ); predictDf.show(); System.out.println( "Root-mean-square error = " + evaluator.evaluate( predictions ) ); jsc.stop(); } private static void prepareDataFrames() { final SparkConf conf = new SparkConf().setAppName( "ALS-Implict with cross validation Model" ) .setMaster( sparkMaster ).set( "spark.executor.memory", "4g" ).set( "spark.eventLog.dir", sparkLogDir ) .set( "spark.eventLog.enabled", "false" ); jsc = new JavaSparkContext( conf ); jsc.addJar( dataPathPrefix + "spark-csv_2.10-1.3.0.jar" );
[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213765#comment-15213765 ] Dulaj Rajitha commented on SPARK-14153: --- This is the java code I used.. package it.codegen.rnd.ml.test; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.ml.evaluation.RegressionEvaluator; import org.apache.spark.ml.param.ParamMap; import org.apache.spark.ml.recommendation.ALS; import org.apache.spark.ml.recommendation.ALSModel; import org.apache.spark.ml.tuning.CrossValidator; import org.apache.spark.ml.tuning.CrossValidatorModel; import org.apache.spark.ml.tuning.ParamGridBuilder; import org.apache.spark.sql.DataFrame; import org.apache.spark.sql.SQLContext; import org.apache.spark.sql.types.DataTypes; /** * @author Dulaj Pathirana - Mar 14, 2016 */ public class ALSImplicitTest { static JavaSparkContext jsc; final static String sparkMaster = "spark://192.168.1.71:7077"; final static String dataPathPrefix = "hdfs://192.168.1.71/res/"; final static String trainDataFile = dataPathPrefix + "train.csv"; final static String testDataFile = dataPathPrefix + "test.csv"; final static String modelPath = dataPathPrefix + "als_implicit1.model"; final static String sparkLogDir = dataPathPrefix + "logs/"; static DataFrame test; static DataFrame train; public static void main( String[] args ) { final int folds = 2; final int[] ranks = { 120, 500 }; double alpha = 0.01, regParam = 0.02, tuningInterval = 0.01; final double[] alphas = prepareDoubleParams( alpha, tuningInterval, 1 ); final double[] regParams = prepareDoubleParams( regParam, tuningInterval, 1 ); prepareDataFrames(); // Build the recommendation model using ALS on the training data // numBlocks is the number of blocks used to parallelize computation (set to -1 to auto-configure). // rank is the number of latent factors in the model. // iterations is the number of iterations to run. // lambda specifies the regularization parameter in ALS. // implicitPrefs specifies whether to use the explicit feedback ALS variant or one adapted for implicit feedback data. // alpha is a parameter applicable to the implicit feedback variant of ALS that governs the baseline confidence in preference observations. ALS implicitALS = new ALS().setImplicitPrefs( true ).setUserCol( "user" ).setItemCol( "item" ) .setRatingCol( "confidence" ).setPredictionCol( "prediction" ); ParamMap[] paramMaps = new ParamGridBuilder().addGrid( implicitALS.alpha(), alphas ) .addGrid( implicitALS.regParam(), regParams ).addGrid( implicitALS.rank(), ranks ).build(); RegressionEvaluator evaluator = new RegressionEvaluator().setMetricName( "rmse" ).setLabelCol( "confidence" ) .setPredictionCol( "prediction" ); CrossValidator crossValidator = new CrossValidator().setEstimator( implicitALS ).setEvaluator( evaluator ) .setEstimatorParamMaps( paramMaps ).setNumFolds( folds ); CrossValidatorModel crossValidatorModel = crossValidator.fit( train ); // save model ALSModel alsModel = ( ALSModel ) crossValidatorModel.bestModel(); alsModel.write().overwrite().saveImpl( modelPath ); // load model ALSModel bestModel = ALSModel.read().load( modelPath ); // predict DataFrame predictDf = bestModel.transform( train.randomSplit( new double[] { 0.8, 0.2 } )[1] ); DataFrame predictions = predictDf .withColumn( "confidence", predictDf.col( "confidence" ).cast( DataTypes.DoubleType ) ) .withColumn( "prediction", predictDf.col( "prediction" ) ); predictDf.show(); System.out.println( "Root-mean-square error = " + evaluator.evaluate( predictions ) ); jsc.stop(); } private static void prepareDataFrames() { final SparkConf conf = new SparkConf().setAppName( "ALS-Implict with cross validation Model" ) .setMaster( sparkMaster ).set( "spark.executor.memory", "4g" ).set( "spark.eventLog.dir", sparkLogDir ) .set( "spark.eventLog.enabled", "false" ); jsc = new JavaSparkContext( conf ); jsc.addJar( dataPathPrefix + "spark-csv_2.10-1.3.0.jar" ); jsc.addJar(
[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211876#comment-15211876 ] Dulaj Rajitha commented on SPARK-14153: --- No I added same headers for both data sets.. Is there any other ways that it can become NaN in this method.. > My dataset does not provide proper predictions in ALS > - > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211794#comment-15211794 ] Dulaj Rajitha commented on SPARK-14153: --- No. I think you understand my problem incorrectly. I mean when I used example provided dataset the predictions are valid.. But if change the dataset to mine, it will give invalid predictions.. (Not at once, in two tests ) PS.. I fit and transformed same dataset's random splits.. > My dataset does not provide proper predictions in ALS > - > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14153) My dataset does not provide proper prdictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dulaj Rajitha updated SPARK-14153: -- Summary: My dataset does not provide proper prdictions in ALS (was: My dataset does not provide proer prdictions in ALS) > My dataset does not provide proper prdictions in ALS > > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14153) My dataset does not provide proper predictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dulaj Rajitha updated SPARK-14153: -- Summary: My dataset does not provide proper predictions in ALS (was: My dataset does not provide proper prdictions in ALS) > My dataset does not provide proper predictions in ALS > - > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14153) My dataset does not provide proer prdictions in ALS
[ https://issues.apache.org/jira/browse/SPARK-14153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211503#comment-15211503 ] Dulaj Rajitha commented on SPARK-14153: --- Here is my data set url : https://drive.google.com/file/d/0BzDPzVBAaXCYb3hBVnh2bndMbFE/view?usp=sharing > My dataset does not provide proer prdictions in ALS > --- > > Key: SPARK-14153 > URL: https://issues.apache.org/jira/browse/SPARK-14153 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > When I used data-set in the git-hub example, I get proper predictions. But > when I used my data set It does not predict well. (I has a large RMSE). > I used cross validator for ALS (in Spark ML) and here are the best model > parameters. > 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) > 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: > { > als_c911c0e183a3-alpha: 0.02, > als_c911c0e183a3-rank: 500, > als_c911c0e183a3-regParam: 0.03 > } > But when I used movie data set It gives proper values for parameters. as below > 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: > WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, > 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, > 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, > 2.048241271354197, 2.057853990227443) > 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: > { > als_31a605e7717b-alpha: 0.02, > als_31a605e7717b-rank: 1, > als_31a605e7717b-regParam: 0.02 > } > 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: > 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14153) My dataset does not provide proer prdictions in ALS
Dulaj Rajitha created SPARK-14153: - Summary: My dataset does not provide proer prdictions in ALS Key: SPARK-14153 URL: https://issues.apache.org/jira/browse/SPARK-14153 Project: Spark Issue Type: Question Components: Java API, ML Reporter: Dulaj Rajitha When I used data-set in the git-hub example, I get proper predictions. But when I used my data set It does not predict well. (I has a large RMSE). I used cross validator for ALS (in Spark ML) and here are the best model parameters. 16/03/25 12:03:06 INFO CrossValidator: Average cross-validation metrics: WrappedArray(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN) 16/03/25 12:03:06 INFO CrossValidator: Best set of parameters: { als_c911c0e183a3-alpha: 0.02, als_c911c0e183a3-rank: 500, als_c911c0e183a3-regParam: 0.03 } But when I used movie data set It gives proper values for parameters. as below 16/03/24 14:07:07 INFO CrossValidator: Average cross-validation metrics: WrappedArray(1.9481584447713676, 2.0501457159728944, 2.0600857505406935, 1.9457234533860048, 2.0494498583414282, 2.0595306613827002, 1.9488322049918922, 2.0489573853226797, 2.0584252131752, 1.9464006741621391, 2.048241271354197, 2.057853990227443) 16/03/24 14:07:07 INFO CrossValidator: Best set of parameters: { als_31a605e7717b-alpha: 0.02, als_31a605e7717b-rank: 1, als_31a605e7717b-regParam: 0.02 } 16/03/24 14:07:07 INFO CrossValidator: Best cross-validation metric: 1.9457234533860048. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-14093) org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with HDFS
[ https://issues.apache.org/jira/browse/SPARK-14093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dulaj Rajitha closed SPARK-14093. - Resolution: Fixed Fix Version/s: 1.6.0 I was able to solve the issue by using the ALSModel.write().saveImpl( hdfsPath) method to save to the HDFS > org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with > HDFS > > > Key: SPARK-14093 > URL: https://issues.apache.org/jira/browse/SPARK-14093 > Project: Spark > Issue Type: Bug > Components: Java API, ML >Affects Versions: 1.6.0 >Reporter: Dulaj Rajitha > Fix For: 1.6.0 > > > ALSModel.save(path) is not working for HDFS paths and it gives > java.lang.IllegalArgumentException: Wrong FS: > hdfs://192.168.1.71/res/als.model, expected: file:/// -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14093) org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with HDFS
[ https://issues.apache.org/jira/browse/SPARK-14093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209724#comment-15209724 ] Dulaj Rajitha commented on SPARK-14093: --- I was able to solve the issue by using the ALSModel.write().saveImpl( hdfsPath) method to save to the HDFS. > org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with > HDFS > > > Key: SPARK-14093 > URL: https://issues.apache.org/jira/browse/SPARK-14093 > Project: Spark > Issue Type: Bug > Components: Java API, ML >Affects Versions: 1.6.0 >Reporter: Dulaj Rajitha > > ALSModel.save(path) is not working for HDFS paths and it gives > java.lang.IllegalArgumentException: Wrong FS: > hdfs://192.168.1.71/res/als.model, expected: file:/// -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14093) org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with HDFS
[ https://issues.apache.org/jira/browse/SPARK-14093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208078#comment-15208078 ] Dulaj Rajitha commented on SPARK-14093: --- I'm running spark in standalone mode and I need to save the trained model and load it back from HDFS using ALSModel.save method and it gives an error as Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://192.168.1.71/res/als.model, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80) at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:529) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80) at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:130) at org.apache.spark.ml.recommendation.ALSModel.save(ALS.scala:182) at it.codegen.rnd.ml.test.ALSImplicitTest.main(ALSImplicitTest.java:107) . When I used jsc.parallelize( alsModels ).saveAsObjectFile( modelPath ), I cannot load back the exact model which I have saved. > org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with > HDFS > > > Key: SPARK-14093 > URL: https://issues.apache.org/jira/browse/SPARK-14093 > Project: Spark > Issue Type: Bug > Components: Java API, ML >Affects Versions: 1.6.0 >Reporter: Dulaj Rajitha > > ALSModel.save(path) is not working for HDFS paths and it gives > java.lang.IllegalArgumentException: Wrong FS: > hdfs://192.168.1.71/res/als.model, expected: file:/// -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14093) org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with HDFS
Dulaj Rajitha created SPARK-14093: - Summary: org.apache.spark.ml.recommendation.ALSModel.save method cannot be used with HDFS Key: SPARK-14093 URL: https://issues.apache.org/jira/browse/SPARK-14093 Project: Spark Issue Type: Bug Components: Java API, ML Affects Versions: 1.6.0 Reporter: Dulaj Rajitha ALSModel.save(path) is not working for HDFS paths and it gives java.lang.IllegalArgumentException: Wrong FS: hdfs://192.168.1.71/res/als.model, expected: file:/// -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org