[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422597#comment-15422597 ] Dulaj Rajitha commented on SPARK-16993: --- The issue is solved and that was not a bug. Thank you. There was a error in with column statement and I had used a column form a wrong data-frame. > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15422384#comment-15422384 ] Yanbo Liang commented on SPARK-16993: - [~dulajrajitha] I can not reproduce your reported issue, the following code works well. {code} val data = spark.read.format("libsvm").load("/Users/yliang/data/trunk0/spark/data/mllib/sample_libsvm_data.txt") val featureIndexer = new VectorIndexer() .setInputCol("features") .setOutputCol("indexedFeatures") .setMaxCategories(4) .fit(data) val trainingData = data val testData = data.drop("label") val rf = new RandomForestRegressor() .setLabelCol("label") .setFeaturesCol("indexedFeatures") val pipeline = new Pipeline() .setStages(Array(featureIndexer, rf)) val model = pipeline.fit(trainingData) val predictions = model.transform(testData) predictions.select("prediction", "features").show(5) {code} Could you tell me whether this code snippet coincide with your issues? If yes, I think it's not a bug. Thanks! > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417892#comment-15417892 ] Sean Owen commented on SPARK-16993: --- You would need to show some code or more about the error. > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417847#comment-15417847 ] Dulaj Rajitha commented on SPARK-16993: --- But the thing is if add dummy column as as the label column, the process goes fine. I could not continue without add dummy the label column for the data set that needs the prediction. > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417192#comment-15417192 ] Sean Owen commented on SPARK-16993: --- Yes, that's clear. You haven't said what the error is, and I expect it's coming from some other misunderstanding, because the class in question does not use the label column in transform() > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15417144#comment-15417144 ] Dulaj Rajitha commented on SPARK-16993: --- Here is the scenario. My train data set has : features,and label column Using that I do train and get a model. (Also I do an evaluation using a split of the training data.) Using the above model I need to predict for data set which has only id and features column. But when using the second data frame I get the error. So how we use the same model for different data frame for prediction after evaluation? > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415222#comment-15415222 ] Sean Owen commented on SPARK-16993: --- You need a label for training and evaluation. You do not need one for prediction, of course. But I do not see any use of labelCol in transform methods. That's why I'm asking for more detail, like where this exception occurs. I'm still not sure you're actually making predictions in your code. > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415102#comment-15415102 ] Dulaj Rajitha commented on SPARK-16993: --- Is there a method to use do the prediction for non evaluating purposes (Just predictictions). > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415081#comment-15415081 ] Dulaj Rajitha commented on SPARK-16993: --- I do not want to evaluate. I just need to predict using the model I got from the regressor.fit(dataframe) method. > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415070#comment-15415070 ] Sean Owen commented on SPARK-16993: --- You certainly need labels in your held out test set for evaluation. But you seem to be talking about model.transform which is different. It is not clear what you are describing. > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415063#comment-15415063 ] Dulaj Rajitha commented on SPARK-16993: --- When using the RandomForestRegressor. I trained using a dataframe with the label column and got a model. by: model = regressor.fit(trainData) But my test data does not have a label column. (Since this is the column I need to be prediicted). therefore when transforming I got a error. model.transform(test) > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16993) model.transform without label column in random forest regression
[ https://issues.apache.org/jira/browse/SPARK-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15415050#comment-15415050 ] Sean Owen commented on SPARK-16993: --- Questions should go to user@. Can you clarify? where do you get this exception? the transform method does not require a label, no. > model.transform without label column in random forest regression > > > Key: SPARK-16993 > URL: https://issues.apache.org/jira/browse/SPARK-16993 > Project: Spark > Issue Type: Question > Components: Java API, ML >Reporter: Dulaj Rajitha > > I need to use a separate data set to prediction (Not as show in example's > training data split). > But those data do not have the label column. (Since these data are the data > that needs to be predict the label). > but model.transform is informing label column is missing. > org.apache.spark.sql.AnalysisException: cannot resolve 'label' given input > columns: [id,features,prediction] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org