[jira] [Commented] (SPARK-15153) SparkR spark.naiveBayes throws error when label is numeric type

2016-10-11 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566251#comment-15566251
 ] 

Joseph K. Bradley commented on SPARK-15153:
---

Note I'm setting the target version for 2.1, not 2.0.x, since the fix requires 
a public API change in the preceding PR.

> SparkR spark.naiveBayes throws error when label is numeric type
> ---
>
> Key: SPARK-15153
> URL: https://issues.apache.org/jira/browse/SPARK-15153
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SparkR
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
>
> When the label of dataset is numeric type, SparkR spark.naiveBayes will throw 
> error. This bug is easy to reproduce:
> {code}
> t <- as.data.frame(Titanic)
> t1 <- t[t$Freq > 0, -5]
> t1$NumericSurvived <- ifelse(t1$Survived == "No", 0, 1)
> t2 <- t1[-4]
> df <- suppressWarnings(createDataFrame(sqlContext, t2))
> m <- spark.naiveBayes(df, NumericSurvived ~ .)
> 16/05/05 03:26:17 ERROR RBackendHandler: fit on 
> org.apache.spark.ml.r.NaiveBayesWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
>   java.lang.ClassCastException: 
> org.apache.spark.ml.attribute.UnresolvedAttribute$ cannot be cast to 
> org.apache.spark.ml.attribute.NominalAttribute
>   at 
> org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:66)
>   at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
>   at 
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
>   at 
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at io.netty.channel.AbstractChannelHandlerContext.invo
> {code}
> In RFormula, the response variable type could be string or numeric. If it's 
> string, RFormula will transform it to label of DoubleType by StringIndexer 
> and set corresponding column metadata; otherwise, RFormula will directly use 
> it as label when training model (and assumes that it was numbered from 0, 
> ..., maxLabelIndex). 
> When we extract labels at ml.r.NaiveBayesWrapper, we should handle it 
> according the type of the response variable (string or numeric).
> cc [~mengxr] [~josephkb]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15153) SparkR spark.naiveBayes throws error when label is numeric type

2016-10-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565004#comment-15565004
 ] 

Apache Spark commented on SPARK-15153:
--

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/15431

> SparkR spark.naiveBayes throws error when label is numeric type
> ---
>
> Key: SPARK-15153
> URL: https://issues.apache.org/jira/browse/SPARK-15153
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SparkR
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
>
> When the label of dataset is numeric type, SparkR spark.naiveBayes will throw 
> error. This bug is easy to reproduce:
> {code}
> t <- as.data.frame(Titanic)
> t1 <- t[t$Freq > 0, -5]
> t1$NumericSurvived <- ifelse(t1$Survived == "No", 0, 1)
> t2 <- t1[-4]
> df <- suppressWarnings(createDataFrame(sqlContext, t2))
> m <- spark.naiveBayes(df, NumericSurvived ~ .)
> 16/05/05 03:26:17 ERROR RBackendHandler: fit on 
> org.apache.spark.ml.r.NaiveBayesWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
>   java.lang.ClassCastException: 
> org.apache.spark.ml.attribute.UnresolvedAttribute$ cannot be cast to 
> org.apache.spark.ml.attribute.NominalAttribute
>   at 
> org.apache.spark.ml.r.NaiveBayesWrapper$.fit(NaiveBayesWrapper.scala:66)
>   at org.apache.spark.ml.r.NaiveBayesWrapper.fit(NaiveBayesWrapper.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.api.r.RBackendHandler.handleMethodCall(RBackendHandler.scala:141)
>   at 
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:86)
>   at 
> org.apache.spark.api.r.RBackendHandler.channelRead0(RBackendHandler.scala:38)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at io.netty.channel.AbstractChannelHandlerContext.invo
> {code}
> In RFormula, the response variable type could be string or numeric. If it's 
> string, RFormula will transform it to label of DoubleType by StringIndexer 
> and set corresponding column metadata; otherwise, RFormula will directly use 
> it as label when training model (and assumes that it was numbered from 0, 
> ..., maxLabelIndex). 
> When we extract labels at ml.r.NaiveBayesWrapper, we should handle it 
> according the type of the response variable (string or numeric).
> cc [~mengxr] [~josephkb]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org