[jira] [Commented] (SPARK-9230) SparkR RFormula should support StringType features
[ https://issues.apache.org/jira/browse/SPARK-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635738#comment-14635738 ] Apache Spark commented on SPARK-9230: - User 'ericl' has created a pull request for this issue: https://github.com/apache/spark/pull/7574 SparkR RFormula should support StringType features -- Key: SPARK-9230 URL: https://issues.apache.org/jira/browse/SPARK-9230 Project: Spark Issue Type: New Feature Components: ML, SparkR Reporter: Eric Liang StringType features will need to be encoded using OneHotEncoder to be used for regression. See umbrella design doc https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9230) SparkR RFormula should support StringType features
[ https://issues.apache.org/jira/browse/SPARK-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635776#comment-14635776 ] Shivaram Venkataraman commented on SPARK-9230: -- [~ekhliang] [~mengxr] One more thing that would be good to do is to make these formulas also work with actual columns in R. For example in DataFrames we parse columns with df$col_name. So it will be great to support a formula of the kind df$Sepal_Length ~ df$Sepal_Width SparkR RFormula should support StringType features -- Key: SPARK-9230 URL: https://issues.apache.org/jira/browse/SPARK-9230 Project: Spark Issue Type: New Feature Components: ML, SparkR Reporter: Eric Liang StringType features will need to be encoded using OneHotEncoder to be used for regression. See umbrella design doc https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9230) SparkR RFormula should support StringType features
[ https://issues.apache.org/jira/browse/SPARK-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635790#comment-14635790 ] Eric Liang commented on SPARK-9230: --- Hmm, I think it would be hard to support that in a cross-language manner (e.g. since you would have to resolve variables on the R side). Though, we'd probably have to do that anyways to support other R expressions, e.g. log(foo). SparkR RFormula should support StringType features -- Key: SPARK-9230 URL: https://issues.apache.org/jira/browse/SPARK-9230 Project: Spark Issue Type: New Feature Components: ML, SparkR Reporter: Eric Liang StringType features will need to be encoded using OneHotEncoder to be used for regression. See umbrella design doc https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9230) SparkR RFormula should support StringType features
[ https://issues.apache.org/jira/browse/SPARK-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635796#comment-14635796 ] Shivaram Venkataraman commented on SPARK-9230: -- The thing to do there would be to capture it as SparkR DataFrame columns. so df$Sepal_Width actually resolves to a Java column class and then we can parse those in RFormula -- So in some sense we'll have two constructors, one from strings and one from DataFrame columns. SparkR RFormula should support StringType features -- Key: SPARK-9230 URL: https://issues.apache.org/jira/browse/SPARK-9230 Project: Spark Issue Type: New Feature Components: ML, SparkR Reporter: Eric Liang StringType features will need to be encoded using OneHotEncoder to be used for regression. See umbrella design doc https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org