[jira] [Commented] (SPARK-9230) SparkR RFormula should support StringType features

2015-07-21 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635738#comment-14635738
 ] 

Apache Spark commented on SPARK-9230:
-

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/7574

 SparkR RFormula should support StringType features
 --

 Key: SPARK-9230
 URL: https://issues.apache.org/jira/browse/SPARK-9230
 Project: Spark
  Issue Type: New Feature
  Components: ML, SparkR
Reporter: Eric Liang

 StringType features will need to be encoded using OneHotEncoder to be used 
 for regression. See umbrella design doc 
 https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9230) SparkR RFormula should support StringType features

2015-07-21 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635776#comment-14635776
 ] 

Shivaram Venkataraman commented on SPARK-9230:
--

[~ekhliang] [~mengxr] One more thing that would be good to do is to make these 
formulas also work with actual columns in R. For example in DataFrames we parse 
columns with df$col_name. So it will be great to support a formula of the kind 
df$Sepal_Length ~ df$Sepal_Width

 SparkR RFormula should support StringType features
 --

 Key: SPARK-9230
 URL: https://issues.apache.org/jira/browse/SPARK-9230
 Project: Spark
  Issue Type: New Feature
  Components: ML, SparkR
Reporter: Eric Liang

 StringType features will need to be encoded using OneHotEncoder to be used 
 for regression. See umbrella design doc 
 https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9230) SparkR RFormula should support StringType features

2015-07-21 Thread Eric Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635790#comment-14635790
 ] 

Eric Liang commented on SPARK-9230:
---

Hmm, I think it would be hard to support that in a cross-language manner (e.g. 
since you would have to resolve variables on the R side). Though, we'd probably 
have to do that anyways to support other R expressions, e.g. log(foo).

 SparkR RFormula should support StringType features
 --

 Key: SPARK-9230
 URL: https://issues.apache.org/jira/browse/SPARK-9230
 Project: Spark
  Issue Type: New Feature
  Components: ML, SparkR
Reporter: Eric Liang

 StringType features will need to be encoded using OneHotEncoder to be used 
 for regression. See umbrella design doc 
 https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9230) SparkR RFormula should support StringType features

2015-07-21 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635796#comment-14635796
 ] 

Shivaram Venkataraman commented on SPARK-9230:
--

The thing to do there would be to capture it as SparkR DataFrame columns. so 
df$Sepal_Width actually resolves to a Java column class and then we can parse 
those in RFormula -- So in some sense we'll have two constructors, one from 
strings and one from DataFrame columns.

 SparkR RFormula should support StringType features
 --

 Key: SPARK-9230
 URL: https://issues.apache.org/jira/browse/SPARK-9230
 Project: Spark
  Issue Type: New Feature
  Components: ML, SparkR
Reporter: Eric Liang

 StringType features will need to be encoded using OneHotEncoder to be used 
 for regression. See umbrella design doc 
 https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org