Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7280#discussion_r34277283
  
    --- Diff: R/pkg/inst/tests/test_sparkSQL.R ---
    @@ -108,6 +108,14 @@ test_that("create DataFrame from RDD", {
       expect_equal(count(df), 10)
       expect_equal(columns(df), c("a", "b"))
       expect_equal(dtypes(df), list(c("a", "int"), c("b", "string")))
    +
    +  localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 
18), height=c(164.10, 181.4, 173.7))
    +  schema <- structType(structField("name", "string"), structField("age", 
"integer"), structField("height", "float"))
    +  df <- createDataFrame(sqlContext, localDF, schema)
    --- End diff --
    
    I think the main reason for supporting user-defined schema was to have 
support for column names that are different from the ones given in the local R 
data frame. We could of course switch to only picking up names from the given 
schema rather than the types -- but I also think specifying schema is an 
advanced option, so expecting users to get it to match their data types is 
fine.  
    
    As a follow up JIRA, we could file a new issue to warn or print an error if 
we find that the schema specified doesn't match the types of values being 
serialized.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to