Github user shivaram commented on a diff in the pull request:
https://github.com/apache/spark/pull/7280#discussion_r34277283
--- Diff: R/pkg/inst/tests/test_sparkSQL.R ---
@@ -108,6 +108,14 @@ test_that("create DataFrame from RDD", {
expect_equal(count(df), 10)
expect_equal(columns(df), c("a", "b"))
expect_equal(dtypes(df), list(c("a", "int"), c("b", "string")))
+
+ localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23,
18), height=c(164.10, 181.4, 173.7))
+ schema <- structType(structField("name", "string"), structField("age",
"integer"), structField("height", "float"))
+ df <- createDataFrame(sqlContext, localDF, schema)
--- End diff --
I think the main reason for supporting user-defined schema was to have
support for column names that are different from the ones given in the local R
data frame. We could of course switch to only picking up names from the given
schema rather than the types -- but I also think specifying schema is an
advanced option, so expecting users to get it to match their data types is
fine.
As a follow up JIRA, we could file a new issue to warn or print an error if
we find that the schema specified doesn't match the types of values being
serialized.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]