Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17178#discussion_r104876261
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -1342,28 +1342,52 @@ test_that("column functions", {
df <- read.json(mapTypeJsonPath)
j <- collect(select(df, alias(to_json(df$info), "json")))
expect_equal(j[order(j$json), ][1], "{\"age\":16,\"height\":176.5}")
- df <- as.DataFrame(j)
- schema <- structType(structField("age", "integer"),
- structField("height", "double"))
- s <- collect(select(df, alias(from_json(df$json, schema), "structcol")))
- expect_equal(ncol(s), 1)
- expect_equal(nrow(s), 3)
- expect_is(s[[1]][[1]], "struct")
- expect_true(any(apply(s, 1, function(x) { x[[1]]$age == 16 } )))
-
- # passing option
- df <- as.DataFrame(list(list("col" = "{\"date\":\"21/10/2014\"}")))
- schema2 <- structType(structField("date", "date"))
- expect_error(tryCatch(collect(select(df, from_json(df$col, schema2))),
- error = function(e) { stop(e) }),
- paste0(".*(java.lang.NumberFormatException: For input
string:).*"))
- s <- collect(select(df, from_json(df$col, schema2, dateFormat =
"dd/MM/yyyy")))
- expect_is(s[[1]][[1]]$date, "Date")
- expect_equal(as.character(s[[1]][[1]]$date), "2014-10-21")
-
- # check for unparseable
- df <- as.DataFrame(list(list("a" = "")))
- expect_equal(collect(select(df, from_json(df$a, schema)))[[1]][[1]], NA)
+
+ schemas <- list(structType(structField("age", "integer"),
structField("height", "double")),
+ "struct<age:integer,height:double>")
--- End diff --
Hm, @felixcheung, I think this resembles catalog string, maybe we could
reuse `CatalystSqlParser.parseDataType` to make this more formal and to do not
duplicate the efforts for defining a format or documentation. This is a big
change but if this is what we want in the future, I would like to argue that we
should keep this way.
For JSON string schema, there is an overloaded version of `from_json` that
takes that schema string. If we are going to expose it, it can be easily done.
However, I think you meant it is a bigger change because we need to provide
a way to produce this JSON string from types. Up to my knowledge, we can only
manually specify the schema via this calalog string. Is this true? If so, I
don't have a good idea for now to support this and I would rather close this if
you so as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]