[GitHub] spark pull request #18458: [SPARK-20889][SparkR] Grouped documentation for C...

2017-06-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18458


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18458: [SPARK-20889][SparkR] Grouped documentation for C...

2017-06-28 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18458#discussion_r124716253
  
--- Diff: R/pkg/R/functions.R ---
@@ -2156,28 +2178,23 @@ setMethod("date_format", signature(y = "Column", x 
= "character"),
 column(jc)
   })
 
-#' from_json
-#'
-#' Parses a column containing a JSON string into a Column of 
\code{structType} with the specified
-#' \code{schema} or array of \code{structType} if \code{as.json.array} is 
set to \code{TRUE}.
-#' If the string is unparseable, the Column will contains the value NA.
+#' @details
+#' \code{from_json}: Parses a column containing a JSON string into a 
Column of \code{structType}
+#' with the specified \code{schema} or array of \code{structType} if 
\code{as.json.array} is set
+#' to \code{TRUE}. If the string is unparseable, the Column will contains 
the value NA.
 #'
-#' @param x Column containing the JSON string.
+#' @rdname column_collection_functions
 #' @param schema a structType object to use as the schema to use when 
parsing the JSON string.
 #' @param as.json.array indicating if input string is JSON array of 
objects or a single object.
-#' @param ... additional named properties to control how the json is 
parsed, accepts the same
-#'options as the JSON data source.
-#'
-#' @family non-aggregate functions
-#' @rdname from_json
-#' @name from_json
-#' @aliases from_json,Column,structType-method
+#' @aliases from_json from_json,Column,structType-method
 #' @export
 #' @examples
+#'
 #' \dontrun{
-#' schema <- structType(structField("name", "string"),
-#' select(df, from_json(df$value, schema, dateFormat = "dd/MM/"))
-#'}
+#' df2 <- sql("SELECT named_struct('name', 'Bob') as people")
+#' df2 <- mutate(df2, people_json = to_json(df2$people))
+#' schema <- structType(structField("name", "string"))
+#' head(select(df2, from_json(df2$people_json, schema)))}
--- End diff --

Thanks for catching this. Added an example. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18458: [SPARK-20889][SparkR] Grouped documentation for C...

2017-06-28 Thread actuaryzhang
Github user actuaryzhang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18458#discussion_r124715019
  
--- Diff: R/pkg/R/functions.R ---
@@ -2156,28 +2178,23 @@ setMethod("date_format", signature(y = "Column", x 
= "character"),
 column(jc)
   })
 
-#' from_json
-#'
-#' Parses a column containing a JSON string into a Column of 
\code{structType} with the specified
-#' \code{schema} or array of \code{structType} if \code{as.json.array} is 
set to \code{TRUE}.
-#' If the string is unparseable, the Column will contains the value NA.
+#' @details
+#' \code{from_json}: Parses a column containing a JSON string into a 
Column of \code{structType}
+#' with the specified \code{schema} or array of \code{structType} if 
\code{as.json.array} is set
+#' to \code{TRUE}. If the string is unparseable, the Column will contains 
the value NA.
--- End diff --

Corrected the typo. Will consider updating `null` & `NA` in the future :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18458: [SPARK-20889][SparkR] Grouped documentation for C...

2017-06-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18458#discussion_r124706483
  
--- Diff: R/pkg/R/functions.R ---
@@ -2156,28 +2178,23 @@ setMethod("date_format", signature(y = "Column", x 
= "character"),
 column(jc)
   })
 
-#' from_json
-#'
-#' Parses a column containing a JSON string into a Column of 
\code{structType} with the specified
-#' \code{schema} or array of \code{structType} if \code{as.json.array} is 
set to \code{TRUE}.
-#' If the string is unparseable, the Column will contains the value NA.
+#' @details
+#' \code{from_json}: Parses a column containing a JSON string into a 
Column of \code{structType}
+#' with the specified \code{schema} or array of \code{structType} if 
\code{as.json.array} is set
+#' to \code{TRUE}. If the string is unparseable, the Column will contains 
the value NA.
 #'
-#' @param x Column containing the JSON string.
+#' @rdname column_collection_functions
 #' @param schema a structType object to use as the schema to use when 
parsing the JSON string.
 #' @param as.json.array indicating if input string is JSON array of 
objects or a single object.
-#' @param ... additional named properties to control how the json is 
parsed, accepts the same
-#'options as the JSON data source.
-#'
-#' @family non-aggregate functions
-#' @rdname from_json
-#' @name from_json
-#' @aliases from_json,Column,structType-method
+#' @aliases from_json from_json,Column,structType-method
 #' @export
 #' @examples
+#'
 #' \dontrun{
-#' schema <- structType(structField("name", "string"),
-#' select(df, from_json(df$value, schema, dateFormat = "dd/MM/"))
-#'}
+#' df2 <- sql("SELECT named_struct('name', 'Bob') as people")
+#' df2 <- mutate(df2, people_json = to_json(df2$people))
+#' schema <- structType(structField("name", "string"))
+#' head(select(df2, from_json(df2$people_json, schema)))}
--- End diff --

I think it's worthwhile to keep `dateFormat = "dd/MM/")` in the example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18458: [SPARK-20889][SparkR] Grouped documentation for C...

2017-06-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18458#discussion_r124706890
  
--- Diff: R/pkg/R/functions.R ---
@@ -132,6 +132,35 @@ NULL
 #' df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))}
 NULL
 
+#' Collection functions for Column operations
+#'
+#' Collection functions defined for \code{Column}.
+#'
+#' @param x Column to compute on. Note the difference in the following 
methods:
+#'  \itemize{
+#'  \item \code{to_json}: it is the column containing the struct 
or array of the structs.
+#'  \item \code{from_json}: it is the column containing the JSON 
string.
+#'  }
+#' @param ... additional argument(s). In \code{to_json} and 
\code{from_json}, this contains
+#'additional named properties to control how it is converted, 
accepts the same
+#'options as the JSON data source.
+#' @name column_collection_functions
+#' @rdname column_collection_functions
+#' @family collection functions
+#' @examples
+#' \dontrun{
+#' # Dataframe used throughout this doc
+#' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))
+#' df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))
+#' tmp <- mutate(df, v1 = create_array(df$mpg, df$cyl, df$hp))
+#' head(select(tmp, array_contains(tmp$v1, 21), size(tmp$v1)))
+#' tmp2 <- mutate(tmp, v2 = explode(tmp$v1))
+#' head(tmp2)
+#' head(select(tmp, posexplode(tmp$v1)))
+#' head(select(tmp, sort_array(tmp$v1)))
+#' head(select(tmp, sort_array(tmp$v1, FALSE)))}
--- End diff --

nit, let's improve this? I think in sort_array we could be more clear, eg. 
`sort_array(tmp$v1, asc = FALSE)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18458: [SPARK-20889][SparkR] Grouped documentation for C...

2017-06-28 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18458#discussion_r124706681
  
--- Diff: R/pkg/R/functions.R ---
@@ -2156,28 +2178,23 @@ setMethod("date_format", signature(y = "Column", x 
= "character"),
 column(jc)
   })
 
-#' from_json
-#'
-#' Parses a column containing a JSON string into a Column of 
\code{structType} with the specified
-#' \code{schema} or array of \code{structType} if \code{as.json.array} is 
set to \code{TRUE}.
-#' If the string is unparseable, the Column will contains the value NA.
+#' @details
+#' \code{from_json}: Parses a column containing a JSON string into a 
Column of \code{structType}
+#' with the specified \code{schema} or array of \code{structType} if 
\code{as.json.array} is set
+#' to \code{TRUE}. If the string is unparseable, the Column will contains 
the value NA.
--- End diff --

btw, `will contains the value NA.` is very consistently documented. in this 
case this is right, but there are many other that says the value is `null` 
(note lower case) which isn't quite correct on the R side.

another project? :)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18458: [SPARK-20889][SparkR] Grouped documentation for C...

2017-06-28 Thread actuaryzhang
GitHub user actuaryzhang opened a pull request:

https://github.com/apache/spark/pull/18458

[SPARK-20889][SparkR] Grouped documentation for COLLECTOIN column methods

## What changes were proposed in this pull request?

Grouped documentation for column collection methods.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/actuaryzhang/spark sparkRDocCollection

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18458.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18458


commit 9bdc739483ec1d0493eda1dbb0e4eef761c31929
Author: actuaryzhang 
Date:   2017-06-28T17:18:12Z

update doc for collection functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org