Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17851#discussion_r114724979
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -3715,3 +3715,34 @@ setMethod("rollup",
                 sgd <- callJMethod(x@sdf, "rollup", jcol)
                 groupedData(sgd)
               })
    +
    +#' hint
    +#'
    +#' Specifies execution plan hint on the current SparkDataFrame.
    +#'
    +#' @param x a SparkDataFrame.
    +#' @param name a name of the hint.
    +#' @param ... additional argument(s) passed to the method.
    +#'
    +#' @return A SparkDataFrame.
    +#' @family SparkDataFrame functions
    +#' @aliases hint,SparkDataFrame,character-method
    +#' @rdname hint
    +#' @name hint
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(mtcars)
    +#' avg_mpg <- mean(groupBy(createDataFrame(mtcars), "cyl"), "mpg")
    --- End diff --
    
    right - I think the example makes sense now but it might not be very 
obvious - for example, 
    ```
    createDataFrame(mtcars)
    createDataFrame(mtcars)
    ```
    vs
    ```
    df <- createDataFrame(mtcars)
    df
    ```
    is not very subtle unless you know what Spark is doing differently here. 
This is why I suggested pointing out the need to have distinct "copies" of data


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to