Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/9613#discussion_r45532294 --- Diff: R/pkg/R/DataFrame.R --- @@ -2199,3 +2199,97 @@ setMethod("coltypes", rTypes }) + +#' Display the structure of a DataFrame, including column names, column types, as well as a +#' a small sample of rows. +#' @name str +#' @title Compactly display the structure of a dataset +#' @rdname str +#' @family DataFrame functions +#' @param object a DataFrame +#' @examples \dontrun{ +#' # Create a DataFrame from the Iris dataset +#' irisDF <- createDataFrame(sqlContext, iris) +#' +#' # Show the structure of the DataFrame +#' str(irisDF) +#' } +setMethod("str", + signature(object = "DataFrame"), + function(object) { + + # TODO: These could be made global parameters, though in R it's not the case + MAX_CHAR_PER_ROW <- 120 + MAX_COLS <- 100 + + # Get the column names and types of the DataFrame + names <- names(object) + types <- coltypes(object) + + # Get the number of rows. + # TODO: Ideally, this should be cached + cachedCount <- nrow(object) + + # Get the first elements of the dataset. Limit number of columns accordingly + dataFrame <- if (ncol(object) > MAX_COLS) { + head(object[, c(1:MAX_COLS)]) + } else { + head(object) + } + + # The number of observations will be displayed only if the number + # of rows of the dataset has already been cached. + if (!is.null(cachedCount)) { + cat(paste0("'", class(object), "': ", cachedCount, " obs. of ", + length(names), " variables:\n")) + } else { + cat(paste0("'", class(object), "': ", length(names), " variables:\n")) + } + + # Whether the ... should be printed at the end of each row + ellipsis <- FALSE + + # Add ellipsis (i.e., "...") if there are more rows than shown + if (!is.null(cachedCount) && (cachedCount > 6)) { + ellipsis <- TRUE + } + + if (nrow(dataFrame) > 0) { --- End diff -- Just curious: Is there a base R function (like say `str` on a `data.frame`) that we can use here ? Or are we not able to do this because of the type mapping we need ?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org