[GitHub] spark pull request: [SPARK-11031][SPARKR] Method str() on a DataFr...

olarayej Thu, 12 Nov 2015 17:23:48 -0800

Github user olarayej commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9613#discussion_r44740371
  
    --- Diff: R/pkg/R/DataFrame.R ---
    @@ -2200,4 +2200,107 @@ setMethod("coltypes",
                 rTypes[naIndices] <- types[naIndices]
     
                 rTypes
    -          })
    \ No newline at end of file
    +          })
    +
    +#' Display the structure of a DataFrame, including column names, column 
types, as well as a
    +#' a small sample of rows.
    +#' @name str
    +#' @title Compactly display the structure of a dataset
    +#' @rdname str_data_frame
    +#' @family dataframe_funcs
    +#' @param x a DataFrame
    +#' @examples \dontrun{
    +#'
    +#' # Create a DataFrame from the Iris dataset
    +#' irisDF <- createDataFrame(sqlContext, iris)
    +#' 
    +#' # Show the structure of the DataFrame
    +#' str(irisDF)
    +#' 
    +#' }
    +setMethod("str", signature="DataFrame", definition=
    +            function(object) {
    +
    +              # A synonym for easily concatenating strings
    +              "%++%" <- function(x, y) {
    +                paste(x, y, sep = "")
    +              }
    +
    +              # TODO: These could be made global parameters, though in R 
it's not the case
    +              DEFAULT_HEAD_ROWS <- 6
    +              MAX_CHAR_PER_ROW <- 120
    +              MAX_COLS <- 100
    +
    +              # Get the column names and types of the DataFrame
    +              names <- names(object)
    +              types <- coltypes(object)
    +
    +              # Get the number of rows.
    +              # TODO: Ideally, this should be cached
    +              cachedCount <- nrow(object)
    +
    +              # Get the first elements of the dataset. Limit number of 
columns accordingly
    +              dataFrame <- if (ncol(object) > MAX_COLS) {
    +                             head(object[, c(1:MAX_COLS)], 
DEFAULT_HEAD_ROWS)
    +                           } else {
    +                             head(object, DEFAULT_HEAD_ROWS)
    +                           }
    +
    +              # The number of observations will be displayed only if the 
number
    +              # of rows of the dataset has already been cached.
    +              if (!is.null(cachedCount)) {
    +                cat("'" %++% class(object) %++% "': " %++% cachedCount 
%++% " obs. of " %++%
    +                      length(names) %++% " variables:\n")
    +              } else {
    +                cat("'" %++% class(object) %++% "': " %++% length(names) 
%++% " variables:\n")
    +              }
    +
    +              # Whether the ... should be printed at the end of each row
    +              ellipsis <- FALSE
    +
    +              # Add ellipsis (i.e., "...") if there are more rows than 
shown
    +              if (!is.null(cachedCount)) {
    +                if (nrow(object) > DEFAULT_HEAD_ROWS) {
    +                  ellipsis <- TRUE
    +                }
    +              }
    +
    +              if (nrow(dataFrame) > 0) {
    +                for (i in 1 : ncol(dataFrame)) {
    +                  firstElements <- ""
    +
    +                  # Get the first elements for each column
    +                  if (types[i] == "chr") {
    +                    firstElements <- paste("\"" %++% dataFrame[,i] %++% 
"\"", collapse = " ")
    +                  } else {
    +                    firstElements <- paste(dataFrame[,i], collapse = " ")
    +                  }
    +
    +                  # Add the corresponding number of spaces for alignment
    +                  spaces <- paste(rep(" ", max(nchar(names) - 
nchar(names[i]))), collapse="")
    +
    +                  # Get the short type. For 'character', it would be 'chr';
    +                  # 'for numeric', it's 'num', etc.
    +                  dataType <- SHORT_TYPES[[types[i]]]
    +                  if (is.null(dataType)) {
    +                    dataType <- substring(types[i], 1, 3)
    +                  }
    +
    +                  # Concatenate the colnames, coltypes, and first
    +                  # elements of each column
    +                  line <- " $ " %++% names[i] %++% spaces %++% ": " %++%
    +                    dataType %++% " " %++% firstElements
    +
    +                  # Chop off extra characters if this is too long
    +                  cat(substr(line, 1, MAX_CHAR_PER_ROW))
    --- End diff --
    
    I'm using a maximum of 120 characters per row. After that, I add "..." if 
necessary. Do you mean I should have MAX_CHAR_PER_ROW - 4? I think it shouldn't 
be okay either or. We could also change the value of MAX_CHAR_PER_ROW 
accordingly.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11031][SPARKR] Method str() on a DataFr...

Reply via email to