[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190554 --- Diff: R/pkg/R/utils.R --- @@ -689,3 +689,7 @@ getSparkContext <- function() { sc <- get(".sparkRjsc", envir = .sparkREnv) sc } + +master_is_local <- function(master) { + grepl("^local(\\[[0-9\\*]*\\])?$", master, perl = TRUE) --- End diff -- `"^local(\\[[0-9\\*]+\\])?$"` (`+` instead of `*`) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190550 --- Diff: R/pkg/R/utils.R --- @@ -689,3 +689,7 @@ getSparkContext <- function() { sc <- get(".sparkRjsc", envir = .sparkREnv) sc } + +master_is_local <- function(master) { --- End diff -- `is_master_local` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190498 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use +#' @param local_dir local directory that Spark is installed to +#' @return \code{install_spark} returns the local directory +#' where Spark is found or installed +#' @rdname install_spark +#' @name install_spark +#' @export +#' @examples +#'\dontrun{ +#' install_spark() +#'} +#' @note install_spark since 2.1.0 +install_spark <- function(hadoop_version = NULL, mirror_url = NULL, + local_dir = NULL) { + version <- paste0("spark-", packageVersion("SparkR")) + hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop()) + packageName <- ifelse(hadoop_version == "without", +paste0(version, "-bin-without-hadoop"), +paste0(version, "-bin-hadoop", hadoop_version)) + if (is.null(local_dir)) { +local_dir <- getOption("spark.install.dir", spark_cache_path()) + } else { +local_dir <- normalizePath(local_dir) + } + + packageLocalDir <- file.path(local_dir, packageName) + + + # can use dir.exists(packageLocalDir) under R 3.2.0 or later + if (!is.na(file.info(packageLocalDir)$isdir)) { +fmt <- "Spark %s for Hadoop %s has been installed." +msg <- sprintf(fmt, version, hadoop_version) +message(msg) +return(invisible(packageLocalDir)) + } + + packageLocalPath <- paste0(packageLocalDir, ".tgz") + tarExists <- file.exists(packageLocalPath) + + if (tarExists) { +message("Tar file found. Installing...") + } else { +if (is.null(mirror_url)) { + message("Remote URL not provided. Use Apache default.") + mirror_url <- mirror_url_default() +} + +version <- "spark-2.0.0-rc4-bin" +# When 2.0 released, remove the above line and +# change spark-releases to spark in the statement below +packageRemotePath <- paste0( + file.path(mirror_url, "spark-releases", version, packageName), ".tgz") +fmt <- paste("Installing Spark %s for Hadoop %s.", + "Downloading from:\n- %s", + "Installing to:\n- %s", sep = "\n") +msg <- sprintf(fmt, version, hadoop_version, packageRemotePath, + packageLocalDir) +message(msg) + +fetchFail <- tryCatch(download.file(packageRemotePath, packageLocalPath), + error = function(e) { +msg <- paste0("Fetch failed from ", mirror_url, ".") +message(msg) +TRUE + }) +if (fetchFail) { + message("Try the backup option.") + mirror_sites <- tryCatch(read.csv(mirror_url_csv()), + error = function(e) stop("No csv file found.")) + mirror_url <- mirror_sites$url[1] + packageRemotePath <- paste0(file.path(mirror_url, version, packageName), + ".tgz") + message(sprintf("Downloading from:\n- %s", packageRemotePath)) + tryCatch(download.file(packageRemotePath, packageLocalPath), + error = function(e)
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190493 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use +#' @param local_dir local directory that Spark is installed to +#' @return \code{install_spark} returns the local directory +#' where Spark is found or installed +#' @rdname install_spark +#' @name install_spark +#' @export +#' @examples +#'\dontrun{ +#' install_spark() +#'} +#' @note install_spark since 2.1.0 +install_spark <- function(hadoop_version = NULL, mirror_url = NULL, + local_dir = NULL) { + version <- paste0("spark-", packageVersion("SparkR")) + hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop()) + packageName <- ifelse(hadoop_version == "without", +paste0(version, "-bin-without-hadoop"), +paste0(version, "-bin-hadoop", hadoop_version)) + if (is.null(local_dir)) { +local_dir <- getOption("spark.install.dir", spark_cache_path()) + } else { +local_dir <- normalizePath(local_dir) + } + + packageLocalDir <- file.path(local_dir, packageName) + + + # can use dir.exists(packageLocalDir) under R 3.2.0 or later + if (!is.na(file.info(packageLocalDir)$isdir)) { +fmt <- "Spark %s for Hadoop %s has been installed." +msg <- sprintf(fmt, version, hadoop_version) +message(msg) +return(invisible(packageLocalDir)) + } + + packageLocalPath <- paste0(packageLocalDir, ".tgz") + tarExists <- file.exists(packageLocalPath) + + if (tarExists) { +message("Tar file found. Installing...") + } else { +if (is.null(mirror_url)) { + message("Remote URL not provided. Use Apache default.") + mirror_url <- mirror_url_default() +} + +version <- "spark-2.0.0-rc4-bin" +# When 2.0 released, remove the above line and +# change spark-releases to spark in the statement below +packageRemotePath <- paste0( + file.path(mirror_url, "spark-releases", version, packageName), ".tgz") +fmt <- paste("Installing Spark %s for Hadoop %s.", + "Downloading from:\n- %s", + "Installing to:\n- %s", sep = "\n") +msg <- sprintf(fmt, version, hadoop_version, packageRemotePath, + packageLocalDir) +message(msg) + +fetchFail <- tryCatch(download.file(packageRemotePath, packageLocalPath), + error = function(e) { +msg <- paste0("Fetch failed from ", mirror_url, ".") +message(msg) +TRUE + }) +if (fetchFail) { + message("Try the backup option.") + mirror_sites <- tryCatch(read.csv(mirror_url_csv()), --- End diff -- Let's skip the CSV solution in this PR and use Apache mirrors as the default. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190477 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use +#' @param local_dir local directory that Spark is installed to --- End diff -- * `localDir` (`cacheDir` might be more accurate) * document default values for each OS --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190472 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without --- End diff -- * I would put `without` as the default to match the default download on http://spark.apache.org/downloads.html. * Do not mention other Hadoop versions as they might change. Use a `@seealso` link to redirect users to the download page for available Hadoop versions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190503 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use +#' @param local_dir local directory that Spark is installed to +#' @return \code{install_spark} returns the local directory +#' where Spark is found or installed +#' @rdname install_spark +#' @name install_spark +#' @export +#' @examples +#'\dontrun{ +#' install_spark() +#'} +#' @note install_spark since 2.1.0 +install_spark <- function(hadoop_version = NULL, mirror_url = NULL, + local_dir = NULL) { + version <- paste0("spark-", packageVersion("SparkR")) + hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop()) + packageName <- ifelse(hadoop_version == "without", +paste0(version, "-bin-without-hadoop"), +paste0(version, "-bin-hadoop", hadoop_version)) + if (is.null(local_dir)) { +local_dir <- getOption("spark.install.dir", spark_cache_path()) + } else { +local_dir <- normalizePath(local_dir) + } + + packageLocalDir <- file.path(local_dir, packageName) + + + # can use dir.exists(packageLocalDir) under R 3.2.0 or later + if (!is.na(file.info(packageLocalDir)$isdir)) { +fmt <- "Spark %s for Hadoop %s has been installed." +msg <- sprintf(fmt, version, hadoop_version) +message(msg) +return(invisible(packageLocalDir)) + } + + packageLocalPath <- paste0(packageLocalDir, ".tgz") + tarExists <- file.exists(packageLocalPath) + + if (tarExists) { +message("Tar file found. Installing...") + } else { +if (is.null(mirror_url)) { + message("Remote URL not provided. Use Apache default.") + mirror_url <- mirror_url_default() +} + +version <- "spark-2.0.0-rc4-bin" +# When 2.0 released, remove the above line and +# change spark-releases to spark in the statement below +packageRemotePath <- paste0( + file.path(mirror_url, "spark-releases", version, packageName), ".tgz") +fmt <- paste("Installing Spark %s for Hadoop %s.", + "Downloading from:\n- %s", + "Installing to:\n- %s", sep = "\n") +msg <- sprintf(fmt, version, hadoop_version, packageRemotePath, + packageLocalDir) +message(msg) + +fetchFail <- tryCatch(download.file(packageRemotePath, packageLocalPath), + error = function(e) { +msg <- paste0("Fetch failed from ", mirror_url, ".") +message(msg) +TRUE + }) +if (fetchFail) { + message("Try the backup option.") + mirror_sites <- tryCatch(read.csv(mirror_url_csv()), + error = function(e) stop("No csv file found.")) + mirror_url <- mirror_sites$url[1] + packageRemotePath <- paste0(file.path(mirror_url, version, packageName), + ".tgz") + message(sprintf("Downloading from:\n- %s", packageRemotePath)) + tryCatch(download.file(packageRemotePath, packageLocalPath), + error = function(e)
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190487 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use +#' @param local_dir local directory that Spark is installed to +#' @return \code{install_spark} returns the local directory +#' where Spark is found or installed +#' @rdname install_spark +#' @name install_spark +#' @export +#' @examples +#'\dontrun{ +#' install_spark() +#'} +#' @note install_spark since 2.1.0 +install_spark <- function(hadoop_version = NULL, mirror_url = NULL, + local_dir = NULL) { + version <- paste0("spark-", packageVersion("SparkR")) + hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop()) + packageName <- ifelse(hadoop_version == "without", +paste0(version, "-bin-without-hadoop"), +paste0(version, "-bin-hadoop", hadoop_version)) + if (is.null(local_dir)) { +local_dir <- getOption("spark.install.dir", spark_cache_path()) + } else { +local_dir <- normalizePath(local_dir) + } + + packageLocalDir <- file.path(local_dir, packageName) + + --- End diff -- remove extra empty line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190481 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use +#' @param local_dir local directory that Spark is installed to +#' @return \code{install_spark} returns the local directory +#' where Spark is found or installed +#' @rdname install_spark +#' @name install_spark +#' @export +#' @examples +#'\dontrun{ +#' install_spark() +#'} +#' @note install_spark since 2.1.0 +install_spark <- function(hadoop_version = NULL, mirror_url = NULL, + local_dir = NULL) { + version <- paste0("spark-", packageVersion("SparkR")) + hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop()) --- End diff -- See my comment below on `supported_versions_hadoop`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190507 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use +#' @param local_dir local directory that Spark is installed to +#' @return \code{install_spark} returns the local directory +#' where Spark is found or installed +#' @rdname install_spark +#' @name install_spark +#' @export +#' @examples +#'\dontrun{ +#' install_spark() +#'} +#' @note install_spark since 2.1.0 +install_spark <- function(hadoop_version = NULL, mirror_url = NULL, + local_dir = NULL) { + version <- paste0("spark-", packageVersion("SparkR")) + hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop()) + packageName <- ifelse(hadoop_version == "without", +paste0(version, "-bin-without-hadoop"), +paste0(version, "-bin-hadoop", hadoop_version)) + if (is.null(local_dir)) { +local_dir <- getOption("spark.install.dir", spark_cache_path()) + } else { +local_dir <- normalizePath(local_dir) + } + + packageLocalDir <- file.path(local_dir, packageName) + + + # can use dir.exists(packageLocalDir) under R 3.2.0 or later + if (!is.na(file.info(packageLocalDir)$isdir)) { +fmt <- "Spark %s for Hadoop %s has been installed." +msg <- sprintf(fmt, version, hadoop_version) +message(msg) +return(invisible(packageLocalDir)) + } + + packageLocalPath <- paste0(packageLocalDir, ".tgz") + tarExists <- file.exists(packageLocalPath) + + if (tarExists) { +message("Tar file found. Installing...") + } else { +if (is.null(mirror_url)) { + message("Remote URL not provided. Use Apache default.") + mirror_url <- mirror_url_default() +} + +version <- "spark-2.0.0-rc4-bin" +# When 2.0 released, remove the above line and +# change spark-releases to spark in the statement below +packageRemotePath <- paste0( + file.path(mirror_url, "spark-releases", version, packageName), ".tgz") +fmt <- paste("Installing Spark %s for Hadoop %s.", + "Downloading from:\n- %s", + "Installing to:\n- %s", sep = "\n") +msg <- sprintf(fmt, version, hadoop_version, packageRemotePath, + packageLocalDir) +message(msg) + +fetchFail <- tryCatch(download.file(packageRemotePath, packageLocalPath), + error = function(e) { +msg <- paste0("Fetch failed from ", mirror_url, ".") +message(msg) +TRUE + }) +if (fetchFail) { + message("Try the backup option.") + mirror_sites <- tryCatch(read.csv(mirror_url_csv()), + error = function(e) stop("No csv file found.")) + mirror_url <- mirror_sites$url[1] + packageRemotePath <- paste0(file.path(mirror_url, version, packageName), + ".tgz") + message(sprintf("Downloading from:\n- %s", packageRemotePath)) + tryCatch(download.file(packageRemotePath, packageLocalPath), + error = function(e)
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190500 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use +#' @param local_dir local directory that Spark is installed to +#' @return \code{install_spark} returns the local directory +#' where Spark is found or installed +#' @rdname install_spark +#' @name install_spark +#' @export +#' @examples +#'\dontrun{ +#' install_spark() +#'} +#' @note install_spark since 2.1.0 +install_spark <- function(hadoop_version = NULL, mirror_url = NULL, + local_dir = NULL) { + version <- paste0("spark-", packageVersion("SparkR")) + hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop()) + packageName <- ifelse(hadoop_version == "without", +paste0(version, "-bin-without-hadoop"), +paste0(version, "-bin-hadoop", hadoop_version)) + if (is.null(local_dir)) { +local_dir <- getOption("spark.install.dir", spark_cache_path()) + } else { +local_dir <- normalizePath(local_dir) + } + + packageLocalDir <- file.path(local_dir, packageName) + + + # can use dir.exists(packageLocalDir) under R 3.2.0 or later + if (!is.na(file.info(packageLocalDir)$isdir)) { +fmt <- "Spark %s for Hadoop %s has been installed." +msg <- sprintf(fmt, version, hadoop_version) +message(msg) +return(invisible(packageLocalDir)) + } + + packageLocalPath <- paste0(packageLocalDir, ".tgz") + tarExists <- file.exists(packageLocalPath) + + if (tarExists) { +message("Tar file found. Installing...") + } else { +if (is.null(mirror_url)) { + message("Remote URL not provided. Use Apache default.") + mirror_url <- mirror_url_default() +} + +version <- "spark-2.0.0-rc4-bin" +# When 2.0 released, remove the above line and +# change spark-releases to spark in the statement below +packageRemotePath <- paste0( + file.path(mirror_url, "spark-releases", version, packageName), ".tgz") +fmt <- paste("Installing Spark %s for Hadoop %s.", + "Downloading from:\n- %s", + "Installing to:\n- %s", sep = "\n") +msg <- sprintf(fmt, version, hadoop_version, packageRemotePath, + packageLocalDir) +message(msg) + +fetchFail <- tryCatch(download.file(packageRemotePath, packageLocalPath), + error = function(e) { +msg <- paste0("Fetch failed from ", mirror_url, ".") +message(msg) +TRUE + }) +if (fetchFail) { + message("Try the backup option.") + mirror_sites <- tryCatch(read.csv(mirror_url_csv()), + error = function(e) stop("No csv file found.")) + mirror_url <- mirror_sites$url[1] + packageRemotePath <- paste0(file.path(mirror_url, version, packageName), + ".tgz") + message(sprintf("Downloading from:\n- %s", packageRemotePath)) + tryCatch(download.file(packageRemotePath, packageLocalPath), + error = function(e)
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190491 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use +#' @param local_dir local directory that Spark is installed to +#' @return \code{install_spark} returns the local directory +#' where Spark is found or installed +#' @rdname install_spark +#' @name install_spark +#' @export +#' @examples +#'\dontrun{ +#' install_spark() +#'} +#' @note install_spark since 2.1.0 +install_spark <- function(hadoop_version = NULL, mirror_url = NULL, + local_dir = NULL) { + version <- paste0("spark-", packageVersion("SparkR")) + hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop()) + packageName <- ifelse(hadoop_version == "without", +paste0(version, "-bin-without-hadoop"), +paste0(version, "-bin-hadoop", hadoop_version)) + if (is.null(local_dir)) { +local_dir <- getOption("spark.install.dir", spark_cache_path()) + } else { +local_dir <- normalizePath(local_dir) + } + + packageLocalDir <- file.path(local_dir, packageName) + + + # can use dir.exists(packageLocalDir) under R 3.2.0 or later + if (!is.na(file.info(packageLocalDir)$isdir)) { +fmt <- "Spark %s for Hadoop %s has been installed." --- End diff -- There should be an option to force re-install Spark in case the local directory or file is corrupted. And please mention that in the message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190465 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if --- End diff -- Is `install_spark` consistent with SparkR naming convention? We only used underscores in wrapped SQL functions. Shall we use `install.spark`? cc: @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190467 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, --- End diff -- * `hadoop_version` => `hadoopVersion` (camelCase for params) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190484 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use +#' @param local_dir local directory that Spark is installed to +#' @return \code{install_spark} returns the local directory +#' where Spark is found or installed +#' @rdname install_spark +#' @name install_spark +#' @export +#' @examples +#'\dontrun{ +#' install_spark() +#'} +#' @note install_spark since 2.1.0 +install_spark <- function(hadoop_version = NULL, mirror_url = NULL, + local_dir = NULL) { + version <- paste0("spark-", packageVersion("SparkR")) + hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop()) + packageName <- ifelse(hadoop_version == "without", +paste0(version, "-bin-without-hadoop"), +paste0(version, "-bin-hadoop", hadoop_version)) + if (is.null(local_dir)) { +local_dir <- getOption("spark.install.dir", spark_cache_path()) --- End diff -- Where is `spark.install.dir` documented? If this is only used by SparkR, we should have a more specific config name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190475 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use --- End diff -- * `mirrorUrl` * mention `the directory layout should follow Apache mirrors (http://www.apache.org/dyn/closer.lua/spark/)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190505 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use +#' @param local_dir local directory that Spark is installed to +#' @return \code{install_spark} returns the local directory +#' where Spark is found or installed +#' @rdname install_spark +#' @name install_spark +#' @export +#' @examples +#'\dontrun{ +#' install_spark() +#'} +#' @note install_spark since 2.1.0 +install_spark <- function(hadoop_version = NULL, mirror_url = NULL, + local_dir = NULL) { + version <- paste0("spark-", packageVersion("SparkR")) + hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop()) + packageName <- ifelse(hadoop_version == "without", +paste0(version, "-bin-without-hadoop"), +paste0(version, "-bin-hadoop", hadoop_version)) + if (is.null(local_dir)) { +local_dir <- getOption("spark.install.dir", spark_cache_path()) + } else { +local_dir <- normalizePath(local_dir) + } + + packageLocalDir <- file.path(local_dir, packageName) + + + # can use dir.exists(packageLocalDir) under R 3.2.0 or later + if (!is.na(file.info(packageLocalDir)$isdir)) { +fmt <- "Spark %s for Hadoop %s has been installed." +msg <- sprintf(fmt, version, hadoop_version) +message(msg) +return(invisible(packageLocalDir)) + } + + packageLocalPath <- paste0(packageLocalDir, ".tgz") + tarExists <- file.exists(packageLocalPath) + + if (tarExists) { +message("Tar file found. Installing...") + } else { +if (is.null(mirror_url)) { + message("Remote URL not provided. Use Apache default.") + mirror_url <- mirror_url_default() +} + +version <- "spark-2.0.0-rc4-bin" +# When 2.0 released, remove the above line and +# change spark-releases to spark in the statement below +packageRemotePath <- paste0( + file.path(mirror_url, "spark-releases", version, packageName), ".tgz") +fmt <- paste("Installing Spark %s for Hadoop %s.", + "Downloading from:\n- %s", + "Installing to:\n- %s", sep = "\n") +msg <- sprintf(fmt, version, hadoop_version, packageRemotePath, + packageLocalDir) +message(msg) + +fetchFail <- tryCatch(download.file(packageRemotePath, packageLocalPath), + error = function(e) { +msg <- paste0("Fetch failed from ", mirror_url, ".") +message(msg) +TRUE + }) +if (fetchFail) { + message("Try the backup option.") + mirror_sites <- tryCatch(read.csv(mirror_url_csv()), + error = function(e) stop("No csv file found.")) + mirror_url <- mirror_sites$url[1] + packageRemotePath <- paste0(file.path(mirror_url, version, packageName), + ".tgz") + message(sprintf("Downloading from:\n- %s", packageRemotePath)) + tryCatch(download.file(packageRemotePath, packageLocalPath), + error = function(e)
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190480 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use +#' @param local_dir local directory that Spark is installed to +#' @return \code{install_spark} returns the local directory +#' where Spark is found or installed +#' @rdname install_spark +#' @name install_spark +#' @export +#' @examples +#'\dontrun{ +#' install_spark() +#'} +#' @note install_spark since 2.1.0 +install_spark <- function(hadoop_version = NULL, mirror_url = NULL, + local_dir = NULL) { + version <- paste0("spark-", packageVersion("SparkR")) --- End diff -- Does it mean that we can only publish SparkR with released Spark versions? Then how to make patched releases, say "2.0.0-1"? Can we overwrite an existing release on CRAN? cc: @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190496 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can +#' specify a desired Hadoop version, the remote site, and the directory where +#' the package is installed locally. +#' +#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6, +#'2.7 (default) and without +#' @param mirror_url the base URL of the repositories to use +#' @param local_dir local directory that Spark is installed to +#' @return \code{install_spark} returns the local directory +#' where Spark is found or installed +#' @rdname install_spark +#' @name install_spark +#' @export +#' @examples +#'\dontrun{ +#' install_spark() +#'} +#' @note install_spark since 2.1.0 +install_spark <- function(hadoop_version = NULL, mirror_url = NULL, + local_dir = NULL) { + version <- paste0("spark-", packageVersion("SparkR")) + hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop()) + packageName <- ifelse(hadoop_version == "without", +paste0(version, "-bin-without-hadoop"), +paste0(version, "-bin-hadoop", hadoop_version)) + if (is.null(local_dir)) { +local_dir <- getOption("spark.install.dir", spark_cache_path()) + } else { +local_dir <- normalizePath(local_dir) + } + + packageLocalDir <- file.path(local_dir, packageName) + + + # can use dir.exists(packageLocalDir) under R 3.2.0 or later + if (!is.na(file.info(packageLocalDir)$isdir)) { +fmt <- "Spark %s for Hadoop %s has been installed." +msg <- sprintf(fmt, version, hadoop_version) +message(msg) +return(invisible(packageLocalDir)) + } + + packageLocalPath <- paste0(packageLocalDir, ".tgz") + tarExists <- file.exists(packageLocalPath) + + if (tarExists) { +message("Tar file found. Installing...") + } else { +if (is.null(mirror_url)) { + message("Remote URL not provided. Use Apache default.") + mirror_url <- mirror_url_default() +} + +version <- "spark-2.0.0-rc4-bin" +# When 2.0 released, remove the above line and +# change spark-releases to spark in the statement below +packageRemotePath <- paste0( + file.path(mirror_url, "spark-releases", version, packageName), ".tgz") +fmt <- paste("Installing Spark %s for Hadoop %s.", + "Downloading from:\n- %s", + "Installing to:\n- %s", sep = "\n") +msg <- sprintf(fmt, version, hadoop_version, packageRemotePath, + packageLocalDir) +message(msg) + +fetchFail <- tryCatch(download.file(packageRemotePath, packageLocalPath), + error = function(e) { +msg <- paste0("Fetch failed from ", mirror_url, ".") +message(msg) +TRUE + }) +if (fetchFail) { + message("Try the backup option.") + mirror_sites <- tryCatch(read.csv(mirror_url_csv()), + error = function(e) stop("No csv file found.")) + mirror_url <- mirror_sites$url[1] + packageRemotePath <- paste0(file.path(mirror_url, version, packageName), + ".tgz") + message(sprintf("Downloading from:\n- %s", packageRemotePath)) + tryCatch(download.file(packageRemotePath, packageLocalPath), + error = function(e)
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190466 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory +#' +#' \code{install_spark} downloads and installs Spark to local directory if +#' it is not found. The Spark version we use is 2.0.0 (preview). Users can --- End diff -- * `2.0.0 (preview)` => The Spark version is the same as the SparkR version. (Otherwise we need to update this doc for each Spark release.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r72190463 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,155 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Functions to install Spark in case the user directly downloads SparkR +# from CRAN. + +#' Download and Install Spark Core to Local Directory --- End diff -- * `Spark Core` -> `Apache Spark`. This downloads the full distribution. * `Local Directory` -> `a Local Directory` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14362: [SPARK-16730][SQL] Implement function aliases for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14362#discussion_r72190201 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteFunctionAliases.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.Cast +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types._ + +/** + * An analyzer rule that handles function aliases. + */ +object SubstituteFunctionAliases extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions { +// SPARK-16730: The following functions are aliases for cast in Hive. +case u: UnresolvedFunction +if u.name.database.isEmpty && u.children.size == 1 && !u.isDistinct => + u.name.funcName.toLowerCase match { +case "boolean" => Cast(u.children.head, BooleanType) --- End diff -- can we use `FunctionRegister` to handle these? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14323 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62863/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14323 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14323 **[Test build #62863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62863/consoleFull)** for PR 14323 at commit [`c33bb62`](https://github.com/apache/spark/commit/c33bb62a4565bbcf3b13ea5e232180a577d79455). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #36: Added a unit test for PairRDDFunctions.lookup
Github user databricks-jenkins commented on the issue: https://github.com/apache/spark/pull/36 **[Test build #69 has finished](https://jenkins.test.databricks.com/job/spark-pull-request-builder/69/consoleFull)** for PR 36 at commit [`306c0f8`](https://github.com/apache/spark/commit/306c0f8c10e04995d9a9cffd3bda5383b65e34ac). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class ByteCodeParserException(message: String) extends ClosureTranslationException(message, null)` * ` class UnsupportedOpcodeException(` * ` sealed trait Node ` * ` sealed trait BinaryNode extends Node ` * ` sealed trait UnaryNode extends Node ` * ` sealed trait NullaryNode extends Node ` * ` case class Constant[T: ClassTag](value: T) extends NullaryNode ` * ` case class Argument(dataType: Type) extends NullaryNode ` * ` case class This(dataType: Type) extends NullaryNode ` * ` case class If(condition: Node, left: Node, right: Node, dataType: Type) extends BinaryNode` * ` case class FunctionCall(` * ` case class Static(clazz: String, name: String, dataType: Type) extends NullaryNode` * ` case class Cast(node: Node, dataType: Type) extends UnaryNode` * ` case class Arithmetic(` * `class ByteCodeParser ` * `class ClosureTranslationException(` * `class ExpressionGenerator ` * ` throw new ClosureTranslationException(\"ExpressionGenerator only support case class or \" +` * ` case class Field(` * ` case class NPEOnNull(` * `case class TranslateClosureOptimizerRule(conf: CatalystConf) extends Rule[LogicalPlan] ` * ` case class Parent(child: LogicalPlan) extends UnaryNode ` * `class TypeOps(dataType: Type) ` * ` class UnSupportedTypeException(dataType: Type)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14257: [SPARK-16621][SQL] Generate stable SQLs in SQLBuilder
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14257 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14257: [SPARK-16621][SQL] Generate stable SQLs in SQLBuilder
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14257 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62862/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14361: [TEST][STREAMING] Fix flaky Kafka rate controlling test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14361 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14361: [TEST][STREAMING] Fix flaky Kafka rate controlling test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14361 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62861/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14362: [SPARK-16730][SQL] Implement function aliases for...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14362#discussion_r72187892 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -200,24 +200,6 @@ class TypeCoercionSuite extends PlanTest { widenTest(ArrayType(IntegerType), StructType(Seq()), None) } - private def ruleTest(rule: Rule[LogicalPlan], initial: Expression, transformed: Expression) { --- End diff -- I moved this into PlanTest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14257: [SPARK-16621][SQL] Generate stable SQLs in SQLBuilder
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14257 **[Test build #62862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62862/consoleFull)** for PR 14257 at commit [`16b0f49`](https://github.com/apache/spark/commit/16b0f4908fe18c04830e1a972bc1874c9a169d4a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14361: [TEST][STREAMING] Fix flaky Kafka rate controlling test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14361 **[Test build #62861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62861/consoleFull)** for PR 14361 at commit [`52b5a20`](https://github.com/apache/spark/commit/52b5a209eab2b11565e0e9403b35b5eae6429e53). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14362: [SPARK-16730][SQL] Implement function aliases for type c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14362 **[Test build #62869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62869/consoleFull)** for PR 14362 at commit [`37b7127`](https://github.com/apache/spark/commit/37b7127cdbf93334daa05dc5f48715b78966d032). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14362: [SPARK-16730][SQL] Implement function aliases for type c...
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/14362 @cloud-fan and @hvanhovell can you take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14362: [SPARK-16730][SQL] Implement function aliases for...
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/14362#discussion_r72187942 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteFunctionAliases.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.expressions.Cast +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.types._ + +/** + * An analyzer rule that handles function aliases. + */ +object SubstituteFunctionAliases extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions { +// SPARK-16730: The following functions are aliases for cast in Hive. +case u: UnresolvedFunction +if u.name.database.isEmpty && u.children.size == 1 && !u.isDistinct => + u.name.funcName.toLowerCase match { +case "boolean" => Cast(u.children.head, BooleanType) +case "tinyint" => Cast(u.children.head, ByteType) +case "smallint" => Cast(u.children.head, ShortType) +case "int" => Cast(u.children.head, IntegerType) +case "bigint" => Cast(u.children.head, LongType) +case "float" => Cast(u.children.head, FloatType) +case "double" => Cast(u.children.head, DoubleType) +case "decimal" => Cast(u.children.head, DecimalType.USER_DEFAULT) --- End diff -- I'm using whatever cast as decimal is using here, but I think it is a bug to by default cast to USER_DEFAULT, which has scale = 0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14362: [SPARK-16730][SQL] Implement function aliases for...
GitHub user petermaxlee opened a pull request: https://github.com/apache/spark/pull/14362 [SPARK-16730][SQL] Implement function aliases for type casts ## What changes were proposed in this pull request? Spark 1.x supports using the Hive type name as function names for doing casts, e.g. ```sql SELECT int(1.0); SELECT string(2.0); ``` The above query would work in Spark 1.x because Spark 1.x fail back to Hive for unimplemented functions, and break in Spark 2.0 because the fall back was removed. This patch implements function aliases using an analyzer rule for the following cast functions: - boolean - tinyint - smallint - int - bigint - float - double - decimal - date - timestamp - binary - string ## How was this patch tested? Added unit tests for SubstituteFunctionAliases as well as end-to-end tests for SQLCompatibilityFunctionSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/petermaxlee/spark SPARK-16730 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14362.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14362 commit 37b7127cdbf93334daa05dc5f48715b78966d032 Author: petermaxleeDate: 2016-07-26T04:49:06Z [SPARK-16730][SQL] Implement function aliases for type casts --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #36: Added a unit test for PairRDDFunctions.lookup
Github user databricks-jenkins commented on the issue: https://github.com/apache/spark/pull/36 **[Test build #70 has started](https://jenkins.test.databricks.com/job/spark-pull-request-builder/70/consoleFull)** for PR 36 at commit [`54b968b`](https://github.com/apache/spark/commit/54b968bd158483ec771e912c32c3faf1e6c62476). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14358: [SPARK-16729][SQL] Throw analysis exception for invalid ...
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/14358 @cloud-fan can you take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14323 **[Test build #62868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62868/consoleFull)** for PR 14323 at commit [`fb0f9a8`](https://github.com/apache/spark/commit/fb0f9a8cd54cca299b2dac34d65848c2f86e7bb4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #36: Added a unit test for PairRDDFunctions.lookup
Github user databricks-jenkins commented on the issue: https://github.com/apache/spark/pull/36 **[Test build #67 has finished](https://jenkins.test.databricks.com/job/spark-pull-request-builder/67/consoleFull)** for PR 36 at commit [`47f2be0`](https://github.com/apache/spark/commit/47f2be03097e0afebc6240e4b39ab55803a1e5d9). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class ByteCodeParserException(message: String) extends ClosureTranslationException(message, null)` * ` class UnsupportedOpcodeException(` * ` sealed trait Node ` * ` sealed trait BinaryNode extends Node ` * ` sealed trait UnaryNode extends Node ` * ` sealed trait NullaryNode extends Node ` * ` case class Constant[T: ClassTag](value: T) extends NullaryNode ` * ` case class Argument(dataType: Type) extends NullaryNode ` * ` case class This(dataType: Type) extends NullaryNode ` * ` case class If(condition: Node, left: Node, right: Node, dataType: Type) extends BinaryNode` * ` case class FunctionCall(` * ` case class Static(clazz: String, name: String, dataType: Type) extends NullaryNode` * ` case class Cast(node: Node, dataType: Type) extends UnaryNode` * ` case class Arithmetic(` * `class ByteCodeParser ` * `class ClosureTranslationException(` * `class ExpressionGenerator ` * ` throw new ClosureTranslationException(\"ExpressionGenerator only support case class or \" +` * ` case class Field(` * ` case class NPEOnNull(` * `case class TranslateClosureOptimizerRule(conf: CatalystConf) extends Rule[LogicalPlan] ` * ` case class Parent(child: LogicalPlan) extends UnaryNode ` * `class TypeOps(dataType: Type) ` * ` class UnSupportedTypeException(dataType: Type)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14323 **[Test build #62867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62867/consoleFull)** for PR 14323 at commit [`ab4a1cf`](https://github.com/apache/spark/commit/ab4a1cf7d9a806fbb9046010f619258f8eb77c8b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14353#discussion_r72186040 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) extends Expression { override def foldable: Boolean = children.forall(_.foldable) - override def checkInputDataTypes(): TypeCheckResult = -TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), "function array") + override def checkInputDataTypes(): TypeCheckResult = { +if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) { + TypeCheckResult.TypeCheckSuccess --- End diff -- Is there anything to check more? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14353#discussion_r72186016 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) extends Expression { override def foldable: Boolean = children.forall(_.foldable) - override def checkInputDataTypes(): TypeCheckResult = -TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), "function array") + override def checkInputDataTypes(): TypeCheckResult = { +if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) { + TypeCheckResult.TypeCheckSuccess --- End diff -- In short, those are recognized correctly in the Analyzed Logical Plan. As a result, the codegen correctly writes it with the unified precision and scale. ``` == Analyzed Logical Plan == a[0]: decimal(3,3), a[1]: decimal(3,3) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14323: [SPARK-16675][SQL] Avoid per-record type dispatch...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14323#discussion_r72185977 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -138,6 +138,79 @@ object JdbcUtils extends Logging { throw new IllegalArgumentException(s"Can't get JDBC type for ${dt.simpleString}")) } + // A `JDBCValueSetter` is responsible for converting and setting a value from `Row` into + // a field for `PreparedStatement`. The last argument `Int` means the index for the + // value to be set in the SQL statement and also used for the value in `Row`. + private type JDBCValueSetter = (PreparedStatement, Row, Int) => Unit --- End diff -- Fixed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14358: [SPARK-16729][SQL] Throw analysis exception for invalid ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14358 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14353#discussion_r72185905 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) extends Expression { override def foldable: Boolean = children.forall(_.foldable) - override def checkInputDataTypes(): TypeCheckResult = -TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), "function array") + override def checkInputDataTypes(): TypeCheckResult = { +if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) { + TypeCheckResult.TypeCheckSuccess --- End diff -- And Finally, the following is the codegen result. Please see the line 29. ```scala scala> sql("explain codegen select array(0.001, 0.02)[1]").collect().foreach(println) [Found 1 WholeStageCodegen subtrees. == Subtree 1 / 1 == *Project [0.02 AS array(0.001, 0.02)[1]#75] +- Scan OneRowRelation[] Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIterator(references); /* 003 */ } /* 004 */ /* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { /* 006 */ private Object[] references; /* 007 */ private scala.collection.Iterator inputadapter_input; /* 008 */ private UnsafeRow project_result; /* 009 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder; /* 010 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter project_rowWriter; /* 011 */ /* 012 */ public GeneratedIterator(Object[] references) { /* 013 */ this.references = references; /* 014 */ } /* 015 */ /* 016 */ public void init(int index, scala.collection.Iterator inputs[]) { /* 017 */ partitionIndex = index; /* 018 */ inputadapter_input = inputs[0]; /* 019 */ project_result = new UnsafeRow(1); /* 020 */ this.project_holder = new org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 0); /* 021 */ this.project_rowWriter = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder, 1); /* 022 */ } /* 023 */ /* 024 */ protected void processNext() throws java.io.IOException { /* 025 */ while (inputadapter_input.hasNext()) { /* 026 */ InternalRow inputadapter_row = (InternalRow) inputadapter_input.next(); /* 027 */ Object project_obj = ((Expression) references[0]).eval(null); /* 028 */ Decimal project_value = (Decimal) project_obj; /* 029 */ project_rowWriter.write(0, project_value, 3, 3); /* 030 */ append(project_result); /* 031 */ if (shouldStop()) return; /* 032 */ } /* 033 */ } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14358: [SPARK-16729][SQL] Throw analysis exception for invalid ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14358 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62857/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14358: [SPARK-16729][SQL] Throw analysis exception for invalid ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14358 **[Test build #62857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62857/consoleFull)** for PR 14358 at commit [`5419b85`](https://github.com/apache/spark/commit/5419b859a7c8669122e8c65e88029a47556c36ee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14353#discussion_r72185594 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) extends Expression { override def foldable: Boolean = children.forall(_.foldable) - override def checkInputDataTypes(): TypeCheckResult = -TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), "function array") + override def checkInputDataTypes(): TypeCheckResult = { +if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) { + TypeCheckResult.TypeCheckSuccess --- End diff -- ```scala scala> sql("create table d1(a DECIMAL(3,2))") scala> sql("create table d2(a DECIMAL(2,1))") scala> sql("insert into d1 values(1.0)") scala> sql("insert into d2 values(1.0)") scala> sql("select * from d1, d2").show() ++---+ | a| a| ++---+ |1.00|1.0| ++---+ scala> sql("select array(d1.a,d2.a),array(d2.a,d1.a),* from d1, d2") res5: org.apache.spark.sql.DataFrame = [array(a, a): array, array(a, a): array ... 2 more fields] scala> sql("select array(d1.a,d2.a),array(d2.a,d1.a),* from d1, d2").show() ++++---+ | array(a, a)| array(a, a)| a| a| ++++---+ |[1.00, 1.00]|[1.00, 1.00]|1.00|1.0| ++++---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12059: [SPARK-14265] Get attempId of stage and transfer it to w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12059 **[Test build #62866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62866/consoleFull)** for PR 12059 at commit [`71bebd6`](https://github.com/apache/spark/commit/71bebd693db9874a941bbb7b8f0fa528ddfd506b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14361: [TEST][STREAMING] Fix flaky Kafka rate controlling test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14361 **[Test build #3192 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3192/consoleFull)** for PR 14361 at commit [`52b5a20`](https://github.com/apache/spark/commit/52b5a209eab2b11565e0e9403b35b5eae6429e53). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14353#discussion_r72185372 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) extends Expression { override def foldable: Boolean = children.forall(_.foldable) - override def checkInputDataTypes(): TypeCheckResult = -TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), "function array") + override def checkInputDataTypes(): TypeCheckResult = { +if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) { + TypeCheckResult.TypeCheckSuccess --- End diff -- Hi, @yhuai . I check the following. ```scala scala> sql("select a[0], a[1] from (select array(0.001, 0.02) a) T") res4: org.apache.spark.sql.DataFrame = [a[0]: decimal(3,3), a[1]: decimal(3,3)] scala> sql("select a[0], a[1] from (select array(0.001, 0.02) a) T").show() +-+-+ | a[0]| a[1]| +-+-+ |0.001|0.020| +-+-+ scala> sql("select a[0], a[1] from (select array(0.001, 0.02) a) T").explain(true) == Parsed Logical Plan == 'Project [unresolvedalias('a[0], None), unresolvedalias('a[1], None)] +- 'SubqueryAlias T +- 'Project ['array(0.001, 0.02) AS a#54] +- OneRowRelation$ == Analyzed Logical Plan == a[0]: decimal(3,3), a[1]: decimal(3,3) Project [a#54[0] AS a[0]#61, a#54[1] AS a[1]#62] +- SubqueryAlias T +- Project [array(0.001, 0.02) AS a#54] +- OneRowRelation$ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14361: [TEST][STREAMING] Fix flaky Kafka rate controlling test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14361 **[Test build #3191 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3191/consoleFull)** for PR 14361 at commit [`52b5a20`](https://github.com/apache/spark/commit/52b5a209eab2b11565e0e9403b35b5eae6429e53). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14361: [TEST][STREAMING] Fix flaky Kafka rate controlling test
Github user tdas commented on the issue: https://github.com/apache/spark/pull/14361 @koeninger Can you take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14323: [SPARK-16675][SQL] Avoid per-record type dispatch...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14323#discussion_r72185133 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -138,6 +138,79 @@ object JdbcUtils extends Logging { throw new IllegalArgumentException(s"Can't get JDBC type for ${dt.simpleString}")) } + // A `JDBCValueSetter` is responsible for converting and setting a value from `Row` into + // a field for `PreparedStatement`. The last argument `Int` means the index for the + // value to be set in the SQL statement and also used for the value in `Row`. + private type JDBCValueSetter = (PreparedStatement, Row, Int) => Unit --- End diff -- please rename the read path too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14327: [SPARK-16686][SQL] Remove PushProjectThroughSample since...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14327 Thanks for reviewing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14327: [SPARK-16686][SQL] Remove PushProjectThroughSample since...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14327 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14327: [SPARK-16686][SQL] Remove PushProjectThroughSampl...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14327 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14357 It's my pleasure. See you later around Apache Spark. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14357: [SPARK-16727][SparkR] Fix expected test output of...
Github user junyangq closed the pull request at: https://github.com/apache/spark/pull/14357 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...
Github user junyangq commented on the issue: https://github.com/apache/spark/pull/14357 I see. Thank you for pointing that out :) I'll close the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] [SPARK-16721] [SQL] F...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14284 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14344: [SPARK-16706][SQL] support java map in encoder
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14344 **[Test build #62864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62864/consoleFull)** for PR 14344 at commit [`b4dad74`](https://github.com/apache/spark/commit/b4dad7433266caba0b7e9d6b0a88c5a9c3e3afa7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14302: [SPARK-16663][SQL] desc table should be consistent betwe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14302 **[Test build #62865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62865/consoleFull)** for PR 14302 at commit [`56338b4`](https://github.com/apache/spark/commit/56338b4a20c3cbdbacbe51e17ad492ea0d6f6ecf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14284: [SPARK-16633] [SPARK-16642] [SPARK-16721] [SQL] Fixes th...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/14284 Thanks for review. I am merging this to master and branch 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r72184495 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -92,6 +92,36 @@ object PhysicalOperation extends PredicateHelper { .map(Alias(_, a.name)(a.exprId, a.qualifier, isGenerated = a.isGenerated)).getOrElse(a) } } + + /** + * Drop the non-partition key expression from the given expression, to optimize the + * partition pruning. For instances: (We assume part1 & part2 are the partition keys): + * (part1 == 1 and a > 3) or (part2 == 2 and a < 5) ==> (part1 == 1 or part1 == 2) + * (part1 == 1 and a > 3) or (a < 100) => None + * (a > 100 && b < 100) or (part1 = 10) => None + * (a > 100 && b < 100 and part1 = 10) or (part1 == 2) => (part1 = 10 or part1 == 2) + * @param predicate The given expression + * @param partitionKeyIds partition keys in attribute set + * @return + */ + def extractPartitionKeyExpression( +predicate: Expression, partitionKeyIds: AttributeSet): Option[Expression] = { +// drop the non-partition key expression in conjunction of the expression tree +val additionalPartPredicate = predicate transformUp { --- End diff -- I can keep updating the code if we are agreed for approach, otherwise, I think we'd better close this PR for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/13585#discussion_r72184424 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -92,6 +92,36 @@ object PhysicalOperation extends PredicateHelper { .map(Alias(_, a.name)(a.exprId, a.qualifier, isGenerated = a.isGenerated)).getOrElse(a) } } + + /** + * Drop the non-partition key expression from the given expression, to optimize the + * partition pruning. For instances: (We assume part1 & part2 are the partition keys): + * (part1 == 1 and a > 3) or (part2 == 2 and a < 5) ==> (part1 == 1 or part1 == 2) + * (part1 == 1 and a > 3) or (a < 100) => None + * (a > 100 && b < 100) or (part1 = 10) => None + * (a > 100 && b < 100 and part1 = 10) or (part1 == 2) => (part1 = 10 or part1 == 2) + * @param predicate The given expression + * @param partitionKeyIds partition keys in attribute set + * @return + */ + def extractPartitionKeyExpression( +predicate: Expression, partitionKeyIds: AttributeSet): Option[Expression] = { +// drop the non-partition key expression in conjunction of the expression tree +val additionalPartPredicate = predicate transformUp { --- End diff -- This PR may have critical bugs, when user implements a UDF which logically like the `NOT` operator in the partition filter expression. Probably we need a white list the built-in UDFs. @yhuai @liancheng @yangw1234 @clockfly any comments on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14357 If you use `-fdx`, it removes IDE like Intellij settings, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14357 Never. I did that in order to make it sure. I don't do it frequently. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...
Github user junyangq commented on the issue: https://github.com/apache/spark/pull/14357 Yeah sure, but just wondering if the clean and additional arguments something we should normally do? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14357 Actually, you can see the Jenkins log, too. There is no problem with the current R testsuite. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62858/consoleFull FYI, I'm using JDK 1.8.0_102 and Jenkins is using JDK 1.7.x. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14357 I think you did something wrong. :) Could you close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14356: [SPARK-16724] Expose DefinedByConstructorParams
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14356 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...
Github user junyangq commented on the issue: https://github.com/apache/spark/pull/14357 Hmm... It doesn't show the name column either. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14357 At the most recent master build, I did the following things and all tests are passed. ``` $ git clean -fdx $ ./build/sbt -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive-thriftserver -Phive -Psparkr package streaming-kafka-0-8-assembly/assembly streaming-flume-assembly/assembly streaming-kinesis-asl-assembly/assembly $ R/install-dev.sh $ R/run-tests.sh ... DONE === Tests passed. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14302: [SPARK-16663][SQL] desc table should be consistent betwe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14302 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62859/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14302: [SPARK-16663][SQL] desc table should be consistent betwe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14302 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14302: [SPARK-16663][SQL] desc table should be consistent betwe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14302 **[Test build #62859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62859/consoleFull)** for PR 14302 at commit [`5a5ddba`](https://github.com/apache/spark/commit/5a5ddbafe069d579670525e5d58f7676dcba1e28). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14356: [SPARK-16724] Expose DefinedByConstructorParams
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14356 Merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14327: [SPARK-16686][SQL] Remove PushProjectThroughSample since...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14327 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14327: [SPARK-16686][SQL] Remove PushProjectThroughSample since...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14327 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62856/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14327: [SPARK-16686][SQL] Remove PushProjectThroughSample since...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14327 **[Test build #62856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62856/consoleFull)** for PR 14327 at commit [`3e134f1`](https://github.com/apache/spark/commit/3e134f18f5b3fc678d95e2bf10997185ab6dd6e9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14357 @junyangq . Currently, you are at the most recent master build, right? And, SparkR does not show the `name` column as a result of `describe` command. I'm wondering the result of `spark-shell` on your systems. Could you run the following command in spark-shell? ```scala scala> spark.read.json("examples/src/main/resources/people.json").describe().show() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14284: [SPARK-16633] [SPARK-16642] [SPARK-16721] [SQL] Fixes th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14284 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14284: [SPARK-16633] [SPARK-16642] [SPARK-16721] [SQL] Fixes th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14284 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62854/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14284: [SPARK-16633] [SPARK-16642] [SPARK-16721] [SQL] Fixes th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14284 **[Test build #62854 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62854/consoleFull)** for PR 14284 at commit [`ff3029e`](https://github.com/apache/spark/commit/ff3029e3db2214d7c51ea3ea866becf441fddac4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14353#discussion_r72182807 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) extends Expression { override def foldable: Boolean = children.forall(_.foldable) - override def checkInputDataTypes(): TypeCheckResult = -TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), "function array") + override def checkInputDataTypes(): TypeCheckResult = { +if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) { + TypeCheckResult.TypeCheckSuccess --- End diff -- Thank you for review, @yhuai . I see. I'll check that more. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14357 Oh, let me try this time. Thank you for double-checking, @junyangq . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r72182414 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1774,6 +1775,49 @@ class Analyzer( } /** + * Substitute Hints. + * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters. + * + * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations` + * rule is applied. Here are two reasons. + * - To support `MetastoreRelation` in Hive module. + * - To reduce the effect of `Hint` on the other rules. + * + * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan. + * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here. + */ + object SubstituteHints extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case logical: LogicalPlan => logical transformDown { +case h @ Hint(name, parameters, child) +if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) => + var resolvedChild = child + + for (table <- parameters) { +var stop = false +resolvedChild = resolvedChild.transformDown { --- End diff -- Oh, I see. That's the reason why I can not find the clear logic. Thank you so much, @yhuai ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14353#discussion_r72182390 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) extends Expression { override def foldable: Boolean = children.forall(_.foldable) - override def checkInputDataTypes(): TypeCheckResult = -TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), "function array") + override def checkInputDataTypes(): TypeCheckResult = { +if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) { + TypeCheckResult.TypeCheckSuccess --- End diff -- For example, if we access a single element, its data type actually may not be the one shown as the array's datatype. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14257: [SPARK-16621][SQL] Generate stable SQLs in SQLBui...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14257#discussion_r72182312 --- Diff: sql/hive/src/test/resources/sqlgen/predicate_subquery.sql --- @@ -1,4 +1,4 @@ -- This file is automatically generated by LogicalPlanToSQLSuite. select * from t1 b where exists (select * from t1 a) -SELECT `gen_attr` AS `a` FROM (SELECT `gen_attr` FROM (SELECT `a` AS `gen_attr` FROM `default`.`t1`) AS gen_subquery_0 WHERE EXISTS(SELECT `gen_attr` AS `a` FROM ((SELECT `gen_attr` FROM (SELECT `a` AS `gen_attr` FROM `default`.`t1`) AS gen_subquery_0) AS gen_subquery_1) AS gen_subquery_1)) AS b +SELECT `gen_attr_0` AS `a` FROM (SELECT `gen_attr_0` FROM (SELECT `a` AS `gen_attr_0` FROM `default`.`t1`) AS gen_subquery_0 WHERE EXISTS(SELECT `gen_attr_1` AS `a` FROM ((SELECT `gen_attr_1` FROM (SELECT `a` AS `gen_attr_1` FROM `default`.`t1`) AS gen_subquery_2) AS gen_subquery_1) AS gen_subquery_1)) AS b --- End diff -- For this query, I reformatted like the following. Here, `gen_subquery_xxx`s are generated uniquely. But, `gen_subquery_1` is repeated due to the added nested subquery alias. Please note that it's not a repetition by duplicated ID and happens as a direct double nesting. I think it's okay. ```sql SELECT `gen_attr_0` AS `a` FROM (SELECT `gen_attr_0` FROM (SELECT `a` AS `gen_attr_0` FROM `default`.`t1` ) AS gen_subquery_0 WHERE EXISTS(SELECT `gen_attr_1` AS `a` FROM ( (SELECT `gen_attr_1` FROM (SELECT `a` AS `gen_attr_1` FROM `default`.`t1` ) AS gen_subquery_2 ) AS gen_subquery_1 ) AS gen_subquery_1 ) ) AS b ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14353#discussion_r72182316 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) extends Expression { override def foldable: Boolean = children.forall(_.foldable) - override def checkInputDataTypes(): TypeCheckResult = -TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), "function array") + override def checkInputDataTypes(): TypeCheckResult = { +if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) { + TypeCheckResult.TypeCheckSuccess --- End diff -- I think we cannot just make the check pass. We need to need to actually cast those element to the same prevision and scale. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14359 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62858/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14359 **[Test build #62858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62858/consoleFull)** for PR 14359 at commit [`6fcfb4b`](https://github.com/apache/spark/commit/6fcfb4b0e158ba86371ad4d0728490f3a8e7caeb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14323 **[Test build #62863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62863/consoleFull)** for PR 14323 at commit [`c33bb62`](https://github.com/apache/spark/commit/c33bb62a4565bbcf3b13ea5e232180a577d79455). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r72181762 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1774,6 +1775,49 @@ class Analyzer( } /** + * Substitute Hints. + * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the given name parameters. + * + * This rule substitutes `UnresolvedRelation`s in `Substitute` batch before `ResolveRelations` + * rule is applied. Here are two reasons. + * - To support `MetastoreRelation` in Hive module. + * - To reduce the effect of `Hint` on the other rules. + * + * After this rule, it is guaranteed that there exists no unknown `Hint` in the plan. + * All new `Hint`s should be transformed into concrete Hint classes `BroadcastHint` here. + */ + object SubstituteHints extends Rule[LogicalPlan] { +def apply(plan: LogicalPlan): LogicalPlan = plan transform { + case logical: LogicalPlan => logical transformDown { +case h @ Hint(name, parameters, child) +if Seq("BROADCAST", "BROADCASTJOIN", "MAPJOIN").contains(name.toUpperCase) => + var resolvedChild = child + + for (table <- parameters) { +var stop = false +resolvedChild = resolvedChild.transformDown { --- End diff -- You probably want to look at older versions of Hive. For example, 0.10. After https://issues.apache.org/jira/browse/HIVE-3784, Hive does not really use map join hint. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #36: Added a unit test for PairRDDFunctions.lookup
Github user databricks-jenkins commented on the issue: https://github.com/apache/spark/pull/36 **[Test build #69 has started](https://jenkins.test.databricks.com/job/spark-pull-request-builder/69/consoleFull)** for PR 36 at commit [`306c0f8`](https://github.com/apache/spark/commit/306c0f8c10e04995d9a9cffd3bda5383b65e34ac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...
Github user junyangq commented on the issue: https://github.com/apache/spark/pull/14357 I merged the most recent master branch, rebuilt and installed the package, but the test failed at the same place. @dongjoon-hyun --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org