date:20160725

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190554
  
--- Diff: R/pkg/R/utils.R ---
@@ -689,3 +689,7 @@ getSparkContext <- function() {
   sc <- get(".sparkRjsc", envir = .sparkREnv)
   sc
 }
+
+master_is_local <- function(master) {
+  grepl("^local(\\[[0-9\\*]*\\])?$", master, perl = TRUE)
--- End diff --

`"^local(\\[[0-9\\*]+\\])?$"` (`+` instead of `*`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190550
  
--- Diff: R/pkg/R/utils.R ---
@@ -689,3 +689,7 @@ getSparkContext <- function() {
   sc <- get(".sparkRjsc", envir = .sparkREnv)
   sc
 }
+
+master_is_local <- function(master) {
--- End diff --

`is_master_local`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190498
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
+#' @param local_dir local directory that Spark is installed to
+#' @return \code{install_spark} returns the local directory 
+#' where Spark is found or installed
+#' @rdname install_spark
+#' @name install_spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install_spark()
+#'}
+#' @note install_spark since 2.1.0
+install_spark <- function(hadoop_version = NULL, mirror_url = NULL,
+  local_dir = NULL) {
+  version <- paste0("spark-", packageVersion("SparkR"))
+  hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop())
+  packageName <- ifelse(hadoop_version == "without",
+paste0(version, "-bin-without-hadoop"),
+paste0(version, "-bin-hadoop", hadoop_version))
+  if (is.null(local_dir)) {
+local_dir <- getOption("spark.install.dir", spark_cache_path())
+  } else {
+local_dir <- normalizePath(local_dir)
+  }
+
+  packageLocalDir <- file.path(local_dir, packageName)
+
+
+  # can use dir.exists(packageLocalDir) under R 3.2.0 or later
+  if (!is.na(file.info(packageLocalDir)$isdir)) {
+fmt <- "Spark %s for Hadoop %s has been installed."
+msg <- sprintf(fmt, version, hadoop_version)
+message(msg)
+return(invisible(packageLocalDir))
+  }
+
+  packageLocalPath <- paste0(packageLocalDir, ".tgz")
+  tarExists <- file.exists(packageLocalPath)
+
+  if (tarExists) {
+message("Tar file found. Installing...")
+  } else {
+if (is.null(mirror_url)) {
+  message("Remote URL not provided. Use Apache default.")
+  mirror_url <- mirror_url_default()
+}
+
+version <- "spark-2.0.0-rc4-bin"
+# When 2.0 released, remove the above line and
+# change spark-releases to spark in the statement below
+packageRemotePath <- paste0(
+  file.path(mirror_url, "spark-releases", version, packageName), 
".tgz")
+fmt <- paste("Installing Spark %s for Hadoop %s.",
+ "Downloading from:\n- %s",
+ "Installing to:\n- %s", sep = "\n")
+msg <- sprintf(fmt, version, hadoop_version, packageRemotePath,
+   packageLocalDir)
+message(msg)
+
+fetchFail <- tryCatch(download.file(packageRemotePath, 
packageLocalPath),
+  error = function(e) {
+msg <- paste0("Fetch failed from ", 
mirror_url, ".")
+message(msg)
+TRUE
+  })
+if (fetchFail) {
+  message("Try the backup option.")
+  mirror_sites <- tryCatch(read.csv(mirror_url_csv()),
+   error = function(e) stop("No csv file 
found."))
+  mirror_url <- mirror_sites$url[1]
+  packageRemotePath <- paste0(file.path(mirror_url, version, 
packageName),
+  ".tgz")
+  message(sprintf("Downloading from:\n- %s", packageRemotePath))
+  tryCatch(download.file(packageRemotePath, packageLocalPath),
+   error = function(e)

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190493
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
+#' @param local_dir local directory that Spark is installed to
+#' @return \code{install_spark} returns the local directory 
+#' where Spark is found or installed
+#' @rdname install_spark
+#' @name install_spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install_spark()
+#'}
+#' @note install_spark since 2.1.0
+install_spark <- function(hadoop_version = NULL, mirror_url = NULL,
+  local_dir = NULL) {
+  version <- paste0("spark-", packageVersion("SparkR"))
+  hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop())
+  packageName <- ifelse(hadoop_version == "without",
+paste0(version, "-bin-without-hadoop"),
+paste0(version, "-bin-hadoop", hadoop_version))
+  if (is.null(local_dir)) {
+local_dir <- getOption("spark.install.dir", spark_cache_path())
+  } else {
+local_dir <- normalizePath(local_dir)
+  }
+
+  packageLocalDir <- file.path(local_dir, packageName)
+
+
+  # can use dir.exists(packageLocalDir) under R 3.2.0 or later
+  if (!is.na(file.info(packageLocalDir)$isdir)) {
+fmt <- "Spark %s for Hadoop %s has been installed."
+msg <- sprintf(fmt, version, hadoop_version)
+message(msg)
+return(invisible(packageLocalDir))
+  }
+
+  packageLocalPath <- paste0(packageLocalDir, ".tgz")
+  tarExists <- file.exists(packageLocalPath)
+
+  if (tarExists) {
+message("Tar file found. Installing...")
+  } else {
+if (is.null(mirror_url)) {
+  message("Remote URL not provided. Use Apache default.")
+  mirror_url <- mirror_url_default()
+}
+
+version <- "spark-2.0.0-rc4-bin"
+# When 2.0 released, remove the above line and
+# change spark-releases to spark in the statement below
+packageRemotePath <- paste0(
+  file.path(mirror_url, "spark-releases", version, packageName), 
".tgz")
+fmt <- paste("Installing Spark %s for Hadoop %s.",
+ "Downloading from:\n- %s",
+ "Installing to:\n- %s", sep = "\n")
+msg <- sprintf(fmt, version, hadoop_version, packageRemotePath,
+   packageLocalDir)
+message(msg)
+
+fetchFail <- tryCatch(download.file(packageRemotePath, 
packageLocalPath),
+  error = function(e) {
+msg <- paste0("Fetch failed from ", 
mirror_url, ".")
+message(msg)
+TRUE
+  })
+if (fetchFail) {
+  message("Try the backup option.")
+  mirror_sites <- tryCatch(read.csv(mirror_url_csv()),
--- End diff --

Let's skip the CSV solution in this PR and use Apache mirrors as the 
default.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190477
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
+#' @param local_dir local directory that Spark is installed to
--- End diff --

* `localDir` (`cacheDir` might be more accurate)
* document default values for each OS


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190472
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
--- End diff --

* I would put `without` as the default to match the default download on 
http://spark.apache.org/downloads.html.
* Do not mention other Hadoop versions as they might change. Use a 
`@seealso` link to redirect users to the download page for available Hadoop 
versions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190503
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
+#' @param local_dir local directory that Spark is installed to
+#' @return \code{install_spark} returns the local directory 
+#' where Spark is found or installed
+#' @rdname install_spark
+#' @name install_spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install_spark()
+#'}
+#' @note install_spark since 2.1.0
+install_spark <- function(hadoop_version = NULL, mirror_url = NULL,
+  local_dir = NULL) {
+  version <- paste0("spark-", packageVersion("SparkR"))
+  hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop())
+  packageName <- ifelse(hadoop_version == "without",
+paste0(version, "-bin-without-hadoop"),
+paste0(version, "-bin-hadoop", hadoop_version))
+  if (is.null(local_dir)) {
+local_dir <- getOption("spark.install.dir", spark_cache_path())
+  } else {
+local_dir <- normalizePath(local_dir)
+  }
+
+  packageLocalDir <- file.path(local_dir, packageName)
+
+
+  # can use dir.exists(packageLocalDir) under R 3.2.0 or later
+  if (!is.na(file.info(packageLocalDir)$isdir)) {
+fmt <- "Spark %s for Hadoop %s has been installed."
+msg <- sprintf(fmt, version, hadoop_version)
+message(msg)
+return(invisible(packageLocalDir))
+  }
+
+  packageLocalPath <- paste0(packageLocalDir, ".tgz")
+  tarExists <- file.exists(packageLocalPath)
+
+  if (tarExists) {
+message("Tar file found. Installing...")
+  } else {
+if (is.null(mirror_url)) {
+  message("Remote URL not provided. Use Apache default.")
+  mirror_url <- mirror_url_default()
+}
+
+version <- "spark-2.0.0-rc4-bin"
+# When 2.0 released, remove the above line and
+# change spark-releases to spark in the statement below
+packageRemotePath <- paste0(
+  file.path(mirror_url, "spark-releases", version, packageName), 
".tgz")
+fmt <- paste("Installing Spark %s for Hadoop %s.",
+ "Downloading from:\n- %s",
+ "Installing to:\n- %s", sep = "\n")
+msg <- sprintf(fmt, version, hadoop_version, packageRemotePath,
+   packageLocalDir)
+message(msg)
+
+fetchFail <- tryCatch(download.file(packageRemotePath, 
packageLocalPath),
+  error = function(e) {
+msg <- paste0("Fetch failed from ", 
mirror_url, ".")
+message(msg)
+TRUE
+  })
+if (fetchFail) {
+  message("Try the backup option.")
+  mirror_sites <- tryCatch(read.csv(mirror_url_csv()),
+   error = function(e) stop("No csv file 
found."))
+  mirror_url <- mirror_sites$url[1]
+  packageRemotePath <- paste0(file.path(mirror_url, version, 
packageName),
+  ".tgz")
+  message(sprintf("Downloading from:\n- %s", packageRemotePath))
+  tryCatch(download.file(packageRemotePath, packageLocalPath),
+   error = function(e)

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190487
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
+#' @param local_dir local directory that Spark is installed to
+#' @return \code{install_spark} returns the local directory 
+#' where Spark is found or installed
+#' @rdname install_spark
+#' @name install_spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install_spark()
+#'}
+#' @note install_spark since 2.1.0
+install_spark <- function(hadoop_version = NULL, mirror_url = NULL,
+  local_dir = NULL) {
+  version <- paste0("spark-", packageVersion("SparkR"))
+  hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop())
+  packageName <- ifelse(hadoop_version == "without",
+paste0(version, "-bin-without-hadoop"),
+paste0(version, "-bin-hadoop", hadoop_version))
+  if (is.null(local_dir)) {
+local_dir <- getOption("spark.install.dir", spark_cache_path())
+  } else {
+local_dir <- normalizePath(local_dir)
+  }
+
+  packageLocalDir <- file.path(local_dir, packageName)
+
+
--- End diff --

remove extra empty line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190481
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
+#' @param local_dir local directory that Spark is installed to
+#' @return \code{install_spark} returns the local directory 
+#' where Spark is found or installed
+#' @rdname install_spark
+#' @name install_spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install_spark()
+#'}
+#' @note install_spark since 2.1.0
+install_spark <- function(hadoop_version = NULL, mirror_url = NULL,
+  local_dir = NULL) {
+  version <- paste0("spark-", packageVersion("SparkR"))
+  hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop())
--- End diff --

See my comment below on `supported_versions_hadoop`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190507
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
+#' @param local_dir local directory that Spark is installed to
+#' @return \code{install_spark} returns the local directory 
+#' where Spark is found or installed
+#' @rdname install_spark
+#' @name install_spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install_spark()
+#'}
+#' @note install_spark since 2.1.0
+install_spark <- function(hadoop_version = NULL, mirror_url = NULL,
+  local_dir = NULL) {
+  version <- paste0("spark-", packageVersion("SparkR"))
+  hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop())
+  packageName <- ifelse(hadoop_version == "without",
+paste0(version, "-bin-without-hadoop"),
+paste0(version, "-bin-hadoop", hadoop_version))
+  if (is.null(local_dir)) {
+local_dir <- getOption("spark.install.dir", spark_cache_path())
+  } else {
+local_dir <- normalizePath(local_dir)
+  }
+
+  packageLocalDir <- file.path(local_dir, packageName)
+
+
+  # can use dir.exists(packageLocalDir) under R 3.2.0 or later
+  if (!is.na(file.info(packageLocalDir)$isdir)) {
+fmt <- "Spark %s for Hadoop %s has been installed."
+msg <- sprintf(fmt, version, hadoop_version)
+message(msg)
+return(invisible(packageLocalDir))
+  }
+
+  packageLocalPath <- paste0(packageLocalDir, ".tgz")
+  tarExists <- file.exists(packageLocalPath)
+
+  if (tarExists) {
+message("Tar file found. Installing...")
+  } else {
+if (is.null(mirror_url)) {
+  message("Remote URL not provided. Use Apache default.")
+  mirror_url <- mirror_url_default()
+}
+
+version <- "spark-2.0.0-rc4-bin"
+# When 2.0 released, remove the above line and
+# change spark-releases to spark in the statement below
+packageRemotePath <- paste0(
+  file.path(mirror_url, "spark-releases", version, packageName), 
".tgz")
+fmt <- paste("Installing Spark %s for Hadoop %s.",
+ "Downloading from:\n- %s",
+ "Installing to:\n- %s", sep = "\n")
+msg <- sprintf(fmt, version, hadoop_version, packageRemotePath,
+   packageLocalDir)
+message(msg)
+
+fetchFail <- tryCatch(download.file(packageRemotePath, 
packageLocalPath),
+  error = function(e) {
+msg <- paste0("Fetch failed from ", 
mirror_url, ".")
+message(msg)
+TRUE
+  })
+if (fetchFail) {
+  message("Try the backup option.")
+  mirror_sites <- tryCatch(read.csv(mirror_url_csv()),
+   error = function(e) stop("No csv file 
found."))
+  mirror_url <- mirror_sites$url[1]
+  packageRemotePath <- paste0(file.path(mirror_url, version, 
packageName),
+  ".tgz")
+  message(sprintf("Downloading from:\n- %s", packageRemotePath))
+  tryCatch(download.file(packageRemotePath, packageLocalPath),
+   error = function(e)

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190500
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
+#' @param local_dir local directory that Spark is installed to
+#' @return \code{install_spark} returns the local directory 
+#' where Spark is found or installed
+#' @rdname install_spark
+#' @name install_spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install_spark()
+#'}
+#' @note install_spark since 2.1.0
+install_spark <- function(hadoop_version = NULL, mirror_url = NULL,
+  local_dir = NULL) {
+  version <- paste0("spark-", packageVersion("SparkR"))
+  hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop())
+  packageName <- ifelse(hadoop_version == "without",
+paste0(version, "-bin-without-hadoop"),
+paste0(version, "-bin-hadoop", hadoop_version))
+  if (is.null(local_dir)) {
+local_dir <- getOption("spark.install.dir", spark_cache_path())
+  } else {
+local_dir <- normalizePath(local_dir)
+  }
+
+  packageLocalDir <- file.path(local_dir, packageName)
+
+
+  # can use dir.exists(packageLocalDir) under R 3.2.0 or later
+  if (!is.na(file.info(packageLocalDir)$isdir)) {
+fmt <- "Spark %s for Hadoop %s has been installed."
+msg <- sprintf(fmt, version, hadoop_version)
+message(msg)
+return(invisible(packageLocalDir))
+  }
+
+  packageLocalPath <- paste0(packageLocalDir, ".tgz")
+  tarExists <- file.exists(packageLocalPath)
+
+  if (tarExists) {
+message("Tar file found. Installing...")
+  } else {
+if (is.null(mirror_url)) {
+  message("Remote URL not provided. Use Apache default.")
+  mirror_url <- mirror_url_default()
+}
+
+version <- "spark-2.0.0-rc4-bin"
+# When 2.0 released, remove the above line and
+# change spark-releases to spark in the statement below
+packageRemotePath <- paste0(
+  file.path(mirror_url, "spark-releases", version, packageName), 
".tgz")
+fmt <- paste("Installing Spark %s for Hadoop %s.",
+ "Downloading from:\n- %s",
+ "Installing to:\n- %s", sep = "\n")
+msg <- sprintf(fmt, version, hadoop_version, packageRemotePath,
+   packageLocalDir)
+message(msg)
+
+fetchFail <- tryCatch(download.file(packageRemotePath, 
packageLocalPath),
+  error = function(e) {
+msg <- paste0("Fetch failed from ", 
mirror_url, ".")
+message(msg)
+TRUE
+  })
+if (fetchFail) {
+  message("Try the backup option.")
+  mirror_sites <- tryCatch(read.csv(mirror_url_csv()),
+   error = function(e) stop("No csv file 
found."))
+  mirror_url <- mirror_sites$url[1]
+  packageRemotePath <- paste0(file.path(mirror_url, version, 
packageName),
+  ".tgz")
+  message(sprintf("Downloading from:\n- %s", packageRemotePath))
+  tryCatch(download.file(packageRemotePath, packageLocalPath),
+   error = function(e)

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190491
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
+#' @param local_dir local directory that Spark is installed to
+#' @return \code{install_spark} returns the local directory 
+#' where Spark is found or installed
+#' @rdname install_spark
+#' @name install_spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install_spark()
+#'}
+#' @note install_spark since 2.1.0
+install_spark <- function(hadoop_version = NULL, mirror_url = NULL,
+  local_dir = NULL) {
+  version <- paste0("spark-", packageVersion("SparkR"))
+  hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop())
+  packageName <- ifelse(hadoop_version == "without",
+paste0(version, "-bin-without-hadoop"),
+paste0(version, "-bin-hadoop", hadoop_version))
+  if (is.null(local_dir)) {
+local_dir <- getOption("spark.install.dir", spark_cache_path())
+  } else {
+local_dir <- normalizePath(local_dir)
+  }
+
+  packageLocalDir <- file.path(local_dir, packageName)
+
+
+  # can use dir.exists(packageLocalDir) under R 3.2.0 or later
+  if (!is.na(file.info(packageLocalDir)$isdir)) {
+fmt <- "Spark %s for Hadoop %s has been installed."
--- End diff --

There should be an option to force re-install Spark in case the local 
directory or file is corrupted. And please mention that in the message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190465
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
--- End diff --

Is `install_spark` consistent with SparkR naming convention? We only used 
underscores in wrapped SQL functions. Shall we use `install.spark`? cc: 
@shivaram 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190467
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
--- End diff --

* `hadoop_version` => `hadoopVersion` (camelCase for params)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190484
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
+#' @param local_dir local directory that Spark is installed to
+#' @return \code{install_spark} returns the local directory 
+#' where Spark is found or installed
+#' @rdname install_spark
+#' @name install_spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install_spark()
+#'}
+#' @note install_spark since 2.1.0
+install_spark <- function(hadoop_version = NULL, mirror_url = NULL,
+  local_dir = NULL) {
+  version <- paste0("spark-", packageVersion("SparkR"))
+  hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop())
+  packageName <- ifelse(hadoop_version == "without",
+paste0(version, "-bin-without-hadoop"),
+paste0(version, "-bin-hadoop", hadoop_version))
+  if (is.null(local_dir)) {
+local_dir <- getOption("spark.install.dir", spark_cache_path())
--- End diff --

Where is `spark.install.dir` documented? If this is only used by SparkR, we 
should have a more specific config name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190475
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
--- End diff --

* `mirrorUrl`
* mention `the directory layout should follow Apache mirrors 
(http://www.apache.org/dyn/closer.lua/spark/)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190505
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
+#' @param local_dir local directory that Spark is installed to
+#' @return \code{install_spark} returns the local directory 
+#' where Spark is found or installed
+#' @rdname install_spark
+#' @name install_spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install_spark()
+#'}
+#' @note install_spark since 2.1.0
+install_spark <- function(hadoop_version = NULL, mirror_url = NULL,
+  local_dir = NULL) {
+  version <- paste0("spark-", packageVersion("SparkR"))
+  hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop())
+  packageName <- ifelse(hadoop_version == "without",
+paste0(version, "-bin-without-hadoop"),
+paste0(version, "-bin-hadoop", hadoop_version))
+  if (is.null(local_dir)) {
+local_dir <- getOption("spark.install.dir", spark_cache_path())
+  } else {
+local_dir <- normalizePath(local_dir)
+  }
+
+  packageLocalDir <- file.path(local_dir, packageName)
+
+
+  # can use dir.exists(packageLocalDir) under R 3.2.0 or later
+  if (!is.na(file.info(packageLocalDir)$isdir)) {
+fmt <- "Spark %s for Hadoop %s has been installed."
+msg <- sprintf(fmt, version, hadoop_version)
+message(msg)
+return(invisible(packageLocalDir))
+  }
+
+  packageLocalPath <- paste0(packageLocalDir, ".tgz")
+  tarExists <- file.exists(packageLocalPath)
+
+  if (tarExists) {
+message("Tar file found. Installing...")
+  } else {
+if (is.null(mirror_url)) {
+  message("Remote URL not provided. Use Apache default.")
+  mirror_url <- mirror_url_default()
+}
+
+version <- "spark-2.0.0-rc4-bin"
+# When 2.0 released, remove the above line and
+# change spark-releases to spark in the statement below
+packageRemotePath <- paste0(
+  file.path(mirror_url, "spark-releases", version, packageName), 
".tgz")
+fmt <- paste("Installing Spark %s for Hadoop %s.",
+ "Downloading from:\n- %s",
+ "Installing to:\n- %s", sep = "\n")
+msg <- sprintf(fmt, version, hadoop_version, packageRemotePath,
+   packageLocalDir)
+message(msg)
+
+fetchFail <- tryCatch(download.file(packageRemotePath, 
packageLocalPath),
+  error = function(e) {
+msg <- paste0("Fetch failed from ", 
mirror_url, ".")
+message(msg)
+TRUE
+  })
+if (fetchFail) {
+  message("Try the backup option.")
+  mirror_sites <- tryCatch(read.csv(mirror_url_csv()),
+   error = function(e) stop("No csv file 
found."))
+  mirror_url <- mirror_sites$url[1]
+  packageRemotePath <- paste0(file.path(mirror_url, version, 
packageName),
+  ".tgz")
+  message(sprintf("Downloading from:\n- %s", packageRemotePath))
+  tryCatch(download.file(packageRemotePath, packageLocalPath),
+   error = function(e)

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190480
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
+#' @param local_dir local directory that Spark is installed to
+#' @return \code{install_spark} returns the local directory 
+#' where Spark is found or installed
+#' @rdname install_spark
+#' @name install_spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install_spark()
+#'}
+#' @note install_spark since 2.1.0
+install_spark <- function(hadoop_version = NULL, mirror_url = NULL,
+  local_dir = NULL) {
+  version <- paste0("spark-", packageVersion("SparkR"))
--- End diff --

Does it mean that we can only publish SparkR with released Spark versions? 
Then how to make patched releases, say "2.0.0-1"? Can we overwrite an existing 
release on CRAN?

cc: @felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190496
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
+#' specify a desired Hadoop version, the remote site, and the directory 
where
+#' the package is installed locally.
+#'
+#' @param hadoop_version Version of Hadoop to install, 2.4, 2.6,
+#'2.7 (default) and without
+#' @param mirror_url the base URL of the repositories to use
+#' @param local_dir local directory that Spark is installed to
+#' @return \code{install_spark} returns the local directory 
+#' where Spark is found or installed
+#' @rdname install_spark
+#' @name install_spark
+#' @export
+#' @examples
+#'\dontrun{
+#' install_spark()
+#'}
+#' @note install_spark since 2.1.0
+install_spark <- function(hadoop_version = NULL, mirror_url = NULL,
+  local_dir = NULL) {
+  version <- paste0("spark-", packageVersion("SparkR"))
+  hadoop_version <- match.arg(hadoop_version, supported_versions_hadoop())
+  packageName <- ifelse(hadoop_version == "without",
+paste0(version, "-bin-without-hadoop"),
+paste0(version, "-bin-hadoop", hadoop_version))
+  if (is.null(local_dir)) {
+local_dir <- getOption("spark.install.dir", spark_cache_path())
+  } else {
+local_dir <- normalizePath(local_dir)
+  }
+
+  packageLocalDir <- file.path(local_dir, packageName)
+
+
+  # can use dir.exists(packageLocalDir) under R 3.2.0 or later
+  if (!is.na(file.info(packageLocalDir)$isdir)) {
+fmt <- "Spark %s for Hadoop %s has been installed."
+msg <- sprintf(fmt, version, hadoop_version)
+message(msg)
+return(invisible(packageLocalDir))
+  }
+
+  packageLocalPath <- paste0(packageLocalDir, ".tgz")
+  tarExists <- file.exists(packageLocalPath)
+
+  if (tarExists) {
+message("Tar file found. Installing...")
+  } else {
+if (is.null(mirror_url)) {
+  message("Remote URL not provided. Use Apache default.")
+  mirror_url <- mirror_url_default()
+}
+
+version <- "spark-2.0.0-rc4-bin"
+# When 2.0 released, remove the above line and
+# change spark-releases to spark in the statement below
+packageRemotePath <- paste0(
+  file.path(mirror_url, "spark-releases", version, packageName), 
".tgz")
+fmt <- paste("Installing Spark %s for Hadoop %s.",
+ "Downloading from:\n- %s",
+ "Installing to:\n- %s", sep = "\n")
+msg <- sprintf(fmt, version, hadoop_version, packageRemotePath,
+   packageLocalDir)
+message(msg)
+
+fetchFail <- tryCatch(download.file(packageRemotePath, 
packageLocalPath),
+  error = function(e) {
+msg <- paste0("Fetch failed from ", 
mirror_url, ".")
+message(msg)
+TRUE
+  })
+if (fetchFail) {
+  message("Try the backup option.")
+  mirror_sites <- tryCatch(read.csv(mirror_url_csv()),
+   error = function(e) stop("No csv file 
found."))
+  mirror_url <- mirror_sites$url[1]
+  packageRemotePath <- paste0(file.path(mirror_url, version, 
packageName),
+  ".tgz")
+  message(sprintf("Downloading from:\n- %s", packageRemotePath))
+  tryCatch(download.file(packageRemotePath, packageLocalPath),
+   error = function(e)

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190466
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
+#' 
+#' \code{install_spark} downloads and installs Spark to local directory if
+#' it is not found. The Spark version we use is 2.0.0 (preview). Users can
--- End diff --

* `2.0.0 (preview)` => The Spark version is the same as the SparkR version. 
(Otherwise we need to update this doc for each Spark release.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-25 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/14258#discussion_r72190463
  
--- Diff: R/pkg/R/install.R ---
@@ -0,0 +1,155 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Functions to install Spark in case the user directly downloads SparkR
+# from CRAN.
+
+#' Download and Install Spark Core to Local Directory
--- End diff --

* `Spark Core` -> `Apache Spark`. This downloads the full distribution.
* `Local Directory` -> `a Local Directory`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14362: [SPARK-16730][SQL] Implement function aliases for...

2016-07-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14362#discussion_r72190201
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteFunctionAliases.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import org.apache.spark.sql.catalyst.expressions.Cast
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.types._
+
+/**
+ * An analyzer rule that handles function aliases.
+ */
+object SubstituteFunctionAliases extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
+// SPARK-16730: The following functions are aliases for cast in Hive.
+case u: UnresolvedFunction
+if u.name.database.isEmpty && u.children.size == 1 && 
!u.isDistinct =>
+  u.name.funcName.toLowerCase match {
+case "boolean" => Cast(u.children.head, BooleanType)
--- End diff --

can we use `FunctionRegister` to handle these?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14323
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62863/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14323
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14323
  
**[Test build #62863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62863/consoleFull)**
 for PR 14323 at commit 
[`c33bb62`](https://github.com/apache/spark/commit/c33bb62a4565bbcf3b13ea5e232180a577d79455).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #36: Added a unit test for PairRDDFunctions.lookup

2016-07-25 Thread databricks-jenkins

Github user databricks-jenkins commented on the issue:

https://github.com/apache/spark/pull/36
  
**[Test build #69 has 
finished](https://jenkins.test.databricks.com/job/spark-pull-request-builder/69/consoleFull)**
 for PR 36 at commit 
[`306c0f8`](https://github.com/apache/spark/commit/306c0f8c10e04995d9a9cffd3bda5383b65e34ac).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class ByteCodeParserException(message: String) extends 
ClosureTranslationException(message, null)`
  * `  class UnsupportedOpcodeException(`
  * `  sealed trait Node `
  * `  sealed trait BinaryNode extends Node `
  * `  sealed trait UnaryNode extends Node `
  * `  sealed trait NullaryNode extends Node `
  * `  case class Constant[T: ClassTag](value: T) extends NullaryNode `
  * `  case class Argument(dataType: Type) extends NullaryNode `
  * `  case class This(dataType: Type) extends NullaryNode `
  * `  case class If(condition: Node, left: Node, right: Node, dataType: 
Type) extends BinaryNode`
  * `  case class FunctionCall(`
  * `  case class Static(clazz: String, name: String, dataType: Type) 
extends NullaryNode`
  * `  case class Cast(node: Node, dataType: Type) extends UnaryNode`
  * `  case class Arithmetic(`
  * `class ByteCodeParser `
  * `class ClosureTranslationException(`
  * `class ExpressionGenerator `
  * `  throw new ClosureTranslationException(\"ExpressionGenerator only 
support case class or \" +`
  * `  case class Field(`
  * `  case class NPEOnNull(`
  * `case class TranslateClosureOptimizerRule(conf: CatalystConf) extends 
Rule[LogicalPlan] `
  * `  case class Parent(child: LogicalPlan) extends UnaryNode `
  * `class TypeOps(dataType: Type) `
  * `  class UnSupportedTypeException(dataType: Type)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14257: [SPARK-16621][SQL] Generate stable SQLs in SQLBuilder

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14257
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14257: [SPARK-16621][SQL] Generate stable SQLs in SQLBuilder

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14257
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62862/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14361: [TEST][STREAMING] Fix flaky Kafka rate controlling test

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14361
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14361: [TEST][STREAMING] Fix flaky Kafka rate controlling test

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14361
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62861/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14362: [SPARK-16730][SQL] Implement function aliases for...

2016-07-25 Thread petermaxlee

Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/14362#discussion_r72187892
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
 ---
@@ -200,24 +200,6 @@ class TypeCoercionSuite extends PlanTest {
 widenTest(ArrayType(IntegerType), StructType(Seq()), None)
   }
 
-  private def ruleTest(rule: Rule[LogicalPlan], initial: Expression, 
transformed: Expression) {
--- End diff --

I moved this into PlanTest


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14257: [SPARK-16621][SQL] Generate stable SQLs in SQLBuilder

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14257
  
**[Test build #62862 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62862/consoleFull)**
 for PR 14257 at commit 
[`16b0f49`](https://github.com/apache/spark/commit/16b0f4908fe18c04830e1a972bc1874c9a169d4a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14361: [TEST][STREAMING] Fix flaky Kafka rate controlling test

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14361
  
**[Test build #62861 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62861/consoleFull)**
 for PR 14361 at commit 
[`52b5a20`](https://github.com/apache/spark/commit/52b5a209eab2b11565e0e9403b35b5eae6429e53).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14362: [SPARK-16730][SQL] Implement function aliases for type c...

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14362
  
**[Test build #62869 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62869/consoleFull)**
 for PR 14362 at commit 
[`37b7127`](https://github.com/apache/spark/commit/37b7127cdbf93334daa05dc5f48715b78966d032).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14362: [SPARK-16730][SQL] Implement function aliases for type c...

2016-07-25 Thread petermaxlee

Github user petermaxlee commented on the issue:

https://github.com/apache/spark/pull/14362
  
@cloud-fan and @hvanhovell can you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14362: [SPARK-16730][SQL] Implement function aliases for...

2016-07-25 Thread petermaxlee

Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/14362#discussion_r72187942
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteFunctionAliases.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import org.apache.spark.sql.catalyst.expressions.Cast
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.types._
+
+/**
+ * An analyzer rule that handles function aliases.
+ */
+object SubstituteFunctionAliases extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan.resolveExpressions {
+// SPARK-16730: The following functions are aliases for cast in Hive.
+case u: UnresolvedFunction
+if u.name.database.isEmpty && u.children.size == 1 && 
!u.isDistinct =>
+  u.name.funcName.toLowerCase match {
+case "boolean" => Cast(u.children.head, BooleanType)
+case "tinyint" => Cast(u.children.head, ByteType)
+case "smallint" => Cast(u.children.head, ShortType)
+case "int" => Cast(u.children.head, IntegerType)
+case "bigint" => Cast(u.children.head, LongType)
+case "float" => Cast(u.children.head, FloatType)
+case "double" => Cast(u.children.head, DoubleType)
+case "decimal" => Cast(u.children.head, DecimalType.USER_DEFAULT)
--- End diff --

I'm using whatever cast as decimal is using here, but I think it is a bug 
to by default cast to USER_DEFAULT, which has scale = 0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14362: [SPARK-16730][SQL] Implement function aliases for...

2016-07-25 Thread petermaxlee

GitHub user petermaxlee opened a pull request:

https://github.com/apache/spark/pull/14362

[SPARK-16730][SQL] Implement function aliases for type casts

## What changes were proposed in this pull request?
Spark 1.x supports using the Hive type name as function names for doing 
casts, e.g.
```sql
SELECT int(1.0);
SELECT string(2.0);
```

The above query would work in Spark 1.x because Spark 1.x fail back to Hive 
for unimplemented functions, and break in Spark 2.0 because the fall back was 
removed.

This patch implements function aliases using an analyzer rule for the 
following cast functions:
- boolean
- tinyint
- smallint
- int
- bigint
- float
- double
- decimal
- date
- timestamp
- binary
- string

## How was this patch tested?
Added unit tests for SubstituteFunctionAliases as well as end-to-end tests 
for SQLCompatibilityFunctionSuite.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/petermaxlee/spark SPARK-16730

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14362.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14362


commit 37b7127cdbf93334daa05dc5f48715b78966d032
Author: petermaxlee 
Date:   2016-07-26T04:49:06Z

[SPARK-16730][SQL] Implement function aliases for type casts




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #36: Added a unit test for PairRDDFunctions.lookup

2016-07-25 Thread databricks-jenkins

Github user databricks-jenkins commented on the issue:

https://github.com/apache/spark/pull/36
  
**[Test build #70 has 
started](https://jenkins.test.databricks.com/job/spark-pull-request-builder/70/consoleFull)**
 for PR 36 at commit 
[`54b968b`](https://github.com/apache/spark/commit/54b968bd158483ec771e912c32c3faf1e6c62476).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14358: [SPARK-16729][SQL] Throw analysis exception for invalid ...

2016-07-25 Thread petermaxlee

Github user petermaxlee commented on the issue:

https://github.com/apache/spark/pull/14358
  
@cloud-fan can you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14323
  
**[Test build #62868 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62868/consoleFull)**
 for PR 14323 at commit 
[`fb0f9a8`](https://github.com/apache/spark/commit/fb0f9a8cd54cca299b2dac34d65848c2f86e7bb4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #36: Added a unit test for PairRDDFunctions.lookup

2016-07-25 Thread databricks-jenkins

Github user databricks-jenkins commented on the issue:

https://github.com/apache/spark/pull/36
  
**[Test build #67 has 
finished](https://jenkins.test.databricks.com/job/spark-pull-request-builder/67/consoleFull)**
 for PR 36 at commit 
[`47f2be0`](https://github.com/apache/spark/commit/47f2be03097e0afebc6240e4b39ab55803a1e5d9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class ByteCodeParserException(message: String) extends 
ClosureTranslationException(message, null)`
  * `  class UnsupportedOpcodeException(`
  * `  sealed trait Node `
  * `  sealed trait BinaryNode extends Node `
  * `  sealed trait UnaryNode extends Node `
  * `  sealed trait NullaryNode extends Node `
  * `  case class Constant[T: ClassTag](value: T) extends NullaryNode `
  * `  case class Argument(dataType: Type) extends NullaryNode `
  * `  case class This(dataType: Type) extends NullaryNode `
  * `  case class If(condition: Node, left: Node, right: Node, dataType: 
Type) extends BinaryNode`
  * `  case class FunctionCall(`
  * `  case class Static(clazz: String, name: String, dataType: Type) 
extends NullaryNode`
  * `  case class Cast(node: Node, dataType: Type) extends UnaryNode`
  * `  case class Arithmetic(`
  * `class ByteCodeParser `
  * `class ClosureTranslationException(`
  * `class ExpressionGenerator `
  * `  throw new ClosureTranslationException(\"ExpressionGenerator only 
support case class or \" +`
  * `  case class Field(`
  * `  case class NPEOnNull(`
  * `case class TranslateClosureOptimizerRule(conf: CatalystConf) extends 
Rule[LogicalPlan] `
  * `  case class Parent(child: LogicalPlan) extends UnaryNode `
  * `class TypeOps(dataType: Type) `
  * `  class UnSupportedTypeException(dataType: Type)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14323
  
**[Test build #62867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62867/consoleFull)**
 for PR 14323 at commit 
[`ab4a1cf`](https://github.com/apache/spark/commit/ab4a1cf7d9a806fbb9046010f619258f8eb77c8b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14353#discussion_r72186040
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
 
   override def foldable: Boolean = children.forall(_.foldable)
 
-  override def checkInputDataTypes(): TypeCheckResult =
-TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), 
"function array")
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) {
+  TypeCheckResult.TypeCheckSuccess
--- End diff --

Is there anything to check more?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14353#discussion_r72186016
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
 
   override def foldable: Boolean = children.forall(_.foldable)
 
-  override def checkInputDataTypes(): TypeCheckResult =
-TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), 
"function array")
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) {
+  TypeCheckResult.TypeCheckSuccess
--- End diff --

In short, those are recognized correctly in the Analyzed Logical Plan. As a 
result, the codegen correctly writes it with the unified precision and scale.
```
== Analyzed Logical Plan ==
a[0]: decimal(3,3), a[1]: decimal(3,3)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14323: [SPARK-16675][SQL] Avoid per-record type dispatch...

2016-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14323#discussion_r72185977
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -138,6 +138,79 @@ object JdbcUtils extends Logging {
   throw new IllegalArgumentException(s"Can't get JDBC type for 
${dt.simpleString}"))
   }
 
+  // A `JDBCValueSetter` is responsible for converting and setting a value 
from `Row` into
+  // a field for `PreparedStatement`. The last argument `Int` means the 
index for the
+  // value to be set in the SQL statement and also used for the value in 
`Row`.
+  private type JDBCValueSetter = (PreparedStatement, Row, Int) => Unit
--- End diff --

Fixed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14358: [SPARK-16729][SQL] Throw analysis exception for invalid ...

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14358
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14353#discussion_r72185905
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
 
   override def foldable: Boolean = children.forall(_.foldable)
 
-  override def checkInputDataTypes(): TypeCheckResult =
-TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), 
"function array")
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) {
+  TypeCheckResult.TypeCheckSuccess
--- End diff --

And Finally, the following is the codegen result. Please see the line 29.
```scala
scala> sql("explain codegen select array(0.001, 
0.02)[1]").collect().foreach(println)
[Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*Project [0.02 AS array(0.001, 0.02)[1]#75]
+- Scan OneRowRelation[]

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private scala.collection.Iterator inputadapter_input;
/* 008 */   private UnsafeRow project_result;
/* 009 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder;
/* 010 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
project_rowWriter;
/* 011 */
/* 012 */   public GeneratedIterator(Object[] references) {
/* 013 */ this.references = references;
/* 014 */   }
/* 015 */
/* 016 */   public void init(int index, scala.collection.Iterator inputs[]) 
{
/* 017 */ partitionIndex = index;
/* 018 */ inputadapter_input = inputs[0];
/* 019 */ project_result = new UnsafeRow(1);
/* 020 */ this.project_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 
0);
/* 021 */ this.project_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder,
 1);
/* 022 */   }
/* 023 */
/* 024 */   protected void processNext() throws java.io.IOException {
/* 025 */ while (inputadapter_input.hasNext()) {
/* 026 */   InternalRow inputadapter_row = (InternalRow) 
inputadapter_input.next();
/* 027 */   Object project_obj = ((Expression) 
references[0]).eval(null);
/* 028 */   Decimal project_value = (Decimal) project_obj;
/* 029 */   project_rowWriter.write(0, project_value, 3, 3);
/* 030 */   append(project_result);
/* 031 */   if (shouldStop()) return;
/* 032 */ }
/* 033 */   }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14358: [SPARK-16729][SQL] Throw analysis exception for invalid ...

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14358
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62857/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14358: [SPARK-16729][SQL] Throw analysis exception for invalid ...

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14358
  
**[Test build #62857 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62857/consoleFull)**
 for PR 14358 at commit 
[`5419b85`](https://github.com/apache/spark/commit/5419b859a7c8669122e8c65e88029a47556c36ee).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14353#discussion_r72185594
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
 
   override def foldable: Boolean = children.forall(_.foldable)
 
-  override def checkInputDataTypes(): TypeCheckResult =
-TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), 
"function array")
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) {
+  TypeCheckResult.TypeCheckSuccess
--- End diff --

```scala
scala> sql("create table d1(a DECIMAL(3,2))")
scala> sql("create table d2(a DECIMAL(2,1))")
scala> sql("insert into d1 values(1.0)")
scala> sql("insert into d2 values(1.0)")
scala> sql("select * from d1, d2").show()
++---+
|   a|  a|
++---+
|1.00|1.0|
++---+

scala> sql("select array(d1.a,d2.a),array(d2.a,d1.a),* from d1, d2")
res5: org.apache.spark.sql.DataFrame = [array(a, a): array, 
array(a, a): array ... 2 more fields]

scala> sql("select array(d1.a,d2.a),array(d2.a,d1.a),* from d1, d2").show()
++++---+
| array(a, a)| array(a, a)|   a|  a|
++++---+
|[1.00, 1.00]|[1.00, 1.00]|1.00|1.0|
++++---+
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12059: [SPARK-14265] Get attempId of stage and transfer it to w...

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12059
  
**[Test build #62866 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62866/consoleFull)**
 for PR 12059 at commit 
[`71bebd6`](https://github.com/apache/spark/commit/71bebd693db9874a941bbb7b8f0fa528ddfd506b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14361: [TEST][STREAMING] Fix flaky Kafka rate controlling test

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14361
  
**[Test build #3192 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3192/consoleFull)**
 for PR 14361 at commit 
[`52b5a20`](https://github.com/apache/spark/commit/52b5a209eab2b11565e0e9403b35b5eae6429e53).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14353#discussion_r72185372
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
 
   override def foldable: Boolean = children.forall(_.foldable)
 
-  override def checkInputDataTypes(): TypeCheckResult =
-TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), 
"function array")
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) {
+  TypeCheckResult.TypeCheckSuccess
--- End diff --

Hi, @yhuai . I check the following.

```scala
scala> sql("select a[0], a[1] from (select array(0.001, 0.02) a) T")
res4: org.apache.spark.sql.DataFrame = [a[0]: decimal(3,3), a[1]: 
decimal(3,3)]

scala> sql("select a[0], a[1] from (select array(0.001, 0.02) a) T").show()
+-+-+
| a[0]| a[1]|
+-+-+
|0.001|0.020|
+-+-+

scala> sql("select a[0], a[1] from (select array(0.001, 0.02) a) 
T").explain(true)
== Parsed Logical Plan ==
'Project [unresolvedalias('a[0], None), unresolvedalias('a[1], None)]
+- 'SubqueryAlias T
   +- 'Project ['array(0.001, 0.02) AS a#54]
  +- OneRowRelation$

== Analyzed Logical Plan ==
a[0]: decimal(3,3), a[1]: decimal(3,3)
Project [a#54[0] AS a[0]#61, a#54[1] AS a[1]#62]
+- SubqueryAlias T
   +- Project [array(0.001, 0.02) AS a#54]
  +- OneRowRelation$
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14361: [TEST][STREAMING] Fix flaky Kafka rate controlling test

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14361
  
**[Test build #3191 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3191/consoleFull)**
 for PR 14361 at commit 
[`52b5a20`](https://github.com/apache/spark/commit/52b5a209eab2b11565e0e9403b35b5eae6429e53).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14361: [TEST][STREAMING] Fix flaky Kafka rate controlling test

2016-07-25 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/14361
  
@koeninger Can you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14323: [SPARK-16675][SQL] Avoid per-record type dispatch...

2016-07-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14323#discussion_r72185133
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -138,6 +138,79 @@ object JdbcUtils extends Logging {
   throw new IllegalArgumentException(s"Can't get JDBC type for 
${dt.simpleString}"))
   }
 
+  // A `JDBCValueSetter` is responsible for converting and setting a value 
from `Row` into
+  // a field for `PreparedStatement`. The last argument `Int` means the 
index for the
+  // value to be set in the SQL statement and also used for the value in 
`Row`.
+  private type JDBCValueSetter = (PreparedStatement, Row, Int) => Unit
--- End diff --

please rename the read path too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14327: [SPARK-16686][SQL] Remove PushProjectThroughSample since...

2016-07-25 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14327
  
Thanks for reviewing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14327: [SPARK-16686][SQL] Remove PushProjectThroughSample since...

2016-07-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14327
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14327: [SPARK-16686][SQL] Remove PushProjectThroughSampl...

2016-07-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14327


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14357
  
It's my pleasure. See you later around Apache Spark. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14357: [SPARK-16727][SparkR] Fix expected test output of...

2016-07-25 Thread junyangq

Github user junyangq closed the pull request at:

https://github.com/apache/spark/pull/14357


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...

2016-07-25 Thread junyangq

Github user junyangq commented on the issue:

https://github.com/apache/spark/pull/14357
  
I see. Thank you for pointing that out :) I'll close the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14284: [SPARK-16633] [SPARK-16642] [SPARK-16721] [SQL] F...

2016-07-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14284


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14344: [SPARK-16706][SQL] support java map in encoder

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14344
  
**[Test build #62864 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62864/consoleFull)**
 for PR 14344 at commit 
[`b4dad74`](https://github.com/apache/spark/commit/b4dad7433266caba0b7e9d6b0a88c5a9c3e3afa7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14302: [SPARK-16663][SQL] desc table should be consistent betwe...

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14302
  
**[Test build #62865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62865/consoleFull)**
 for PR 14302 at commit 
[`56338b4`](https://github.com/apache/spark/commit/56338b4a20c3cbdbacbe51e17ad492ea0d6f6ecf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14284: [SPARK-16633] [SPARK-16642] [SPARK-16721] [SQL] Fixes th...

2016-07-25 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14284
  
Thanks for review. I am merging this to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-07-25 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/13585#discussion_r72184495
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -92,6 +92,36 @@ object PhysicalOperation extends PredicateHelper {
   .map(Alias(_, a.name)(a.exprId, a.qualifier, isGenerated = 
a.isGenerated)).getOrElse(a)
 }
   }
+
+  /**
+   * Drop the non-partition key expression from the given expression, to 
optimize the
+   * partition pruning. For instances: (We assume part1 & part2 are the 
partition keys):
+   * (part1 == 1 and a > 3) or (part2 == 2 and a < 5)  ==> (part1 == 1 or 
part1 == 2)
+   * (part1 == 1 and a > 3) or (a < 100) => None
+   * (a > 100 && b < 100) or (part1 = 10) => None
+   * (a > 100 && b < 100 and part1 = 10) or (part1 == 2) => (part1 = 10 or 
part1 == 2)
+   * @param predicate The given expression
+   * @param partitionKeyIds partition keys in attribute set
+   * @return
+   */
+  def extractPartitionKeyExpression(
+predicate: Expression, partitionKeyIds: AttributeSet): 
Option[Expression] = {
+// drop the non-partition key expression in conjunction of the 
expression tree
+val additionalPartPredicate = predicate transformUp {
--- End diff --

I can keep updating the code if we are agreed for approach, otherwise, I 
think we'd better close this PR for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13585: [SPARK-15859][SQL] Optimize the partition pruning...

2016-07-25 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/13585#discussion_r72184424
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -92,6 +92,36 @@ object PhysicalOperation extends PredicateHelper {
   .map(Alias(_, a.name)(a.exprId, a.qualifier, isGenerated = 
a.isGenerated)).getOrElse(a)
 }
   }
+
+  /**
+   * Drop the non-partition key expression from the given expression, to 
optimize the
+   * partition pruning. For instances: (We assume part1 & part2 are the 
partition keys):
+   * (part1 == 1 and a > 3) or (part2 == 2 and a < 5)  ==> (part1 == 1 or 
part1 == 2)
+   * (part1 == 1 and a > 3) or (a < 100) => None
+   * (a > 100 && b < 100) or (part1 = 10) => None
+   * (a > 100 && b < 100 and part1 = 10) or (part1 == 2) => (part1 = 10 or 
part1 == 2)
+   * @param predicate The given expression
+   * @param partitionKeyIds partition keys in attribute set
+   * @return
+   */
+  def extractPartitionKeyExpression(
+predicate: Expression, partitionKeyIds: AttributeSet): 
Option[Expression] = {
+// drop the non-partition key expression in conjunction of the 
expression tree
+val additionalPartPredicate = predicate transformUp {
--- End diff --

This PR may have critical bugs, when user implements a UDF which logically 
like the `NOT` operator in the partition filter expression. Probably we need a 
white list the built-in UDFs.

@yhuai @liancheng @yangw1234 @clockfly any comments on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14357
  
If you use `-fdx`, it removes IDE like Intellij settings, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14357
  
Never. I did that in order to make it sure. I don't do it frequently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...

2016-07-25 Thread junyangq

Github user junyangq commented on the issue:

https://github.com/apache/spark/pull/14357
  
Yeah sure, but just wondering if the clean and additional arguments 
something we should normally do? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14357
  
Actually, you can see the Jenkins log, too. There is no problem with the 
current R testsuite. 

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62858/consoleFull

FYI, I'm using JDK 1.8.0_102 and Jenkins is using JDK 1.7.x.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14357
  
I think you did something wrong. :)
Could you close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14356: [SPARK-16724] Expose DefinedByConstructorParams

2016-07-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14356


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...

2016-07-25 Thread junyangq

Github user junyangq commented on the issue:

https://github.com/apache/spark/pull/14357
  
Hmm... It doesn't show the name column either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14357
  
At the most recent master build, I did the following things and all tests 
are passed.
```
$ git clean -fdx
$ ./build/sbt -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive-thriftserver -Phive 
-Psparkr package streaming-kafka-0-8-assembly/assembly 
streaming-flume-assembly/assembly streaming-kinesis-asl-assembly/assembly
$ R/install-dev.sh 
$ R/run-tests.sh 
...
DONE 
===
Tests passed.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14302: [SPARK-16663][SQL] desc table should be consistent betwe...

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14302
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62859/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14302: [SPARK-16663][SQL] desc table should be consistent betwe...

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14302
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14302: [SPARK-16663][SQL] desc table should be consistent betwe...

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14302
  
**[Test build #62859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62859/consoleFull)**
 for PR 14302 at commit 
[`5a5ddba`](https://github.com/apache/spark/commit/5a5ddbafe069d579670525e5d58f7676dcba1e28).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14356: [SPARK-16724] Expose DefinedByConstructorParams

2016-07-25 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14356
  
Merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14327: [SPARK-16686][SQL] Remove PushProjectThroughSample since...

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14327
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14327: [SPARK-16686][SQL] Remove PushProjectThroughSample since...

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14327
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62856/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14327: [SPARK-16686][SQL] Remove PushProjectThroughSample since...

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14327
  
**[Test build #62856 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62856/consoleFull)**
 for PR 14327 at commit 
[`3e134f1`](https://github.com/apache/spark/commit/3e134f18f5b3fc678d95e2bf10997185ab6dd6e9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14357
  
@junyangq . Currently, you are at the most recent master build, right?
And, SparkR does not show the `name` column as a result of `describe` 
command.
I'm wondering the result of `spark-shell` on your systems. Could you run 
the following command in spark-shell?
```scala
scala> 
spark.read.json("examples/src/main/resources/people.json").describe().show()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14284: [SPARK-16633] [SPARK-16642] [SPARK-16721] [SQL] Fixes th...

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14284
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14284: [SPARK-16633] [SPARK-16642] [SPARK-16721] [SQL] Fixes th...

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14284
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62854/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14284: [SPARK-16633] [SPARK-16642] [SPARK-16721] [SQL] Fixes th...

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14284
  
**[Test build #62854 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62854/consoleFull)**
 for PR 14284 at commit 
[`ff3029e`](https://github.com/apache/spark/commit/ff3029e3db2214d7c51ea3ea866becf441fddac4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14353#discussion_r72182807
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
 
   override def foldable: Boolean = children.forall(_.foldable)
 
-  override def checkInputDataTypes(): TypeCheckResult =
-TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), 
"function array")
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) {
+  TypeCheckResult.TypeCheckSuccess
--- End diff --

Thank you for review, @yhuai .
I see. I'll check that more. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14357
  
Oh, let me try this time. Thank you for double-checking, @junyangq .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72182414
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (table <- parameters) {
+var stop = false
+resolvedChild = resolvedChild.transformDown {
--- End diff --

Oh, I see. That's the reason why I can not find the clear logic. Thank you 
so much, @yhuai !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...

2016-07-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14353#discussion_r72182390
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
 
   override def foldable: Boolean = children.forall(_.foldable)
 
-  override def checkInputDataTypes(): TypeCheckResult =
-TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), 
"function array")
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) {
+  TypeCheckResult.TypeCheckSuccess
--- End diff --

For example, if we access a single element, its data type actually may not 
be the one shown as the array's datatype.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14257: [SPARK-16621][SQL] Generate stable SQLs in SQLBui...

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14257#discussion_r72182312
  
--- Diff: sql/hive/src/test/resources/sqlgen/predicate_subquery.sql ---
@@ -1,4 +1,4 @@
 -- This file is automatically generated by LogicalPlanToSQLSuite.
 select * from t1 b where exists (select * from t1 a)
 

-SELECT `gen_attr` AS `a` FROM (SELECT `gen_attr` FROM (SELECT `a` AS 
`gen_attr` FROM `default`.`t1`) AS gen_subquery_0 WHERE EXISTS(SELECT 
`gen_attr` AS `a` FROM ((SELECT `gen_attr` FROM (SELECT `a` AS `gen_attr` FROM 
`default`.`t1`) AS gen_subquery_0) AS gen_subquery_1) AS gen_subquery_1)) AS b
+SELECT `gen_attr_0` AS `a` FROM (SELECT `gen_attr_0` FROM (SELECT `a` AS 
`gen_attr_0` FROM `default`.`t1`) AS gen_subquery_0 WHERE EXISTS(SELECT 
`gen_attr_1` AS `a` FROM ((SELECT `gen_attr_1` FROM (SELECT `a` AS `gen_attr_1` 
FROM `default`.`t1`) AS gen_subquery_2) AS gen_subquery_1) AS gen_subquery_1)) 
AS b
--- End diff --

For this query, I reformatted like the following. Here, `gen_subquery_xxx`s 
are generated uniquely. But, `gen_subquery_1` is repeated due to the added 
nested subquery alias. Please note that it's not a repetition by duplicated ID 
and happens as a direct double nesting. I think it's okay.
```sql
SELECT `gen_attr_0` AS `a`
FROM (SELECT `gen_attr_0`
  FROM (SELECT `a` AS `gen_attr_0`
FROM `default`.`t1`
   ) AS gen_subquery_0
  WHERE EXISTS(SELECT `gen_attr_1` AS `a`
   FROM (  (SELECT `gen_attr_1`
FROM (SELECT `a` AS `gen_attr_1`
  FROM `default`.`t1`
 ) AS gen_subquery_2
   ) AS gen_subquery_1
) AS gen_subquery_1
   )
 ) AS b
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14353: [SPARK-16714][SQL] `array` should create a decima...

2016-07-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14353#discussion_r72182316
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -33,13 +33,24 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
 
   override def foldable: Boolean = children.forall(_.foldable)
 
-  override def checkInputDataTypes(): TypeCheckResult =
-TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), 
"function array")
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.map(_.dataType).forall(_.isInstanceOf[DecimalType])) {
+  TypeCheckResult.TypeCheckSuccess
--- End diff --

I think we cannot just make the check pass. We need to need to actually 
cast those element to the same prevision and scale.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14359
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14359
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62858/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14359
  
**[Test build #62858 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62858/consoleFull)**
 for PR 14359 at commit 
[`6fcfb4b`](https://github.com/apache/spark/commit/6fcfb4b0e158ba86371ad4d0728490f3a8e7caeb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...

2016-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14323
  
**[Test build #62863 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62863/consoleFull)**
 for PR 14323 at commit 
[`c33bb62`](https://github.com/apache/spark/commit/c33bb62a4565bbcf3b13ea5e232180a577d79455).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72181762
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (table <- parameters) {
+var stop = false
+resolvedChild = resolvedChild.transformDown {
--- End diff --

You probably want to look at older versions of Hive. For example, 0.10. 
After https://issues.apache.org/jira/browse/HIVE-3784, Hive does not really use 
map join hint.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #36: Added a unit test for PairRDDFunctions.lookup

2016-07-25 Thread databricks-jenkins

Github user databricks-jenkins commented on the issue:

https://github.com/apache/spark/pull/36
  
**[Test build #69 has 
started](https://jenkins.test.databricks.com/job/spark-pull-request-builder/69/consoleFull)**
 for PR 36 at commit 
[`306c0f8`](https://github.com/apache/spark/commit/306c0f8c10e04995d9a9cffd3bda5383b65e34ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14357: [SPARK-16727][SparkR] Fix expected test output of descri...

2016-07-25 Thread junyangq

Github user junyangq commented on the issue:

https://github.com/apache/spark/pull/14357
  
I merged the most recent master branch, rebuilt and installed the package, 
but the test failed at the same place. @dongjoon-hyun 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 699 matches

Mail list logo