[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21230 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90436/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21230 **[Test build #90436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90436/testReport)** for PR 21230 at commit [`e224f8a`](https://github.com/apache/spark/commit/e224f8a798ed30319efab386720c997227e1b421). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21186: [SPARK-22279][SQL] Enable `convertMetastoreOrc` b...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21186 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21186: [SPARK-22279][SQL] Enable `convertMetastoreOrc` by defau...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21186 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21238 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90432/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21238 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21238 **[Test build #90432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90432/testReport)** for PR 21238 at commit [`fa095cd`](https://github.com/apache/spark/commit/fa095cd9faceb1247f3704a1a4949be834b05746). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3096/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21278 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90439/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21278 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21278 **[Test build #90439 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90439/testReport)** for PR 21278 at commit [`04c3a2d`](https://github.com/apache/spark/commit/04c3a2d864d980e10bc55518d86e6307b637c6c2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21288 **[Test build #90441 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90441/testReport)** for PR 21288 at commit [`8f60902`](https://github.com/apache/spark/commit/8f609023174c9f97bddc46bebe98f4ce3caf08c5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21276: [SPARK-24216][SQL] Spark TypedAggregateExpression uses g...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21276 @fangshil Can you update? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21231 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90433/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21231 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21231 **[Test build #90433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90433/testReport)** for PR 21231 at commit [`590ba26`](https://github.com/apache/spark/commit/590ba26c54b22de670cc699dcd0e1e48aaf71ab2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21282: [SPARK-23934][SQL] Adding map_from_entries function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21282 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21282: [SPARK-23934][SQL] Adding map_from_entries function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21282 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90434/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21282: [SPARK-23934][SQL] Adding map_from_entries function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21282 **[Test build #90434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90434/testReport)** for PR 21282 at commit [`8c6039c`](https://github.com/apache/spark/commit/8c6039c7b7f31f0343c4b0098a4e12dfff125128). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class MapFromEntries(child: Expression) extends UnaryExpression` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3095/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21288 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21145: [SPARK-24073][SQL]: Rename DataReaderFactory to I...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21145 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21288 **[Test build #90440 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90440/testReport)** for PR 21288 at commit [`223bf20`](https://github.com/apache/spark/commit/223bf2008abfe5fd41c3b5e741dc525ab3864977). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21145: [SPARK-24073][SQL]: Rename DataReaderFactory to InputPar...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21145 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/21288 [SPARK-24206][SQL] Improve FilterPushdownBenchmark benchmark code ## What changes were proposed in this pull request? This pr added benchmark code `FilterPushdownBenchmark` for string pushdown and updated performance results. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark UpdateParquetBenchmark Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21288.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21288 commit 223bf2008abfe5fd41c3b5e741dc525ab3864977 Author: Takeshi YamamuroDate: 2018-05-03T00:17:21Z Fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21278: [SPARKR] Require Java 8 for SparkR
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21278#discussion_r187238820 --- Diff: R/pkg/DESCRIPTION --- @@ -13,6 +13,7 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"), License: Apache License (== 2.0) URL: http://www.apache.org/ http://spark.apache.org/ BugReports: http://spark.apache.org/contributing.html +SystemRequirements: Java (== 8) Depends: R (>= 3.0), --- End diff -- btw, I saw this the other day, and thought we should update this to `>= 3.11` to reflect what we test with? what do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21266 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21266 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90438/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21255: [SPARK-24186][R][SQL]change reverse and concat to...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21255#discussion_r187238518 --- Diff: R/pkg/R/functions.R --- @@ -219,7 +219,8 @@ NULL #' head(select(tmp3, map_values(tmp3$v3))) #' head(select(tmp3, element_at(tmp3$v3, "Valiant"))) #' tmp4 <- mutate(df, v4 = create_array(df$mpg, df$cyl), v5 = create_array(df$hp)) -#' head(select(tmp4, concat(tmp4$v4, tmp4$v5)))} +#' head(select(tmp4, concat(tmp4$v4, tmp4$v5))) +#' concat(df$mpg, df$cyl, df$hp)} --- End diff -- I'd perhaps do this as ``` tmp5 <- mutate(df, s1 = concat(df$mpg, df$cyl, df$hp) head(tmp5) ``` or ``` head(mutate(df, s1 = concat(df$mpg, df$cyl, df$hp)) ``` btw, aren't these numeric columns? does that work with concat? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21266 **[Test build #90438 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90438/testReport)** for PR 21266 at commit [`1d93d99`](https://github.com/apache/spark/commit/1d93d99e4f01bc7b65152c630d7bf144366f6cda). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21278 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21278 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3094/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21278 **[Test build #90439 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90439/testReport)** for PR 21278 at commit [`04c3a2d`](https://github.com/apache/spark/commit/04c3a2d864d980e10bc55518d86e6307b637c6c2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21278: [SPARKR] Require Java 8 for SparkR
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21278#discussion_r187237615 --- Diff: R/pkg/R/client.R --- @@ -60,13 +60,48 @@ generateSparkSubmitArgs <- function(args, sparkHome, jars, sparkSubmitOpts, pack combinedArgs } +checkJavaVersion <- function() { + javaBin <- "java" + javaHome <- Sys.getenv("JAVA_HOME") + javaReqs <- packageDescription("SparkR", fields=c("SystemRequirements")) + sparkJavaVersion <- as.numeric(tail(strsplit(javaReqs, "[(=)]")[[1]], n = 1L)) + if (javaHome != "") { +javaBin <- file.path(javaHome, javaBin) + } + + # If java is missing from PATH, we get an error in Unix and a warning in Windows + javaVersionOut <- tryCatch( + launchScript(javaBin, "-version", wait = TRUE, stdout = TRUE, stderr = TRUE), + error = function(e) { + stop("Java version check failed. Please make sure Java is installed", + " and set JAVA_HOME to point to the installation directory.") + }, + warning = function(w) { + stop("Java version check failed. Please make sure Java is installed", + " and set JAVA_HOME to point to the installation directory.") + }) + javaVersionFilter <- Filter( + function(x) { +grepl("java version", x) + }, javaVersionOut) + + javaVersionStr <- strsplit(javaVersionFilter[[1]], "[\"]")[[1L]][2] + # javaVersionStr is of the form 1.8.0_92. + # Extract 8 from it to compare to sparkJavaVersion + javaVersionNum <- as.numeric(paste0(strsplit(javaVersionStr, "[.]")[[1L]][2], collapse = ".")) --- End diff -- isn't `as.numeric(strsplit(javaVersionStr, "[.]")[[1L]][2])` sufficient? or `as.integer(strsplit(javaVersionStr, "[.]")[[1L]][2])` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21278: [SPARKR] Require Java 8 for SparkR
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21278#discussion_r187237024 --- Diff: R/pkg/R/client.R --- @@ -60,13 +60,48 @@ generateSparkSubmitArgs <- function(args, sparkHome, jars, sparkSubmitOpts, pack combinedArgs } +checkJavaVersion <- function() { + javaBin <- "java" + javaHome <- Sys.getenv("JAVA_HOME") + javaReqs <- packageDescription("SparkR", fields=c("SystemRequirements")) --- End diff -- nit: use `packageName()`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21266 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3092/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21266 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3093/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21266 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21266 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21238 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21238 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90431/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21278 nice I like it... they also say ``` When specifying a minimum Java version please use the official version names, which are (confusingly) 1.1 1.2 1.3 1.4 5.0 6 7 8 9 10 and supposedly will in 2018 move to a year.month scheme such as â18.9â. ``` so it might still break in the future though.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21278 it fails with ``` Quitting from lines 65-67 (sparkr-vignettes.Rmd) Error: processing vignette 'sparkr-vignettes.Rmd' failed with diagnostics: Java version check failed. Please make sure Java is installed and set JAVA_HOME to point to the installation directory. Execution halted ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21238 **[Test build #90431 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90431/testReport)** for PR 21238 at commit [`aebdb68`](https://github.com/apache/spark/commit/aebdb6885237163b55a90fb739bcbbdcb00d7890). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21266 **[Test build #90438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90438/testReport)** for PR 21266 at commit [`1d93d99`](https://github.com/apache/spark/commit/1d93d99e4f01bc7b65152c630d7bf144366f6cda). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21266 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21266 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3091/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21266 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21266 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90437/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21266 **[Test build #90437 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90437/testReport)** for PR 21266 at commit [`8aedbf0`](https://github.com/apache/spark/commit/8aedbf0a04a92231242ee77222b76201c92fb9f2). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21028#discussion_r187233292 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -529,6 +564,272 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } +/** + * Checks if the two arrays contain at least one common element. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least an element present also in a2. If the arrays have no common element and either of them contains a null element null is returned, false otherwise.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5)); + true + """, since = "2.4.0") +// scalastyle:off line.size.limit +case class ArraysOverlap(left: Expression, right: Expression) + extends BinaryArrayExpressionWithImplicitCast { + + override def checkInputDataTypes(): TypeCheckResult = super.checkInputDataTypes() match { +case TypeCheckResult.TypeCheckSuccess => + if (RowOrdering.isOrderable(elementType)) { +TypeCheckResult.TypeCheckSuccess + } else { +TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} cannot be used in comparison.") + } +case failure => failure + } + + @transient private lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + @transient private lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } + + @transient private lazy val doEvaluation = if (elementTypeSupportEquals) { + fastEval _ +} else { + bruteForceEval _ +} --- End diff -- nit: indent --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21028#discussion_r187236142 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala --- @@ -136,6 +136,59 @@ class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper checkEvaluation(ArrayContains(a3, Literal.create(null, StringType)), null) } + test("ArraysOverlap") { +val a0 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType)) +val a1 = Literal.create(Seq(4, 5, 3), ArrayType(IntegerType)) +val a2 = Literal.create(Seq(null, 5, 6), ArrayType(IntegerType)) +val a3 = Literal.create(Seq(7, 8), ArrayType(IntegerType)) +val a4 = Literal.create(Seq.empty[Int], ArrayType(IntegerType)) + +val a5 = Literal.create(Seq[String](null, ""), ArrayType(StringType)) +val a6 = Literal.create(Seq[String]("", "abc"), ArrayType(StringType)) +val a7 = Literal.create(Seq[String]("def", "ghi"), ArrayType(StringType)) + +checkEvaluation(ArraysOverlap(a0, a1), true) +checkEvaluation(ArraysOverlap(a0, a2), null) +checkEvaluation(ArraysOverlap(a1, a2), true) +checkEvaluation(ArraysOverlap(a1, a3), false) +checkEvaluation(ArraysOverlap(a0, a4), false) +checkEvaluation(ArraysOverlap(a2, a4), null) +checkEvaluation(ArraysOverlap(a4, a2), null) + +checkEvaluation(ArraysOverlap(a5, a6), true) +checkEvaluation(ArraysOverlap(a5, a7), null) +checkEvaluation(ArraysOverlap(a6, a7), false) + +// null handling +checkEvaluation(ArraysOverlap(Literal.create(null, ArrayType(IntegerType)), a0), null) +checkEvaluation(ArraysOverlap(a0, Literal.create(null, ArrayType(IntegerType))), null) +checkEvaluation(ArraysOverlap( + Literal.create(Seq(null), ArrayType(IntegerType)), + Literal.create(Seq(null), ArrayType(IntegerType))), null) --- End diff -- What if `arrays_overlap(array(), array(null))`? Seems like Presto returns `false` for the case. [TestArrayOperators.java#L1041](https://github.com/prestodb/presto/blob/master/presto-main/src/test/java/com/facebook/presto/type/TestArrayOperators.java#L1041) Also can you add the test case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21266 **[Test build #90437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90437/testReport)** for PR 21266 at commit [`8aedbf0`](https://github.com/apache/spark/commit/8aedbf0a04a92231242ee77222b76201c92fb9f2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21028#discussion_r187234226 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -529,6 +564,272 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } +/** + * Checks if the two arrays contain at least one common element. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least an element present also in a2. If the arrays have no common element and either of them contains a null element null is returned, false otherwise.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5)); + true + """, since = "2.4.0") +// scalastyle:off line.size.limit +case class ArraysOverlap(left: Expression, right: Expression) + extends BinaryArrayExpressionWithImplicitCast { + + override def checkInputDataTypes(): TypeCheckResult = super.checkInputDataTypes() match { +case TypeCheckResult.TypeCheckSuccess => + if (RowOrdering.isOrderable(elementType)) { +TypeCheckResult.TypeCheckSuccess + } else { +TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} cannot be used in comparison.") + } +case failure => failure + } + + @transient private lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + @transient private lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } + + @transient private lazy val doEvaluation = if (elementTypeSupportEquals) { + fastEval _ +} else { + bruteForceEval _ +} + + override def dataType: DataType = BooleanType + + override def nullable: Boolean = { +left.nullable || right.nullable || left.dataType.asInstanceOf[ArrayType].containsNull || + right.dataType.asInstanceOf[ArrayType].containsNull + } + + override def nullSafeEval(a1: Any, a2: Any): Any = { +doEvaluation(a1.asInstanceOf[ArrayData], a2.asInstanceOf[ArrayData]) + } + + /** + * A fast implementation which puts all the elements from the smaller array in a set + * and then performs a lookup on it for each element of the bigger one. + * This eval mode works only for data types which implements properly the equals method. + */ + private def fastEval(arr1: ArrayData, arr2: ArrayData): Any = { +var hasNull = false +val (bigger, smaller, biggerDt) = if (arr1.numElements() > arr2.numElements()) { + (arr1, arr2, left.dataType.asInstanceOf[ArrayType]) +} else { + (arr2, arr1, right.dataType.asInstanceOf[ArrayType]) +} +if (smaller.numElements() > 0) { + val smallestSet = new mutable.HashSet[Any] + smaller.foreach(elementType, (_, v) => +if (v == null) { + hasNull = true +} else { + smallestSet += v +}) + bigger.foreach(elementType, (_, v1) => +if (v1 == null) { + hasNull = true +} else if (smallestSet.contains(v1)) { + return true +} + ) +} else if (containsNull(bigger, biggerDt)) { + hasNull = true +} +if (hasNull) { + null +} else { + false +} + } + + /** + * A slower evaluation which performs a nested loop and supports all the data types. + */ + private def bruteForceEval(arr1: ArrayData, arr2: ArrayData): Any = { +var hasNull = false +if (arr1.numElements() > 0) { + arr1.foreach(elementType, (_, v1) => +if (v1 == null) { + hasNull = true +} else { + arr2.foreach(elementType, (_, v2) => +if (v1 == null) { + hasNull = true +} else if (ordering.equiv(v1, v2)) { + return true +} + ) +}) +} else if (containsNull(arr2, right.dataType.asInstanceOf[ArrayType])) { + hasNull = true +} +if (hasNull) { + null +} else { + false +} + } + + def containsNull(arr: ArrayData, dt: ArrayType): Boolean = { +if (dt.containsNull) { + var i = 0 + var hasNull = false + while (i < arr.numElements && !hasNull) { +hasNull = arr.isNullAt(i) +i += 1 + } + hasNull +} else { + false +} + }
[GitHub] spark pull request #21282: [SPARK-23934][SQL] Adding map_from_entries functi...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21282#discussion_r187234431 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -118,6 +120,229 @@ case class MapValues(child: Expression) override def prettyName: String = "map_values" } +/** + * Returns a map created from the given array of entries. + */ +@ExpressionDescription( + usage = "_FUNC_(arrayOfEntries) - Returns a map created from the given array of entries.", + examples = """ +Examples: + > SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b'))); + {1:"a",2:"b"} + """, + since = "2.4.0") +case class MapFromEntries(child: Expression) extends UnaryExpression +{ + private lazy val resolvedDataType: Option[MapType] = child.dataType match { +case ArrayType( + StructType(Array( +StructField(_, keyType, false, _), +StructField(_, valueType, valueNullable, _))), + false) => Some(MapType(keyType, valueType, valueNullable)) +case _ => None + } + + override def dataType: MapType = resolvedDataType.get + + override def checkInputDataTypes(): TypeCheckResult = resolvedDataType match { +case Some(_) => TypeCheckResult.TypeCheckSuccess +case None => TypeCheckResult.TypeCheckFailure(s"'${child.sql}' is of " + + s"${child.dataType.simpleString} type. $prettyName accepts only null-free arrays " + + "of pair structs. Values of the first struct field can't contain nulls and produce " + + "duplicates.") + } + + override protected def nullSafeEval(input: Any): Any = { +val arrayData = input.asInstanceOf[ArrayData] +val length = arrayData.numElements() +val keyArray = new Array[AnyRef](length) +val keySet = new OpenHashSet[AnyRef]() +val valueArray = new Array[AnyRef](length) +var i = 0; +while (i < length) { + val entry = arrayData.getStruct(i, 2) + val key = entry.get(0, dataType.keyType) + if (key == null) { +throw new RuntimeException("The first field from a struct (key) can't be null.") + } + if (keySet.contains(key)) { --- End diff -- Is this check necessary for now? This is because other operations (e.g. `CreateMap`) allows us to create a map with duplicated key. Is it better to be consistent in Spark? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...
Github user shahidki31 commented on a diff in the pull request: https://github.com/apache/spark/pull/21274#discussion_r187234165 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala --- @@ -231,8 +231,12 @@ class PowerIterationClustering private[clustering] ( dataset.schema($(idCol)).dataType match { case _: LongType => uncastPredictions +case _: IntegerType => + uncastPredictions.withColumn($(idCol), col($(idCol)).cast(LongType)) --- End diff -- Shouldn't it be ` case _: IntegerType => + uncastPredictions.withColumn($(idCol), col($(idCol)).cast(IntegerType)) ` Otherwise it is not necessary for casting. right? Because prediction already has id as Long type and dataset has id as IntegerType. So, we need to cast prediction.id to IntegerType. right? Correct me if I am wrong. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21287: [SPARK-1849][Core]Add encoding customization support in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21287 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21287: [SPARK-1849][Core]Add encoding customization support in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21287 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21287: [SPARK-1849][Core]Add encoding customization supp...
GitHub user cqzlxl opened a pull request: https://github.com/apache/spark/pull/21287 [SPARK-1849][Core]Add encoding customization support in SparkContext.textFile ## What changes were proposed in this pull request? As within a non-English locale, we usually need to load non-UTF8 encoded text files. So I added a `charsetName = "UTF-8'"` parameter to the `SparkContext.textFile` method, let the caller to specify the actual file character encoding schema. ## How was this patch tested? I manually tested the changes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cqzlxl/spark encoding Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21287.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21287 commit 7b2eb1834dae28388f8a225537d2322fac3b6656 Author: Liu,XiaolinDate: 2018-05-10T03:21:47Z Add encoding customization support in SparkContext.textFile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21193 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21193 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90430/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21193 **[Test build #90430 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90430/testReport)** for PR 21193 at commit [`72faac3`](https://github.com/apache/spark/commit/72faac3209beb8bc38938f8788de6338e9b2ffae). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19183: [SPARK-21960][Streaming] Spark Streaming Dynamic Allocat...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19183 I don't have personal experience with streaming dynamic allocation, but this patch makes sense to me and I don't see anything obviously wrong. I agree with Holden regarding tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21286: [SPARK-24238][SQL] HadoopFsRelation can't append the sam...
Github user zheh12 commented on the issue: https://github.com/apache/spark/pull/21286 relates to #21257 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21284: [SPARK-23852][SQL] Add test that fails if PARQUET...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21284 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21285: [SPARK-24176][SQL] LOAD DATA can't identify wildcard in ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21285 cc @wzhfy and @sujith71955 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21285: [SPARK-24176][SQL] LOAD DATA can't identify wildcard in ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21285 is it a duplicate of https://github.com/apache/spark/pull/20611? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21279 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3003/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21230 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3090/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21230 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21279 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3003/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21279 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21279 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3089/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21230 **[Test build #90436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90436/testReport)** for PR 21230 at commit [`e224f8a`](https://github.com/apache/spark/commit/e224f8a798ed30319efab386720c997227e1b421). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21230 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21279 **[Test build #90435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90435/testReport)** for PR 21279 at commit [`e9ea7e5`](https://github.com/apache/spark/commit/e9ea7e5dc0cd2c3456112ad46c754571ac6e555b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21286: [SPARK-24238][SQL] HadoopFsRelation can't append the sam...
Github user zheh12 commented on the issue: https://github.com/apache/spark/pull/21286 cc @cloud-fan @jiangxb1987 Is there some drawbacks for this idea? Please give some advice when you have time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21286: [SPARK-24238][SQL] HadoopFsRelation can't append the sam...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21286 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21286: [SPARK-24238][SQL] HadoopFsRelation can't append the sam...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21286 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21279 @foxish would you please help to review this, thanks a lot! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21286: [SPARK-24194] HadoopFsRelation cannot overwrite a...
GitHub user zheh12 opened a pull request: https://github.com/apache/spark/pull/21286 [SPARK-24194] HadoopFsRelation cannot overwrite a path that is also b⦠## What changes were proposed in this pull request? When there are multiple tasks at the same time append a `HadoopFsRelation`, there will be an error, there are the following two errors: 1. A task will succeed, but the data will be wrong and more data than excepted will appear 2. Other tasks will fail with `java.io.FileNotFoundException: Failed to get file status skip_dir/_temporary/0` The main reason for this problem is because multiple job will use the same `_temporary` directory. So the core idea of this `PR` is to create a different temporary directory with jobId for the different Job in the `output` folder , so that conflicts can be avoided. ## How was this patch tested? I manually tested. But I don't know how to write a unit test for this situation. Please help me. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zheh12/spark SPARK-24238 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21286.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21286 commit b676a36af110b0b7d7dfc47ab292d09c441f6a0f Author: yangzDate: 2018-05-10T01:46:49Z [SPARK-24194] HadoopFsRelation cannot overwrite a path that is also being read from --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21279 jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21155: [SPARK-23927][SQL] Add "sequence" expression
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21155 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90428/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21155: [SPARK-23927][SQL] Add "sequence" expression
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21155 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21155: [SPARK-23927][SQL] Add "sequence" expression
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21155 **[Test build #90428 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90428/testReport)** for PR 21155 at commit [`22bde31`](https://github.com/apache/spark/commit/22bde31feab95e548351a6057f5286c6faf75695). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21282: [SPARK-23934][SQL] Adding map_from_entries function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21282 **[Test build #90434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90434/testReport)** for PR 21282 at commit [`8c6039c`](https://github.com/apache/spark/commit/8c6039c7b7f31f0343c4b0098a4e12dfff125128). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21282: [SPARK-23934][SQL] Adding map_from_entries function
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21282 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21282: [SPARK-23934][SQL] Adding map_from_entries function
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21282 add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21238 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3002/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21209: [SPARK-24141][CORE] Fix bug in CoarseGrainedSchedulerBac...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/21209 Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21238 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3088/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21238 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3002/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21238 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21183#discussion_r187217859 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala --- @@ -622,11 +623,11 @@ object LocalLDAModel extends MLReadable[LocalLDAModel] { val vectorConverted = MLUtils.convertVectorColumnsToML(data, "docConcentration") val matrixConverted = MLUtils.convertMatrixColumnsToML(vectorConverted, "topicsMatrix") val Row(vocabSize: Int, topicsMatrix: Matrix, docConcentration: Vector, - topicConcentration: Double, gammaShape: Double) = + topicConcentration: Double, gammaShape: Double, seed: Long) = --- End diff -- This will break backwards compatibility of ML persistence (when users try to load LDAModels saved using past versions of Spark). Could you please test this manually by saving a LocalLDAModel using Spark 2.3 and loading it with a build of your PR? You can fix this by checking for the Spark version (in the `metadata`) and loading the seed for Spark >= 2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21183#discussion_r187216371 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala --- @@ -252,6 +252,15 @@ class LDASuite extends SparkFunSuite with MLlibTestSparkContext with DefaultRead val lda = new LDA() testEstimatorAndModelReadWrite(lda, dataset, LDASuite.allParamSettings, LDASuite.allParamSettings, checkModelData) + +def checkModelDataWithDataset(model: LDAModel, model2: LDAModel, --- End diff -- style: Please fix this to match other multi-line method headers. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21231 **[Test build #90433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90433/testReport)** for PR 21231 at commit [`590ba26`](https://github.com/apache/spark/commit/590ba26c54b22de670cc699dcd0e1e48aaf71ab2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21238 **[Test build #90432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90432/testReport)** for PR 21238 at commit [`fa095cd`](https://github.com/apache/spark/commit/fa095cd9faceb1247f3704a1a4949be834b05746). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21238 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3087/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21238 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org