[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21230
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90436/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21230
  
**[Test build #90436 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90436/testReport)**
 for PR 21230 at commit 
[`e224f8a`](https://github.com/apache/spark/commit/e224f8a798ed30319efab386720c997227e1b421).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21186: [SPARK-22279][SQL] Enable `convertMetastoreOrc` b...

2018-05-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21186


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21186: [SPARK-22279][SQL] Enable `convertMetastoreOrc` by defau...

2018-05-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21186
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21238
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90432/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21238
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21238
  
**[Test build #90432 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90432/testReport)**
 for PR 21238 at commit 
[`fa095cd`](https://github.com/apache/spark/commit/fa095cd9faceb1247f3704a1a4949be834b05746).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3096/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21278
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90439/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21278
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21278
  
**[Test build #90439 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90439/testReport)**
 for PR 21278 at commit 
[`04c3a2d`](https://github.com/apache/spark/commit/04c3a2d864d980e10bc55518d86e6307b637c6c2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21288
  
**[Test build #90441 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90441/testReport)**
 for PR 21288 at commit 
[`8f60902`](https://github.com/apache/spark/commit/8f609023174c9f97bddc46bebe98f4ce3caf08c5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21276: [SPARK-24216][SQL] Spark TypedAggregateExpression uses g...

2018-05-09 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21276
  
@fangshil Can you update?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21231
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90433/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21231
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21231
  
**[Test build #90433 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90433/testReport)**
 for PR 21231 at commit 
[`590ba26`](https://github.com/apache/spark/commit/590ba26c54b22de670cc699dcd0e1e48aaf71ab2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21282: [SPARK-23934][SQL] Adding map_from_entries function

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21282
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21282: [SPARK-23934][SQL] Adding map_from_entries function

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21282
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90434/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21282: [SPARK-23934][SQL] Adding map_from_entries function

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21282
  
**[Test build #90434 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90434/testReport)**
 for PR 21282 at commit 
[`8c6039c`](https://github.com/apache/spark/commit/8c6039c7b7f31f0343c4b0098a4e12dfff125128).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class MapFromEntries(child: Expression) extends UnaryExpression`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3095/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21288
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21145: [SPARK-24073][SQL]: Rename DataReaderFactory to I...

2018-05-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21145


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21288
  
**[Test build #90440 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90440/testReport)**
 for PR 21288 at commit 
[`223bf20`](https://github.com/apache/spark/commit/223bf2008abfe5fd41c3b5e741dc525ab3864977).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21145: [SPARK-24073][SQL]: Rename DataReaderFactory to InputPar...

2018-05-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21145
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

2018-05-09 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/21288

[SPARK-24206][SQL] Improve FilterPushdownBenchmark benchmark code

## What changes were proposed in this pull request?
This pr added benchmark code `FilterPushdownBenchmark` for string pushdown 
and updated performance results.

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark UpdateParquetBenchmark

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21288.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21288


commit 223bf2008abfe5fd41c3b5e741dc525ab3864977
Author: Takeshi Yamamuro 
Date:   2018-05-03T00:17:21Z

Fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21278: [SPARKR] Require Java 8 for SparkR

2018-05-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21278#discussion_r187238820
  
--- Diff: R/pkg/DESCRIPTION ---
@@ -13,6 +13,7 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = 
c("aut", "cre"),
 License: Apache License (== 2.0)
 URL: http://www.apache.org/ http://spark.apache.org/
 BugReports: http://spark.apache.org/contributing.html
+SystemRequirements: Java (== 8)
 Depends:
 R (>= 3.0),
--- End diff --

btw, I saw this the other day, and thought we should update this to `>= 
3.11` to reflect what we test with?
what do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90438/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21255: [SPARK-24186][R][SQL]change reverse and concat to...

2018-05-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21255#discussion_r187238518
  
--- Diff: R/pkg/R/functions.R ---
@@ -219,7 +219,8 @@ NULL
 #' head(select(tmp3, map_values(tmp3$v3)))
 #' head(select(tmp3, element_at(tmp3$v3, "Valiant")))
 #' tmp4 <- mutate(df, v4 = create_array(df$mpg, df$cyl), v5 = 
create_array(df$hp))
-#' head(select(tmp4, concat(tmp4$v4, tmp4$v5)))}
+#' head(select(tmp4, concat(tmp4$v4, tmp4$v5)))
+#' concat(df$mpg, df$cyl, df$hp)}
--- End diff --

I'd perhaps do this as
```
tmp5 <- mutate(df, s1 = concat(df$mpg, df$cyl, df$hp)
head(tmp5)
```

or

```
head(mutate(df, s1 = concat(df$mpg, df$cyl, df$hp))
```

btw, aren't these numeric columns? does that work with concat?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21266
  
**[Test build #90438 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90438/testReport)**
 for PR 21266 at commit 
[`1d93d99`](https://github.com/apache/spark/commit/1d93d99e4f01bc7b65152c630d7bf144366f6cda).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21278
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21278
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3094/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21278
  
**[Test build #90439 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90439/testReport)**
 for PR 21278 at commit 
[`04c3a2d`](https://github.com/apache/spark/commit/04c3a2d864d980e10bc55518d86e6307b637c6c2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21278: [SPARKR] Require Java 8 for SparkR

2018-05-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21278#discussion_r187237615
  
--- Diff: R/pkg/R/client.R ---
@@ -60,13 +60,48 @@ generateSparkSubmitArgs <- function(args, sparkHome, 
jars, sparkSubmitOpts, pack
   combinedArgs
 }
 
+checkJavaVersion <- function() {
+  javaBin <- "java"
+  javaHome <- Sys.getenv("JAVA_HOME")
+  javaReqs <- packageDescription("SparkR", fields=c("SystemRequirements"))
+  sparkJavaVersion <- as.numeric(tail(strsplit(javaReqs, "[(=)]")[[1]], n 
= 1L))
+  if (javaHome != "") {
+javaBin <- file.path(javaHome, javaBin)
+  }
+
+  # If java is missing from PATH, we get an error in Unix and a warning in 
Windows
+  javaVersionOut <- tryCatch(
+  launchScript(javaBin, "-version", wait = TRUE, stdout = TRUE, stderr 
= TRUE),
+   error = function(e) {
+ stop("Java version check failed. Please make sure 
Java is installed",
+  " and set JAVA_HOME to point to the installation 
directory.")
+   },
+   warning = function(w) {
+ stop("Java version check failed. Please make sure 
Java is installed",
+  " and set JAVA_HOME to point to the installation 
directory.")
+   })
+  javaVersionFilter <- Filter(
+  function(x) {
+grepl("java version", x)
+  }, javaVersionOut)
+
+  javaVersionStr <- strsplit(javaVersionFilter[[1]], "[\"]")[[1L]][2]
+  # javaVersionStr is of the form 1.8.0_92.
+  # Extract 8 from it to compare to sparkJavaVersion
+  javaVersionNum <- as.numeric(paste0(strsplit(javaVersionStr, 
"[.]")[[1L]][2], collapse = "."))
--- End diff --

isn't `as.numeric(strsplit(javaVersionStr, "[.]")[[1L]][2])` sufficient?
or `as.integer(strsplit(javaVersionStr, "[.]")[[1L]][2])`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21278: [SPARKR] Require Java 8 for SparkR

2018-05-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21278#discussion_r187237024
  
--- Diff: R/pkg/R/client.R ---
@@ -60,13 +60,48 @@ generateSparkSubmitArgs <- function(args, sparkHome, 
jars, sparkSubmitOpts, pack
   combinedArgs
 }
 
+checkJavaVersion <- function() {
+  javaBin <- "java"
+  javaHome <- Sys.getenv("JAVA_HOME")
+  javaReqs <- packageDescription("SparkR", fields=c("SystemRequirements"))
--- End diff --

nit: use `packageName()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3092/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3093/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21238
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21238
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90431/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR

2018-05-09 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21278
  
nice I like it... they also say
```
When specifying a minimum Java version please use the official version 
names, which are (confusingly)

1.1 1.2 1.3 1.4 5.0 6 7 8 9 10
and supposedly will in 2018 move to a year.month scheme such as ‘18.9’.
```

so it might still break in the future though..



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR

2018-05-09 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21278
  
it fails with
```
Quitting from lines 65-67 (sparkr-vignettes.Rmd) 
Error: processing vignette 'sparkr-vignettes.Rmd' failed with diagnostics:
Java version check failed. Please make sure Java is installed and set 
JAVA_HOME to point to the installation directory.
Execution halted
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21238
  
**[Test build #90431 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90431/testReport)**
 for PR 21238 at commit 
[`aebdb68`](https://github.com/apache/spark/commit/aebdb6885237163b55a90fb739bcbbdcb00d7890).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21266
  
**[Test build #90438 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90438/testReport)**
 for PR 21266 at commit 
[`1d93d99`](https://github.com/apache/spark/commit/1d93d99e4f01bc7b65152c630d7bf144366f6cda).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3091/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21266
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90437/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21266
  
**[Test build #90437 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90437/testReport)**
 for PR 21266 at commit 
[`8aedbf0`](https://github.com/apache/spark/commit/8aedbf0a04a92231242ee77222b76201c92fb9f2).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r187233292
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -529,6 +564,272 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least an 
element present also in a2. If the arrays have no common element and either of 
them contains a null element null is returned, false otherwise.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+// scalastyle:off line.size.limit
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryArrayExpressionWithImplicitCast {
+
+  override def checkInputDataTypes(): TypeCheckResult = 
super.checkInputDataTypes() match {
+case TypeCheckResult.TypeCheckSuccess =>
+  if (RowOrdering.isOrderable(elementType)) {
+TypeCheckResult.TypeCheckSuccess
+  } else {
+TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} 
cannot be used in comparison.")
+  }
+case failure => failure
+  }
+
+  @transient private lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient private lazy val elementTypeSupportEquals = elementType match 
{
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+
+  @transient private lazy val doEvaluation = if (elementTypeSupportEquals) 
{
+  fastEval _
+} else {
+  bruteForceEval _
+}
--- End diff --

nit: indent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r187236142
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala
 ---
@@ -136,6 +136,59 @@ class CollectionExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(ArrayContains(a3, Literal.create(null, StringType)), 
null)
   }
 
+  test("ArraysOverlap") {
+val a0 = Literal.create(Seq(1, 2, 3), ArrayType(IntegerType))
+val a1 = Literal.create(Seq(4, 5, 3), ArrayType(IntegerType))
+val a2 = Literal.create(Seq(null, 5, 6), ArrayType(IntegerType))
+val a3 = Literal.create(Seq(7, 8), ArrayType(IntegerType))
+val a4 = Literal.create(Seq.empty[Int], ArrayType(IntegerType))
+
+val a5 = Literal.create(Seq[String](null, ""), ArrayType(StringType))
+val a6 = Literal.create(Seq[String]("", "abc"), ArrayType(StringType))
+val a7 = Literal.create(Seq[String]("def", "ghi"), 
ArrayType(StringType))
+
+checkEvaluation(ArraysOverlap(a0, a1), true)
+checkEvaluation(ArraysOverlap(a0, a2), null)
+checkEvaluation(ArraysOverlap(a1, a2), true)
+checkEvaluation(ArraysOverlap(a1, a3), false)
+checkEvaluation(ArraysOverlap(a0, a4), false)
+checkEvaluation(ArraysOverlap(a2, a4), null)
+checkEvaluation(ArraysOverlap(a4, a2), null)
+
+checkEvaluation(ArraysOverlap(a5, a6), true)
+checkEvaluation(ArraysOverlap(a5, a7), null)
+checkEvaluation(ArraysOverlap(a6, a7), false)
+
+// null handling
+checkEvaluation(ArraysOverlap(Literal.create(null, 
ArrayType(IntegerType)), a0), null)
+checkEvaluation(ArraysOverlap(a0, Literal.create(null, 
ArrayType(IntegerType))), null)
+checkEvaluation(ArraysOverlap(
+  Literal.create(Seq(null), ArrayType(IntegerType)),
+  Literal.create(Seq(null), ArrayType(IntegerType))), null)
--- End diff --

What if `arrays_overlap(array(), array(null))`?
Seems like Presto returns `false` for the case. 
[TestArrayOperators.java#L1041](https://github.com/prestodb/presto/blob/master/presto-main/src/test/java/com/facebook/presto/type/TestArrayOperators.java#L1041)
Also can you add the test case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21266: [SPARK-24206][SQL] Improve DataSource read benchmark cod...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21266
  
**[Test build #90437 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90437/testReport)**
 for PR 21266 at commit 
[`8aedbf0`](https://github.com/apache/spark/commit/8aedbf0a04a92231242ee77222b76201c92fb9f2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-05-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r187234226
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -529,6 +564,272 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least an 
element present also in a2. If the arrays have no common element and either of 
them contains a null element null is returned, false otherwise.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+// scalastyle:off line.size.limit
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryArrayExpressionWithImplicitCast {
+
+  override def checkInputDataTypes(): TypeCheckResult = 
super.checkInputDataTypes() match {
+case TypeCheckResult.TypeCheckSuccess =>
+  if (RowOrdering.isOrderable(elementType)) {
+TypeCheckResult.TypeCheckSuccess
+  } else {
+TypeCheckResult.TypeCheckFailure(s"${elementType.simpleString} 
cannot be used in comparison.")
+  }
+case failure => failure
+  }
+
+  @transient private lazy val ordering: Ordering[Any] =
+TypeUtils.getInterpretedOrdering(elementType)
+
+  @transient private lazy val elementTypeSupportEquals = elementType match 
{
+case BinaryType => false
+case _: AtomicType => true
+case _ => false
+  }
+
+  @transient private lazy val doEvaluation = if (elementTypeSupportEquals) 
{
+  fastEval _
+} else {
+  bruteForceEval _
+}
+
+  override def dataType: DataType = BooleanType
+
+  override def nullable: Boolean = {
+left.nullable || right.nullable || 
left.dataType.asInstanceOf[ArrayType].containsNull ||
+  right.dataType.asInstanceOf[ArrayType].containsNull
+  }
+
+  override def nullSafeEval(a1: Any, a2: Any): Any = {
+doEvaluation(a1.asInstanceOf[ArrayData], a2.asInstanceOf[ArrayData])
+  }
+
+  /**
+   * A fast implementation which puts all the elements from the smaller 
array in a set
+   * and then performs a lookup on it for each element of the bigger one.
+   * This eval mode works only for data types which implements properly 
the equals method.
+   */
+  private def fastEval(arr1: ArrayData, arr2: ArrayData): Any = {
+var hasNull = false
+val (bigger, smaller, biggerDt) = if (arr1.numElements() > 
arr2.numElements()) {
+  (arr1, arr2, left.dataType.asInstanceOf[ArrayType])
+} else {
+  (arr2, arr1, right.dataType.asInstanceOf[ArrayType])
+}
+if (smaller.numElements() > 0) {
+  val smallestSet = new mutable.HashSet[Any]
+  smaller.foreach(elementType, (_, v) =>
+if (v == null) {
+  hasNull = true
+} else {
+  smallestSet += v
+})
+  bigger.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else if (smallestSet.contains(v1)) {
+  return true
+}
+  )
+} else if (containsNull(bigger, biggerDt)) {
+  hasNull = true
+}
+if (hasNull) {
+  null
+} else {
+  false
+}
+  }
+
+  /**
+   * A slower evaluation which performs a nested loop and supports all the 
data types.
+   */
+  private def bruteForceEval(arr1: ArrayData, arr2: ArrayData): Any = {
+var hasNull = false
+if (arr1.numElements() > 0) {
+  arr1.foreach(elementType, (_, v1) =>
+if (v1 == null) {
+  hasNull = true
+} else {
+  arr2.foreach(elementType, (_, v2) =>
+if (v1 == null) {
+  hasNull = true
+} else if (ordering.equiv(v1, v2)) {
+  return true
+}
+  )
+})
+} else if (containsNull(arr2, right.dataType.asInstanceOf[ArrayType])) 
{
+  hasNull = true
+}
+if (hasNull) {
+  null
+} else {
+  false
+}
+  }
+
+  def containsNull(arr: ArrayData, dt: ArrayType): Boolean = {
+if (dt.containsNull) {
+  var i = 0
+  var hasNull = false
+  while (i < arr.numElements && !hasNull) {
+hasNull = arr.isNullAt(i)
+i += 1
+  }
+  hasNull
+} else {
+  false
+}
+  }
 

[GitHub] spark pull request #21282: [SPARK-23934][SQL] Adding map_from_entries functi...

2018-05-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21282#discussion_r187234431
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -118,6 +120,229 @@ case class MapValues(child: Expression)
   override def prettyName: String = "map_values"
 }
 
+/**
+ * Returns a map created from the given array of entries.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(arrayOfEntries) - Returns a map created from the given 
array of entries.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));
+   {1:"a",2:"b"}
+  """,
+  since = "2.4.0")
+case class MapFromEntries(child: Expression) extends UnaryExpression
+{
+  private lazy val resolvedDataType: Option[MapType] = child.dataType 
match {
+case ArrayType(
+  StructType(Array(
+StructField(_, keyType, false, _),
+StructField(_, valueType, valueNullable, _))),
+  false) => Some(MapType(keyType, valueType, valueNullable))
+case _ => None
+  }
+
+  override def dataType: MapType = resolvedDataType.get
+
+  override def checkInputDataTypes(): TypeCheckResult = resolvedDataType 
match {
+case Some(_) => TypeCheckResult.TypeCheckSuccess
+case None => TypeCheckResult.TypeCheckFailure(s"'${child.sql}' is of " 
+
+  s"${child.dataType.simpleString} type. $prettyName accepts only 
null-free arrays " +
+  "of pair structs. Values of the first struct field can't contain 
nulls and produce " +
+  "duplicates.")
+  }
+
+  override protected def nullSafeEval(input: Any): Any = {
+val arrayData = input.asInstanceOf[ArrayData]
+val length = arrayData.numElements()
+val keyArray = new Array[AnyRef](length)
+val keySet = new OpenHashSet[AnyRef]()
+val valueArray = new Array[AnyRef](length)
+var i = 0;
+while (i < length) {
+  val entry = arrayData.getStruct(i, 2)
+  val key = entry.get(0, dataType.keyType)
+  if (key == null) {
+throw new RuntimeException("The first field from a struct (key) 
can't be null.")
+  }
+  if (keySet.contains(key)) {
--- End diff --

Is this check necessary for now? This is because other operations (e.g. 
`CreateMap`) allows us to create a map with duplicated key. Is it better to be 
consistent in Spark?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21274: [SPARK-24213][ML] Fix for Int id type for PowerIt...

2018-05-09 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21274#discussion_r187234165
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
 ---
@@ -231,8 +231,12 @@ class PowerIterationClustering private[clustering] (
   dataset.schema($(idCol)).dataType match {
 case _: LongType =>
   uncastPredictions
+case _: IntegerType =>
+  uncastPredictions.withColumn($(idCol), 
col($(idCol)).cast(LongType))
--- End diff --

Shouldn't it be 
` case _: IntegerType =>
+  uncastPredictions.withColumn($(idCol), 
col($(idCol)).cast(IntegerType))
`
Otherwise it is not necessary for casting. right? Because prediction 
already has id as Long type and dataset has id as IntegerType. So, we need to 
cast prediction.id to IntegerType. right?
Correct me if I am wrong.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21287: [SPARK-1849][Core]Add encoding customization support in ...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21287
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21287: [SPARK-1849][Core]Add encoding customization support in ...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21287
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21287: [SPARK-1849][Core]Add encoding customization supp...

2018-05-09 Thread cqzlxl
GitHub user cqzlxl opened a pull request:

https://github.com/apache/spark/pull/21287

[SPARK-1849][Core]Add encoding customization support in 
SparkContext.textFile

## What changes were proposed in this pull request?

As within a non-English locale, we usually need to load non-UTF8 encoded 
text files. So I added a `charsetName = "UTF-8'"` parameter to the 
`SparkContext.textFile` method, let the caller to specify
the actual file character encoding schema.

## How was this patch tested?

I manually tested the changes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cqzlxl/spark encoding

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21287


commit 7b2eb1834dae28388f8a225537d2322fac3b6656
Author: Liu,Xiaolin 
Date:   2018-05-10T03:21:47Z

Add encoding customization support in SparkContext.textFile




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21193
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21193
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90430/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21193: [SPARK-24121][SQL] Add API for handling expression code ...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21193
  
**[Test build #90430 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90430/testReport)**
 for PR 21193 at commit 
[`72faac3`](https://github.com/apache/spark/commit/72faac3209beb8bc38938f8788de6338e9b2ffae).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19183: [SPARK-21960][Streaming] Spark Streaming Dynamic Allocat...

2018-05-09 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/19183
  
I don't have personal experience with streaming dynamic allocation, but 
this patch makes sense to me and I don't see anything obviously wrong.

I agree with Holden regarding tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21286: [SPARK-24238][SQL] HadoopFsRelation can't append the sam...

2018-05-09 Thread zheh12
Github user zheh12 commented on the issue:

https://github.com/apache/spark/pull/21286
  
relates to #21257 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21284: [SPARK-23852][SQL] Add test that fails if PARQUET...

2018-05-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21284


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21285: [SPARK-24176][SQL] LOAD DATA can't identify wildcard in ...

2018-05-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21285
  
cc @wzhfy and @sujith71955


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21285: [SPARK-24176][SQL] LOAD DATA can't identify wildcard in ...

2018-05-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21285
  
is it a duplicate of https://github.com/apache/spark/pull/20611?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21279
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3003/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21230
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3090/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21230
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21279
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3003/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21279
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21279
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3089/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21230
  
**[Test build #90436 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90436/testReport)**
 for PR 21230 at commit 
[`e224f8a`](https://github.com/apache/spark/commit/e224f8a798ed30319efab386720c997227e1b421).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21230: [SPARK-24172][SQL] we should not apply operator pushdown...

2018-05-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21230
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21279
  
**[Test build #90435 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90435/testReport)**
 for PR 21279 at commit 
[`e9ea7e5`](https://github.com/apache/spark/commit/e9ea7e5dc0cd2c3456112ad46c754571ac6e555b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21286: [SPARK-24238][SQL] HadoopFsRelation can't append the sam...

2018-05-09 Thread zheh12
Github user zheh12 commented on the issue:

https://github.com/apache/spark/pull/21286
  
cc @cloud-fan @jiangxb1987
Is there some drawbacks for this idea? Please give some advice when you 
have time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21286: [SPARK-24238][SQL] HadoopFsRelation can't append the sam...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21286
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21286: [SPARK-24238][SQL] HadoopFsRelation can't append the sam...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21286
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...

2018-05-09 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21279
  
@foxish would you please help to review this, thanks a lot!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21286: [SPARK-24194] HadoopFsRelation cannot overwrite a...

2018-05-09 Thread zheh12
GitHub user zheh12 opened a pull request:

https://github.com/apache/spark/pull/21286

[SPARK-24194] HadoopFsRelation cannot overwrite a path that is also b…

## What changes were proposed in this pull request?

When there are multiple tasks at the same time append a `HadoopFsRelation`, 
there will be an error, there are the following two errors: 

1. A task will succeed, but the data will be wrong and more data than 
excepted will appear
2. Other tasks will fail with `java.io.FileNotFoundException: Failed to get 
file status skip_dir/_temporary/0`

The main reason for this problem is because multiple job will use the same 
`_temporary` directory.

So the core idea of this `PR` is to create a different temporary directory 
with jobId for the different Job in the `output` folder , so that conflicts can 
be avoided.

## How was this patch tested?

I manually tested. 
But I don't know how to write a unit test for this situation. Please help 
me.


Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zheh12/spark SPARK-24238

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21286.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21286


commit b676a36af110b0b7d7dfc47ab292d09c441f6a0f
Author: yangz 
Date:   2018-05-10T01:46:49Z

[SPARK-24194] HadoopFsRelation cannot overwrite a path that is also being 
read from




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21279: [SPARK-24219][k8s] Improve the docker building script to...

2018-05-09 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21279
  
jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21155: [SPARK-23927][SQL] Add "sequence" expression

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21155
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90428/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21155: [SPARK-23927][SQL] Add "sequence" expression

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21155
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21155: [SPARK-23927][SQL] Add "sequence" expression

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21155
  
**[Test build #90428 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90428/testReport)**
 for PR 21155 at commit 
[`22bde31`](https://github.com/apache/spark/commit/22bde31feab95e548351a6057f5286c6faf75695).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21282: [SPARK-23934][SQL] Adding map_from_entries function

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21282
  
**[Test build #90434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90434/testReport)**
 for PR 21282 at commit 
[`8c6039c`](https://github.com/apache/spark/commit/8c6039c7b7f31f0343c4b0098a4e12dfff125128).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21282: [SPARK-23934][SQL] Adding map_from_entries function

2018-05-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21282
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21282: [SPARK-23934][SQL] Adding map_from_entries function

2018-05-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21282
  
add to whitelist


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21238
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3002/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21209: [SPARK-24141][CORE] Fix bug in CoarseGrainedSchedulerBac...

2018-05-09 Thread Ngone51
Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/21209
  
Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21238
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3088/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21238
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/3002/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21238
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...

2018-05-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21183#discussion_r187217859
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
@@ -622,11 +623,11 @@ object LocalLDAModel extends 
MLReadable[LocalLDAModel] {
   val vectorConverted = MLUtils.convertVectorColumnsToML(data, 
"docConcentration")
   val matrixConverted = 
MLUtils.convertMatrixColumnsToML(vectorConverted, "topicsMatrix")
   val Row(vocabSize: Int, topicsMatrix: Matrix, docConcentration: 
Vector,
-  topicConcentration: Double, gammaShape: Double) =
+  topicConcentration: Double, gammaShape: Double, seed: Long) =
--- End diff --

This will break backwards compatibility of ML persistence (when users try 
to load LDAModels saved using past versions of Spark).  Could you please test 
this manually by saving a LocalLDAModel using Spark 2.3 and loading it with a 
build of your PR?  You can fix this by checking for the Spark version (in the 
`metadata`) and loading the seed for Spark >= 2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...

2018-05-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/21183#discussion_r187216371
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala ---
@@ -252,6 +252,15 @@ class LDASuite extends SparkFunSuite with 
MLlibTestSparkContext with DefaultRead
 val lda = new LDA()
 testEstimatorAndModelReadWrite(lda, dataset, LDASuite.allParamSettings,
   LDASuite.allParamSettings, checkModelData)
+
+def checkModelDataWithDataset(model: LDAModel, model2: LDAModel,
--- End diff --

style: Please fix this to match other multi-line method headers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21231: [SPARK-24119][SQL]Add interpreted execution to SortPrefi...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21231
  
**[Test build #90433 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90433/testReport)**
 for PR 21231 at commit 
[`590ba26`](https://github.com/apache/spark/commit/590ba26c54b22de670cc699dcd0e1e48aaf71ab2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21238
  
**[Test build #90432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90432/testReport)**
 for PR 21238 at commit 
[`fa095cd`](https://github.com/apache/spark/commit/fa095cd9faceb1247f3704a1a4949be834b05746).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21238
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3087/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21238: [SPARK-24137][K8s] Mount local directories as empty dir ...

2018-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21238
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >