date:20180816

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22125


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22127
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94868/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22127
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22127
  
**[Test build #94868 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94868/testReport)**
 for PR 22127 at commit 
[`8255336`](https://github.com/apache/spark/commit/825533682c98598409e537fa866dcdab915e3948).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22125
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22116: [DOCS]Update configuration.md

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22116
  
Merged to master. For the future, a better title and bundling these in one 
PR would be preferable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22090: [DOCS] Fixed NDCG formula issues

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22090#discussion_r210772634
  
--- Diff: docs/mllib-evaluation-metrics.md ---
@@ -461,11 +461,11 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 
0 & \text{otherwise}.\end{
 
   Normalized Discounted Cumulative Gain
   
-$NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, 
k)}\sum_{j=0}^{n-1}
+$NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, 
k)}\sum_{j=1}^{n}
--- End diff --

We do need to fix this, but, this makes the subscripts incorrect for 
R_i(j). I think the expression should change to ln(j+2) in the next line; this 
is what the code does. For consistency I'd do the same below too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22090: [DOCS] Fixed NDCG formula issues

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22090#discussion_r210772686
  
--- Diff: docs/mllib-evaluation-metrics.md ---
@@ -461,11 +461,11 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 
0 & \text{otherwise}.\end{
 
   Normalized Discounted Cumulative Gain
   
-$NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, 
k)}\sum_{j=0}^{n-1}
+$NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, 
k)}\sum_{j=1}^{n}
   \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+1)}} \\
 \text{Where} \\
 \hspace{5 mm} n = 
\text{min}\left(\text{max}\left(|R_i|,|D_i|\right),k\right) \\
-\hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, 
k) - 1} \frac{1}{\text{ln}(j+1)}$
+\hspace{5 mm} IDCG(D, k) = \sum_{j=1}^{\text{min}(\left|D\right|, 
k)} \frac{1}{\text{ln}(j+1)}$
   
   
 https://en.wikipedia.org/wiki/Information_retrieval#Discounted_cumulative_gain";>NDCG
 at k is a
--- End diff --

We can update the link here to 
https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22125
  
**[Test build #4279 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4279/testReport)**
 for PR 22125 at commit 
[`6031f70`](https://github.com/apache/spark/commit/6031f70b8f57f9b64335db33d8e219814a7bba9c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94862/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21584
  
**[Test build #94862 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94862/testReport)**
 for PR 21584 at commit 
[`6584029`](https://github.com/apache/spark/commit/658402919c080ae4d878d355a4b3a14a4d4d0aad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22125
  
OK, we should bundle these, but w/e


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22125
  
**[Test build #4279 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4279/testReport)**
 for PR 22125 at commit 
[`6031f70`](https://github.com/apache/spark/commit/6031f70b8f57f9b64335db33d8e219814a7bba9c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...

2018-08-16 Thread shaneknapp

Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/20725
  
still having problems getting this to pass:
```
[error] (sql-kafka-0-10/test:test) sbt.TestsFailedException: Tests 
unsuccessful
[error] (core/test:test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 14786 s, completed Aug 16, 2018 4:05:09 PM
```

i rebased upstream changes from the main spark repo on my fork and launched 
another build.  in ~5 hours we'll know how it went.  :\

if this fails tonite, i'll figure out a hacky way to test this tomorrow 
morning.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22126
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94863/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22126
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22126
  
**[Test build #94863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94863/testReport)**
 for PR 22126 at commit 
[`c005109`](https://github.com/apache/spark/commit/c005109ac517bc8db687318f5e93a35a1ae785c3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22128: Add test_slice() to streaming BasicOperations

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22128
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22128: Add test_slice() to streaming BasicOperations

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22128
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue

2018-08-16 Thread bomeng

Github user bomeng commented on the issue:

https://github.com/apache/spark/pull/22127
  
Good points. I will leave it open for any suggestions for improving the 
user experience.. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22128: Add test_slice() to streaming BasicOperations

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22128
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22128: Add test_slice() to streaming BasicOperations

2018-08-16 Thread cclauss

GitHub user cclauss opened a pull request:

https://github.com/apache/spark/pull/22128

Add test_slice() to streaming BasicOperations

As suggested in 
https://github.com/apache/spark/pull/20838#pullrequestreview-139118618

## What changes were proposed in this pull request?
Add a test for slice operations on streams.

(Please fill in changes proposed in this fix)

## How was this patch tested?
It is a new test being added to the automated test suite.

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cclauss/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22128.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22128


commit 4094422d58077aa95129a7ec9fddf75c2e3af7a7
Author: cclauss 
Date:   2018-08-16T23:06:59Z

Add test_slice() to streaming BasicOperations

As suggested in 
https://github.com/apache/spark/pull/20838#pullrequestreview-139118618




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-16 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21909#discussion_r210767018
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -2223,21 +2223,31 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
 checkAnswer(jsonDF, Seq(Row("Chris", "Baird")))
   }
 
-
   test("SPARK-23723: specified encoding is not matched to actual 
encoding") {
-val fileName = "test-data/utf16LE.json"
-val schema = new StructType().add("firstName", 
StringType).add("lastName", StringType)
-val exception = intercept[SparkException] {
-  spark.read.schema(schema)
-.option("mode", "FAILFAST")
-.option("multiline", "true")
-.options(Map("encoding" -> "UTF-16BE"))
-.json(testFile(fileName))
-.count()
+def doCount(bypassParser: Boolean, multiLine: Boolean): Long = {
+  var result: Long = -1
+  withSQLConf(SQLConf.BYPASS_PARSER_FOR_EMPTY_SCHEMA.key -> 
bypassParser.toString) {
+val fileName = "test-data/utf16LE.json"
+val schema = new StructType().add("firstName", 
StringType).add("lastName", StringType)
+result = spark.read.schema(schema)
+  .option("mode", "FAILFAST")
--- End diff --

This sounds good! Let us enable it only when PERMISSIVE is on. You know, 
our default mode is PERMISSIVE. This should benefit most users. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22108: [SPARK-25092][SQL][FOLLOWUP] Add RewriteCorrelate...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22108


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-16 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21909#discussion_r210765672
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1894,6 +1894,7 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
   - In version 2.3 and earlier, CSV rows are considered as malformed if at 
least one column value in the row is malformed. CSV parser dropped such rows in 
the DROPMALFORMED mode or outputs an error in the FAILFAST mode. Since Spark 
2.4, CSV row is considered as malformed only when it contains malformed column 
values requested from CSV datasource, other values can be ignored. As an 
example, CSV file contains the "id,name" header and one row "1234". In Spark 
2.4, selection of the id column consists of a row with one column value 1234 
but in Spark 2.3 and earlier it is empty in the DROPMALFORMED mode. To restore 
the previous behavior, set `spark.sql.csv.parser.columnPruning.enabled` to 
`false`.
   - Since Spark 2.4, File listing for compute statistics is done in 
parallel by default. This can be disabled by setting 
`spark.sql.parallelFileListingInStatsComputation.enabled` to `False`.
   - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and 
temporary files are not counted as data files when calculating table size 
during Statistics computation.
+  - Since Spark 2.4, text-based datasources like CSV and JSON don't parse 
input lines if the required schema pushed down to the datasources is empty. The 
schema can be empty in the case of the count() action. For example, Spark 2.3 
and earlier versions failed on JSON files with invalid encoding but Spark 2.4 
returns total number of lines in the file. To restore the previous behavior 
when the underlying parser is always invoked even for the empty schema, set 
`true` to `spark.sql.legacy.bypassParserForEmptySchema`. This option will be 
removed in Spark 3.0.
--- End diff --

Is it right based on what you said 
https://github.com/apache/spark/pull/21909#discussion_r210704902?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22108: [SPARK-25092][SQL][FOLLOWUP] Add RewriteCorrelatedScalar...

2018-08-16 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22108
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue

2018-08-16 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/22127
  
@bomeng There seems to be bit of history to this :-) . Please check 
https://github.com/apache/spark/pull/15011
where we decided against silently switching to "default" database.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22117: [SPARK-23654][BUILD] remove jets3t as a dependenc...

2018-08-16 Thread steveloughran

Github user steveloughran closed the pull request at:

https://github.com/apache/spark/pull/22117


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22081: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...

2018-08-16 Thread steveloughran

Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/22081
  
Thanks. Two less JARs on the CP to keep up to date âwhat more can anyone 
want?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21950
  
**[Test build #94869 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94869/testReport)**
 for PR 21950 at commit 
[`3a65edf`](https://github.com/apache/spark/commit/3a65edf0e07f3beb6d6dd4dcb16e76ea7210c5e9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21135
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/G...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21561


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21561
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-16 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22112
  
Thanks for the clarification, but I guess my point is with your last 
statement:

>  - with assumption that we will expand solution to cover all later.

If we document this and say we support unordered operations with the caveat 
that failures could result in different results, my assumption is we don't 
necessarily have to do anything else ever (this is what I am proposing).  We 
could decide to for instance add an option to sort, or if its not a result 
stage fail more tasks to try handle the situation, but strictly speaking we 
wouldn't have to.

If you think we have to fix those operations that can result in unordered 
then I think it comes back to we just don't support unordered operations at all 
and we should say that and probably force the sort on all these operations and 
possibly on all operations where user could cause it to be different order on 
rerun. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21909
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94860/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21909
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21909
  
**[Test build #94860 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94860/testReport)**
 for PR 21909 at commit 
[`6b34018`](https://github.com/apache/spark/commit/6b34018fcedffa0033cb281d619af79e15d99585).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22127
  
**[Test build #94868 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94868/testReport)**
 for PR 22127 at commit 
[`8255336`](https://github.com/apache/spark/commit/825533682c98598409e537fa866dcdab915e3948).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22127
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22127
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2258/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22127: [SPARK-25032][SQL] fix drop database issue

2018-08-16 Thread bomeng

GitHub user bomeng opened a pull request:

https://github.com/apache/spark/pull/22127

[SPARK-25032][SQL] fix drop database issue

## What changes were proposed in this pull request?
When user tries to drop the current database (other than default database), 
after the database is deleted, we should set the database to default. 

## How was this patch tested?
A new test case is added to cover this scenario.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bomeng/spark 25032

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22127.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22127


commit 825533682c98598409e537fa866dcdab915e3948
Author: Bo Meng 
Date:   2018-08-16T21:58:17Z

fix drop database issue




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22114: [SPARK-24938][Core] Prevent Netty from using onheap memo...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22114
  
**[Test build #94867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94867/testReport)**
 for PR 22114 at commit 
[`c2f9ed1`](https://github.com/apache/spark/commit/c2f9ed10776842ffe0746fcc89b157675fa6c455).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-16 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/22112
  
@tgravescs I was specifically in agreement with
> Personally I don't want to talk about implementation until we decide what 
we want our semantics to be around the unordered operations because that 
affects any implementation.

and

> I would propose we fix the things that are using the round robin type 
partitioning (repartition) but then unordered things like zip/MapPartitions 
(via user code) we document or perhaps give the user the option to sort.

IMO a fix in spark core for repartition should work for most (if not all) 
order dependent closures - we might choose not to implement for others due to 
time constraints; but basic idea should be fairly similar.
Given this, I am fine with documenting the potential issue for others and 
fix for a core subset - with assumption that we will expand solution to cover 
all later.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22114: [SPARK-24938][Core] Prevent Netty from using onheap memo...

2018-08-16 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22114
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22114: [SPARK-24938][Core] Prevent Netty from using onheap memo...

2018-08-16 Thread NiharS

Github user NiharS commented on the issue:

https://github.com/apache/spark/pull/22114
  
Tried with a significantly larger input, both with and without the change. 
They ran in just about the same time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21990
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21990
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94866/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21990
  
**[Test build #94866 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94866/testReport)**
 for PR 21990 at commit 
[`7ba70b5`](https://github.com/apache/spark/commit/7ba70b524f9779529142f6c70b04610b5b068a05).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21990
  
**[Test build #94866 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94866/testReport)**
 for PR 21990 at commit 
[`7ba70b5`](https://github.com/apache/spark/commit/7ba70b524f9779529142f6c70b04610b5b068a05).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21950
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94853/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21950
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21950
  
**[Test build #94853 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94853/testReport)**
 for PR 21950 at commit 
[`aa2a957`](https://github.com/apache/spark/commit/aa2a957751a906fe538822cace019014e763a8c3).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22124: [SPARK-25135][SQL] Insert datasource table may all null ...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22124
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22124: [SPARK-25135][SQL] Insert datasource table may all null ...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22124
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94861/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22124: [SPARK-25135][SQL] Insert datasource table may all null ...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22124
  
**[Test build #94861 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94861/testReport)**
 for PR 22124 at commit 
[`276879c`](https://github.com/apache/spark/commit/276879ca2bd8d2966b829b7e41e140362c4e4160).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #94865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94865/testReport)**
 for PR 21221 at commit 
[`2897281`](https://github.com/apache/spark/commit/2897281a384d25556609a17be21f926cb5d68dd6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...

2018-08-16 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/22107
  
@felixcheung I have incorporated the comments. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities...

2018-08-16 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/22126#discussion_r210724650
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala
 ---
@@ -363,9 +363,9 @@ class HigherOrderFunctionsSuite extends SparkFunSuite 
with ExpressionEvalHelper
 left: Expression,
 right: Expression,
 f: (Expression, Expression, Expression) => Expression): Expression 
= {
-  val MapType(kt, vt1, vcn1) = left.dataType.asInstanceOf[MapType]
-  val MapType(_, vt2, vcn2) = right.dataType.asInstanceOf[MapType]
-  MapZipWith(left, right, createLambda(kt, false, vt1, vcn1, vt2, 
vcn2, f))
+  val MapType(kt, vt1, _) = left.dataType.asInstanceOf[MapType]
--- End diff --

Optional suggestion: Maybe we could remove```asInstanceOf[MapType]```  here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22045
  
**[Test build #94864 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94864/testReport)**
 for PR 22045 at commit 
[`3382e1a`](https://github.com/apache/spark/commit/3382e1a5396c8e5a94802d92a7106eacf627617c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-16 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/22112
  
@mridulm  so just to clarify are you agreeing that we need to decide on 
what we do with zip and others or are you agreeing that we should document 
these as unordered actions thus retries might be different and only fix 
repartition?

We can certainly add other options later but I don't want to change what we 
say the core zip behavior is.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-16 Thread ajacques

Github user ajacques closed the pull request at:

https://github.com/apache/spark/pull/21889


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21584
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2256/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22126
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2256/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22126
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2257/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...

2018-08-16 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22126
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22081: [SPARK-23654][BUILD] remove jets3t as a dependenc...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22081


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-16 Thread ajacques

Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
Thanks for the response all. @mailman If it's really your preference, I 
will create a PR against that branch and close this one. My intention was never 
to take away from your efforts, and I still consider my work here to be just 
minor stylistic tweaks on top of your work. I did this as service to help 
bridge the divide and hopefully alleviate frustrations. But this has been a bit 
frustrating being stuck between two sides of this and changing merge strategies 
often and don't wish to continue being in between like this. 

As such, I will create a PR, but hope it does not dragged out to settle any 
differences in opinions between maintainers and submitters. My goal is to make 
sure this valuable feature gets merged so many can benefit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-08-16 Thread mccheah

Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/21584
  
Let's merge at the end of the day pacific time (~5PM-ish) on Friday, August 
17, pending any additional feedback on the mailing list thread discussing the 
subject of including this in 2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22081: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22081
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21584
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2256/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22119: [WIP][SPARK-25129][SQL] Revert mapping com.databricks.sp...

2018-08-16 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22119
  
+1 for @tgravescs 's comments. In terms of usability, the mapping and 
configuration will be easier for the most customers.

For the following @gengliangwang 's comment, technically there is no 
available published Databricks avro artifacts for Spark 2.4 (master branch) as 
of today. I assume that @gengliangwang will release it on the same day along 
with Apache Spark 2.4, but it would be great if we don't have that kind of 
undesirable assumptions which is beyond the Apache community.
> For hive tables that used Databricks spark-avro, the tables can still use 
the Databricks repo(since the built-in spark-avro is not loaded by default)

Additionally, 3rd party `spark-avro` will go to maintenance mode. Spark 3.0 
may want to read the old `spark-avro` generated tables.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-16 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/22112
  
I agree @tgravescs, I was looking at the implementation to understand what 
the expectations are wrt newly introduced methods/fields and whether they make 
sense : I did not see any details furnished.
I donât think we can hack our way out of this.

I would expect a solution for repartition to also be applicable to other 
order dependent closures as well - though we might choose to fix them later, 
the basic approach ideally should be transferable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22123: [SPARK-25134][SQL] Csv column pruning with checking of h...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22123
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94854/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22123: [SPARK-25134][SQL] Csv column pruning with checking of h...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22123
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22123: [SPARK-25134][SQL] Csv column pruning with checking of h...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22123
  
**[Test build #94854 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94854/testReport)**
 for PR 22123 at commit 
[`c4179a9`](https://github.com/apache/spark/commit/c4179a9f0a85b412178323e6cb881385fa644051).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22126
  
**[Test build #94863 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94863/testReport)**
 for PR 22126 at commit 
[`c005109`](https://github.com/apache/spark/commit/c005109ac517bc8db687318f5e93a35a1ae785c3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...

2018-08-16 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22126
  
cc @mn-mikke 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities...

2018-08-16 Thread ueshin

GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/22126

[SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of value arguments should 
be true.

## What changes were proposed in this pull request?

This is a follow-up pr of #22017 which added `map_zip_with` function.
In the test, when creating a lambda function, we use the 
`valueContainsNull` values for the nullabilities of the value arguments, but we 
should've used `true` as the same as `bind` method because the values might be 
`null` if the keys don't match.

## How was this patch tested?

Added small tests and existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark 
issues/SPARK-23938/fix_tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22126.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22126


commit c005109ac517bc8db687318f5e93a35a1ae785c3
Author: Takuya UESHIN 
Date:   2018-08-16T19:14:34Z

Fix a test.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21584
  
**[Test build #94862 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94862/testReport)**
 for PR 21584 at commit 
[`6584029`](https://github.com/apache/spark/commit/658402919c080ae4d878d355a4b3a14a4d4d0aad).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-08-16 Thread ifilonenko

Github user ifilonenko commented on the issue:

https://github.com/apache/spark/pull/21584
  
This PR has been updated to pass Jenkins by removing the `with RTestsSuite` 
line in `KubernetesSuite`. As such, this feature may be merged and the `with 
RTestsSuite` will be re-included in a separate PR for when the Jenkins is 
updated with the new Ubuntu OS. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-16 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21320
  
> I see no point of leaving this PR open.

I don't agree with you on that point, and I've expressed my view in 
https://github.com/apache/spark/pull/21889#issuecomment-413655304.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-16 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889

Essentially, this PR was created to take the management of #21320 out of my
hands, with a view towards facilitating its incorporation into Spark 2.4. It
was my suggestion, one based in frustration. In hindsight, I no longer believe
this strategy is the bestâor most expedientâapproach towards progress.
Indeed, I believe the direction of this PR has become orthogonal to its
motivating goal, becoming a dispute between myself and @HyukjinKwon rather than
a means to move things along.

I believe I can shepherd #21320 in a way that will promote greater
progress. @ajacques, I mean no disrespect, and I thank you for volunteering
your time, patience and effort for the sake of all that are interested in
seeing this patch become a part of Spark. And I apologize for letting you down,
letting everyone down. In my conduct leading up to the creation of this PR I
did not act with the greatest maturity or patience. And I did not act in the
best interests of the community.

No one has spent more time or more effort, taken more responsibility or
exhibited more patience with this 2+ year patch-set-in-the-making than myself.
I respectfully submit it is mine to present and manage, and no one else's.
Insofar as I have expressed otherwise in the past, I admit my errorâone made
in frustrationâand recant in hindsight.

@ajacques, at this point I respectfully assert that managing the patch set
I submitted in #21320 is not your responsibility, nor is it anyone else's but
mine. I ask you to close this PR so that we can resume the review in #21320. As
I stated there, you are welcome to open a PR on
https://github.com/VideoAmp/spark-public/tree/spark-4502-parquet_column_pruning-foundation
to submit the changes you've made for review.

Thank you.

---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo

2018-08-16 Thread KraFusion

Github user KraFusion commented on the issue:

https://github.com/apache/spark/pull/22125
  
@kiszk PR created yesterday for ```configurations.md```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo

2018-08-16 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22125
  
Thanks, would it possible to address similar issues? For example, in 
`configurations.md`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...

2018-08-16 Thread vackosar

Github user vackosar commented on a diff in the pull request:

https://github.com/apache/spark/pull/21919#discussion_r210708107
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -254,3 +259,10 @@ class SinkProgress protected[sql](
 }
   }
 }
+
+private[sql] object SinkProgress {
+  val DEFAULT_NUM_OUTPUT_ROWS: Long = -1L
--- End diff --

I will implement this for continuous streaming and then only legacy sinks 
would output -1. I didn't wanted to change the API too often.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22125
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22125
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22125
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress

2018-08-16 Thread arunmahadevan

Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/21919
  
LGTM overall except one minor comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...

2018-08-16 Thread arunmahadevan

Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21919#discussion_r210707152
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -254,3 +259,10 @@ class SinkProgress protected[sql](
 }
   }
 }
+
+private[sql] object SinkProgress {
+  val DEFAULT_NUM_OUTPUT_ROWS: Long = -1L
--- End diff --

Does it result in sink progress output with "numOutputRows = -1" ? Maybe 
add numOutputRows to the output only if the value is not default.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22125: [DOCS] Fix cloud-integration.md Typo

2018-08-16 Thread KraFusion

GitHub user KraFusion opened a pull request:

https://github.com/apache/spark/pull/22125

[DOCS] Fix cloud-integration.md Typo

Corrected typo; changed spark-default.conf to spark-defaults.conf


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/KraFusion/spark-1 patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22125.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22125


commit 6031f70b8f57f9b64335db33d8e219814a7bba9c
Author: Joey Krabacher 
Date:   2018-08-16T18:58:54Z

[DOCS] Fix cloud-integration.md Typo

changed spark-default.conf to spark-defaults.conf




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...

2018-08-16 Thread steveloughran

Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/22117
  
Test failure in `
org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a 
sbt.testing.SuiteSelector)`: 

```
Caused by: sbt.ForkMain$ForkError: java.lang.NoClassDefFoundError: 
javax/jdo/JDOException
at 
org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5501)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:184)
at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:73)
... 41 more
Caused by: sbt.ForkMain$ForkError: java.lang.ClassNotFoundException: 
javax.jdo.JDOException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:227)
at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:216)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 44 more
```

somehow datanucleus JARs aren't on the CP for the hive test. I can't see 
how this patch is causing this âcan anyone else? But if not: why is it 
surfacing here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

2018-08-16 Thread MaxGekk

Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21909#discussion_r210704902
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -2223,21 +2223,31 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
 checkAnswer(jsonDF, Seq(Row("Chris", "Baird")))
   }
 
-
   test("SPARK-23723: specified encoding is not matched to actual 
encoding") {
-val fileName = "test-data/utf16LE.json"
-val schema = new StructType().add("firstName", 
StringType).add("lastName", StringType)
-val exception = intercept[SparkException] {
-  spark.read.schema(schema)
-.option("mode", "FAILFAST")
-.option("multiline", "true")
-.options(Map("encoding" -> "UTF-16BE"))
-.json(testFile(fileName))
-.count()
+def doCount(bypassParser: Boolean, multiLine: Boolean): Long = {
+  var result: Long = -1
+  withSQLConf(SQLConf.BYPASS_PARSER_FOR_EMPTY_SCHEMA.key -> 
bypassParser.toString) {
+val fileName = "test-data/utf16LE.json"
+val schema = new StructType().add("firstName", 
StringType).add("lastName", StringType)
+result = spark.read.schema(schema)
+  .option("mode", "FAILFAST")
--- End diff --

> Does the mode matter?

I just want to have an explicit error in the test instead of `0` for 
`count()` ( `DROPMALFORMED`), or full table of nulls or an exception 
(`PERMISSIVE`) since an exception is expected result.

> What happened if users use DROPMALFORMED before this PR?

It depends on `multiLine`. If it is `true`, behaviour before and after PR 
is the same since the optimization doesn't impact on the `multiLine` mode. For 
`multiLine` equals to `false`, after the PR the result is `5` (total number of 
lines), before the PR - `0` in the `DROPMALFORMED` mode.

We can enable this optimization for the `PERMISSIVE` mode only to exclude 
any deviation in outputs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...

2018-08-16 Thread arunmahadevan

Github user arunmahadevan commented on the issue:

https://github.com/apache/spark/pull/21819
  
@HyukjinKwon , can you take it forward? Appreciate your effort and thanks 
in advance. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22124: [SPARK-25135][SQL] Insert datasource table may all null ...

2018-08-16 Thread wangyum

Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22124
  
cc @gengliangwang


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22117
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94851/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...