[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21295
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3539/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21295
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21295
  
**[Test build #91086 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91086/testReport)**
 for PR 21295 at commit 
[`497bdd8`](https://github.com/apache/spark/commit/497bdd8fc581f3c40ae97eb56d0a5f65e7d42405).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isinSet in DataFrame AP...

2018-05-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21416#discussion_r190472138
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -219,7 +219,11 @@ object ReorderAssociativeOperator extends 
Rule[LogicalPlan] {
 object OptimizeIn extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case q: LogicalPlan => q transformExpressionsDown {
-  case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral
+  case In(v, list) if list.isEmpty =>
+// When v is not nullable, the following expression will be 
optimized
+// to FalseLiteral which is tested in OptimizeInSuite.scala
+If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType))
+  case In(v, list) if list.length == 1 => EqualTo(v, list.head)
--- End diff --

Why does it have any implication on typecasting? With this PR, it seems I 
get the correct result.

```scala
== Analyzed Logical Plan ==
(CAST(1.1 AS STRING) IN (CAST(1 AS STRING))): boolean, (CAST(1.1 AS INT) = 
1): boolean
Project [cast(1.1 as string) IN (cast(1 as string)) AS (CAST(1.1 AS STRING) 
IN (CAST(1 AS STRING)))#484, (cast(1.1 as int) = 1) AS (CAST(1.1 AS INT) = 
1)#485]
+- OneRowRelation

== Optimized Logical Plan ==
Project [false AS (CAST(1.1 AS STRING) IN (CAST(1 AS STRING)))#484, true AS 
(CAST(1.1 AS INT) = 1)#485]
+- OneRowRelation

== Physical Plan ==
*(1) Project [false AS (CAST(1.1 AS STRING) IN (CAST(1 AS STRING)))#484, 
true AS (CAST(1.1 AS INT) = 1)#485]
+- Scan OneRowRelation[]
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21418: Branch 2.2

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21418
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21418: Branch 2.2

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21418
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21415
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21295
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21415
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91079/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21310: [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder should...

2018-05-23 Thread fangshil
Github user fangshil commented on the issue:

https://github.com/apache/spark/pull/21310
  
@viirya  thanks for the feedback. We internally customized the AvroEncoder 
based on the open source PR, since it never gets merged into spark-avro. we 
propose this feature since it should apply to every user-defined Encoder, not 
limited to AvroEncoder.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21295
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91080/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3538/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21418: Branch 2.2

2018-05-23 Thread gentlewangyu
GitHub user gentlewangyu opened a pull request:

https://github.com/apache/spark/pull/21418

Branch 2.2

## What changes were proposed in this pull request?

compiling spark with scala-2.10 should use the -p parameter instead of -d

## How was this patch tested?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21418


commit 9949fed1c45865b6e5e8ebe610789c5fb9546052
Author: Corey Woodfield 
Date:   2017-07-19T22:21:38Z

[SPARK-21333][DOCS] Removed invalid joinTypes from javadoc of 
Dataset#joinWith

## What changes were proposed in this pull request?

Two invalid join types were mistakenly listed in the javadoc for joinWith, 
in the Dataset class. I presume these were copied from the javadoc of join, but 
since joinWith returns a Dataset\, left_semi and left_anti are 
invalid, as they only return values from one of the datasets, instead of from 
both

## How was this patch tested?

I ran the following code :
```
public static void main(String[] args) {
SparkSession spark = new SparkSession(new SparkContext("local[*]", 
"Test"));
Dataset one = spark.createDataFrame(Arrays.asList(new Bean(1), new 
Bean(2), new Bean(3), new Bean(4), new Bean(5)), Bean.class);
Dataset two = spark.createDataFrame(Arrays.asList(new Bean(4), new 
Bean(5), new Bean(6), new Bean(7), new Bean(8), new Bean(9)), Bean.class);

try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"inner").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"cross").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"full").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"full_outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"left").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"left_outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"right").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"right_outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"left_semi").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), 
"left_anti").show();} catch (Exception e) {e.printStackTrace();}
}
```
which tests all the different join types, and the last two (left_semi and 
left_anti) threw exceptions. The same code using join instead of joinWith did 
fine. The Bean class was just a java bean with a single int field, x.

Author: Corey Woodfield 

Closes #18462 from coreywoodfield/master.

(cherry picked from commit 8cd9cdf17a7a4ad6f2eecd7c4b388ca363c20982)
Signed-off-by: gatorsmile 

commit 88dccda393bc79dc6032f71b6acf8eb2b4b152be
Author: Dhruve Ashar 
Date:   2017-07-21T19:03:46Z

[SPARK-21243][CORE] Limit no. of map outputs in a shuffle fetch

For configurations with external shuffle enabled, we have observed that if 
a very large no. of blocks are being fetched from a remote host, it puts the NM 
under extra pressure and can crash it. This change introduces a configuration 
`spark.reducer.maxBlocksInFlightPerAddress` , to limit the no. of map outputs 
being fetched from a given remote address. The changes applied here are 
applicable for both the scenarios - when external shuffle is enabled as well as 
disabled.

Ran the job with the default configuration which does not change the 
existing behavior and ran it with few configurations of lower values 
-10,20,50,100. The job ran fine and there is no change in the output. (I will 
update the metrics related to NM in some time.)

Author: Dhruve Ashar 

Closes #18487 from dhruve/impr/SPARK-21243.

Author: Dhruve Ashar 

Closes #18691 from dhruve/branch-2.2.

commit da403b95353f064c24da25236fa7f905fa8ddca1
Author: Holden Karau 
Date:   2017-07-21T23:50:47Z


[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21295
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21415
  
**[Test build #91079 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91079/testReport)**
 for PR 21415 at commit 
[`0aef16b`](https://github.com/apache/spark/commit/0aef16b5e9017fb398e0df2f3694a1db1f4d7cb8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21295
  
**[Test build #91080 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91080/testReport)**
 for PR 21295 at commit 
[`497bdd8`](https://github.com/apache/spark/commit/497bdd8fc581f3c40ae97eb56d0a5f65e7d42405).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91084/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21399
  
**[Test build #91084 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91084/testReport)**
 for PR 21399 at commit 
[`294e189`](https://github.com/apache/spark/commit/294e18925a6d4d0d216a6173fb3d7930da6985fe).
 * This patch **fails Java style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isinSet in DataFrame AP...

2018-05-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21416#discussion_r190470525
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -397,6 +399,68 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("isinSet: Scala Set") {
+val df = Seq((1, "x"), (2, "y"), (3, "z")).toDF("a", "b")
+checkAnswer(df.filter($"a".isinSet(Set(1, 2))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isinSet(Set(3, 2))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isinSet(Set(3, 1))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1))
+
+// Auto casting should work with mixture of different types in Set
+checkAnswer(df.filter($"a".isinSet(Set(1.toShort, "2"))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isinSet(Set("3", 2.toLong))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2))
+checkAnswer(df.filter($"a".isinSet(Set(3, "1"))),
+  df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1))
+
+checkAnswer(df.filter($"b".isinSet(Set("y", "x"))),
+  df.collect().toSeq.filter(r => r.getString(1) == "y" || 
r.getString(1) == "x"))
+checkAnswer(df.filter($"b".isinSet(Set("z", "x"))),
+  df.collect().toSeq.filter(r => r.getString(1) == "z" || 
r.getString(1) == "x"))
+checkAnswer(df.filter($"b".isinSet(Set("z", "y"))),
+  df.collect().toSeq.filter(r => r.getString(1) == "z" || 
r.getString(1) == "y"))
+
+val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b")
+
+intercept[AnalysisException] {
+  df2.filter($"a".isinSet(Set($"b")))
+}
--- End diff --

Addressed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3537/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21417: Branch 2.0

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21417
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21399
  
**[Test build #91085 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91085/testReport)**
 for PR 21399 at commit 
[`6943ff8`](https://github.com/apache/spark/commit/6943ff81e5b63314ffc78591dec289a73fc2dcd5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21417: Branch 2.0

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21417
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21417: Branch 2.0

2018-05-23 Thread gentlewangyu
GitHub user gentlewangyu opened a pull request:

https://github.com/apache/spark/pull/21417

Branch 2.0

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21417


commit 050b8177e27df06d33a6f6f2b3b6a952b0d03ba6
Author: cody koeninger 
Date:   2016-10-12T22:22:06Z

[SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of 
poll twice

## What changes were proposed in this pull request?

Alternative approach to https://github.com/apache/spark/pull/15387

Author: cody koeninger 

Closes #15401 from koeninger/SPARK-17782-alt.

(cherry picked from commit f9a56a153e0579283160519065c7f3620d12da3e)
Signed-off-by: Shixiong Zhu 

commit 5903dabc57c07310573babe94e4f205bdea6455f
Author: Brian Cho 
Date:   2016-10-13T03:43:18Z

[SPARK-16827][BRANCH-2.0] Avoid reporting spill metrics as shuffle metrics

## What changes were proposed in this pull request?

Fix a bug where spill metrics were being reported as shuffle metrics. 
Eventually these spill metrics should be reported (SPARK-3577), but separate 
from shuffle metrics. The fix itself basically reverts the line to what it was 
in 1.6.

## How was this patch tested?

Cherry-picked from master (#15347)

Author: Brian Cho 

Closes #15455 from dafrista/shuffle-metrics-2.0.

commit ab00e410c6b1d7dafdfabcea1f249c78459b94f0
Author: Burak Yavuz 
Date:   2016-10-13T04:40:45Z

[SPARK-17876] Write StructuredStreaming WAL to a stream instead of 
materializing all at once

## What changes were proposed in this pull request?

The CompactibleFileStreamLog materializes the whole metadata log in memory 
as a String. This can cause issues when there are lots of files that are being 
committed, especially during a compaction batch.
You may come across stacktraces that look like:
```
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.lang.StringCoding.encode(StringCoding.java:350)
at java.lang.String.getBytes(String.java:941)
at 
org.apache.spark.sql.execution.streaming.FileStreamSinkLog.serialize(FileStreamSinkLog.scala:127)

```
The safer way is to write to an output stream so that we don't have to 
materialize a huge string.

## How was this patch tested?

Existing unit tests

Author: Burak Yavuz 

Closes #15437 from brkyvz/ser-to-stream.

(cherry picked from commit edeb51a39d76d64196d7635f52be1b42c7ec4341)
Signed-off-by: Shixiong Zhu 

commit d38f38a093b4dff32c686675d93ab03e7a8f4908
Author: buzhihuojie 
Date:   2016-10-13T05:51:54Z

minor doc fix for Row.scala

## What changes were proposed in this pull request?

minor doc fix for "getAnyValAs" in class Row

## How was this patch tested?

None.

(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Author: buzhihuojie 

Closes #15452 from david-weiluo-ren/minorDocFixForRow.

(cherry picked from commit 7222a25a11790fa9d9d1428c84b6f827a785c9e8)
Signed-off-by: Reynold Xin 

commit d7fa3e32421c73adfa522adfeeb970edd4c22eb3
Author: Shixiong Zhu 
Date:   2016-10-13T20:31:50Z

[SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource 
instead of counting on KafkaConsumer

## What changes were proposed in this pull request?

Because `KafkaConsumer.poll(0)` may update the partition offsets, this PR 
just calls `seekToBeginning` to manually set the earliest offsets for the 
KafkaSource initial offsets.

## How was this patch tested?

Existing tests.

Author: Shixiong Zhu 

Closes #15397 from zsxwing/SPARK-17834.

(cherry picked from commit 08eac356095c7faa2b19d52f2fb0cbc47eb7d1d1)
Signed-off-by: Shixiong Zhu 

commit c53b8374911e801ed98c1436c384f0aef076eaab
Author: Davies Liu 

[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21399
  
**[Test build #91084 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91084/testReport)**
 for PR 21399 at commit 
[`294e189`](https://github.com/apache/spark/commit/294e18925a6d4d0d216a6173fb3d7930da6985fe).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21408: [SPARK-24364][SS] Prevent InMemoryFileIndex from ...

2018-05-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21408


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21408: [SPARK-24364][SS] Prevent InMemoryFileIndex from failing...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21408
  
Merged to master and branch-2.3.

Thanks @cloud-fan.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21390
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21390
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3536/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21390
  
**[Test build #91083 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91083/testReport)**
 for PR 21390 at commit 
[`4a4ab59`](https://github.com/apache/spark/commit/4a4ab595a32537bd5ad022ec77f3e598a252a8ed).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-23 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21390
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91074/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91074 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91074/testReport)**
 for PR 21366 at commit 
[`4f58393`](https://github.com/apache/spark/commit/4f583939f9e3d6d1df7a0d44ec0c5acf6ae82ef1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21390
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91078/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21390
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21390
  
**[Test build #91078 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91078/testReport)**
 for PR 21390 at commit 
[`4a4ab59`](https://github.com/apache/spark/commit/4a4ab595a32537bd5ad022ec77f3e598a252a8ed).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21408: [SPARK-24364][SS] Prevent InMemoryFileIndex from failing...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21408
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21408: [SPARK-24364][SS] Prevent InMemoryFileIndex from failing...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21408
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91077/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21408: [SPARK-24364][SS] Prevent InMemoryFileIndex from failing...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21408
  
**[Test build #91077 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91077/testReport)**
 for PR 21408 at commit 
[`a5614f8`](https://github.com/apache/spark/commit/a5614f8fc1346fca321a413d107fddd70d8197c8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91076/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21399
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21399
  
**[Test build #91076 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91076/testReport)**
 for PR 21399 at commit 
[`7bb0eb3`](https://github.com/apache/spark/commit/7bb0eb3be6619ea9d0c7a023da5b665fecbc799e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21411
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21405: [SPARK-24361][SQL] Polish code block manipulation API

2018-05-23 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21405
  
@cloud-fan Thanks. I give a use case of splitting code into method in the 
PR description. I think it can show the basic idea.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21411
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91073/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21411
  
**[Test build #91073 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91073/testReport)**
 for PR 21411 at commit 
[`3a6a87b`](https://github.com/apache/spark/commit/3a6a87ba0e0bcb36a7a023edbd35fe411ed2fd6d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-05-23 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21389#discussion_r190461717
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.types._
+
+
+object DataSourceUtils {
+
+  /**
+   * Verify if the schema is supported in datasource.
+   */
+  def verifySchema(format: String, schema: StructType): Unit = {
+def verifyType(dataType: DataType): Unit = dataType match {
+  case BooleanType | ByteType | ShortType | IntegerType | LongType | 
FloatType | DoubleType |
+   StringType | BinaryType | DateType | TimestampType | _: 
DecimalType =>
+
+  case st: StructType => st.foreach { f => verifyType(f.dataType) }
+
+  case ArrayType(elementType, _) => verifyType(elementType)
+
+  case MapType(keyType, valueType, _) =>
+verifyType(keyType)
+verifyType(valueType)
+
+  case udt: UserDefinedType[_] => verifyType(udt.sqlType)
+
+  // For backward-compatibility
+  case NullType if format == "JSON" =>
+
+  case _ =>
+throw new UnsupportedOperationException(
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-05-23 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21389#discussion_r190461628
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.types._
+
+
+object DataSourceUtils {
+
+  /**
+   * Verify if the schema is supported in datasource.
+   */
+  def verifySchema(format: String, schema: StructType): Unit = {
+def verifyType(dataType: DataType): Unit = dataType match {
+  case BooleanType | ByteType | ShortType | IntegerType | LongType | 
FloatType | DoubleType |
+   StringType | BinaryType | DateType | TimestampType | _: 
DecimalType =>
+
+  case st: StructType => st.foreach { f => verifyType(f.dataType) }
+
+  case ArrayType(elementType, _) => verifyType(elementType)
+
+  case MapType(keyType, valueType, _) =>
+verifyType(keyType)
+verifyType(valueType)
+
+  case udt: UserDefinedType[_] => verifyType(udt.sqlType)
+
+  // For backward-compatibility
--- End diff --

ok, I will.
Also, we need to merge this function with `CSVUtils.verifySchema` in this 
pr?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-05-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21389#discussion_r190461517
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.types._
+
+
+object DataSourceUtils {
+
+  /**
+   * Verify if the schema is supported in datasource.
+   */
+  def verifySchema(format: String, schema: StructType): Unit = {
+def verifyType(dataType: DataType): Unit = dataType match {
+  case BooleanType | ByteType | ShortType | IntegerType | LongType | 
FloatType | DoubleType |
+   StringType | BinaryType | DateType | TimestampType | _: 
DecimalType =>
+
+  case st: StructType => st.foreach { f => verifyType(f.dataType) }
+
+  case ArrayType(elementType, _) => verifyType(elementType)
+
+  case MapType(keyType, valueType, _) =>
+verifyType(keyType)
+verifyType(valueType)
+
+  case udt: UserDefinedType[_] => verifyType(udt.sqlType)
+
+  // For backward-compatibility
+  case NullType if format == "JSON" =>
+
+  case _ =>
+throw new UnsupportedOperationException(
--- End diff --

Basically, for such a PR, we need to check all the data types that we block 
and ensure no behavior change is introduced by this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-05-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21389#discussion_r190461402
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.types._
+
+
+object DataSourceUtils {
+
+  /**
+   * Verify if the schema is supported in datasource.
+   */
+  def verifySchema(format: String, schema: StructType): Unit = {
+def verifyType(dataType: DataType): Unit = dataType match {
+  case BooleanType | ByteType | ShortType | IntegerType | LongType | 
FloatType | DoubleType |
+   StringType | BinaryType | DateType | TimestampType | _: 
DecimalType =>
+
+  case st: StructType => st.foreach { f => verifyType(f.dataType) }
+
+  case ArrayType(elementType, _) => verifyType(elementType)
+
+  case MapType(keyType, valueType, _) =>
+verifyType(keyType)
+verifyType(valueType)
+
+  case udt: UserDefinedType[_] => verifyType(udt.sqlType)
+
+  // For backward-compatibility
--- End diff --

Do we have any test case for this?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20708
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91075/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21366
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...

2018-05-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21319
  
The problem is that plan visitor can only visit plan but not changing it, 
and pushing down operators to data source needs to remove filters from plan...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21406: [Minor][Core] Cleanup unused vals in `DAGSchedule...

2018-05-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21406


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21366
  
**[Test build #91075 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91075/testReport)**
 for PR 21366 at commit 
[`2a2374c`](https://github.com/apache/spark/commit/2a2374c915aafa1b5a53c8e02581cea0c2c176df).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21406: [Minor][Core] Cleanup unused vals in `DAGScheduler.handl...

2018-05-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21406
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21372
  
Thank you, @cloud-fan , @gatorsmile , @HyukjinKwon .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

2018-05-23 Thread cxzl25
Github user cxzl25 commented on the issue:

https://github.com/apache/spark/pull/21311
  
@cloud-fan Thank you very much for your help.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21372


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21372
  
LGTM

Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21372
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17583
  
**[Test build #91082 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91082/testReport)**
 for PR 17583 at commit 
[`47aa749`](https://github.com/apache/spark/commit/47aa7492e0f3edf3549e5e7b1eeb6074fb5d6f8b).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17583
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91082/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17583
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17583
  
**[Test build #91082 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91082/testReport)**
 for PR 17583 at commit 
[`47aa749`](https://github.com/apache/spark/commit/47aa7492e0f3edf3549e5e7b1eeb6074fb5d6f8b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...

2018-05-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21311
  
thanks, merging to master/2.3/2.2/2.1/2.0! There is no conflict so I 
backported all the way to 2.0. I'll watch the jenkins build in the few days.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...

2018-05-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21311


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21069
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3535/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21069
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21069
  
**[Test build #91081 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91081/testReport)**
 for PR 21069 at commit 
[`9281ae2`](https://github.com/apache/spark/commit/9281ae233dc54dd961e99e345be559929232c148).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...

2018-05-23 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21069
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21383
  
cc @icexelloss too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21295
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21295
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3534/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190453705
  
--- Diff: python/pyspark/rdd.py ---
@@ -791,9 +792,11 @@ def foreach(self, f):
 >>> def f(x): print(x)
 >>> sc.parallelize([1, 2, 3, 4, 5]).foreach(f)
 """
+safe_f = fail_on_StopIteration(f)
--- End diff --

Im okay with `safe` as is too if you feel strongly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21295
  
Congratulation, @rdblue ! :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21295
  
**[Test build #91080 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91080/testReport)**
 for PR 21295 at commit 
[`497bdd8`](https://github.com/apache/spark/commit/497bdd8fc581f3c40ae97eb56d0a5f65e7d42405).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21372
  
Finally! Could you review this again, @HyukjinKwon , @gatorsmile , 
@cloud-fan ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21295
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...

2018-05-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21295
  
@rdblue congrats!

All my concerns have been addressed, I think it's ready to merge, also cc 
@michal-databricks 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21383
  
Seems good otherwise to me too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190450900
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -900,6 +900,17 @@ def __call__(self, x):
 self.assertEqual(f, f_.func)
 self.assertEqual(return_type, f_.returnType)
 
+def test_stopiteration_in_udf(self):
+# test for SPARK-23754
+from pyspark.sql.functions import udf
+from py4j.protocol import Py4JJavaError
+
+def foo(x):
+raise StopIteration()
+
+with self.assertRaises(Py4JJavaError) as cm:
--- End diff --

ditto for `cm`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190450846
  
--- Diff: python/pyspark/tests.py ---
@@ -161,6 +161,37 @@ def gen_gs(N, step=1):
 self.assertEqual(k, len(vs))
 self.assertEqual(list(range(k)), list(vs))
 
+def test_stopiteration_is_raised(self):
+
+def stopit(*args, **kwargs):
+raise StopIteration()
+
+def legit_create_combiner(x):
+return [x]
+
+def legit_merge_value(x, y):
+return x.append(y) or x
+
+def legit_merge_combiners(x, y):
+return x.extend(y) or x
+
+data = [(x % 2, x) for x in range(100)]
+
+# wrong create combiner
+m = ExternalMerger(Aggregator(stopit, legit_merge_value, 
legit_merge_combiners), 20)
+with self.assertRaises((Py4JJavaError, RuntimeError)) as cm:
--- End diff --

Let's pick up one explicit exception here too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190450814
  
--- Diff: python/pyspark/tests.py ---
@@ -161,6 +161,37 @@ def gen_gs(N, step=1):
 self.assertEqual(k, len(vs))
 self.assertEqual(list(range(k)), list(vs))
 
+def test_stopiteration_is_raised(self):
+
+def stopit(*args, **kwargs):
+raise StopIteration()
+
+def legit_create_combiner(x):
+return [x]
+
+def legit_merge_value(x, y):
+return x.append(y) or x
+
+def legit_merge_combiners(x, y):
+return x.extend(y) or x
+
+data = [(x % 2, x) for x in range(100)]
+
+# wrong create combiner
+m = ExternalMerger(Aggregator(stopit, legit_merge_value, 
legit_merge_combiners), 20)
+with self.assertRaises((Py4JJavaError, RuntimeError)) as cm:
--- End diff --

`cm` looks unused.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21379: [SPARK-24327][SQL] Add an option to quote a parti...

2018-05-23 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21379#discussion_r190450621
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
 ---
@@ -78,7 +79,12 @@ private[sql] object JDBCRelation extends Logging {
 // Overflow and silliness can happen if you subtract then divide.
 // Here we get a little roundoff, but that's (hopefully) OK.
 val stride: Long = upperBound / numPartitions - lowerBound / 
numPartitions
-val column = partitioning.column
+val column = if (jdbcOptions.quotePartitionColumnName) {
+  val dialect = JdbcDialects.get(jdbcOptions.url)
+  dialect.quoteIdentifier(partitioning.column)
--- End diff --

ok, I will


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190450475
  
--- Diff: python/pyspark/util.py ---
@@ -89,6 +89,19 @@ def majorMinorVersion(sparkVersion):
  " version numbers.")
 
 
+def fail_on_StopIteration(f):
+""" wraps f to make it safe (= does not lead to data loss) to use 
inside a for loop
--- End diff --

How about something like `Wraps the input function to fail on StopIteration 
by RuntimeError to prevent data loss silently.`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21372
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91070/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21372
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4

2018-05-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21372
  
**[Test build #91070 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91070/testReport)**
 for PR 21372 at commit 
[`954d1d9`](https://github.com/apache/spark/commit/954d1d92ade183d8774b75e03cb02e16635cde48).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190449941
  
--- Diff: python/pyspark/util.py ---
@@ -89,6 +89,19 @@ def majorMinorVersion(sparkVersion):
  " version numbers.")
 
 
+def fail_on_StopIteration(f):
+""" wraps f to make it safe (= does not lead to data loss) to use 
inside a for loop
--- End diff --

not a big deal at all but `wraps` -> `Wraps` while we are here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190449695
  
--- Diff: python/pyspark/tests.py ---
@@ -1246,6 +1277,31 @@ def test_pipe_unicode(self):
 result = rdd.pipe('cat').collect()
 self.assertEqual(data, result)
 
+def test_stopiteration_in_client_code(self):
+
+def a_rdd(keyed=False):
+return self.sc.parallelize(
+((x % 2, x) if keyed else x)
--- End diff --

I would just create two RDDs and reuse it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190449424
  
--- Diff: python/pyspark/tests.py ---
@@ -1246,6 +1277,31 @@ def test_pipe_unicode(self):
 result = rdd.pipe('cat').collect()
 self.assertEqual(data, result)
 
+def test_stopiteration_in_client_code(self):
+
+def a_rdd(keyed=False):
+return self.sc.parallelize(
+((x % 2, x) if keyed else x)
+for x in range(10)
+)
+
+def stopit(*x):
+raise StopIteration()
+
+def do_test(action, *args, **kwargs):
+with self.assertRaises((Py4JJavaError, RuntimeError)) as cm:
+action(*args, **kwargs)
+
+do_test(a_rdd().map(stopit).collect)
--- End diff --

Maybe we could do:

```
self.assertRaises(RuntimeError, rdd.map(stopit).collect)
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190449208
  
--- Diff: python/pyspark/tests.py ---
@@ -1246,6 +1277,31 @@ def test_pipe_unicode(self):
 result = rdd.pipe('cat').collect()
 self.assertEqual(data, result)
 
+def test_stopiteration_in_client_code(self):
+
+def a_rdd(keyed=False):
+return self.sc.parallelize(
+((x % 2, x) if keyed else x)
+for x in range(10)
+)
+
+def stopit(*x):
+raise StopIteration()
+
+def do_test(action, *args, **kwargs):
+with self.assertRaises((Py4JJavaError, RuntimeError)) as cm:
--- End diff --

Shall we pick up one explicit exception for each?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190448950
  
--- Diff: python/pyspark/tests.py ---
@@ -1246,6 +1277,31 @@ def test_pipe_unicode(self):
 result = rdd.pipe('cat').collect()
 self.assertEqual(data, result)
 
+def test_stopiteration_in_client_code(self):
+
+def a_rdd(keyed=False):
+return self.sc.parallelize(
+((x % 2, x) if keyed else x)
+for x in range(10)
+)
--- End diff --

Shell we make this inlined?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21383#discussion_r190448642
  
--- Diff: python/pyspark/rdd.py ---
@@ -791,9 +792,11 @@ def foreach(self, f):
 >>> def f(x): print(x)
 >>> sc.parallelize([1, 2, 3, 4, 5]).foreach(f)
 """
+safe_f = fail_on_StopIteration(f)
--- End diff --

`safe` prefix doesn't imply why it's safe though .. I would just name it 
like `fail_on_stopiteration_f` or feel free to another name if you have a good 
one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >