[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3539/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21295 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21295 **[Test build #91086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91086/testReport)** for PR 21295 at commit [`497bdd8`](https://github.com/apache/spark/commit/497bdd8fc581f3c40ae97eb56d0a5f65e7d42405). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isinSet in DataFrame AP...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21416#discussion_r190472138 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -219,7 +219,11 @@ object ReorderAssociativeOperator extends Rule[LogicalPlan] { object OptimizeIn extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case q: LogicalPlan => q transformExpressionsDown { - case In(v, list) if list.isEmpty && !v.nullable => FalseLiteral + case In(v, list) if list.isEmpty => +// When v is not nullable, the following expression will be optimized +// to FalseLiteral which is tested in OptimizeInSuite.scala +If(IsNotNull(v), FalseLiteral, Literal(null, BooleanType)) + case In(v, list) if list.length == 1 => EqualTo(v, list.head) --- End diff -- Why does it have any implication on typecasting? With this PR, it seems I get the correct result. ```scala == Analyzed Logical Plan == (CAST(1.1 AS STRING) IN (CAST(1 AS STRING))): boolean, (CAST(1.1 AS INT) = 1): boolean Project [cast(1.1 as string) IN (cast(1 as string)) AS (CAST(1.1 AS STRING) IN (CAST(1 AS STRING)))#484, (cast(1.1 as int) = 1) AS (CAST(1.1 AS INT) = 1)#485] +- OneRowRelation == Optimized Logical Plan == Project [false AS (CAST(1.1 AS STRING) IN (CAST(1 AS STRING)))#484, true AS (CAST(1.1 AS INT) = 1)#485] +- OneRowRelation == Physical Plan == *(1) Project [false AS (CAST(1.1 AS STRING) IN (CAST(1 AS STRING)))#484, true AS (CAST(1.1 AS INT) = 1)#485] +- Scan OneRowRelation[] ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21418: Branch 2.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21418 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21418: Branch 2.2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21418 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21415 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21295 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21415 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91079/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21310: [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder should...
Github user fangshil commented on the issue: https://github.com/apache/spark/pull/21310 @viirya thanks for the feedback. We internally customized the AvroEncoder based on the open source PR, since it never gets merged into spark-avro. we propose this feature since it should apply to every user-defined Encoder, not limited to AvroEncoder. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21295 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91080/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3538/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21418: Branch 2.2
GitHub user gentlewangyu opened a pull request: https://github.com/apache/spark/pull/21418 Branch 2.2 ## What changes were proposed in this pull request? compiling spark with scala-2.10 should use the -p parameter instead of -d ## How was this patch tested? You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21418.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21418 commit 9949fed1c45865b6e5e8ebe610789c5fb9546052 Author: Corey WoodfieldDate: 2017-07-19T22:21:38Z [SPARK-21333][DOCS] Removed invalid joinTypes from javadoc of Dataset#joinWith ## What changes were proposed in this pull request? Two invalid join types were mistakenly listed in the javadoc for joinWith, in the Dataset class. I presume these were copied from the javadoc of join, but since joinWith returns a Dataset\ , left_semi and left_anti are invalid, as they only return values from one of the datasets, instead of from both ## How was this patch tested? I ran the following code : ``` public static void main(String[] args) { SparkSession spark = new SparkSession(new SparkContext("local[*]", "Test")); Dataset one = spark.createDataFrame(Arrays.asList(new Bean(1), new Bean(2), new Bean(3), new Bean(4), new Bean(5)), Bean.class); Dataset two = spark.createDataFrame(Arrays.asList(new Bean(4), new Bean(5), new Bean(6), new Bean(7), new Bean(8), new Bean(9)), Bean.class); try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "inner").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "cross").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "outer").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full_outer").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_outer").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right_outer").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_semi").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_anti").show();} catch (Exception e) {e.printStackTrace();} } ``` which tests all the different join types, and the last two (left_semi and left_anti) threw exceptions. The same code using join instead of joinWith did fine. The Bean class was just a java bean with a single int field, x. Author: Corey Woodfield Closes #18462 from coreywoodfield/master. (cherry picked from commit 8cd9cdf17a7a4ad6f2eecd7c4b388ca363c20982) Signed-off-by: gatorsmile commit 88dccda393bc79dc6032f71b6acf8eb2b4b152be Author: Dhruve Ashar Date: 2017-07-21T19:03:46Z [SPARK-21243][CORE] Limit no. of map outputs in a shuffle fetch For configurations with external shuffle enabled, we have observed that if a very large no. of blocks are being fetched from a remote host, it puts the NM under extra pressure and can crash it. This change introduces a configuration `spark.reducer.maxBlocksInFlightPerAddress` , to limit the no. of map outputs being fetched from a given remote address. The changes applied here are applicable for both the scenarios - when external shuffle is enabled as well as disabled. Ran the job with the default configuration which does not change the existing behavior and ran it with few configurations of lower values -10,20,50,100. The job ran fine and there is no change in the output. (I will update the metrics related to NM in some time.) Author: Dhruve Ashar Closes #18487 from dhruve/impr/SPARK-21243. Author: Dhruve Ashar Closes #18691 from dhruve/branch-2.2. commit da403b95353f064c24da25236fa7f905fa8ddca1 Author: Holden Karau Date: 2017-07-21T23:50:47Z
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21295 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21415: [SPARK-24244][SPARK-24368][SQL] Passing only required co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21415 **[Test build #91079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91079/testReport)** for PR 21415 at commit [`0aef16b`](https://github.com/apache/spark/commit/0aef16b5e9017fb398e0df2f3694a1db1f4d7cb8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21295 **[Test build #91080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91080/testReport)** for PR 21295 at commit [`497bdd8`](https://github.com/apache/spark/commit/497bdd8fc581f3c40ae97eb56d0a5f65e7d42405). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91084/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21399 **[Test build #91084 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91084/testReport)** for PR 21399 at commit [`294e189`](https://github.com/apache/spark/commit/294e18925a6d4d0d216a6173fb3d7930da6985fe). * This patch **fails Java style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isinSet in DataFrame AP...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21416#discussion_r190470525 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -397,6 +399,68 @@ class ColumnExpressionSuite extends QueryTest with SharedSQLContext { } } + test("isinSet: Scala Set") { +val df = Seq((1, "x"), (2, "y"), (3, "z")).toDF("a", "b") +checkAnswer(df.filter($"a".isinSet(Set(1, 2))), + df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isinSet(Set(3, 2))), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isinSet(Set(3, 1))), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1)) + +// Auto casting should work with mixture of different types in Set +checkAnswer(df.filter($"a".isinSet(Set(1.toShort, "2"))), + df.collect().toSeq.filter(r => r.getInt(0) == 1 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isinSet(Set("3", 2.toLong))), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 2)) +checkAnswer(df.filter($"a".isinSet(Set(3, "1"))), + df.collect().toSeq.filter(r => r.getInt(0) == 3 || r.getInt(0) == 1)) + +checkAnswer(df.filter($"b".isinSet(Set("y", "x"))), + df.collect().toSeq.filter(r => r.getString(1) == "y" || r.getString(1) == "x")) +checkAnswer(df.filter($"b".isinSet(Set("z", "x"))), + df.collect().toSeq.filter(r => r.getString(1) == "z" || r.getString(1) == "x")) +checkAnswer(df.filter($"b".isinSet(Set("z", "y"))), + df.collect().toSeq.filter(r => r.getString(1) == "z" || r.getString(1) == "y")) + +val df2 = Seq((1, Seq(1)), (2, Seq(2)), (3, Seq(3))).toDF("a", "b") + +intercept[AnalysisException] { + df2.filter($"a".isinSet(Set($"b"))) +} --- End diff -- Addressed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3537/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21417: Branch 2.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21417 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21399 **[Test build #91085 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91085/testReport)** for PR 21399 at commit [`6943ff8`](https://github.com/apache/spark/commit/6943ff81e5b63314ffc78591dec289a73fc2dcd5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21417: Branch 2.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21417 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21417: Branch 2.0
GitHub user gentlewangyu opened a pull request: https://github.com/apache/spark/pull/21417 Branch 2.0 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21417.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21417 commit 050b8177e27df06d33a6f6f2b3b6a952b0d03ba6 Author: cody koeningerDate: 2016-10-12T22:22:06Z [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice ## What changes were proposed in this pull request? Alternative approach to https://github.com/apache/spark/pull/15387 Author: cody koeninger Closes #15401 from koeninger/SPARK-17782-alt. (cherry picked from commit f9a56a153e0579283160519065c7f3620d12da3e) Signed-off-by: Shixiong Zhu commit 5903dabc57c07310573babe94e4f205bdea6455f Author: Brian Cho Date: 2016-10-13T03:43:18Z [SPARK-16827][BRANCH-2.0] Avoid reporting spill metrics as shuffle metrics ## What changes were proposed in this pull request? Fix a bug where spill metrics were being reported as shuffle metrics. Eventually these spill metrics should be reported (SPARK-3577), but separate from shuffle metrics. The fix itself basically reverts the line to what it was in 1.6. ## How was this patch tested? Cherry-picked from master (#15347) Author: Brian Cho Closes #15455 from dafrista/shuffle-metrics-2.0. commit ab00e410c6b1d7dafdfabcea1f249c78459b94f0 Author: Burak Yavuz Date: 2016-10-13T04:40:45Z [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once ## What changes were proposed in this pull request? The CompactibleFileStreamLog materializes the whole metadata log in memory as a String. This can cause issues when there are lots of files that are being committed, especially during a compaction batch. You may come across stacktraces that look like: ``` java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.lang.StringCoding.encode(StringCoding.java:350) at java.lang.String.getBytes(String.java:941) at org.apache.spark.sql.execution.streaming.FileStreamSinkLog.serialize(FileStreamSinkLog.scala:127) ``` The safer way is to write to an output stream so that we don't have to materialize a huge string. ## How was this patch tested? Existing unit tests Author: Burak Yavuz Closes #15437 from brkyvz/ser-to-stream. (cherry picked from commit edeb51a39d76d64196d7635f52be1b42c7ec4341) Signed-off-by: Shixiong Zhu commit d38f38a093b4dff32c686675d93ab03e7a8f4908 Author: buzhihuojie Date: 2016-10-13T05:51:54Z minor doc fix for Row.scala ## What changes were proposed in this pull request? minor doc fix for "getAnyValAs" in class Row ## How was this patch tested? None. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: buzhihuojie Closes #15452 from david-weiluo-ren/minorDocFixForRow. (cherry picked from commit 7222a25a11790fa9d9d1428c84b6f827a785c9e8) Signed-off-by: Reynold Xin commit d7fa3e32421c73adfa522adfeeb970edd4c22eb3 Author: Shixiong Zhu Date: 2016-10-13T20:31:50Z [SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer ## What changes were proposed in this pull request? Because `KafkaConsumer.poll(0)` may update the partition offsets, this PR just calls `seekToBeginning` to manually set the earliest offsets for the KafkaSource initial offsets. ## How was this patch tested? Existing tests. Author: Shixiong Zhu Closes #15397 from zsxwing/SPARK-17834. (cherry picked from commit 08eac356095c7faa2b19d52f2fb0cbc47eb7d1d1) Signed-off-by: Shixiong Zhu commit c53b8374911e801ed98c1436c384f0aef076eaab Author: Davies Liu
[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21399 **[Test build #91084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91084/testReport)** for PR 21399 at commit [`294e189`](https://github.com/apache/spark/commit/294e18925a6d4d0d216a6173fb3d7930da6985fe). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21408: [SPARK-24364][SS] Prevent InMemoryFileIndex from ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21408 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21408: [SPARK-24364][SS] Prevent InMemoryFileIndex from failing...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21408 Merged to master and branch-2.3. Thanks @cloud-fan. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21390 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21390 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3536/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21390 **[Test build #91083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91083/testReport)** for PR 21390 at commit [`4a4ab59`](https://github.com/apache/spark/commit/4a4ab595a32537bd5ad022ec77f3e598a252a8ed). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/21390 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91074/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #91074 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91074/testReport)** for PR 21366 at commit [`4f58393`](https://github.com/apache/spark/commit/4f583939f9e3d6d1df7a0d44ec0c5acf6ae82ef1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21390 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91078/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21390 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21390: [SPARK-24340][Core] Clean up non-shuffle disk block mana...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21390 **[Test build #91078 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91078/testReport)** for PR 21390 at commit [`4a4ab59`](https://github.com/apache/spark/commit/4a4ab595a32537bd5ad022ec77f3e598a252a8ed). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21408: [SPARK-24364][SS] Prevent InMemoryFileIndex from failing...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21408 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21408: [SPARK-24364][SS] Prevent InMemoryFileIndex from failing...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21408 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91077/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21408: [SPARK-24364][SS] Prevent InMemoryFileIndex from failing...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21408 **[Test build #91077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91077/testReport)** for PR 21408 at commit [`a5614f8`](https://github.com/apache/spark/commit/a5614f8fc1346fca321a413d107fddd70d8197c8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91076/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21399 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21399: [SPARK-22269][BUILD][test-maven] Run Java linter via SBT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21399 **[Test build #91076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91076/testReport)** for PR 21399 at commit [`7bb0eb3`](https://github.com/apache/spark/commit/7bb0eb3be6619ea9d0c7a023da5b665fecbc799e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21411 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21405: [SPARK-24361][SQL] Polish code block manipulation API
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21405 @cloud-fan Thanks. I give a use case of splitting code into method in the PR description. I think it can show the basic idea. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21411 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91073/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21411: [SPARK-24367][SQL]Parquet: use JOB_SUMMARY_LEVEL instead...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21411 **[Test build #91073 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91073/testReport)** for PR 21411 at commit [`3a6a87b`](https://github.com/apache/spark/commit/3a6a87ba0e0bcb36a7a023edbd35fe411ed2fd6d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21389#discussion_r190461717 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.types._ + + +object DataSourceUtils { + + /** + * Verify if the schema is supported in datasource. + */ + def verifySchema(format: String, schema: StructType): Unit = { +def verifyType(dataType: DataType): Unit = dataType match { + case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType | + StringType | BinaryType | DateType | TimestampType | _: DecimalType => + + case st: StructType => st.foreach { f => verifyType(f.dataType) } + + case ArrayType(elementType, _) => verifyType(elementType) + + case MapType(keyType, valueType, _) => +verifyType(keyType) +verifyType(valueType) + + case udt: UserDefinedType[_] => verifyType(udt.sqlType) + + // For backward-compatibility + case NullType if format == "JSON" => + + case _ => +throw new UnsupportedOperationException( --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21389#discussion_r190461628 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.types._ + + +object DataSourceUtils { + + /** + * Verify if the schema is supported in datasource. + */ + def verifySchema(format: String, schema: StructType): Unit = { +def verifyType(dataType: DataType): Unit = dataType match { + case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType | + StringType | BinaryType | DateType | TimestampType | _: DecimalType => + + case st: StructType => st.foreach { f => verifyType(f.dataType) } + + case ArrayType(elementType, _) => verifyType(elementType) + + case MapType(keyType, valueType, _) => +verifyType(keyType) +verifyType(valueType) + + case udt: UserDefinedType[_] => verifyType(udt.sqlType) + + // For backward-compatibility --- End diff -- ok, I will. Also, we need to merge this function with `CSVUtils.verifySchema` in this pr? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21389#discussion_r190461517 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.types._ + + +object DataSourceUtils { + + /** + * Verify if the schema is supported in datasource. + */ + def verifySchema(format: String, schema: StructType): Unit = { +def verifyType(dataType: DataType): Unit = dataType match { + case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType | + StringType | BinaryType | DateType | TimestampType | _: DecimalType => + + case st: StructType => st.foreach { f => verifyType(f.dataType) } + + case ArrayType(elementType, _) => verifyType(elementType) + + case MapType(keyType, valueType, _) => +verifyType(keyType) +verifyType(valueType) + + case udt: UserDefinedType[_] => verifyType(udt.sqlType) + + // For backward-compatibility + case NullType if format == "JSON" => + + case _ => +throw new UnsupportedOperationException( --- End diff -- Basically, for such a PR, we need to check all the data types that we block and ensure no behavior change is introduced by this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21389#discussion_r190461402 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.types._ + + +object DataSourceUtils { + + /** + * Verify if the schema is supported in datasource. + */ + def verifySchema(format: String, schema: StructType): Unit = { +def verifyType(dataType: DataType): Unit = dataType match { + case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType | + StringType | BinaryType | DateType | TimestampType | _: DecimalType => + + case st: StructType => st.foreach { f => verifyType(f.dataType) } + + case ArrayType(elementType, _) => verifyType(elementType) + + case MapType(keyType, valueType, _) => +verifyType(keyType) +verifyType(valueType) + + case udt: UserDefinedType[_] => verifyType(udt.sqlType) + + // For backward-compatibility --- End diff -- Do we have any test case for this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20708: [SPARK-21209][MLLLIB] Implement Incremental PCA algorith...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20708 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91075/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21366 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21319: [SPARK-24267][SQL] explicitly keep DataSourceReader in D...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21319 The problem is that plan visitor can only visit plan but not changing it, and pushing down operators to data source needs to remove filters from plan... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21406: [Minor][Core] Cleanup unused vals in `DAGSchedule...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21406 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21366: [SPARK-24248][K8S][WIP] Use the Kubernetes API to popula...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21366 **[Test build #91075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91075/testReport)** for PR 21366 at commit [`2a2374c`](https://github.com/apache/spark/commit/2a2374c915aafa1b5a53c8e02581cea0c2c176df). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21406: [Minor][Core] Cleanup unused vals in `DAGScheduler.handl...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21406 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21372 Thank you, @cloud-fan , @gatorsmile , @HyukjinKwon . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...
Github user cxzl25 commented on the issue: https://github.com/apache/spark/pull/21311 @cloud-fan Thank you very much for your help. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21372 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21372 LGTM Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21372 thanks, merging to master/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17583 **[Test build #91082 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91082/testReport)** for PR 17583 at commit [`47aa749`](https://github.com/apache/spark/commit/47aa7492e0f3edf3549e5e7b1eeb6074fb5d6f8b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17583 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91082/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17583 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17583: [SPARK-20271]Add FuncTransformer to simplify custom tran...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17583 **[Test build #91082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91082/testReport)** for PR 17583 at commit [`47aa749`](https://github.com/apache/spark/commit/47aa7492e0f3edf3549e5e7b1eeb6074fb5d6f8b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate the new s...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21311 thanks, merging to master/2.3/2.2/2.1/2.0! There is no conflict so I backported all the way to 2.0. I'll watch the jenkins build in the few days. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21311: [SPARK-24257][SQL]LongToUnsafeRowMap calculate th...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21311 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21069 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3535/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21069 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21069 **[Test build #91081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91081/testReport)** for PR 21069 at commit [`9281ae2`](https://github.com/apache/spark/commit/9281ae233dc54dd961e99e345be559929232c148). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21069: [SPARK-23920][SQL]add array_remove to remove all element...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21069 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21383 cc @icexelloss too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21295 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3534/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190453705 --- Diff: python/pyspark/rdd.py --- @@ -791,9 +792,11 @@ def foreach(self, f): >>> def f(x): print(x) >>> sc.parallelize([1, 2, 3, 4, 5]).foreach(f) """ +safe_f = fail_on_StopIteration(f) --- End diff -- Im okay with `safe` as is too if you feel strongly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21295 Congratulation, @rdblue ! :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21295 **[Test build #91080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91080/testReport)** for PR 21295 at commit [`497bdd8`](https://github.com/apache/spark/commit/497bdd8fc581f3c40ae97eb56d0a5f65e7d42405). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/21372 Finally! Could you review this again, @HyukjinKwon , @gatorsmile , @cloud-fan ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21295 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21295: [SPARK-24230][SQL] Fix SpecificParquetRecordReaderBase w...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21295 @rdblue congrats! All my concerns have been addressed, I think it's ready to merge, also cc @michal-databricks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21383 Seems good otherwise to me too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190450900 --- Diff: python/pyspark/sql/tests.py --- @@ -900,6 +900,17 @@ def __call__(self, x): self.assertEqual(f, f_.func) self.assertEqual(return_type, f_.returnType) +def test_stopiteration_in_udf(self): +# test for SPARK-23754 +from pyspark.sql.functions import udf +from py4j.protocol import Py4JJavaError + +def foo(x): +raise StopIteration() + +with self.assertRaises(Py4JJavaError) as cm: --- End diff -- ditto for `cm` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190450846 --- Diff: python/pyspark/tests.py --- @@ -161,6 +161,37 @@ def gen_gs(N, step=1): self.assertEqual(k, len(vs)) self.assertEqual(list(range(k)), list(vs)) +def test_stopiteration_is_raised(self): + +def stopit(*args, **kwargs): +raise StopIteration() + +def legit_create_combiner(x): +return [x] + +def legit_merge_value(x, y): +return x.append(y) or x + +def legit_merge_combiners(x, y): +return x.extend(y) or x + +data = [(x % 2, x) for x in range(100)] + +# wrong create combiner +m = ExternalMerger(Aggregator(stopit, legit_merge_value, legit_merge_combiners), 20) +with self.assertRaises((Py4JJavaError, RuntimeError)) as cm: --- End diff -- Let's pick up one explicit exception here too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190450814 --- Diff: python/pyspark/tests.py --- @@ -161,6 +161,37 @@ def gen_gs(N, step=1): self.assertEqual(k, len(vs)) self.assertEqual(list(range(k)), list(vs)) +def test_stopiteration_is_raised(self): + +def stopit(*args, **kwargs): +raise StopIteration() + +def legit_create_combiner(x): +return [x] + +def legit_merge_value(x, y): +return x.append(y) or x + +def legit_merge_combiners(x, y): +return x.extend(y) or x + +data = [(x % 2, x) for x in range(100)] + +# wrong create combiner +m = ExternalMerger(Aggregator(stopit, legit_merge_value, legit_merge_combiners), 20) +with self.assertRaises((Py4JJavaError, RuntimeError)) as cm: --- End diff -- `cm` looks unused. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21379: [SPARK-24327][SQL] Add an option to quote a parti...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21379#discussion_r190450621 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala --- @@ -78,7 +79,12 @@ private[sql] object JDBCRelation extends Logging { // Overflow and silliness can happen if you subtract then divide. // Here we get a little roundoff, but that's (hopefully) OK. val stride: Long = upperBound / numPartitions - lowerBound / numPartitions -val column = partitioning.column +val column = if (jdbcOptions.quotePartitionColumnName) { + val dialect = JdbcDialects.get(jdbcOptions.url) + dialect.quoteIdentifier(partitioning.column) --- End diff -- ok, I will --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190450475 --- Diff: python/pyspark/util.py --- @@ -89,6 +89,19 @@ def majorMinorVersion(sparkVersion): " version numbers.") +def fail_on_StopIteration(f): +""" wraps f to make it safe (= does not lead to data loss) to use inside a for loop --- End diff -- How about something like `Wraps the input function to fail on StopIteration by RuntimeError to prevent data loss silently.`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21372 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91070/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21372 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21372: [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21372 **[Test build #91070 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91070/testReport)** for PR 21372 at commit [`954d1d9`](https://github.com/apache/spark/commit/954d1d92ade183d8774b75e03cb02e16635cde48). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190449941 --- Diff: python/pyspark/util.py --- @@ -89,6 +89,19 @@ def majorMinorVersion(sparkVersion): " version numbers.") +def fail_on_StopIteration(f): +""" wraps f to make it safe (= does not lead to data loss) to use inside a for loop --- End diff -- not a big deal at all but `wraps` -> `Wraps` while we are here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190449695 --- Diff: python/pyspark/tests.py --- @@ -1246,6 +1277,31 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect() self.assertEqual(data, result) +def test_stopiteration_in_client_code(self): + +def a_rdd(keyed=False): +return self.sc.parallelize( +((x % 2, x) if keyed else x) --- End diff -- I would just create two RDDs and reuse it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190449424 --- Diff: python/pyspark/tests.py --- @@ -1246,6 +1277,31 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect() self.assertEqual(data, result) +def test_stopiteration_in_client_code(self): + +def a_rdd(keyed=False): +return self.sc.parallelize( +((x % 2, x) if keyed else x) +for x in range(10) +) + +def stopit(*x): +raise StopIteration() + +def do_test(action, *args, **kwargs): +with self.assertRaises((Py4JJavaError, RuntimeError)) as cm: +action(*args, **kwargs) + +do_test(a_rdd().map(stopit).collect) --- End diff -- Maybe we could do: ``` self.assertRaises(RuntimeError, rdd.map(stopit).collect) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190449208 --- Diff: python/pyspark/tests.py --- @@ -1246,6 +1277,31 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect() self.assertEqual(data, result) +def test_stopiteration_in_client_code(self): + +def a_rdd(keyed=False): +return self.sc.parallelize( +((x % 2, x) if keyed else x) +for x in range(10) +) + +def stopit(*x): +raise StopIteration() + +def do_test(action, *args, **kwargs): +with self.assertRaises((Py4JJavaError, RuntimeError)) as cm: --- End diff -- Shall we pick up one explicit exception for each? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190448950 --- Diff: python/pyspark/tests.py --- @@ -1246,6 +1277,31 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect() self.assertEqual(data, result) +def test_stopiteration_in_client_code(self): + +def a_rdd(keyed=False): +return self.sc.parallelize( +((x % 2, x) if keyed else x) +for x in range(10) +) --- End diff -- Shell we make this inlined? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190448642 --- Diff: python/pyspark/rdd.py --- @@ -791,9 +792,11 @@ def foreach(self, f): >>> def f(x): print(x) >>> sc.parallelize([1, 2, 3, 4, 5]).foreach(f) """ +safe_f = fail_on_StopIteration(f) --- End diff -- `safe` prefix doesn't imply why it's safe though .. I would just name it like `fail_on_stopiteration_f` or feel free to another name if you have a good one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org