[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88387/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20847 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20847 **[Test build #88387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88387/testReport)** for PR 20847 at commit [`f130a36`](https://github.com/apache/spark/commit/f130a3603386acee670788d545a9d38969fcf39f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20829 **[Test build #88395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88395/testReport)** for PR 20829 at commit [`ab91545`](https://github.com/apache/spark/commit/ab91545bebc6e1d0c5c3cb7c15156d546ad48f81). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20807 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88388/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20579 **[Test build #88388 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88388/testReport)** for PR 20579 at commit [`3392305`](https://github.com/apache/spark/commit/339230570eab374619477f7c0d68f3451d7ff90b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20807: SPARK-23660: Fix exception in yarn cluster mode when app...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20807 Merging to master / 2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20829 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20853 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20829 **[Test build #88392 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88392/testReport)** for PR 20829 at commit [`9624061`](https://github.com/apache/spark/commit/9624061920bfe0a2cf06af054d9988c95cd0b322). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175629947 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- There's a big different between the RPC endpoint and the web UI. For the RPC endpoint, your change is fixing a potential issue. For the web UI, this change is potentially introducing a new issue in non-YARN mode. So please, revert this one change and all the rest is good to go. Otherwise I can't merge this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20829 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88392/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20853 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88385/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20853 **[Test build #88385 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88385/testReport)** for PR 20853 at commit [`9f391de`](https://github.com/apache/spark/commit/9f391de914e3be5ed61e00d1c651349db3d52cb6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20657 I'm really sorry about the delay @vanzin @squito . I will take another review today and back to you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20796: [SPARK-23649][SQL] Skipping chars disallowed in U...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20796#discussion_r175626064 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -187,8 +218,9 @@ public void writeTo(OutputStream out) throws IOException { * @param b The first byte of a code point */ private static int numBytesForFirstByte(final byte b) { -final int offset = (b & 0xFF) - 192; -return (offset >= 0) ? bytesOfCodePointInUTF8[offset] : 1; +final int offset = b & 0xFF; +byte numBytes = bytesOfCodePointInUTF8[offset]; +return (numBytes == 0) ? 1: numBytes; // Skip the first byte disallowed in UTF-8 --- End diff -- Is the comment valid? Do we skip it? Don't we still count the disallowed byte as one code point in `numChars`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r175625855 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -616,18 +616,22 @@ booleanValue ; interval -: INTERVAL intervalField* +: INTERVAL? intervalField+ ; intervalField -: value=intervalValue unit=identifier (TO to=identifier)? +: value=intervalValue unit=intervalUnit (TO to=identifier)? --- End diff -- right, I checked the hive parser and it seems to handle `TO` as units. But, this is an refactoring issue, so is it okay to include this fix in this pr? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20829 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88390/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20829 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20829 **[Test build #88390 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88390/testReport)** for PR 20829 at commit [`2b1fd4e`](https://github.com/apache/spark/commit/2b1fd4e36fffefb1e1679221353426cb05320104). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20796: [SPARK-23649][SQL] Skipping chars disallowed in U...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20796#discussion_r175624764 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -57,12 +57,39 @@ public Object getBaseObject() { return base; } public long getBaseOffset() { return offset; } - private static int[] bytesOfCodePointInUTF8 = {2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, -4, 4, 4, 4, 4, 4, 4, 4, -5, 5, 5, 5, -6, 6}; + /** + * A char in UTF-8 encoding can take 1-4 bytes depending on the first byte which + * indicates the size of the char. See Unicode standard in page 126: + * http://www.unicode.org/versions/Unicode10.0.0/UnicodeStandard-10.0.pdf + * + * BinaryHex Comments + * 0xxx 0x00..0x7F Only byte of a 1-byte character encoding + * 10xx 0x80..0xBF Continuation bytes (1-3 continuation bytes) + * 110x 0xC0..0xDF First byte of a 2-byte character encoding --- End diff -- Yes, `0xC2..0xDF` should be the first byte for 2-byte character encoding. But here you said `0xC0..0xDF`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20796 **[Test build #88394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88394/testReport)** for PR 20796 at commit [`5557a80`](https://github.com/apache/spark/commit/5557a80d4674e929332d9441342e5b90e314eb45). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r175624411 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -1479,15 +1479,13 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { e.message.contains("Cannot save interval data type into external storage") }) -val e1 = intercept[AnalysisException] { - sql("select interval") --- End diff -- yea, I'll revert this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20796 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20796 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88384/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20796 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20856: [SPARK-23731][SQL] FileSourceScanExec throws NullPointer...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/20856 BTW, I've just realized that even without the issue it's clear that creating a new `FileSourceScanExec` will end up with a NPE from the `supportsBatch` field. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20796 **[Test build #88384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88384/testReport)** for PR 20796 at commit [`5557a80`](https://github.com/apache/spark/commit/5557a80d4674e929332d9441342e5b90e314eb45). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20327: [SPARK-12963][CORE] NM host for driver end points
Github user gerashegalov commented on a diff in the pull request: https://github.com/apache/spark/pull/20327#discussion_r175623955 --- Diff: core/src/main/scala/org/apache/spark/ui/WebUI.scala --- @@ -126,7 +126,11 @@ private[spark] abstract class WebUI( def bind(): Unit = { assert(serverInfo.isEmpty, s"Attempted to bind $className more than once!") try { - val host = Option(conf.getenv("SPARK_LOCAL_IP")).getOrElse("0.0.0.0") + val host = if (Utils.isClusterMode(conf)) { --- End diff -- I formulated the problem more broadly in the title of the PR: "NM host for driver end points". It's not a intuitive default behavior to bind to `0.0.0.0` if the backend (YARN) is explicitly configured not to, and we need a mechanism to allow Spark to inherit the YARN-determined bind address on NM. You convinced me that the client mode is less critical, and it's easy to override via environment of spark submit (after the bug fix). Although I'd prefer using bindAddress everywhere for consistency and as it's documented. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20856: [SPARK-23731][SQL] FileSourceScanExec throws NullPointer...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/20856 I spent over 2 days applying different modifications to the query hoping I could cut the number of `CASE WHEN`s and other projections, but noticed no correlation between the number or their "types". I'll see if renaming the columns leads to the issue and submit a test case. Thanks for your support! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20433#discussion_r175622969 --- Diff: sql/core/src/test/resources/sql-tests/inputs/interval.sql --- @@ -0,0 +1,184 @@ +-- Basic tests for intervals + +select + '1' second, + 2 seconds, --- End diff -- yea. Since `days`, `seconds`, ... seems to be non-sql standard and this is out-of-scope in this pr, I'll drop these in this pr. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20827 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1620/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20827 **[Test build #88393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88393/testReport)** for PR 20827 at commit [`bee3711`](https://github.com/apache/spark/commit/bee3711074a7d34cf19e8794f837b70eddaffbe0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20827 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20827: [SPARK-23666][SQL] Do not display exprIds of Alia...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20827#discussion_r175621271 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala --- @@ -324,31 +324,28 @@ case class AttributeReference( * A place holder used when printing expressions without debugging information such as the * expression id or the unresolved indicator. */ -case class PrettyAttribute( +case class PrettyNamedExpression( name: String, dataType: DataType = NullType) - extends Attribute with Unevaluable { --- End diff -- You mean the latest commit? I just renamed this because `Alias` is not `Attribute`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20829 **[Test build #88392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88392/testReport)** for PR 20829 at commit [`9624061`](https://github.com/apache/spark/commit/9624061920bfe0a2cf06af054d9988c95cd0b322). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20687 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1619/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20687 **[Test build #88391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88391/testReport)** for PR 20687 at commit [`5926301`](https://github.com/apache/spark/commit/592630148af19adbb72703dd1ff49f82c33087d2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20687 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...
Github user henryr commented on the issue: https://github.com/apache/spark/pull/20687 @gatorsmile ok, I think the coverage right now is a reasonable start - the other test cases I can think of would act more like they're exercising the expression-walking code, not the actual simplification. Look forward to collaborating on the follow-up PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20795 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20829 **[Test build #88390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88390/testReport)** for PR 20829 at commit [`2b1fd4e`](https://github.com/apache/spark/commit/2b1fd4e36fffefb1e1679221353426cb05320104). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20795 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88386/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20853 **[Test build #88389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88389/testReport)** for PR 20853 at commit [`8a12452`](https://github.com/apache/spark/commit/8a124522519ed4f8fb750555f1a596c9f97b6947). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20795 **[Test build #88386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88386/testReport)** for PR 20795 at commit [`93b115e`](https://github.com/apache/spark/commit/93b115eb398ce69afd63c3f11915d1bbbef15ce1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20858 Thinks for this work! One question; why do you think we need to support this api in Spark native? Other libraries support this as first-class? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20850 @hvanhovell btw, (this is not related to this pr thought...) the most part of code in `UTF8StringBuffer` and `BufferHolder` are overlapped. So, we could clean up there, too? https://github.com/apache/spark/compare/master...maropu:CleanupBufferImpl --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20803: [SPARK-23653][SQL] Show sql statement in spark SQ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20803#discussion_r175609377 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -166,20 +168,28 @@ private[sql] object Dataset { class Dataset[T] private[sql]( @transient val sparkSession: SparkSession, @DeveloperApi @InterfaceStability.Unstable @transient val queryExecution: QueryExecution, -encoder: Encoder[T]) +encoder: Encoder[T], +val sqlText: String = "") --- End diff -- And how does the SQL shell execute commands? like `SELECT * FROM ...`, does it display all the rows or add a LIMIT before displaying? Generally we should not propagate sql text, as a new DataFrame usually means the plan is changed, the SQL text is not accurate anymore. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20850#discussion_r175609234 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -40,29 +37,45 @@ */ public final class UnsafeRowWriter extends UnsafeWriter { - private final BufferHolder holder; - // The offset of the global buffer where we start to write this row. - private int startingOffset; + private final UnsafeRow row; + private final int nullBitsSize; private final int fixedSize; - public UnsafeRowWriter(BufferHolder holder, int numFields) { -this.holder = holder; + public UnsafeRowWriter(UnsafeRow row, int initialBufferSize) { +this(row, new BufferHolder(row, initialBufferSize), row.numFields()); + } + + public UnsafeRowWriter(UnsafeRow row) { +this(row, new BufferHolder(row), row.numFields()); + } + + public UnsafeRowWriter(UnsafeWriter writer, int numFields) { +this(null, writer.getBufferHolder(), numFields); + } + + private UnsafeRowWriter(UnsafeRow row, BufferHolder holder, int numFields) { +super(holder); +this.row = row; this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields); this.fixedSize = nullBitsSize + 8 * numFields; -this.startingOffset = holder.cursor; +this.startingOffset = cursor(); + } + + public void setTotalSize() { --- End diff -- If we'd like to make generated code blocks easy-to-read, we should depend on generated comments instead of api names, I think. Anyway, this decision depends on other dev's thoughts. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20803: [SPARK-23653][SQL] Show sql statement in spark SQ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20803#discussion_r175608866 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -166,20 +168,28 @@ private[sql] object Dataset { class Dataset[T] private[sql]( @transient val sparkSession: SparkSession, @DeveloperApi @InterfaceStability.Unstable @transient val queryExecution: QueryExecution, -encoder: Encoder[T]) +encoder: Encoder[T], +val sqlText: String = "") --- End diff -- what's the exact rule you defined to decide whether or not we should propagate the sql text? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20850#discussion_r175608496 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -86,11 +88,39 @@ public void grow(int neededSize) { } } - public void reset() { + byte[] buffer() { +return buffer; + } + + int getCursor() { +return cursor; + } + + void incrementCursor(int val) { +cursor += val; + } + + int pushCursor() { --- End diff -- Can we make this a little bit less complex? I think just storing the cursor in the UnsafeWriter is enough. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20806: [SPARK-23661][SQL] Implement treeAggregate on Dataset AP...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20806 great! maybe we can hold this PR for a real SQL tree aggregate in the future, with some proper design and discussion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20850#discussion_r175608099 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java --- @@ -17,17 +17,71 @@ package org.apache.spark.sql.catalyst.expressions.codegen; import org.apache.spark.sql.types.Decimal; +import org.apache.spark.unsafe.Platform; +import org.apache.spark.unsafe.array.ByteArrayMethods; import org.apache.spark.unsafe.types.CalendarInterval; import org.apache.spark.unsafe.types.UTF8String; /** * Base class for writing Unsafe* structures. */ public abstract class UnsafeWriter { + // Keep internal buffer holder + protected final BufferHolder holder; + + // The offset of the global buffer where we start to write this structure. + protected int startingOffset; + + protected UnsafeWriter(BufferHolder holder) { +this.holder = holder; + } + + /** + * Accessor methods are delegated from BufferHolder class + */ + public final BufferHolder getBufferHolder() { +return holder; + } + + public final byte[] buffer() { return holder.buffer(); } --- End diff -- No performance impact? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20701 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20579 **[Test build #88388 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88388/testReport)** for PR 20579 at commit [`3392305`](https://github.com/apache/spark/commit/339230570eab374619477f7c0d68f3451d7ff90b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20701 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88381/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1618/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20579 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20701 **[Test build #88381 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88381/testReport)** for PR 20701 at commit [`f6ee4a2`](https://github.com/apache/spark/commit/f6ee4a2b4bb2444d65ab0e26a141304b327bd998). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class KMeansModel (@Since(\"1.0.0\") val clusterCenters: Array[Vector],` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19222 Looks pretty good! My major concern is, the semantic of offset in `MemoryBlock.getXXX` should be implementation independent. Users should always assume the offset starts with 0. Since it's about API/semantic, it's better to do it right at the first place. Thanks for working on it! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r175605407 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -119,15 +115,24 @@ public static UTF8String blankString(int length) { return fromBytes(spaces); } - protected UTF8String(Object base, long offset, int numBytes) { + protected UTF8String(byte[] bytes, long offset, int numBytes) { +this(new ByteArrayMemoryBlock(bytes, offset, numBytes)); + } + + public UTF8String(MemoryBlock base) { this.base = base; -this.offset = offset; -this.numBytes = numBytes; +if (base != null) { + if ((long) Integer.MAX_VALUE < base.size()) { +throw new ArrayIndexOutOfBoundsException( + "MemoryBlock.size " + base.size() + " should be less than " + Integer.MAX_VALUE); + } + this.numBytes = (int) base.size(); --- End diff -- why not use `Ints.checkedCast`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r175605220 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -119,15 +115,24 @@ public static UTF8String blankString(int length) { return fromBytes(spaces); } - protected UTF8String(Object base, long offset, int numBytes) { + protected UTF8String(byte[] bytes, long offset, int numBytes) { +this(new ByteArrayMemoryBlock(bytes, offset, numBytes)); + } + + public UTF8String(MemoryBlock base) { this.base = base; -this.offset = offset; -this.numBytes = numBytes; +if (base != null) { --- End diff -- I think we can actually add ab assert to make sure `base` is not null. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r175605033 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -119,15 +115,24 @@ public static UTF8String blankString(int length) { return fromBytes(spaces); } - protected UTF8String(Object base, long offset, int numBytes) { + protected UTF8String(byte[] bytes, long offset, int numBytes) { +this(new ByteArrayMemoryBlock(bytes, offset, numBytes)); + } + + public UTF8String(MemoryBlock base) { this.base = base; -this.offset = offset; -this.numBytes = numBytes; +if (base != null) { + if ((long) Integer.MAX_VALUE < base.size()) { --- End diff -- do we need to do the cast? I think comparing int and long, java would cast the int to long automatically. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r175604820 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java --- @@ -36,22 +42,34 @@ public MemoryBlock allocate(long size) throws OutOfMemoryError { @Override public void free(MemoryBlock memory) { -assert (memory.obj == null) : - "baseObject not null; are you trying to use the off-heap allocator to free on-heap memory?"; -assert (memory.pageNumber != MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER) : +assert(memory instanceof OffHeapMemoryBlock) : + "UnsafeMemoryAllocator can only free OffHeapMemoryBlock."; +if (memory == OffHeapMemoryBlock.NULL) return; +assert (memory.getPageNumber() != MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER) : "page has already been freed"; -assert ((memory.pageNumber == MemoryBlock.NO_PAGE_NUMBER) -|| (memory.pageNumber == MemoryBlock.FREED_IN_TMM_PAGE_NUMBER)) : +assert ((memory.getPageNumber() == MemoryBlock.NO_PAGE_NUMBER) +|| (memory.getPageNumber() == MemoryBlock.FREED_IN_TMM_PAGE_NUMBER)) : "TMM-allocated pages must be freed via TMM.freePage(), not directly in allocator free()"; if (MemoryAllocator.MEMORY_DEBUG_FILL_ENABLED) { memory.fill(MemoryAllocator.MEMORY_DEBUG_FILL_FREED_VALUE); } + Platform.freeMemory(memory.offset); + // As an additional layer of defense against use-after-free bugs, we mutate the // MemoryBlock to reset its pointer. -memory.offset = 0; +memory.resetObjAndOffset(); // Mark the page as freed (so we can detect double-frees). -memory.pageNumber = MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER; +memory.setPageNumber(MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER); + } + + public OffHeapMemoryBlock reallocate(OffHeapMemoryBlock block, long oldSize, long newSize) { --- End diff -- no one is calling it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r175604535 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -45,38 +45,159 @@ */ public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3; - private final long length; + @Nullable + protected Object obj; + + protected long offset; + + protected final long length; /** * Optional page number; used when this MemoryBlock represents a page allocated by a - * TaskMemoryManager. This field is public so that it can be modified by the TaskMemoryManager, - * which lives in a different package. + * TaskMemoryManager. This field can be updated using setPageNumber method so that + * this can be modified by the TaskMemoryManager, which lives in a different package. */ - public int pageNumber = NO_PAGE_NUMBER; + private int pageNumber = NO_PAGE_NUMBER; - public MemoryBlock(@Nullable Object obj, long offset, long length) { -super(obj, offset); + protected MemoryBlock(@Nullable Object obj, long offset, long length) { +if (offset < 0 || length < 0) { + throw new ArrayIndexOutOfBoundsException( +"Length " + length + " and offset " + offset + "must be non-negative"); +} +this.obj = obj; +this.offset = offset; this.length = length; } + protected MemoryBlock() { +this(null, 0, 0); + } + + public final Object getBaseObject() { +return obj; + } + + public final long getBaseOffset() { +return offset; + } + + public void resetObjAndOffset() { +this.obj = null; +this.offset = 0; + } + /** * Returns the size of the memory block. */ - public long size() { + public final long size() { return length; } - /** - * Creates a memory block pointing to the memory used by the long array. - */ - public static MemoryBlock fromLongArray(final long[] array) { -return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length * 8L); + public final void setPageNumber(int pageNum) { +pageNumber = pageNum; + } + + public final int getPageNumber() { +return pageNumber; } /** * Fills the memory block with the specified byte value. */ - public void fill(byte value) { + public final void fill(byte value) { Platform.setMemory(obj, offset, length, value); } + + /** + * Instantiate MemoryBlock for given object type with new offset + */ + public final static MemoryBlock allocateFromObject(Object obj, long offset, long length) { +MemoryBlock mb = null; +if (obj instanceof byte[]) { + byte[] array = (byte[])obj; + mb = new ByteArrayMemoryBlock(array, offset, length); +} else if (obj instanceof long[]) { + long[] array = (long[])obj; + mb = new OnHeapMemoryBlock(array, offset, length); +} else if (obj == null) { + // we assume that to pass null pointer means off-heap + mb = new OffHeapMemoryBlock(offset, length); +} else { + throw new UnsupportedOperationException( +"Instantiate MemoryBlock for type " + obj.getClass() + " is not supported now"); +} +return mb; + } + + /** + * Just instantiate the same type of MemoryBlock with new offset and size. The data is not + * copied. If parameters are invalid, an exception is thrown + */ + public abstract MemoryBlock subBlock(long offset, long size); + + protected void checkSubBlockRange(long offset, long size) { +if (this.offset + offset < 0 || size < 0) { + throw new ArrayIndexOutOfBoundsException( +"Size " + size + " and offset " + (this.offset + offset) + " must be non-negative"); +} +if (offset + size > length) { + throw new ArrayIndexOutOfBoundsException("The sum of size " + size + " and offset " + +offset + " should not be larger than the length " + length + " in the MemoryBlock"); +} + } + + /** + * getXXX/putXXX does not ensure guarantee behavior if the offset is invalid. e.g cause illegal + * memory access, throw an exception, or etc. + */ + public abstract int getInt(long offset); + + public abstract void putInt(long offset, int value); + + public abstract boolean getBoolean(long offset); + + public abstract void putBoolean(long offset, boolean value); + + public abstract byte getByte(long offset); + + public abstract void putByte(long offset, byte value); + + public
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r175604062 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java --- @@ -94,12 +95,12 @@ public void free(MemoryBlock memory) { } // Mark the page as freed (so we can detect double-frees). -memory.pageNumber = MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER; +memory.setPageNumber(MemoryBlock.FREED_IN_ALLOCATOR_PAGE_NUMBER); // As an additional layer of defense against use-after-free bugs, we mutate the // MemoryBlock to null out its reference to the long[] array. -long[] array = (long[]) memory.obj; -memory.setObjAndOffset(null, 0); +long[] array = ((OnHeapMemoryBlock)memory).getLongArray(); +memory.resetObjAndOffset(); long alignedSize = ((size + 7) / 8) * 8; if (shouldPool(alignedSize)) { --- End diff -- I think in the future we should cache `MemoryBlock` directly, so that we can have a unified pool for both on-heap and off-heap memory manager. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20796 LGTM, pending jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20847 **[Test build #88387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88387/testReport)** for PR 20847 at commit [`f130a36`](https://github.com/apache/spark/commit/f130a3603386acee670788d545a9d38969fcf39f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1617/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20847 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20847 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20847 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88382/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20847 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20847 **[Test build #88382 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88382/testReport)** for PR 20847 at commit [`f130a36`](https://github.com/apache/spark/commit/f130a3603386acee670788d545a9d38969fcf39f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20795 **[Test build #88386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88386/testReport)** for PR 20795 at commit [`93b115e`](https://github.com/apache/spark/commit/93b115eb398ce69afd63c3f11915d1bbbef15ce1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r175592083 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java --- @@ -49,55 +51,81 @@ public static int hashInt(int input, int seed) { } public int hashUnsafeWords(Object base, long offset, int lengthInBytes) { -return hashUnsafeWords(base, offset, lengthInBytes, seed); +return hashUnsafeWordsBlock(MemoryBlock.allocateFromObject(base, offset, lengthInBytes), seed); + } + + public static int hashUnsafeWordsBlock(MemoryBlock base, int seed) { +// This is based on Guava's `Murmur32_Hasher.processRemaining(ByteBuffer)` method. +int lengthInBytes = Ints.checkedCast(base.size()); +assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 8 (word-aligned)"; +int h1 = hashBytesByIntBlock(base, seed); +return fmix(h1, lengthInBytes); } public static int hashUnsafeWords(Object base, long offset, int lengthInBytes, int seed) { // This is based on Guava's `Murmur32_Hasher.processRemaining(ByteBuffer)` method. assert (lengthInBytes % 8 == 0): "lengthInBytes must be a multiple of 8 (word-aligned)"; -int h1 = hashBytesByInt(base, offset, lengthInBytes, seed); +int h1 = hashBytesByIntBlock(MemoryBlock.allocateFromObject(base, offset, lengthInBytes), seed); --- End diff -- It's more consistent to call `hashUnsafeWordsBlock` here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20208 Which document do you have in mind? A section in `docs/sql-programming-guide.md`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20796: [SPARK-23649][SQL] Skipping chars disallowed in U...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20796#discussion_r175590297 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -57,12 +57,39 @@ public Object getBaseObject() { return base; } public long getBaseOffset() { return offset; } - private static int[] bytesOfCodePointInUTF8 = {2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, -3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, -4, 4, 4, 4, 4, 4, 4, 4, -5, 5, 5, 5, -6, 6}; + /** + * A char in UTF-8 encoding can take 1-4 bytes depending on the first byte which + * indicates the size of the char. See Unicode standard in page 126: + * http://www.unicode.org/versions/Unicode10.0.0/UnicodeStandard-10.0.pdf + * + * BinaryHex Comments + * 0xxx 0x00..0x7F Only byte of a 1-byte character encoding + * 10xx 0x80..0xBF Continuation bytes (1-3 continuation bytes) + * 110x 0xC0..0xDF First byte of a 2-byte character encoding --- End diff -- Here is the table of allowed first bytes: https://user-images.githubusercontent.com/1580697/37623134-eca5756a-2bc3-11e8-99a0-3b538697d15c.png;> --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20796: [SPARK-23649][SQL] Skipping chars disallowed in U...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20796#discussion_r175589638 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java --- @@ -791,4 +795,21 @@ public void trimRightWithTrimString() { assertEquals(fromString("头"), fromString("头a???/").trimRight(fromString("æ°?/*&^%a"))); assertEquals(fromString("头"), fromString("头æ°bæ°æ° [").trimRight(fromString(" []æ°b"))); } + + @Test + public void skipWrongFirstByte() { +int[] wrongFirstBytes = { --- End diff -- The bytes are not filtered by UTF8String methods. For instance, in the case of csv datasource the invalid bytes are just passed to the final result. See https://issues.apache.org/jira/browse/SPARK-23649 I have created a separate ticket to fix the issue: https://issues.apache.org/jira/browse/SPARK-23741 . I am not sure that the issue of output of wrong UTF-8 chars should be addressed by this PR (this pr just fixes crashes on wrong input) because it could impact on users and other Spark components. Need to discuss it and test it carefully. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20859: [SPARK-23702][SS] Forbid watermarks on both sides of sta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20859 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88383/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20859: [SPARK-23702][SS] Forbid watermarks on both sides of sta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20859 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20859: [SPARK-23702][SS] Forbid watermarks on both sides of sta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20859 **[Test build #88383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88383/testReport)** for PR 20859 at commit [`051c85a`](https://github.com/apache/spark/commit/051c85afe727f39ba9d505e00e162620f69c808f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20818: [SPARK-23675][WEB-UI]Title add spark logo, use spark log...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20818 **[Test build #4141 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4141/testReport)** for PR 20818 at commit [`964439b`](https://github.com/apache/spark/commit/964439be7a592b2a94f93008dabc45a18f97c3c6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20859: [SPARK-23702][SS] Forbid watermarks on both sides...
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/20859#discussion_r175585873 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -160,6 +160,19 @@ object UnsupportedOperationChecker { case _: InsertIntoDir => throwError("InsertIntoDir is not supported with streaming DataFrames/Datasets") +case e: EventTimeWatermark => + val statefulChildren = e.collect { +case a: Aggregate if a.isStreaming => a +case d: Deduplicate if d.isStreaming => d +case f: FlatMapGroupsWithState if f.isStreaming => f + } + statefulChildren.foreach { statefulNode => +if (statefulNode.collectFirst{ case e: EventTimeWatermark => e }.isDefined) { + throwError("Watermarks both before and after a stateful operator in a streaming " + --- End diff -- WDYT of something like "watermarks may not be present...". Talking about "well-defined" seems a bit confusing to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175585634 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +138,36 @@ private[deploy] object DependencyUtils { def resolveGlobPaths(paths: String, hadoopConf: Configuration): String = { require(paths != null, "paths cannot be null.") Utils.stringToSeq(paths).flatMap { path => - val uri = Utils.resolveURI(path) - uri.getScheme match { -case "local" | "http" | "https" | "ftp" => Array(path) -case _ => - val fs = FileSystem.get(uri, hadoopConf) - Option(fs.globStatus(new Path(uri))).map { status => -status.filter(_.isFile).map(_.getPath.toUri.toString) - }.getOrElse(Array(path)) + val (base, fragment) = splitOnFragment(Utils.resolveURI(path)) + (resolveGlobPath(base, hadoopConf), fragment) match { +case (resolved: Array[String], Some(_)) if resolved.length > 1 => throw new SparkException( +s"${base.toString} resolves ambiguously to multiple files: ${resolved.mkString(",")}") +case (resolved: Array[String], Some(namedAs)) => resolved.map( _ + "#" + namedAs) --- End diff -- Same here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175585581 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +138,36 @@ private[deploy] object DependencyUtils { def resolveGlobPaths(paths: String, hadoopConf: Configuration): String = { require(paths != null, "paths cannot be null.") Utils.stringToSeq(paths).flatMap { path => - val uri = Utils.resolveURI(path) - uri.getScheme match { -case "local" | "http" | "https" | "ftp" => Array(path) -case _ => - val fs = FileSystem.get(uri, hadoopConf) - Option(fs.globStatus(new Path(uri))).map { status => -status.filter(_.isFile).map(_.getPath.toUri.toString) - }.getOrElse(Array(path)) + val (base, fragment) = splitOnFragment(Utils.resolveURI(path)) + (resolveGlobPath(base, hadoopConf), fragment) match { +case (resolved: Array[String], Some(_)) if resolved.length > 1 => throw new SparkException( --- End diff -- Type inference is not working here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20809: [SPARK-23667][CORE] Better scala version check
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20809 > This is on travis and no SPARK_HOME is set. That sounds a little odd. If that is true, then your proposed code wouldn't work either, since it requires SPARK_HOME to be known. In any case, there are two calls to `getScalaVersion()`. First is: ``` boolean prependClasses = !isEmpty(getenv("SPARK_PREPEND_CLASSES")); boolean isTesting = "1".equals(getenv("SPARK_TESTING")); if (prependClasses || isTesting) { String scala = getScalaVersion(); ``` And your code shouldn't be triggering that, since both env variables are for Spark development and other applications shouldn't be using them. Second call is a little later: ``` String jarsDir = findJarsDir(getSparkHome(), getScalaVersion(), !isTesting && !isTestingSql); ``` Here `getScalaVersion()` is only needed when running Spark from a git clone, not from the distribution package. So the right thing would be to move `getScalaVersion()` to `CommandBuilderUtils`, and call it from `findJarsDir` only if needed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20850 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20850 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88380/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20853 **[Test build #88385 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88385/testReport)** for PR 20853 at commit [`9f391de`](https://github.com/apache/spark/commit/9f391de914e3be5ed61e00d1c651349db3d52cb6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and BufferHolder...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20850 **[Test build #88380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88380/testReport)** for PR 20850 at commit [`c342f0d`](https://github.com/apache/spark/commit/c342f0da350f53ad0683bda115174d4debbcf1e0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20208 Let us write the document in this PR? Document our current behaviors with the related test cases. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20853 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20853 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88379/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20853 **[Test build #88379 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88379/testReport)** for PR 20853 at commit [`50b5ad1`](https://github.com/apache/spark/commit/50b5ad1289fabffe4f661d0b44fcc9ef3ecd2367). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19222#discussion_r175581710 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java --- @@ -48,6 +49,16 @@ public static int roundNumberOfBytesToNearestWord(int numBytes) { public static int MAX_ROUNDED_ARRAY_LENGTH = Integer.MAX_VALUE - 15; private static final boolean unaligned = Platform.unaligned(); + /** + * MemoryBlock equality check for MemoryBlocks. + * @return true if the arrays are equal, false otherwise + */ + public static boolean arrayEqualsBlock( --- End diff -- no one is calling it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20796 **[Test build #88384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88384/testReport)** for PR 20796 at commit [`5557a80`](https://github.com/apache/spark/commit/5557a80d4674e929332d9441342e5b90e314eb45). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20816: [SPARK-21479][SQL] Outer join filter pushdown in ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20816#discussion_r175580746 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -669,11 +672,42 @@ object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelpe val newConditionOpt = conditionOpt match { case Some(condition) => val newFilters = additionalConstraints -- splitConjunctivePredicates(condition) - if (newFilters.nonEmpty) Option(And(newFilters.reduce(And), condition)) else None + if (newFilters.nonEmpty) Option(And(newFilters.reduce(And), condition)) else conditionOpt case None => additionalConstraints.reduceOption(And) } - if (newConditionOpt.isDefined) Join(left, right, joinType, newConditionOpt) else join + // Infer filter for left/right outer joins + val newLeftOpt = joinType match { +case RightOuter if newConditionOpt.isDefined => + val rightConstraints = right.constraints.union( +splitConjunctivePredicates(newConditionOpt.get).toSet) + val inferredConstraints = ExpressionSet( + QueryPlanConstraints.inferAdditionalConstraints(rightConstraints)) + val leftConditions = inferredConstraints --- End diff -- Let us leave `allConstraints ` untouched. We should avoid the extra code changes in this PR. We need to use `conf.constraintPropagationEnabled` for the extra constraints introduced by this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org