[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22141 @maropu Very sorry. I haven't had the time to come back to it. I have some stuff on my plate. I will get to this after i am done. Thanks !! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23243: [SPARK-26288][ExternalShuffleService]add initRegisteredE...
Github user weixiuli commented on the issue: https://github.com/apache/spark/pull/23243 @HyukjinKwon OK ,thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23243: [branch-2.4][ExternalShuffleService]add initRegisteredEx...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23243 Backport from which JIRA @weixiuli? Usually the fix should go to master first and it's backported to other branches when it's needed. If it should be fixed in master branch as well, let's file a JIRA and switch the branch to the master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23241 **[Test build #99758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99758/testReport)** for PR 23241 at commit [`29f618e`](https://github.com/apache/spark/commit/29f618e682282f38ff56369eadab0baff3895180). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5797/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23241 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23241 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregati...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22305 I can help if this looks good to @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23243: add initRegisteredExecutorsDB
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23243 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23243: add initRegisteredExecutorsDB
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23243 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23243: add initRegisteredExecutorsDB
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23243 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23243: add initRegisteredExecutorsDB
GitHub user weixiuli opened a pull request: https://github.com/apache/spark/pull/23243 add initRegisteredExecutorsDB ## What changes were proposed in this pull request? As we all know that spark on Yarn uses DB to record RegisteredExecutors information, when the ExternalShuffleService restart and it can be reload, which will be used as well . While neither spark's standalone nor spark on k8s can record it's RegisteredExecutors information by db or other ,so when ExternalShuffleService restart ,which RegisteredExecutors information will be lost,this is't what we want to. This commit add initRegisteredExecutorsDB which can be used either spark standalone or spark on k8s to record RegisteredExecutors information , when the ExternalShuffleService restart and it can be reload, which will be used as well . (Please fill in changes proposed in this fix) ## How was this patch tested? test("test initRegisteredExecutorsDB ") { val sparkConf = new SparkConf() Utils.loadDefaultSparkProperties(sparkConf) val securityManager = new SecurityManager(sparkConf) sparkConf.set(config.SHUFFLE_SERVICE_DB_ENABLED.key, "true") sparkConf.set(config.SHUFFLE_SERVICE_ENABLED.key, "true") sparkConf.set("spark.local.dir", "/tmp") val externalShuffleService = new ExternalShuffleService(sparkConf, securityManager) } (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/weixiuli/spark branch-2.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23243.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23243 commit 3591c16dc758b762ead253be490a67748c33078a Author: éç§å© Date: 2018-12-06T06:20:43Z add initRegisteredExecutorsDB --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99752/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23207 **[Test build #99752 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99752/testReport)** for PR 23207 at commit [`9966c2a`](https://github.com/apache/spark/commit/9966c2abc821492d5f5c6c74034407879c764573). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23223 **[Test build #99757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99757/testReport)** for PR 23223 at commit [`2d1c27a`](https://github.com/apache/spark/commit/2d1c27aa1cf94a9a4a524ddc16670a25c0c3b41d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23223 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99757/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23223 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99751/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading from open frames of z...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23241 **[Test build #99751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99751/testReport)** for PR 23241 at commit [`29f618e`](https://github.com/apache/spark/commit/29f618e682282f38ff56369eadab0baff3895180). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23223 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5796/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23223 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23223: [SPARK-26269][YARN]Yarnallocator should have same...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/23223#discussion_r239341126 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -612,11 +612,14 @@ private[yarn] class YarnAllocator( val message = "Container killed by YARN for exceeding physical memory limits. " + s"$diag Consider boosting ${EXECUTOR_MEMORY_OVERHEAD.key}." (true, message) + case exit_status if NOT_APP_AND_SYSTEM_FAULT_EXIT_STATUS.contains(exit_status) => --- End diff -- Updated. But I'm not sure about: > That way values like ContainerExitStatus.SUCCESS from the set would be really used. this part. @attilapiros --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23223 **[Test build #99757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99757/testReport)** for PR 23223 at commit [`2d1c27a`](https://github.com/apache/spark/commit/2d1c27aa1cf94a9a4a524ddc16670a25c0c3b41d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23213 **[Test build #99756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99756/testReport)** for PR 23213 at commit [`a9c108f`](https://github.com/apache/spark/commit/a9c108fa090b847d48848cf6d679aa6747dcc534). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22612 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99748/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5795/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22612 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22612 **[Test build #99748 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99748/testReport)** for PR 22612 at commit [`3d65b35`](https://github.com/apache/spark/commit/3d65b35eb17e69147d30edd1ebdc73e762c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5794/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23215 **[Test build #99755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99755/testReport)** for PR 23215 at commit [`ce2db28`](https://github.com/apache/spark/commit/ce2db2824d0179d63f7234688784f78ddb04e4e5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99750/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23229: [MINOR][CORE] Modify some field name because it may be c...
Github user wangjiaochun commented on the issue: https://github.com/apache/spark/pull/23229 ok,close this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23213 **[Test build #99750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99750/testReport)** for PR 23213 at commit [`a9c108f`](https://github.com/apache/spark/commit/a9c108fa090b847d48848cf6d679aa6747dcc534). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23229: [MINOR][CORE] Modify some field name because it m...
Github user wangjiaochun closed the pull request at: https://github.com/apache/spark/pull/23229 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user httfighter commented on the issue: https://github.com/apache/spark/pull/22683 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23141: [SPARK-26021][SQL][followup] add test for special...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23141#discussion_r239333919 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java --- @@ -165,10 +165,14 @@ public void writeMinusZeroIsReplacedWithZero() { byte[] floatBytes = new byte[Float.BYTES]; Platform.putDouble(doubleBytes, Platform.BYTE_ARRAY_OFFSET, -0.0d); Platform.putFloat(floatBytes, Platform.BYTE_ARRAY_OFFSET, -0.0f); -double doubleFromPlatform = Platform.getDouble(doubleBytes, Platform.BYTE_ARRAY_OFFSET); -float floatFromPlatform = Platform.getFloat(floatBytes, Platform.BYTE_ARRAY_OFFSET); -Assert.assertEquals(Double.doubleToLongBits(0.0d), Double.doubleToLongBits(doubleFromPlatform)); -Assert.assertEquals(Float.floatToIntBits(0.0f), Float.floatToIntBits(floatFromPlatform)); +byte[] doubleBytes2 = new byte[Double.BYTES]; +byte[] floatBytes2 = new byte[Float.BYTES]; +Platform.putDouble(doubleBytes, Platform.BYTE_ARRAY_OFFSET, 0.0d); +Platform.putFloat(floatBytes, Platform.BYTE_ARRAY_OFFSET, 0.0f); --- End diff -- ah good catch! I'm surprised this test passed before... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23215 **[Test build #99754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99754/testReport)** for PR 23215 at commit [`b24134a`](https://github.com/apache/spark/commit/b24134a574bb3b2098bdc51bc96a49c2412585e3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5793/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5792/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23215 **[Test build #99753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99753/testReport)** for PR 23215 at commit [`8e1653b`](https://github.com/apache/spark/commit/8e1653b978679a675a41eaddf40f87f7de26028c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...
Github user skambha commented on the issue: https://github.com/apache/spark/pull/23206 Today in Spark, the extension points API `injectOptimizerRule` method allows the rules to be injected at the end in `extendedOperatorOptimizationRules` and this becomes 2 batches separated by removing the InferFiltersFromConstraints rule, ie: "Operator Optimization before Inferring Filters", "Operator Optimization after Inferring Filters". As you can see, even here we have a usecase of an order, where we want the rules to kick in before the InferFiltersFromConstraint rule. What this PR proposes is a method to inject a rule in a specific place in a batch. For our usecase, we have optimization rules defined that need to be kicked in after a certain rule. Just a case like above. The position the rules get kicked in by default now, alter the plan and our optimization rule doesn't get kicked in. The other method that is proposed is adding a batch. This is similar to what exists today already in postHocOptimizationBatches and preHocOptimizationBatches, but this is not exposed in SparkSessionExtensions API. So the proposed method `injectOptimizerBatch` just exposes this as part of the extension points so we can make use of it. I agree this adds logic to the Optimizer to compute batches. The new code is structured in such a way that if these new inject methods are not used, it will not use any of the computation of the new logic. There is one check to see if there are any new rules or batches to be injected, if not, then the code is as before. Hope this helps some. The SparkSessionExtension APIs is experimental and for third party developers who want to add extensions to Spark without the need to get their code merged into Spark. If there are other ways to achieve this without changing Spark code, please share your thoughts. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23206 What's a concrete example? IMHO the current proposed API is some complicated/cumbersome to users and I feel its error-prone. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...
Github user skambha commented on the issue: https://github.com/apache/spark/pull/23206 @maropu, Thanks for your question. Yes. Thats correct. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23242: SPARK-26285: accumulator metrics sources for LongAccumul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23242 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23242: SPARK-26285: accumulator metrics sources for LongAccumul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23242 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23242: SPARK-26285: accumulator metrics sources for Long...
GitHub user abellina opened a pull request: https://github.com/apache/spark/pull/23242 SPARK-26285: accumulator metrics sources for LongAccumulator and Doub⦠â¦leAccumulator ## What changes were proposed in this pull request? This PR implements metric sources for LongAccumulator and DoubleAccumulator, such that a user can register these accumulators easily and have their values be reported by the driver's metric namespace. ## How was this patch tested? Unit tests, and manual tests. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/abellina/spark SPARK-26285_accumulator_source Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23242.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23242 commit 45cfada1079838de90e39e488e593886b2bc33b7 Author: Alessandro Bellina Date: 2018-11-19T14:13:23Z SPARK-26285: accumulator metrics sources for LongAccumulator and DoubleAccumulator --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23242: SPARK-26285: accumulator metrics sources for LongAccumul...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23242 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r239323943 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,116 @@ case class CreateHiveTableAsSelectCommand( Seq.empty[Row] } + // Returns `DataWritingCommand` used to write data when the table exists. + def writingCommandForExistingTable( +catalog: SessionCatalog, +tableDesc: CatalogTable): DataWritingCommand + + // Returns `DataWritingCommand` used to write data when the table doesn't exist. + def writingCommandForNewTable( +catalog: SessionCatalog, +tableDesc: CatalogTable): DataWritingCommand + override def argString: String = { s"[Database:${tableDesc.database}, " + s"TableName: ${tableDesc.identifier.table}, " + s"InsertIntoHiveTable]" } } + +/** + * Create table and insert the query result into it. + * + * @param tableDesc the table description, which may contain serde, storage handler etc. + * @param query the query whose result will be insert into the new relation + * @param mode SaveMode + */ +case class CreateHiveTableAsSelectCommand( +tableDesc: CatalogTable, +query: LogicalPlan, +outputColumnNames: Seq[String], +mode: SaveMode) + extends CreateHiveTableAsSelectBase { + + override def writingCommandForExistingTable( + catalog: SessionCatalog, + tableDesc: CatalogTable): DataWritingCommand = { +InsertIntoHiveTable( + tableDesc, + Map.empty, --- End diff -- https://github.com/apache/spark/blob/8534d753ecb21ea64ffbaefb5eaca38ba0464c6d/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala#L686-L697 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22141: [SPARK-25154][SQL] Support NOT IN sub-queries inside nes...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22141 Any update? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21777: [WIP][SPARK-24498][SQL] Add JDK compiler for runtime cod...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21777 @kiszk Can you close this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99747/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23108 **[Test build #99747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99747/testReport)** for PR 23108 at commit [`51d1d78`](https://github.com/apache/spark/commit/51d1d78d1e1c4f56f5f07dc18bc9fcbe9a00fbbf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r239319889 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,116 @@ case class CreateHiveTableAsSelectCommand( Seq.empty[Row] } + // Returns `DataWritingCommand` used to write data when the table exists. + def writingCommandForExistingTable( +catalog: SessionCatalog, +tableDesc: CatalogTable): DataWritingCommand + + // Returns `DataWritingCommand` used to write data when the table doesn't exist. + def writingCommandForNewTable( +catalog: SessionCatalog, +tableDesc: CatalogTable): DataWritingCommand + override def argString: String = { s"[Database:${tableDesc.database}, " + s"TableName: ${tableDesc.identifier.table}, " + s"InsertIntoHiveTable]" } } + +/** + * Create table and insert the query result into it. + * + * @param tableDesc the table description, which may contain serde, storage handler etc. + * @param query the query whose result will be insert into the new relation + * @param mode SaveMode + */ +case class CreateHiveTableAsSelectCommand( +tableDesc: CatalogTable, +query: LogicalPlan, +outputColumnNames: Seq[String], +mode: SaveMode) + extends CreateHiveTableAsSelectBase { + + override def writingCommandForExistingTable( + catalog: SessionCatalog, + tableDesc: CatalogTable): DataWritingCommand = { +InsertIntoHiveTable( + tableDesc, + Map.empty, --- End diff -- Could you point it out? I want to ensure it is covered --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23237: [SPARK-26279][CORE] Remove unused method in Logging
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/23237 @HyukjinKwon Close this PR. Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23237: [SPARK-26279][CORE] Remove unused method in Loggi...
Github user seancxmao closed the pull request at: https://github.com/apache/spark/pull/23237 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5791/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23207 **[Test build #99752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99752/testReport)** for PR 23207 at commit [`9966c2a`](https://github.com/apache/spark/commit/9966c2abc821492d5f5c6c74034407879c764573). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23223: [SPARK-26269][YARN]Yarnallocator should have same...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/23223#discussion_r239316608 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala --- @@ -417,4 +426,59 @@ class YarnAllocatorSuite extends SparkFunSuite with Matchers with BeforeAndAfter clock.advance(50 * 1000L) handler.getNumExecutorsFailed should be (0) } + + test("SPARK-26296: YarnAllocator should have same blacklist behaviour with YARN") { +val rmClientSpy = spy(rmClient) +val maxExecutors = 11 + +val handler = createAllocator( + maxExecutors, + rmClientSpy, + Map( +"spark.yarn.blacklist.executor.launch.blacklisting.enabled" -> "true", +"spark.blacklist.application.maxFailedExecutorsPerNode" -> "0")) +handler.updateResourceRequests() + +val hosts = (0 until maxExecutors).map(i => s"host$i") +val ids = (0 to maxExecutors).map(i => ContainerId.newContainerId(appAttemptId, i)) +val containers = createContainers(hosts, ids) +handler.handleAllocatedContainers(containers.slice(0, 9)) +val cs0 = ContainerStatus.newInstance(containers(0).getId, ContainerState.COMPLETE, + "success", ContainerExitStatus.SUCCESS) +val cs1 = ContainerStatus.newInstance(containers(1).getId, ContainerState.COMPLETE, + "preempted", ContainerExitStatus.PREEMPTED) +val cs2 = ContainerStatus.newInstance(containers(2).getId, ContainerState.COMPLETE, + "killed_exceeded_vmem", ContainerExitStatus.KILLED_EXCEEDED_VMEM) +val cs3 = ContainerStatus.newInstance(containers(3).getId, ContainerState.COMPLETE, + "killed_exceeded_pmem", ContainerExitStatus.KILLED_EXCEEDED_PMEM) +val cs4 = ContainerStatus.newInstance(containers(4).getId, ContainerState.COMPLETE, + "killed_by_resourcemanager", ContainerExitStatus.KILLED_BY_RESOURCEMANAGER) +val cs5 = ContainerStatus.newInstance(containers(5).getId, ContainerState.COMPLETE, + "killed_by_appmaster", ContainerExitStatus.KILLED_BY_APPMASTER) +val cs6 = ContainerStatus.newInstance(containers(6).getId, ContainerState.COMPLETE, + "killed_after_app_completion", ContainerExitStatus.KILLED_AFTER_APP_COMPLETION) +val cs7 = ContainerStatus.newInstance(containers(7).getId, ContainerState.COMPLETE, + "aborted", ContainerExitStatus.ABORTED) +val cs8 = ContainerStatus.newInstance(containers(8).getId, ContainerState.COMPLETE, + "disk_failed", ContainerExitStatus.DISKS_FAILED) --- End diff -- Nice suggestion! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23223: [SPARK-26269][YARN]Yarnallocator should have same...
Github user Ngone51 commented on a diff in the pull request: https://github.com/apache/spark/pull/23223#discussion_r239316424 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala --- @@ -114,13 +116,20 @@ class YarnAllocatorSuite extends SparkFunSuite with Matchers with BeforeAndAfter clock) } - def createContainer(host: String, resource: Resource = containerResource): Container = { -val containerId = ContainerId.newContainerId(appAttemptId, containerNum) + def createContainer( + host: String, + containerId: ContainerId = ContainerId.newContainerId(appAttemptId, containerNum), --- End diff -- Good idea. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/23223 > it looks like its only going to blacklist the node for the AM, not other nodes for general containers. @squito Yarn have blacklist for AM when config `am-scheduling.node-blacklisting-enabled`=true, and have `ContainerFailureTracker` for general containers(haven't find a config for it). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading incompleted frames of...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23241 **[Test build #99751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99751/testReport)** for PR 23241 at commit [`29f618e`](https://github.com/apache/spark/commit/29f618e682282f38ff56369eadab0baff3895180). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading incompleted frames of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5790/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading incompleted frames of...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23241 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/23223 > Are you seeing actual issues with this blacklisting when it shouldn't? Unfortunately, no. @tgravescs @squito --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/23223 > I mean if node blacklisting in Spark would be perfectly aligned to YARN then it would be just redundant to have it in Spark in the first place. This change seems result in *perfectly* aligned to YARN for node blacklisting in Spark, but my original thought is that some exit status (e.g. KILLED_BY_RESOURCEMANAGER ), currently, should not lead to a node blacklisting. So, actually, *perfectly* aligned to YARN is not the real target of this change, and we can also make some custom strategy for Spark. > Take for example disk failure. For spark task level backlisting, is it should be delegated to **schedulerBlacklist** in YarnAllocatorBlacklistTracker ? And it seems ContainerExitStatus.DISKS_FAILED in YARN is not same with Spark tasks' disk failure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r239312302 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala --- @@ -144,24 +282,107 @@ case class WindowInPandasExec( queue.close() } - val inputProj = UnsafeProjection.create(allInputs, child.output) - val pythonInput = grouped.map { case (_, rows) => -rows.map { row => - queue.add(row.asInstanceOf[UnsafeRow]) - inputProj(row) + val stream = iter.map { row => +queue.add(row.asInstanceOf[UnsafeRow]) +row + } + + val pythonInput = new Iterator[Iterator[UnsafeRow]] { + +// Manage the stream and the grouping. +var nextRow: UnsafeRow = null +var nextGroup: UnsafeRow = null +var nextRowAvailable: Boolean = false +private[this] def fetchNextRow() { + nextRowAvailable = stream.hasNext + if (nextRowAvailable) { +nextRow = stream.next().asInstanceOf[UnsafeRow] +nextGroup = grouping(nextRow) + } else { +nextRow = null +nextGroup = null + } +} +fetchNextRow() + +// Manage the current partition. +val buffer: ExternalAppendOnlyUnsafeRowArray = + new ExternalAppendOnlyUnsafeRowArray(inMemoryThreshold, spillThreshold) +var bufferIterator: Iterator[UnsafeRow] = _ + +val indexRow = new SpecificInternalRow(Array.fill(numBoundIndices)(IntegerType)) + +val frames = factories.map(_(indexRow)) + +private[this] def fetchNextPartition() { + // Collect all the rows in the current partition. + // Before we start to fetch new input rows, make a copy of nextGroup. + val currentGroup = nextGroup.copy() + + // clear last partition + buffer.clear() + + while (nextRowAvailable && nextGroup == currentGroup) { --- End diff -- I guess we have to use `GenerateOrdering.compare()` to support complex types as well as `GroupedIterator` does. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r239307965 --- Diff: python/pyspark/sql/tests/test_pandas_udf_window.py --- @@ -231,12 +266,10 @@ def test_array_type(self): self.assertEquals(result1.first()['v2'], [1.0, 2.0]) def test_invalid_args(self): -from pyspark.sql.functions import pandas_udf, PandasUDFType +from pyspark.sql.functions import mean, pandas_udf, PandasUDFType --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r239307779 --- Diff: python/pyspark/sql/tests/test_pandas_udf_window.py --- @@ -87,8 +96,34 @@ def ordered_window(self): def unpartitioned_window(self): return Window.partitionBy() +@property +def sliding_row_window(self): +return Window.partitionBy('id').orderBy('v').rowsBetween(-2, 1) + +@property +def sliding_range_window(self): +return Window.partitionBy('id').orderBy('v').rangeBetween(-2, 4) + +@property +def growing_row_window(self): +return Window.partitionBy('id').orderBy('v').rowsBetween(Window.unboundedPreceding, 3) + +@property +def growing_range_window(self): +return Window.partitionBy('id').orderBy('v') \ +.rangeBetween(Window.unboundedPreceding, 4) + +@property +def shrinking_row_window(self): +return Window.partitionBy('id').orderBy('v').rowsBetween(-2, Window.unboundedFollowing) + +@property +def shrinking_range_window(self): +return Window.partitionBy('id').orderBy('v') \ +.rangeBetween(-3, Window.unboundedFollowing) + def test_simple(self): -from pyspark.sql.functions import mean +from pyspark.sql.functions import pandas_udf, PandasUDFType, percent_rank, mean, max --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r239308506 --- Diff: python/pyspark/sql/tests/test_pandas_udf_window.py --- @@ -245,11 +278,101 @@ def test_invalid_args(self): foo_udf = pandas_udf(lambda x: x, 'v double', PandasUDFType.GROUPED_MAP) df.withColumn('v2', foo_udf(df['v']).over(w)) -with QuietTest(self.sc): -with self.assertRaisesRegexp( -AnalysisException, -'.*Only unbounded window frame is supported.*'): -df.withColumn('mean_v', mean_udf(df['v']).over(ow)) +def test_bounded_simple(self): +from pyspark.sql.functions import mean, max, min, count + +df = self.data +w1 = self.sliding_row_window +w2 = self.shrinking_range_window + +plus_one = self.python_plus_one +count_udf = self.pandas_agg_count_udf +mean_udf = self.pandas_agg_mean_udf +max_udf = self.pandas_agg_max_udf +min_udf = self.pandas_agg_min_udf + +result1 = df.withColumn('mean_v', mean_udf(plus_one(df['v'])).over(w1)) \ +.withColumn('count_v', count_udf(df['v']).over(w2)) \ +.withColumn('max_v', max_udf(df['v']).over(w2)) \ +.withColumn('min_v', min_udf(df['v']).over(w1)) + +expected1 = df.withColumn('mean_v', mean(plus_one(df['v'])).over(w1)) \ +.withColumn('count_v', count(df['v']).over(w2)) \ +.withColumn('max_v', max(df['v']).over(w2)) \ +.withColumn('min_v', min(df['v']).over(w1)) + +self.assertPandasEqual(expected1.toPandas(), result1.toPandas()) + +def test_growing_window(self): +from pyspark.sql.functions import mean + +df = self.data +w1 = self.growing_row_window +w2 = self.growing_range_window + +mean_udf = self.pandas_agg_mean_udf + +result1 = df.withColumn('m1', mean_udf(df['v']).over(w1)) \ +.withColumn('m2', mean_udf(df['v']).over(w2)) + +expected1 = df.withColumn('m1', mean(df['v']).over(w1)) \ +.withColumn('m2', mean(df['v']).over(w2)) + +self.assertPandasEqual(expected1.toPandas(), result1.toPandas()) + +def test_sliding_window(self): +from pyspark.sql.functions import mean + +df = self.data +w1 = self.sliding_row_window +w2 = self.sliding_range_window + +mean_udf = self.pandas_agg_mean_udf + +result1 = df.withColumn('m1', mean_udf(df['v']).over(w1)) \ +.withColumn('m2', mean_udf(df['v']).over(w2)) + +expected1 = df.withColumn('m1', mean(df['v']).over(w1)) \ +.withColumn('m2', mean(df['v']).over(w2)) + +self.assertPandasEqual(expected1.toPandas(), result1.toPandas()) + +def test_shrinking_window(self): +from pyspark.sql.functions import mean + +df = self.data +w1 = self.shrinking_row_window +w2 = self.shrinking_range_window + +mean_udf = self.pandas_agg_mean_udf + +result1 = df.withColumn('m1', mean_udf(df['v']).over(w1)) \ +.withColumn('m2', mean_udf(df['v']).over(w2)) + +expected1 = df.withColumn('m1', mean(df['v']).over(w1)) \ +.withColumn('m2', mean(df['v']).over(w2)) + +self.assertPandasEqual(expected1.toPandas(), result1.toPandas()) + +def test_bounded_mixed(self): +from pyspark.sql.functions import mean, max, min, count --- End diff -- We don't need min and count? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22305: [SPARK-24561][SQL][Python] User-defined window ag...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/22305#discussion_r239307483 --- Diff: python/pyspark/sql/tests/test_pandas_udf_window.py --- @@ -44,9 +44,18 @@ def python_plus_one(self): @property def pandas_scalar_time_two(self): -from pyspark.sql.functions import pandas_udf +from pyspark.sql.functions import pandas_udf, PandasUDFType --- End diff -- nit: we can revert this change? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 Anyway, if we can accept the additional test time, I think it is the best to run the tests on all the 4 patterns above for strict checks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239312090 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -170,13 +172,23 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared val df = testData2.groupBy().agg(collect_set('a)) // 2 partitions testSparkPlanMetrics(df, 1, Map( 2L -> (("ObjectHashAggregate", Map("number of output rows" -> 2L))), + 1L -> (("Exchange", Map( +"shuffle records written" -> 2L, +"records read" -> 2L, +"local blocks fetched" -> 2L, --- End diff -- Copy, the display text will be done in another pr. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/23207 ``` can you separate the prs to rename read side metric and the write side change? ``` No problem, next commit will revert the changes of rename read side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239311564 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -95,3 +96,59 @@ private[spark] object SQLShuffleMetricsReporter { FETCH_WAIT_TIME -> SQLMetrics.createTimingMetric(sc, "fetch wait time"), RECORDS_READ -> SQLMetrics.createMetric(sc, "records read")) } + +/** + * A shuffle write metrics reporter for SQL exchange operators. Different with + * [[SQLShuffleReadMetricsReporter]], we need a function of (reporter => reporter) set in + * shuffle dependency, so the local SQLMetric should transient and create on executor. + * @param metrics Shuffle write metrics in current SparkPlan. + * @param metricsReporter Other reporter need to be updated in this SQLShuffleWriteMetricsReporter. + */ +private[spark] case class SQLShuffleWriteMetricsReporter( +metrics: Map[String, SQLMetric])(metricsReporter: ShuffleWriteMetricsReporter) --- End diff -- As our discussion here https://github.com/apache/spark/pull/23207#discussion_r238909822, The latest approach choose to carry a function of (reporter => reporter) in shuffle dependency to create SQLShuffleWriteMetrics in ShuffleMapTask. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23237: [SPARK-26279][CORE] Remove unused method in Logging
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23237 Looks some classes, for instance, `KafkaUtils` exposes this (I guess mistakenly?). Let's don't bother this and close this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23206 cc: @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23206: [SPARK-26249][SQL] Add ability to inject a rule in order...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23206 The current post hook is not enough for the use case you assume? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239311141 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -38,12 +38,18 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) extends UnaryExecNode override def outputPartitioning: Partitioning = SinglePartition override def executeCollect(): Array[InternalRow] = child.executeTake(limit) private val serializer: Serializer = new UnsafeRowSerializer(child.output.size) - override lazy val metrics = SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext) + private val writeMetrics = SQLShuffleWriteMetricsReporter.createShuffleWriteMetrics(sparkContext) --- End diff -- Both should be private lazy val(also newly added readMetrics), I'll change them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239311018 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -38,12 +38,18 @@ case class CollectLimitExec(limit: Int, child: SparkPlan) extends UnaryExecNode override def outputPartitioning: Partitioning = SinglePartition override def executeCollect(): Array[InternalRow] = child.executeTake(limit) private val serializer: Serializer = new UnsafeRowSerializer(child.output.size) - override lazy val metrics = SQLShuffleMetricsReporter.createShuffleReadMetrics(sparkContext) + private val writeMetrics = SQLShuffleWriteMetricsReporter.createShuffleWriteMetrics(sparkContext) + override lazy val metrics = --- End diff -- Thanks, make sense, I'll change to separate both read/write metrics and pass them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23213 I think so, don't know if @cloud-fan or @mgaido91 has other opinions? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23237: [SPARK-26279][CORE] Remove unused method in Logging
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23237 **[Test build #99749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99749/testReport)** for PR 23237 at commit [`90b111f`](https://github.com/apache/spark/commit/90b111f900d8f11e4d730e0cfbe56a1683f96faa). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23237: [SPARK-26279][CORE] Remove unused method in Logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23237 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99749/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23237: [SPARK-26279][CORE] Remove unused method in Logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23237 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23241: [SPARK-26283][CORE] Enable reading open frames of zstd, ...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23241 Thanks @vanzin I updated the title --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23237: [SPARK-26279][CORE] Remove unused method in Logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23237 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23237: [SPARK-26279][CORE] Remove unused method in Logging
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23237 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5788/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5789/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23213 **[Test build #99750 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99750/testReport)** for PR 23213 at commit [`a9c108f`](https://github.com/apache/spark/commit/a9c108fa090b847d48848cf6d679aa6747dcc534). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org