[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20057 **[Test build #93327 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93327/testReport)** for PR 20057 at commit [`a365f79`](https://github.com/apache/spark/commit/a365f79b2f29326621a4cd0177780e66c56eaceb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21828: Update regression.py
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21828#discussion_r204012751 --- Diff: python/pyspark/ml/regression.py --- @@ -1116,7 +1116,7 @@ def setParams(self, featuresCol="features", labelCol="label", predictionCol="pre maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False, subsamplingRate=1.0, checkpointInterval=10, lossType="squared", maxIter=20, stepSize=0.1, seed=None, - impuriy="variance", featureSubsetStrategy="all"): + impurity="variance", featureSubsetStrategy="all"): --- End diff -- we could. I would rather use `_NoValue` instance for that purpose though. Also, I would make a warning via warnings package as we do in the code base. Can we add a simple test for that as well while we are here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21815: [SPARK-23731][SQL] Make FileSourceScanExec canonicalizab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21815 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21815: [SPARK-23731][SQL] Make FileSourceScanExec canonicalizab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21815 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93334/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21805: [SPARK-24850][SQL] fix str representation of CachedRDDBu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21805 **[Test build #93341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93341/testReport)** for PR 21805 at commit [`cf2eae2`](https://github.com/apache/spark/commit/cf2eae2b93df12e8418897c9bb770abb416cbe1e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21815: [SPARK-23731][SQL] Make FileSourceScanExec canonicalizab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21815 **[Test build #93334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93334/testReport)** for PR 21815 at commit [`7f531bd`](https://github.com/apache/spark/commit/7f531bd3962685ff2bd271af8721653319f618bf). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20761 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93340/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20761 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20761 **[Test build #93340 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93340/testReport)** for PR 20761 at commit [`ad96372`](https://github.com/apache/spark/commit/ad96372f51fc1920da7b0173e2bf0dcc5ef626fe). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20761 **[Test build #93340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93340/testReport)** for PR 20761 at commit [`ad96372`](https://github.com/apache/spark/commit/ad96372f51fc1920da7b0173e2bf0dcc5ef626fe). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20761 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20761 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93339/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20761 **[Test build #93339 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93339/testReport)** for PR 20761 at commit [`b58df80`](https://github.com/apache/spark/commit/b58df80d3e20ceea7e08a6394804e02847addb05). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20761 **[Test build #93339 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93339/testReport)** for PR 20761 at commit [`b58df80`](https://github.com/apache/spark/commit/b58df80d3e20ceea7e08a6394804e02847addb05). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21805: [SPARK-24850][SQL] fix str representation of Cach...
Github user onursatici commented on a diff in the pull request: https://github.com/apache/spark/pull/21805#discussion_r204001527 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala --- @@ -206,4 +206,19 @@ class DatasetCacheSuite extends QueryTest with SharedSQLContext with TimeLimits // first time use, load cache checkDataset(df5, Row(10)) } + + test("SPARK-24850 InMemoryRelation string representation does not include cached plan") { +val dummyQueryExecution = spark.range(0, 1).toDF().queryExecution +val inMemoryRelation = InMemoryRelation( + true, + 1000, + StorageLevel.MEMORY_ONLY, + dummyQueryExecution.sparkPlan, + Some("test-relation"), + dummyQueryExecution.logical) + + assert(!inMemoryRelation.simpleString.contains(dummyQueryExecution.sparkPlan.toString)) +assert(inMemoryRelation.simpleString.contains( + "CachedRDDBuilder(true, 1000, StorageLevel(memory, deserialized, 1 replicas))")) --- End diff -- @gatorsmile tried to keep this close to its default value, maybe we can do something like `CachedRDDBuilder(useCompression = true, batchSize = 1000, ...)`? But that will break the consistency across logging case classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21805: [SPARK-24850][SQL] fix str representation of Cach...
Github user onursatici commented on a diff in the pull request: https://github.com/apache/spark/pull/21805#discussion_r204001546 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala --- @@ -206,4 +206,19 @@ class DatasetCacheSuite extends QueryTest with SharedSQLContext with TimeLimits // first time use, load cache checkDataset(df5, Row(10)) } + + test("SPARK-24850 InMemoryRelation string representation does not include cached plan") { +val dummyQueryExecution = spark.range(0, 1).toDF().queryExecution +val inMemoryRelation = InMemoryRelation( + true, + 1000, + StorageLevel.MEMORY_ONLY, + dummyQueryExecution.sparkPlan, + Some("test-relation"), + dummyQueryExecution.logical) + + assert(!inMemoryRelation.simpleString.contains(dummyQueryExecution.sparkPlan.toString)) +assert(inMemoryRelation.simpleString.contains( + "CachedRDDBuilder(true, 1000, StorageLevel(memory, deserialized, 1 replicas))")) --- End diff -- @maropu wouldn't that be testing the same thing, as explain calls `plan.treeString` which calls `elem.simpleString` for every child? I think testing for `InMemoryRelation.simpleString` covers other possible places where a `plan.treeString` is logged. Happy to change if you have concerns --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21103 cc @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21828: Update regression.py
Github user woodthom2 commented on a diff in the pull request: https://github.com/apache/spark/pull/21828#discussion_r203997163 --- Diff: python/pyspark/ml/regression.py --- @@ -1116,7 +1116,7 @@ def setParams(self, featuresCol="features", labelCol="label", predictionCol="pre maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False, subsamplingRate=1.0, checkpointInterval=10, lossType="squared", maxIter=20, stepSize=0.1, seed=None, - impuriy="variance", featureSubsetStrategy="all"): + impurity="variance", featureSubsetStrategy="all"): --- End diff -- what about this until next major release? ``` def setParams(self, featuresCol="features", labelCol="label", predictionCol="prediction", maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False, subsamplingRate=1.0, checkpointInterval=10, lossType="squared", maxIter=20, stepSize=0.1, seed=None, impuriy=None, impurity="variance", featureSubsetStrategy="all"): if impuriy is not None: # for backward compatibility impurity = impuriy ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21828: Update regression.py
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21828#discussion_r203995996 --- Diff: python/pyspark/ml/regression.py --- @@ -1116,7 +1116,7 @@ def setParams(self, featuresCol="features", labelCol="label", predictionCol="pre maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False, subsamplingRate=1.0, checkpointInterval=10, lossType="squared", maxIter=20, stepSize=0.1, seed=None, - impuriy="variance", featureSubsetStrategy="all"): + impurity="variance", featureSubsetStrategy="all"): --- End diff -- Of course that's possible - user upgraded Spark and suddenly it gives no such keyword exception. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93325/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21822 **[Test build #93325 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93325/testReport)** for PR 21822 at commit [`83ffa51`](https://github.com/apache/spark/commit/83ffa51f4b165152dea214be4d73dd518d742a56). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21820: [SPARK-24868][PYTHON]add sequence function in Pyt...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21820 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21828: Update regression.py
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21828 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21828: Update regression.py
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21828 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21828: Update regression.py
Github user woodthom2 commented on a diff in the pull request: https://github.com/apache/spark/pull/21828#discussion_r203994440 --- Diff: python/pyspark/ml/regression.py --- @@ -1116,7 +1116,7 @@ def setParams(self, featuresCol="features", labelCol="label", predictionCol="pre maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False, subsamplingRate=1.0, checkpointInterval=10, lossType="squared", maxIter=20, stepSize=0.1, seed=None, - impuriy="variance", featureSubsetStrategy="all"): + impurity="variance", featureSubsetStrategy="all"): --- End diff -- is anyone depending on the typoed version? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21828: Update regression.py
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21828#discussion_r203994108 --- Diff: python/pyspark/ml/regression.py --- @@ -1116,7 +1116,7 @@ def setParams(self, featuresCol="features", labelCol="label", predictionCol="pre maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False, subsamplingRate=1.0, checkpointInterval=10, lossType="squared", maxIter=20, stepSize=0.1, seed=None, - impuriy="variance", featureSubsetStrategy="all"): + impurity="variance", featureSubsetStrategy="all"): --- End diff -- I think this can't just changed like this since it's going to break other users codes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21820: [SPARK-24868][PYTHON]add sequence function in Python
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21820 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21828: Update regression.py
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21828 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21828: Update regression.py
GitHub user woodthom2 opened a pull request: https://github.com/apache/spark/pull/21828 Update regression.py Correct typo impuriy -> impurity (this would have stopped GBT working for some hyperparameter configurations) ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/woodthom2/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21828.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21828 commit d3973d0149ce313210c1f1930a9450bb33d70960 Author: woodthom2 Date: 2018-07-20T09:51:59Z Update regression.py Correct typo impuriy -> impurity (this would have stopped GBT working for some hyperparameter configurations) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21823 **[Test build #93338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93338/testReport)** for PR 21823 at commit [`86c7ed6`](https://github.com/apache/spark/commit/86c7ed6dd4e2790e64148ff2dc6e856b2b2fd80a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21821 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93323/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21821 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21823 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1169/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21827: [SPARK-24873]Increase switch to shielding frequent inter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21827 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21823 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21827: [SPARK-24873]Increase switch to shielding frequent inter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21827 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21821 **[Test build #93323 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93323/testReport)** for PR 21821 at commit [`9edc28f`](https://github.com/apache/spark/commit/9edc28fdcb7261f01db716f65e723668a493327e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21823 **[Test build #93337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93337/testReport)** for PR 21823 at commit [`b5b2a1b`](https://github.com/apache/spark/commit/b5b2a1b5c1c2e8b04fe40c165c5827f3380a472b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21827: [SPARK-24873]Increase switch to shielding frequent inter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21827 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21827: [SPARK-24873]Increase switch to shielding frequen...
GitHub user hejiefang opened a pull request: https://github.com/apache/spark/pull/21827 [SPARK-24873]Increase switch to shielding frequent interaction report⦠[https://issues.apache.org/jira/browse/SPARK-24873](url) [SPARK-24873]Increase switch to shielding frequent interaction report⦠You can merge this pull request into a Git repository by running: $ git pull https://github.com/hejiefang/spark spark-24873 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21827.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21827 commit 7f9ef388679e8b9a282befc3c5a031a2199d0eb0 Author: hejiefang Date: 2018-07-20T09:39:11Z [SPARK-24873]Increase switch to shielding frequent interaction reports with yarn --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21823 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1168/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21823 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21823: [SPARK-24870][SQL]Cache can't work normally if th...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/21823#discussion_r203990617 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala --- @@ -50,4 +52,30 @@ class CanonicalizeSuite extends SparkFunSuite { assert(range.where(arrays1).sameResult(range.where(arrays2))) assert(!range.where(arrays1).sameResult(range.where(arrays3))) } + + test("Canonicalized result is not case-insensitive") { --- End diff -- Ok,modified,thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol “||” of the “OR”...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21826 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol “||” of the “OR”...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21826 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20761 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20761 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93336/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20761 **[Test build #93336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93336/testReport)** for PR 20761 at commit [`0ff9dee`](https://github.com/apache/spark/commit/0ff9dee17e720fd448ad3c3939e5a2937a13b711). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol “||” of the “OR”...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21826 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21826: [SPARK-24872] Remove the symbol “||” of the �...
GitHub user httfighter opened a pull request: https://github.com/apache/spark/pull/21826 [SPARK-24872] Remove the symbol â||â of the âORâ operation ## What changes were proposed in this pull request? â||â will perform the function of STRING concat, and it is also the symbol of the "OR" operation. When I want use "||" as "OR" operation, I find that it perform the function of STRING concatï¼ spark-sql> explain extended select * from aa where id==1 || id==2; == Parsed Logical Plan == 'Project [*] +- 'Filter (('id = concat(1, 'id)) = 2) +- 'UnresolvedRelation `aa` spark-sql> select "abc" || "DFF" ; And the result is "abcDFF". In predicates.scala, "||" is the symbol of "Or" operation. Could we remove it? ## How was this patch tested? We can test this patch with unit tests. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/httfighter/spark SPARK-24872 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21826.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21826 commit fb98029c451023789a2c7fa0e758c6c8790bbaea Author: é©ç°ç°00222924 Date: 2018-07-20T09:19:54Z SPARK-24872 Remove the symbol â||â of the âORâ operation --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS FLOAT ...
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/21789 @mgaido91 yes , update it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20761: [SPARK-20327][CORE][YARN] Add CLI support for YARN custo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20761 **[Test build #93336 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93336/testReport)** for PR 20761 at commit [`0ff9dee`](https://github.com/apache/spark/commit/0ff9dee17e720fd448ad3c3939e5a2937a13b711). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21652 @felixcheung can I have a merge pls? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21815: [SPARK-23731][SQL] Make FileSourceScanExec canonicalizab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21815 **[Test build #93335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93335/testReport)** for PR 21815 at commit [`be6e594`](https://github.com/apache/spark/commit/be6e5941991ca045100456e11a59a9b2eb77a1ea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21815: [SPARK-23731][SQL] Make FileSourceScanExec canonicalizab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21815 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21815: [SPARK-23731][SQL] Make FileSourceScanExec canonicalizab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21815 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1167/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21815: [SPARK-23731][SQL] Make FileSourceScanExec canonicalizab...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21815 **[Test build #93334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93334/testReport)** for PR 21815 at commit [`7f531bd`](https://github.com/apache/spark/commit/7f531bd3962685ff2bd271af8721653319f618bf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21815: [SPARK-23731][SQL] Make FileSourceScanExec canonicalizab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21815 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1166/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21815: [SPARK-23731][SQL] Make FileSourceScanExec canonicalizab...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21815 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21815: [SPARK-23731][SQL] Make FileSourceScanExec canonicalizab...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21815 Let me update it soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21789: [SPARK-24829][SQL]In Spark Thrift Server, CAST AS FLOAT ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21789 @zuotingbing I think what @dongjoon-hyun was suggesting you was to put `[STS]` instead of `[SQL]` in the title of the PR. May you please update accordingly? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21815: [SPARK-23731][SQL] Make FileSourceScanExec canoni...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21815#discussion_r203980465 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/FileSourceScanExecSuite.scala --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.test.SharedSQLContext + +class FileSourceScanExecSuite extends SharedSQLContext { + test("FileSourceScanExec should be canonicalizable on executor side") { --- End diff -- `SparkPlanSuite` SGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21823: [SPARK-24870][SQL]Cache can't work normally if th...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21823#discussion_r203980186 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala --- @@ -50,4 +52,30 @@ class CanonicalizeSuite extends SparkFunSuite { assert(range.where(arrays1).sameResult(range.where(arrays2))) assert(!range.where(arrays1).sameResult(range.where(arrays3))) } + + test("Canonicalized result is not case-insensitive") { --- End diff -- let's move it to `SameResultSuite`, also let's pick a simpler test, like using a `Project` with one columns instead of `Aggregate`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21815: [SPARK-23731][SQL] Make FileSourceScanExec canoni...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21815#discussion_r203979619 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/FileSourceScanExecSuite.scala --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.test.SharedSQLContext + +class FileSourceScanExecSuite extends SharedSQLContext { + test("FileSourceScanExec should be canonicalizable on executor side") { --- End diff -- I found `SparkPlanSuite` could be another place to add to address your comment. Let me stick to `FileSourceScanExec` but please let me know if you prefer this please. I don't mind changing it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21823: [SPARK-24870][SQL]Cache can't work normally if th...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21823#discussion_r203979598 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -282,9 +282,9 @@ object QueryPlan extends PredicateHelper { case ar: AttributeReference => val ordinal = input.indexOf(ar.exprId) if (ordinal == -1) { - ar + ar.withName("") } else { - ar.withExprId(ExprId(ordinal)) + ar.withExprId(ExprId(ordinal)).withName("") --- End diff -- I think we just need to add a `.canonicalized` at the end. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21823: [SPARK-24870][SQL]Cache can't work normally if th...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21823#discussion_r203979413 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -237,7 +237,7 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT // Top level `AttributeReference` may also be used for output like `Alias`, we should // normalize the epxrId too. id += 1 -ar.withExprId(ExprId(id)).canonicalized +ar.withExprId(ExprId(id)).withName("").canonicalized --- End diff -- oh wait. I think we've already erased the name, in `Expression#canonicalized` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21815: [SPARK-23731][SQL] Make FileSourceScanExec canoni...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21815#discussion_r203979151 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/FileSourceScanExecSuite.scala --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.test.SharedSQLContext + +class FileSourceScanExecSuite extends SharedSQLContext { + test("FileSourceScanExec should be canonicalizable on executor side") { --- End diff -- I think I can actually put this under `SparkPlanSuite`. Let me put this it in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21823: [SPARK-24870][SQL]Cache can't work normally if th...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21823#discussion_r203978990 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -282,9 +282,9 @@ object QueryPlan extends PredicateHelper { case ar: AttributeReference => val ordinal = input.indexOf(ar.exprId) if (ordinal == -1) { - ar + ar.withName("") --- End diff -- let's leave it. We don't even normalize the exprId here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21823 **[Test build #9 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/9/testReport)** for PR 21823 at commit [`1aefcb3`](https://github.com/apache/spark/commit/1aefcb370ad972cfc17315d000569da1f11c61ef). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21823 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21823 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1165/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21815: [SPARK-23731][SQL] Make FileSourceScanExec canoni...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21815#discussion_r203976429 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/FileSourceScanExecSuite.scala --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.test.SharedSQLContext + +class FileSourceScanExecSuite extends SharedSQLContext { + test("FileSourceScanExec should be canonicalizable on executor side") { --- End diff -- There's few things bothering for that actually - it's kind of messy to create `FileSourceScanExec` without `SparkSession` (and also without other utils from `SharedSQLContext`), and `QueryPlanSuite` is under `catalyst` whereas this plan itself is under `execution` in SQL core. And, I actually believe this PR more targets to make the plan canonicalizable after it's de/serialized since this plan itself is serializable and deserializable already but it's not canonicalizable after that. Let me try to clean up based on your comment anyway. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21823 **[Test build #93332 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93332/testReport)** for PR 21823 at commit [`c01cf89`](https://github.com/apache/spark/commit/c01cf897daea314c43c96253c0b41aace72637ac). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21823 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21823: [SPARK-24870][SQL]Cache can't work normally if there are...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21823 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1164/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21802: [SPARK-23928][SQL] Add shuffle collection functio...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21802#discussion_r203974038 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1184,6 +1186,137 @@ case class ArraySort(child: Expression) extends UnaryExpression with ArraySortLi override def prettyName: String = "array_sort" } +/** + * Returns a random permutation of the given array. + * + * This implementation uses the modern version of Fisher-Yates algorithm. + * Reference: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Modern_method --- End diff -- Oh, I see. Let me try. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21823: [SPARK-24870][SQL]Cache can't work normally if th...
Github user eatoncys commented on a diff in the pull request: https://github.com/apache/spark/pull/21823#discussion_r203972375 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -237,7 +239,7 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT // Top level `AttributeReference` may also be used for output like `Alias`, we should // normalize the epxrId too. id += 1 -ar.withExprId(ExprId(id)).canonicalized + ar.withExprId(ExprId(id)).withName(ar.name.toLowerCase(Locale.ROOT)).canonicalized --- End diff -- I think it is Ok, and it erase the attribute name in spark version 2.0.2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21774: [SPARK-24811][SQL]Avro: add new function from_avro and t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21774 **[Test build #93331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93331/testReport)** for PR 21774 at commit [`7179e85`](https://github.com/apache/spark/commit/7179e85f49fbd2f6f1a6a0d27dae474d6df12cea). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21774: [SPARK-24811][SQL]Avro: add new function from_avro and t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21774 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21754: [SPARK-24705][SQL] Cannot reuse an exchange opera...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21754#discussion_r203972003 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala --- @@ -85,14 +85,20 @@ case class ReusedExchangeExec(override val output: Seq[Attribute], child: Exchan */ case class ReuseExchange(conf: SQLConf) extends Rule[SparkPlan] { + private def supportReuseExchange(exchange: Exchange): Boolean = exchange match { +// If a coordinator defined in an exchange operator, the exchange cannot be reused --- End diff -- Ah, ok. Iâll check if we can. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21774: [SPARK-24811][SQL]Avro: add new function from_avro and t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21774 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1163/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21774: [SPARK-24811][SQL]Avro: add new function from_avro and t...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/21774 I will create another separate PR to totally remove SerializableSchema. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21823: [SPARK-24870][SQL]Cache can't work normally if th...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21823#discussion_r203969926 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -237,7 +239,7 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT // Top level `AttributeReference` may also be used for output like `Alias`, we should // normalize the epxrId too. id += 1 -ar.withExprId(ExprId(id)).canonicalized + ar.withExprId(ExprId(id)).withName(ar.name.toLowerCase(Locale.ROOT)).canonicalized --- End diff -- shall we just erase the attribute name like alias? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21802: [SPARK-23928][SQL] Add shuffle collection functio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21802#discussion_r203968939 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2086,6 +2087,20 @@ class Analyzer( } } + /** + * Set the seed for random number generation in Shuffle expressions. + */ + object ResolvedShuffleExpressions extends Rule[LogicalPlan] { +private lazy val random = new Random() + +override def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp { + case p if p.resolved => p + case p => p transformExpressionsUp { +case Shuffle(child, None) => Shuffle(child, Some(random.nextLong())) --- End diff -- then can we use a single rule to assign seed to these randomized functions? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uui...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20861#discussion_r203968743 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1994,6 +1996,20 @@ class Analyzer( } } + /** + * Set the seed for random number generation in Uuid expressions. + */ + object ResolvedUuidExpressions extends Rule[LogicalPlan] { +private lazy val random = new Random() + +override def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp { + case p if p.resolved => p + case p => p transformExpressionsUp { +case Uuid(None) => Uuid(Some(random.nextLong())) --- End diff -- shall we do the same thing for `Rand`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21802: [SPARK-23928][SQL] Add shuffle collection functio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21802#discussion_r203968608 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1184,6 +1186,137 @@ case class ArraySort(child: Expression) extends UnaryExpression with ArraySortLi override def prettyName: String = "array_sort" } +/** + * Returns a random permutation of the given array. + * + * This implementation uses the modern version of Fisher-Yates algorithm. + * Reference: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#Modern_method --- End diff -- if we create a new array, I guess there should be some simpler algorithms without swapping... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21754: [SPARK-24705][SQL] Cannot reuse an exchange opera...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21754#discussion_r203968311 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala --- @@ -85,14 +85,20 @@ case class ReusedExchangeExec(override val output: Seq[Attribute], child: Exchan */ case class ReuseExchange(conf: SQLConf) extends Rule[SparkPlan] { + private def supportReuseExchange(exchange: Exchange): Boolean = exchange match { +// If a coordinator defined in an exchange operator, the exchange cannot be reused --- End diff -- I think object reference also works, since currently if it's same coordinator, it's the same object. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21825: [SPARK-18188][DOC][FOLLOW-UP]Add `spark.broadcast.checks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21825 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21825: [SPARK-18188][DOC][FOLLOW-UP]Add `spark.broadcast.checks...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21825 **[Test build #93329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93329/testReport)** for PR 21825 at commit [`6a85aad`](https://github.com/apache/spark/commit/6a85aadc33a6d1ba18d028eeafce3167e5b7aaf7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21825: [SPARK-18188][DOC][FOLLOW-UP]Add `spark.broadcast.checks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21825 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93329/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21754: [SPARK-24705][SQL] Cannot reuse an exchange opera...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21754#discussion_r203968016 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala --- @@ -85,14 +85,20 @@ case class ReusedExchangeExec(override val output: Seq[Attribute], child: Exchan */ case class ReuseExchange(conf: SQLConf) extends Rule[SparkPlan] { + private def supportReuseExchange(exchange: Exchange): Boolean = exchange match { +// If a coordinator defined in an exchange operator, the exchange cannot be reused --- End diff -- can we assign an id to the `ExchangeCoordinator` so that we can correctly tell if they are same coordinators? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21754: [SPARK-24705][SQL] Cannot reuse an exchange opera...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21754#discussion_r203966689 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala --- @@ -85,14 +85,20 @@ case class ReusedExchangeExec(override val output: Seq[Attribute], child: Exchan */ case class ReuseExchange(conf: SQLConf) extends Rule[SparkPlan] { + private def supportReuseExchange(exchange: Exchange): Boolean = exchange match { +// If a coordinator defined in an exchange operator, the exchange cannot be reused --- End diff -- We might be able to logically reuse the same coordinator though, it seems to be difficult to implement based on the current master, I think. In the current adaptive query execution, exchanges (between stages) registered in a coordinator and their partition size are decided on runtime (inside `SparkPlan.execute()`). Since `ReuseExchange` runs in the final phase of planning. So, it is difficult to tell which coordinator can be reused at that time. So, to archive the reuse, we might need some refactoring about these logics... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21824: [SPARK-24871][SQL] Refactor Concat and MapConcat to avoi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21824 **[Test build #93326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93326/testReport)** for PR 21824 at commit [`c254523`](https://github.com/apache/spark/commit/c2545232d2157311ab3ea3ccf6dd45f1a5024f02). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21824: [SPARK-24871][SQL] Refactor Concat and MapConcat to avoi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21824 **[Test build #93330 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93330/testReport)** for PR 21824 at commit [`c254523`](https://github.com/apache/spark/commit/c2545232d2157311ab3ea3ccf6dd45f1a5024f02). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21824: [SPARK-24871][SQL] Refactor Concat and MapConcat to avoi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21824 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1162/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21824: [SPARK-24871][SQL] Refactor Concat and MapConcat to avoi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21824 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/21103#discussion_r203964623 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -3805,3 +3799,332 @@ object ArrayUnion { new GenericArrayData(arrayBuffer) } } + +/** + * Returns an array of the elements in the intersect of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ + _FUNC_(array1, array2) - Returns an array of the elements in array1 but not in array2, +without duplicates. + """, + examples = """ +Examples:Fun + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(2) + """, + since = "2.4.0") +case class ArrayExcept(left: Expression, right: Expression) extends ArraySetLike { + override def dataType: DataType = ArrayType(elementType, --- End diff -- Yeah this case is valid. But ```containsNull``` flag is defined for the whole column (accross multiple rows). Since this flag could cause removal of null safe check in expressions that will use ```ArrayExcept``` as a child, it could lead to failures with ```NullPointerException``` for the cases as in the second row of the example dataset. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21825: [SPARK-18188][DOC][FOLLOW-UP]Add `spark.broadcast.checks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21825 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1161/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21825: [SPARK-18188][DOC][FOLLOW-UP]Add `spark.broadcast.checks...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21825 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org