[GitHub] AmplabJenkins commented on issue #23581: [SPARK-22465][CORE][FOLLOWUP]Use existing partitioner when defaultNumPartitions is equal to maxPartitioner.numPartitions
AmplabJenkins commented on issue #23581: [SPARK-22465][CORE][FOLLOWUP]Use existing partitioner when defaultNumPartitions is equal to maxPartitioner.numPartitions URL: https://github.com/apache/spark/pull/23581#issuecomment-455458800 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23581: [SPARK-22465][CORE][FOLLOWUP]Use existing partitioner when defaultNumPartitions is equal to maxPartitioner.numPartitions
AmplabJenkins removed a comment on issue #23581: [SPARK-22465][CORE][FOLLOWUP]Use existing partitioner when defaultNumPartitions is equal to maxPartitioner.numPartitions URL: https://github.com/apache/spark/pull/23581#issuecomment-455457883 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23581: [SPARK-22465][CORE][FOLLOWUP]Use existing partitioner when defaultNumPartitions is equal to maxPartitioner.numPartitions
AmplabJenkins commented on issue #23581: [SPARK-22465][CORE][FOLLOWUP]Use existing partitioner when defaultNumPartitions is equal to maxPartitioner.numPartitions URL: https://github.com/apache/spark/pull/23581#issuecomment-455458718 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] Ngone51 commented on issue #23581: [SPARK-22465][FOLLOWUP]Use existing partitioner when defaultNumPartitions is equal to maxPartitioner.numPartitions
Ngone51 commented on issue #23581: [SPARK-22465][FOLLOWUP]Use existing partitioner when defaultNumPartitions is equal to maxPartitioner.numPartitions URL: https://github.com/apache/spark/pull/23581#issuecomment-455457927 ping @cloud-fan @jiangxb1987 @mridulm @sujithjay . Please take a look, thanks :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23581: [SPARK-22465][FOLLOWUP]Use existing partitioner when defaultNumPartitions is equal to maxPartitioner.numPartitions
AmplabJenkins commented on issue #23581: [SPARK-22465][FOLLOWUP]Use existing partitioner when defaultNumPartitions is equal to maxPartitioner.numPartitions URL: https://github.com/apache/spark/pull/23581#issuecomment-455457883 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] Ngone51 opened a new pull request #23581: [SPARK-22465][FOLLOWUP]Use existing partitioner when defaultNumPartitions is equal to maxPartitioner.numPartitions
Ngone51 opened a new pull request #23581: [SPARK-22465][FOLLOWUP]Use existing partitioner when defaultNumPartitions is equal to maxPartitioner.numPartitions URL: https://github.com/apache/spark/pull/23581 ## What changes were proposed in this pull request? Followup of #20091. We could also use existing partitioner when defaultNumPartitions is equal to the maxPartitioner's numPartitions. ## How was this patch tested? Existed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read)
AmplabJenkins removed a comment on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read) URL: https://github.com/apache/spark/pull/23430#issuecomment-455454939 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101389/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read)
AmplabJenkins removed a comment on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read) URL: https://github.com/apache/spark/pull/23430#issuecomment-455454935 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read)
AmplabJenkins commented on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read) URL: https://github.com/apache/spark/pull/23430#issuecomment-455454935 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read)
AmplabJenkins commented on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read) URL: https://github.com/apache/spark/pull/23430#issuecomment-455454939 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101389/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read)
SparkQA removed a comment on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read) URL: https://github.com/apache/spark/pull/23430#issuecomment-455419461 **[Test build #101389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101389/testReport)** for PR 23430 at commit [`5a4047e`](https://github.com/apache/spark/commit/5a4047e84a1e46a247a962137e77cf83390200aa). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read)
SparkQA commented on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read) URL: https://github.com/apache/spark/pull/23430#issuecomment-455454537 **[Test build #101389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101389/testReport)** for PR 23430 at commit [`5a4047e`](https://github.com/apache/spark/commit/5a4047e84a1e46a247a962137e77cf83390200aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] jerryshao commented on a change in pull request #23578: Fix typo in README.md
jerryshao commented on a change in pull request #23578: Fix typo in README.md URL: https://github.com/apache/spark/pull/23578#discussion_r248949000 ## File path: README.md ## @@ -1,6 +1,6 @@ # Apache Spark -Spark is a fast and general cluster computing system for Big Data. It provides +Spark is a fast and general cluster computing system for big data. It provides Review comment: Is this really a typo? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] dongjoon-hyun commented on a change in pull request #23542: [WIP] [SPARK-25603][SQL] Pushing Down Nested Field projections
dongjoon-hyun commented on a change in pull request #23542: [WIP] [SPARK-25603][SQL] Pushing Down Nested Field projections URL: https://github.com/apache/spark/pull/23542#discussion_r248944103 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala ## @@ -85,10 +90,95 @@ class ParquetSchemaPruningSuite briefContacts.map { case BriefContact(id, name, address) => BriefContactWithDataPartitionColumn(id, name, address, 2) } + testSchemaPruning("testing bug") { +val data = sql("select * from contacts") +import data.sqlContext.implicits._ + +val firstAndLastName = udf((first: String, last: String) => first + " " + last) + +data.show(10) + +val query = data.as[Contact] + .map(c => c.copy(id = 2)) + .select("id", "name", "address", "friends") + .select(col("id"), firstAndLastName(col("name.first"), col("name.last"))) + +query.explain(true) +query.show(true) + +val a =10 Review comment: Could you remove this line (which is failing the ScalaStyle test)? Then, we can see UT test result. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] dongjoon-hyun commented on a change in pull request #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
dongjoon-hyun commented on a change in pull request #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#discussion_r248941983 ## File path: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java ## @@ -59,28 +57,23 @@ public UnsafeSorterSpillReader( File file, BlockId blockId) throws IOException { assert (file.length() > 0); -long bufferSizeBytes = +final ConfigEntry bufferSizeConfigEntry = +package$.MODULE$.UNSAFE_SORTER_SPILL_READER_BUFFER_SIZE(); +// This value must be less than or equal to MAX_BUFFER_SIZE_BYTES. Cast to int is always safe. +final int DEFAULT_BUFFER_SIZE_BYTES = (int)(long)bufferSizeConfigEntry.defaultValue().get(); Review comment: My bad. Please change like this, @kiszk . ```scala -final int DEFAULT_BUFFER_SIZE_BYTES = (int)(long)bufferSizeConfigEntry.defaultValue().get(); +final int DEFAULT_BUFFER_SIZE_BYTES = ((Long)bufferSizeConfigEntry.defaultValue().get()).intValue(); int bufferSizeBytes = SparkEnv.get() == null ? -DEFAULT_BUFFER_SIZE_BYTES : (int)SparkEnv.get().conf().get(bufferSizeConfigEntry); +DEFAULT_BUFFER_SIZE_BYTES : ((Long)SparkEnv.get().conf().get(bufferSizeConfigEntry)).intValue(); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] asfgit closed pull request #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
asfgit closed pull request #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455444383 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7203/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455444383 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7203/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455444380 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
SparkQA commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-45576 **[Test build #101392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101392/testReport)** for PR 23117 at commit [`18c40d9`](https://github.com/apache/spark/commit/18c40d925d7e93cc9221f3358d5028bf5ba007bd). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455444380 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] inpefess commented on a change in pull request #20691: [SPARK-18161] [Python] Allow pickle to serialize >4 GB objects when possible (Python 3.4+)
inpefess commented on a change in pull request #20691: [SPARK-18161] [Python] Allow pickle to serialize >4 GB objects when possible (Python 3.4+) URL: https://github.com/apache/spark/pull/20691#discussion_r248941059 ## File path: python/pyspark/cloudpickle.py ## @@ -852,11 +853,11 @@ def _rebuild_tornado_coroutine(func): # Shorthands for legacy support -def dump(obj, file, protocol=2): +def dump(obj, file, protocol=protocol): Review comment: Ok, I'll copy the newest cloudpickle version then. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile edited a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
gatorsmile edited a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455443946 LGTM Thanks! Merged to master This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] gatorsmile commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
gatorsmile commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455443946 LGTM Thanks! Merged to master/2.4 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
HyukjinKwon commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455443520 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-455443259 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-455443263 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7202/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
HyukjinKwon commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455443498 I filed for the flaky test (SPARK-26646). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-455443259 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-455443263 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7202/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
SparkQA commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-455442253 **[Test build #101391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101391/testReport)** for PR 23412 at commit [`f04a6f7`](https://github.com/apache/spark/commit/f04a6f7454f6bd38c3fd02761f237916d87a5c2d). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] dongjoon-hyun commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
dongjoon-hyun commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-455442016 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
AmplabJenkins removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455438973 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
AmplabJenkins removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455438978 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101387/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
AmplabJenkins commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455438973 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
AmplabJenkins commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455438978 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101387/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
SparkQA removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455406101 **[Test build #101387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101387/testReport)** for PR 23579 at commit [`176b4d0`](https://github.com/apache/spark/commit/176b4d02f2b40a28b3f9f0773ff0d532deffe000). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
SparkQA commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455438687 **[Test build #101387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101387/testReport)** for PR 23579 at commit [`176b4d0`](https://github.com/apache/spark/commit/176b4d02f2b40a28b3f9f0773ff0d532deffe000). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] j-esse commented on issue #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions
j-esse commented on issue #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions URL: https://github.com/apache/spark/pull/23556#issuecomment-455438556 @maropu no I don't think this is related to CleanupAliases - we can't clean up the aliases, because they still need to be used in the expression; we just can't substitute them (if they're large) or we risk OOMing This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455436592 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101386/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455436588 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455436592 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101386/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
AmplabJenkins commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455436588 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
SparkQA removed a comment on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455399364 **[Test build #101386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101386/testReport)** for PR 23117 at commit [`18c40d9`](https://github.com/apache/spark/commit/18c40d925d7e93cc9221f3358d5028bf5ba007bd). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins
SparkQA commented on issue #23117: [WIP][SPARK-7721][INFRA] Run and generate test coverage report from Python via Jenkins URL: https://github.com/apache/spark/pull/23117#issuecomment-455436351 **[Test build #101386 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101386/testReport)** for PR 23117 at commit [`18c40d9`](https://github.com/apache/spark/commit/18c40d925d7e93cc9221f3358d5028bf5ba007bd). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
AmplabJenkins commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455435536 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
AmplabJenkins removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455435536 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
AmplabJenkins commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455435539 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101385/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
AmplabJenkins removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455435539 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101385/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
SparkQA removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455399348 **[Test build #101385 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101385/testReport)** for PR 23579 at commit [`b5f7089`](https://github.com/apache/spark/commit/b5f7089a66b95fd79ec9b0606bd3316e03c1808a). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] liupc commented on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size
liupc commented on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size URL: https://github.com/apache/spark/pull/23580#issuecomment-455435280 @maropu Yes, they are different. `TaskSetManager.TASK_SIZE_TO_WARN_KB` is for warning a task which including addedJars, addedFiles and taskProperties and etc. However, here we are warning for broadcasting large task binary, it's mainly for checking unexpected data brought by closure. For instance, user may mis-introduced a large memory objects in closure. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
SparkQA commented on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455435245 **[Test build #101385 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101385/testReport)** for PR 23579 at commit [`b5f7089`](https://github.com/apache/spark/commit/b5f7089a66b95fd79ec9b0606bd3316e03c1808a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] j-esse commented on a change in pull request #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions
j-esse commented on a change in pull request #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions URL: https://github.com/apache/spark/pull/23556#discussion_r248933024 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -686,6 +687,28 @@ object CollapseProject extends Rule[LogicalPlan] { }.exists(!_.deterministic)) } + private def hasOversizedRepeatedAliases( + upper: Seq[NamedExpression], lower: Seq[NamedExpression]): Boolean = { +val aliases = collectAliases(lower) + +// Count how many times each alias is used in the upper Project. +// If an alias is only used once, we can safely substitute it without increasing the overall +// tree size +val referenceCounts = AttributeMap( + upper +.flatMap(_.collect { case a: Attribute => a }) +.groupBy(identity) +.mapValues(_.size).toSeq +) + +// Check for any aliases that are used more than once, and are larger than the configured +// maximum size +aliases.exists({ case (attribute, expression) => + referenceCounts.getOrElse(attribute, 0) > 1 && +expression.treeSize > SQLConf.get.maxRepeatedAliasSize Review comment: This isn't trying to determine the cost of the expression - the cost of the expression is irrelevant here, we're just trying to determine the size of the expression itself (using tree size as a proxy for memory size). That way, if the expression is too large (takes up too much memory) we can prevent OOMs by not de-aliasing it multiple times (and thus greatly increasing the amount of heap the expression tree takes up). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HeartSaVioR commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
HeartSaVioR commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455430715 I'll work on fixing UTs: not sure why `@JsonIgnore` doesn't work. Also I'm testing manually on YARN cluster as well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-455428001 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101384/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-455427997 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] cloud-fan commented on a change in pull request #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions
cloud-fan commented on a change in pull request #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions URL: https://github.com/apache/spark/pull/23556#discussion_r248928260 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -686,6 +687,28 @@ object CollapseProject extends Rule[LogicalPlan] { }.exists(!_.deterministic)) } + private def hasOversizedRepeatedAliases( + upper: Seq[NamedExpression], lower: Seq[NamedExpression]): Boolean = { +val aliases = collectAliases(lower) + +// Count how many times each alias is used in the upper Project. +// If an alias is only used once, we can safely substitute it without increasing the overall +// tree size +val referenceCounts = AttributeMap( + upper +.flatMap(_.collect { case a: Attribute => a }) +.groupBy(identity) +.mapValues(_.size).toSeq +) + +// Check for any aliases that are used more than once, and are larger than the configured +// maximum size +aliases.exists({ case (attribute, expression) => + referenceCounts.getOrElse(attribute, 0) > 1 && +expression.treeSize > SQLConf.get.maxRepeatedAliasSize Review comment: I'm not sure about using `treeSize` as the cost of an expression. UDF can be very expensive even if its `treeSize` is 1. How about we simplify it with a blacklist? e.g. UDF is expensive and we shouldn't collapse projects if udf is repeated. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-455428001 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101384/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
SparkQA removed a comment on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-455391223 **[Test build #101384 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101384/testReport)** for PR 23412 at commit [`f04a6f7`](https://github.com/apache/spark/commit/f04a6f7454f6bd38c3fd02761f237916d87a5c2d). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
AmplabJenkins commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-455427997 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] maropu commented on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size
maropu commented on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size URL: https://github.com/apache/spark/pull/23580#issuecomment-455427901 Is this different from `TaskSetManager.TASK_SIZE_TO_WARN_KB`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455427587 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101383/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category
SparkQA commented on issue #23412: [SPARK-26477][CORE] Use ConfigEntry for hardcoded configs for unsafe category URL: https://github.com/apache/spark/pull/23412#issuecomment-455427855 **[Test build #101384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101384/testReport)** for PR 23412 at commit [`f04a6f7`](https://github.com/apache/spark/commit/f04a6f7454f6bd38c3fd02761f237916d87a5c2d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455427585 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455427585 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
SparkQA removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455387860 **[Test build #101383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101383/testReport)** for PR 23260 at commit [`37e6ee5`](https://github.com/apache/spark/commit/37e6ee5ab0c77ecbd69fb578784be70c5a4d15a2). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455427587 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101383/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
SparkQA commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455427454 **[Test build #101383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101383/testReport)** for PR 23260 at commit [`37e6ee5`](https://github.com/apache/spark/commit/37e6ee5ab0c77ecbd69fb578784be70c5a4d15a2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] maropu commented on issue #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions
maropu commented on issue #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions URL: https://github.com/apache/spark/pull/23556#issuecomment-455426167 Not enough to just keep an `Alias` on the top only like `CleanupAliases` in the analyzer? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables
AmplabJenkins removed a comment on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables URL: https://github.com/apache/spark/pull/23545#issuecomment-455423525 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables
AmplabJenkins removed a comment on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables URL: https://github.com/apache/spark/pull/23545#issuecomment-455423527 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7201/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables
AmplabJenkins commented on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables URL: https://github.com/apache/spark/pull/23545#issuecomment-455423527 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7201/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables
AmplabJenkins commented on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables URL: https://github.com/apache/spark/pull/23545#issuecomment-455423525 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables
SparkQA commented on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables URL: https://github.com/apache/spark/pull/23545#issuecomment-455423359 **[Test build #101390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101390/testReport)** for PR 23545 at commit [`1f78144`](https://github.com/apache/spark/commit/1f7814478779ac71d72a112da85b39368ef03a30). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] beliefer commented on a change in pull request #23574: [SPARK-26643][SQL] Fix incorrect analysis exception about set table properties.
beliefer commented on a change in pull request #23574: [SPARK-26643][SQL] Fix incorrect analysis exception about set table properties. URL: https://github.com/apache/spark/pull/23574#discussion_r248924093 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ## @@ -129,7 +129,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat val invalidKeys = table.properties.keys.filter(_.startsWith(SPARK_SQL_PREFIX)) if (invalidKeys.nonEmpty) { throw new AnalysisException(s"Cannot persistent ${table.qualifiedName} into hive metastore " + -s"as table property keys may not start with '$SPARK_SQL_PREFIX': " + +s"as table property keys may start with '$SPARK_SQL_PREFIX': " + Review comment: > `User-specified table property keys should not start with '$SPARK_SQL_PREFIX'. Invalid table properties: `. There exists a detail is the analysis exception contains a word 'as', the meaning of 'as' is because. so I still think `as table property keys may start with '$SPARK_SQL_PREFIX' is better. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] maropu commented on issue #23551: [SPARK-26622][SQL] Revise SQL Metrics labels
maropu commented on issue #23551: [SPARK-26622][SQL] Revise SQL Metrics labels URL: https://github.com/apache/spark/pull/23551#issuecomment-455422759 NVM, I'll do it after the v2.3.3 release finished if nobody takes on it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] beliefer commented on a change in pull request #23574: [SPARK-26643][SQL] Fix incorrect analysis exception about set table properties.
beliefer commented on a change in pull request #23574: [SPARK-26643][SQL] Fix incorrect analysis exception about set table properties. URL: https://github.com/apache/spark/pull/23574#discussion_r248924093 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ## @@ -129,7 +129,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat val invalidKeys = table.properties.keys.filter(_.startsWith(SPARK_SQL_PREFIX)) if (invalidKeys.nonEmpty) { throw new AnalysisException(s"Cannot persistent ${table.qualifiedName} into hive metastore " + -s"as table property keys may not start with '$SPARK_SQL_PREFIX': " + +s"as table property keys may start with '$SPARK_SQL_PREFIX': " + Review comment: > `User-specified table property keys should not start with '$SPARK_SQL_PREFIX'. Invalid table properties: `. There exists a detail is the analysis exception contains a word 'as', the meaning of 'as' is because. so I still think `as table property keys may start with '$SPARK_SQL_PREFIX' is better. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] maropu commented on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables
maropu commented on issue #23545: [SPARK-25196][SQL][WIP] Extends Analyze commands for cached tables URL: https://github.com/apache/spark/pull/23545#issuecomment-455422632 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size
AmplabJenkins removed a comment on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size URL: https://github.com/apache/spark/pull/23580#issuecomment-455420751 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size
AmplabJenkins commented on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size URL: https://github.com/apache/spark/pull/23580#issuecomment-455420997 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size
AmplabJenkins commented on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size URL: https://github.com/apache/spark/pull/23580#issuecomment-455420751 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size
AmplabJenkins removed a comment on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size URL: https://github.com/apache/spark/pull/23580#issuecomment-455420702 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size
AmplabJenkins commented on issue #23580: [SPARK-26660]Add warning logs for large taskBinary size URL: https://github.com/apache/spark/pull/23580#issuecomment-455420702 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] liupc opened a new pull request #23580: [SPARK-26660]Add warning logs for large taskBinary size
liupc opened a new pull request #23580: [SPARK-26660]Add warning logs for large taskBinary size URL: https://github.com/apache/spark/pull/23580 ## What changes were proposed in this pull request? Currently, some ML library may generate large ml model, thus causing executor may not able to deserialize it and result in OOM failures, user specified closure which refers large data may also have this problem. In order to facilitate the debuging of memory problem caused by large taskBinary broadcast, we may should add same warning logs for it. ## How was this patch tested? NA-Just log changes. Please review http://spark.apache.org/contributing.html before opening a pull request. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read)
AmplabJenkins removed a comment on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read) URL: https://github.com/apache/spark/pull/23430#issuecomment-455419542 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read)
AmplabJenkins removed a comment on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read) URL: https://github.com/apache/spark/pull/23430#issuecomment-455419544 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7200/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read)
AmplabJenkins commented on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read) URL: https://github.com/apache/spark/pull/23430#issuecomment-455419544 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/7200/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read)
AmplabJenkins commented on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read) URL: https://github.com/apache/spark/pull/23430#issuecomment-455419542 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read)
SparkQA commented on issue #23430: [SPARK-26520][SQL] data source v2 API refactor (micro-batch read) URL: https://github.com/apache/spark/pull/23430#issuecomment-455419461 **[Test build #101389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101389/testReport)** for PR 23430 at commit [`5a4047e`](https://github.com/apache/spark/commit/5a4047e84a1e46a247a962137e77cf83390200aa). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455416623 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101382/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455416619 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455416623 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/101382/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
AmplabJenkins commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455416619 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
SparkQA removed a comment on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455364736 **[Test build #101382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101382/testReport)** for PR 23260 at commit [`3037797`](https://github.com/apache/spark/commit/3037797e5809105b77a3868d1e401a0e2c578426). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
SparkQA commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-455416328 **[Test build #101382 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101382/testReport)** for PR 23260 at commit [`3037797`](https://github.com/apache/spark/commit/3037797e5809105b77a3868d1e401a0e2c578426). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs
SparkQA commented on issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous partition IDs URL: https://github.com/apache/spark/pull/19788#issuecomment-455416162 **[Test build #101388 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/101388/testReport)** for PR 19788 at commit [`bd9f70e`](https://github.com/apache/spark/commit/bd9f70ef95158c1de05eaf288134c601698fcfc0). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] ayudovin commented on a change in pull request #23569: [SPARK-25713][Core] - adding copy for ColumnArray
ayudovin commented on a change in pull request #23569: [SPARK-25713][Core] - adding copy for ColumnArray URL: https://github.com/apache/spark/pull/23569#discussion_r248885211 ## File path: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java ## @@ -46,7 +47,33 @@ public int numElements() { @Override public ArrayData copy() { Review comment: It’s overrided method and I think It's not a good idea to change the signature of the method. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] ayudovin commented on a change in pull request #23569: [SPARK-25713][Core] - adding copy for ColumnArray
ayudovin commented on a change in pull request #23569: [SPARK-25713][Core] - adding copy for ColumnArray URL: https://github.com/apache/spark/pull/23569#discussion_r248885211 ## File path: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java ## @@ -46,7 +47,33 @@ public int numElements() { @Override public ArrayData copy() { Review comment: It’s overrided method and I'm not sure that It's a good idea to change the signature of the method. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] ayudovin commented on a change in pull request #23569: [SPARK-25713][Core] - adding copy for ColumnArray
ayudovin commented on a change in pull request #23569: [SPARK-25713][Core] - adding copy for ColumnArray URL: https://github.com/apache/spark/pull/23569#discussion_r248885211 ## File path: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java ## @@ -46,7 +47,33 @@ public int numElements() { @Override public ArrayData copy() { Review comment: It’s overrided method and I'm not sure that I can change the signature of the method. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec
AmplabJenkins removed a comment on issue #23579: [SPARK-26659][SQL] Fix duplicate cmd.nodeName in the explain output of DataWritingCommandExec URL: https://github.com/apache/spark/pull/23579#issuecomment-455407337 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org