[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20545 **[Test build #87215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87215/testReport)** for PR 20545 at commit [`664a62c`](https://github.com/apache/spark/commit/664a62c7da9ba5da2007d40ef9c157f7e82938c5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20545 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20545 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/714/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20545 cc @rxin, could you take a look when you are available please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fiel...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/20545 [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames' in Scala ## What changes were proposed in this pull request? This PR proposes to add an alias 'names' of 'fieldNames' in Scala. Please see the discussion in [SPARK-20090](https://issues.apache.org/jira/browse/SPARK-20090). ## How was this patch tested? Unit tests added in `DataTypeSuite.scala`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-23359 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20545.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20545 commit 664a62c7da9ba5da2007d40ef9c157f7e82938c5 Author: hyukjinkwonDate: 2018-02-08T12:38:25Z Adds an alias 'names' for 'fieldNames' in Scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20541 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20541 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87210/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20541 **[Test build #87210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87210/testReport)** for PR 20541 at commit [`36dbc9c`](https://github.com/apache/spark/commit/36dbc9c543f36dc5952a89c354bd70067ddd6883). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87211/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20525 **[Test build #87211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87211/testReport)** for PR 20525 at commit [`cb73001`](https://github.com/apache/spark/commit/cb730014ee1d951b481cc6af65c21fa37d94bcb4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20531 **[Test build #87214 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87214/testReport)** for PR 20531 at commit [`36617e4`](https://github.com/apache/spark/commit/36617e4bd864e0fbca5c617d009de45a8231a5d6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20531 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/713/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20531 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20531: [SPARK-23352][PYTHON] Explicitly specify supported types...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20531 @ueshin and @icexelloss, thanks for your review. I tried to address the comments at my best. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20535 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87207/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20535 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20535 **[Test build #87207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87207/testReport)** for PR 20535 at commit [`e92b6b2`](https://github.com/apache/spark/commit/e92b6b2083c4dbf31c27c961096a45cd8d84f16e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87208/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20477 **[Test build #87208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87208/testReport)** for PR 20477 at commit [`0efd5d3`](https://github.com/apache/spark/commit/0efd5d3919e24a480a9771ddd1d81bef11341e94). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20544: [SPARK-23358][CORE]When the number of partitions is grea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20544 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/712/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20544: [SPARK-23358][CORE]When the number of partitions is grea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20544 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20544: [SPARK-23358][CORE]When the number of partitions is grea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20544 **[Test build #87213 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87213/testReport)** for PR 20544 at commit [`9ad0bfd`](https://github.com/apache/spark/commit/9ad0bfdc00373a9732338e17ca2dfa05b0c28cfb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20544: [SPARK-23358][CORE]When the number of partitions ...
GitHub user 10110346 opened a pull request: https://github.com/apache/spark/pull/20544 [SPARK-23358][CORE]When the number of partitions is greater than 2^28, it will result in an error result ## What changes were proposed in this pull request? In the `checkIndexAndDataFile`,the `blocks` is the ` Int` type, when it is greater than 2^28, `blocks*8` will overflow, and this will result in an error result. In fact, `blocks` is actually the number of partitions. ## How was this patch tested? Manual test You can merge this pull request into a Git repository by running: $ git pull https://github.com/10110346/spark overflow Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20544.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20544 commit 9ad0bfdc00373a9732338e17ca2dfa05b0c28cfb Author: liuxianDate: 2018-02-08T11:13:41Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20509: [SPARK-23268][SQL][followup] Reorganize packages ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20509 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20310: revert [SPARK-10030] Use tags to control which te...
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/20310 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20509: [SPARK-23268][SQL][followup] Reorganize packages in data...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20509 thanks, merging to master/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20465: [SPARK-23292][TEST] always run python tests
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/20465 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20465: [SPARK-23292][TEST] always run python tests
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20465 The logging approach has been merged and I'm closing this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87203/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20382 **[Test build #87203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87203/testReport)** for PR 20382 at commit [`874c91c`](https://github.com/apache/spark/commit/874c91c41942972cabb85be175f929fc62e74af7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20490: [SPARK-23323][SQL]: Support commit coordinator for DataS...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20490 Overall LGTM, I don't have a better idea. also cc @rxin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20490#discussion_r166899259 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -117,20 +118,43 @@ object DataWritingSparkTask extends Logging { writeTask: DataWriterFactory[InternalRow], context: TaskContext, iter: Iterator[InternalRow]): WriterCommitMessage = { -val dataWriter = writeTask.createDataWriter(context.partitionId(), context.attemptNumber()) +val stageId = context.stageId() +val partId = context.partitionId() +val attemptId = context.attemptNumber() +val dataWriter = writeTask.createDataWriter(partId, attemptId) // write the data and commit this writer. Utils.tryWithSafeFinallyAndFailureCallbacks(block = { iter.foreach(dataWriter.write) - logInfo(s"Writer for partition ${context.partitionId()} is committing.") - val msg = dataWriter.commit() - logInfo(s"Writer for partition ${context.partitionId()} committed.") + + val msg = if (writeTask.useCommitCoordinator) { --- End diff -- I think we also need to handle it at the streaming side. Please check all the callers of `DataWriter.commit`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20490#discussion_r166898992 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriterFactory.java --- @@ -32,6 +32,16 @@ @InterfaceStability.Evolving public interface DataWriterFactory extends Serializable { + /** + * Returns whether Spark should use the OutputCommitCoordinator to ensure that only one attempt + * for each task commits. + * + * @return true if commit coordinator should be used, false otherwise. + */ + default boolean useCommitCoordinator() { --- End diff -- it's weird to put this method in `DataWriterFactory`, as it's not related to factory. How about we put it in `DataSourceWriter`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20490#discussion_r166898718 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java --- @@ -78,10 +78,11 @@ default void onDataWriterCommit(WriterCommitMessage message) {} * failed, and {@link #abort(WriterCommitMessage[])} would be called. The state of the destination * is undefined and @{@link #abort(WriterCommitMessage[])} may not be able to deal with it. * - * Note that, one partition may have multiple committed data writers because of speculative tasks. - * Spark will pick the first successful one and get its commit message. Implementations should be - * aware of this and handle it correctly, e.g., have a coordinator to make sure only one data - * writer can commit, or have a way to clean up the data of already-committed writers. + * Note that speculative execution may cause multiple tasks to run for a partition. By default, + * Spark uses the OutputCommitCoordinator to allow only one attempt to commit. --- End diff -- don't say `OutputCommitCoordinator` as it's an internal class. We can just say `a commit coordinator` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20490: [SPARK-23323][SQL]: Support commit coordinator fo...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20490#discussion_r166898212 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceWriter.java --- @@ -78,10 +78,11 @@ default void onDataWriterCommit(WriterCommitMessage message) {} * failed, and {@link #abort(WriterCommitMessage[])} would be called. The state of the destination * is undefined and @{@link #abort(WriterCommitMessage[])} may not be able to deal with it. * - * Note that, one partition may have multiple committed data writers because of speculative tasks. - * Spark will pick the first successful one and get its commit message. Implementations should be - * aware of this and handle it correctly, e.g., have a coordinator to make sure only one data - * writer can commit, or have a way to clean up the data of already-committed writers. + * Note that speculative execution may cause multiple tasks to run for a partition. By default, + * Spark uses the OutputCommitCoordinator to allow only one attempt to commit. + * {@link DataWriterFactory} implementations can disable this behavior. If disabled, multiple --- End diff -- we should mention that users can disable this and use their customer commit coordinator. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20521 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20521 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87209/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/19775#discussion_r166886232 --- Diff: core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/TextFormatWithTimestamp.java --- @@ -0,0 +1,178 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.metrics.prometheus.client.exporter; + +import java.io.IOException; +import java.io.Writer; +import java.util.Enumeration; + +import io.prometheus.client.Collector; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class TextFormatWithTimestamp { +private static final Logger logger = LoggerFactory.getLogger(TextFormatWithTimestamp.class); + +/** + * Content-type for text version 0.0.4. + */ +public static final String CONTENT_TYPE_004 = "text/plain; version=0.0.4; charset=utf-8"; + +private static StringBuilder jsonMessageLogBuilder = new StringBuilder(); + +public static void write004(Writer writer, +Enumeration mfs)throws IOException { +write004(writer, mfs, null); +} + +/** + * Write out the text version 0.0.4 of the given MetricFamilySamples. + */ +public static void write004(Writer writer,Enumeration mfs, +String timestamp) throws IOException { +/* See http://prometheus.io/docs/instrumenting/exposition_formats/ + * for the output format specification. */ +while(mfs.hasMoreElements()) { +Collector.MetricFamilySamples metricFamilySamples = mfs.nextElement(); --- End diff -- I think `for(Collector.MetricFamilySamples s: Collections.list(mfs)) {` would be nicer. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/19775#discussion_r166883609 --- Diff: core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/TextFormatWithTimestamp.java --- @@ -0,0 +1,178 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.metrics.prometheus.client.exporter; + +import java.io.IOException; +import java.io.Writer; +import java.util.Enumeration; + +import io.prometheus.client.Collector; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class TextFormatWithTimestamp { +private static final Logger logger = LoggerFactory.getLogger(TextFormatWithTimestamp.class); + +/** + * Content-type for text version 0.0.4. + */ +public static final String CONTENT_TYPE_004 = "text/plain; version=0.0.4; charset=utf-8"; + +private static StringBuilder jsonMessageLogBuilder = new StringBuilder(); + +public static void write004(Writer writer, --- End diff -- No doc --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20521 **[Test build #87209 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87209/testReport)** for PR 20521 at commit [`326a8dd`](https://github.com/apache/spark/commit/326a8dd65a4315f7fa678fb4024d0cf2f19e252e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/19775#discussion_r166883372 --- Diff: core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/PushGatewayWithTimestamp.java --- @@ -0,0 +1,320 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.metrics.prometheus.client.exporter; + +import io.prometheus.client.Collector; +import io.prometheus.client.CollectorRegistry; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.HttpURLConnection; +import java.net.InetAddress; +import java.net.UnknownHostException; +import java.net.URL; +import java.net.URLEncoder; +import java.util.Collections; +import java.util.HashMap; +import java.util.Map; + +import java.io.BufferedWriter; +import java.io.IOException; +import java.io.OutputStreamWriter; + +/** + * Export metrics via the Prometheus Pushgateway. + * + * The Prometheus Pushgateway exists to allow ephemeral and + * batch jobs to expose their metrics to Prometheus. + * Since these kinds of jobs may not exist long enough to be scraped, + * they can instead push their metrics to a Pushgateway. + * This class allows pushing the contents of a {@link CollectorRegistry} to + * a Pushgateway. + * + * Example usage: + * + * {@code + * void executeBatchJob() throws Exception { + * CollectorRegistry registry = new CollectorRegistry(); + * Gauge duration = Gauge.build() + * .name("my_batch_job_duration_seconds") + * .help("Duration of my batch job in seconds.") + * .register(registry); + * Gauge.Timer durationTimer = duration.startTimer(); + * try { + * // Your code here. + * + * // This is only added to the registry after success, + * // so that a previous success in the Pushgateway isn't overwritten on failure. + * Gauge lastSuccess = Gauge.build() + * .name("my_batch_job_last_success") + * .help("Last time my batch job succeeded, in unixtime.") + * .register(registry); + * lastSuccess.setToCurrentTime(); + * } finally { + * durationTimer.setDuration(); + * PushGatewayWithTimestamp pg = new PushGatewayWithTimestamp("127.0.0.1:9091"); + * pg.pushAdd(registry, "my_batch_job"); + * } + * } + * } + * + * + * See https://github.com/prometheus/pushgateway;> + * https://github.com/prometheus/pushgateway + */ +public class PushGatewayWithTimestamp { + +private static final Logger logger = LoggerFactory.getLogger(PushGatewayWithTimestamp.class); +private final String address; +private static final int SECONDS_PER_MILLISECOND = 1000; +/** + * Construct a Pushgateway, with the given address. + * + * @param address host:port or ip:port of the Pushgateway. + */ +public PushGatewayWithTimestamp(String address) { +this.address = address; +} + +/** + * Pushes all metrics in a registry, + * replacing all those with the same job and no grouping key. + * + * This uses the PUT HTTP method. + */ +public void push(CollectorRegistry registry, String job) throws IOException { +doRequest(registry, job, null, "PUT", null); +} + +/** + * Pushes all metrics in a Collector, + * replacing all those with the same job and no grouping key. + * + * This is useful for pushing a single Gauge. + * + * This uses the PUT HTTP method. + */ +public void push(Collector collector, String job) throws IOException { +CollectorRegistry registry = new CollectorRegistry(); +collector.register(registry); +push(registry, job); +} + +/** + * Pushes all metrics in a registry, + * replacing all those with the same job and grouping key. + * + * This uses the PUT HTTP method. + */ +public void push(CollectorRegistry registry, + String job, Map
[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/19775#discussion_r166892802 --- Diff: core/src/main/scala/org/apache/spark/metrics/sink/PrometheusSink.scala --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.metrics.sink + +import java.net.URI +import java.util +import java.util.Properties +import java.util.concurrent.TimeUnit + +import scala.collection.JavaConverters._ +import scala.util.Try + +import com.codahale.metrics._ +import io.prometheus.client.CollectorRegistry +import io.prometheus.client.dropwizard.DropwizardExports + +import org.apache.spark.{SecurityManager, SparkConf, SparkContext, SparkEnv} +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config.METRICS_NAMESPACE +import org.apache.spark.metrics.MetricsSystem +import org.apache.spark.metrics.prometheus.client.exporter.PushGatewayWithTimestamp + + +private[spark] class PrometheusSink( + val property: Properties, + val registry: MetricRegistry, + securityMgr: SecurityManager) + extends Sink with Logging { + + protected class Reporter(registry: MetricRegistry) +extends ScheduledReporter( + registry, + "prometheus-reporter", + MetricFilter.ALL, + TimeUnit.SECONDS, + TimeUnit.MILLISECONDS) { + +val defaultSparkConf: SparkConf = new SparkConf(true) + +override def report( + gauges: util.SortedMap[String, Gauge[_]], + counters: util.SortedMap[String, Counter], + histograms: util.SortedMap[String, Histogram], + meters: util.SortedMap[String, Meter], + timers: util.SortedMap[String, Timer]): Unit = { + + // SparkEnv may become available only after metrics sink creation thus retrieving + // SparkConf from spark env here and not during the creation/initialisation of PrometheusSink. + val sparkConf: SparkConf = Option(SparkEnv.get).map(_.conf).getOrElse(defaultSparkConf) + + val metricsNamespace: Option[String] = sparkConf.get(METRICS_NAMESPACE) + val sparkAppId: Option[String] = sparkConf.getOption("spark.app.id") + val executorId: Option[String] = sparkConf.getOption("spark.executor.id") + + logInfo(s"metricsNamespace=$metricsNamespace, sparkAppId=$sparkAppId, " + +s"executorId=$executorId") + + val role: String = (sparkAppId, executorId) match { +case (Some(_), Some(SparkContext.DRIVER_IDENTIFIER)) => "driver" +case (Some(_), Some(_)) => "executor" +case _ => "shuffle" + } + + val job: String = role match { +case "driver" => metricsNamespace.getOrElse(sparkAppId.get) +case "executor" => metricsNamespace.getOrElse(sparkAppId.get) +case _ => metricsNamespace.getOrElse("shuffle") + } + logInfo(s"role=$role, job=$job") + + val groupingKey: Map[String, String] = (role, executorId) match { +case ("driver", _) => Map("role" -> role) +case ("executor", Some(id)) => Map ("role" -> role, "number" -> id) +case _ => Map("role" -> role) + } + + --- End diff -- Nit: extra line --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/19775#discussion_r166889139 --- Diff: core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/TextFormatWithTimestamp.java --- @@ -0,0 +1,178 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.metrics.prometheus.client.exporter; + +import java.io.IOException; +import java.io.Writer; +import java.util.Enumeration; + +import io.prometheus.client.Collector; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class TextFormatWithTimestamp { +private static final Logger logger = LoggerFactory.getLogger(TextFormatWithTimestamp.class); + +/** + * Content-type for text version 0.0.4. + */ +public static final String CONTENT_TYPE_004 = "text/plain; version=0.0.4; charset=utf-8"; + +private static StringBuilder jsonMessageLogBuilder = new StringBuilder(); --- End diff -- Usage of this variable is questionable for a couple of reasons: - it just keeps growing, it's never cleared or re-initialized. As a consequence from the second call of write it will have invalid content + it acts as a memory leak. - its usage pattern (`writer.write(blah);appendToJsonMessageLogBuilder("blah")`) is pretty verbose, it should be factored out. - it's not thread safe (and it's not documented) - I don't think accessing it as a static member everywhere is a good design. It should either - be passed around as method parameter - or changed to an instance method. The static write004 could instantiate a new `TextFormatWithTimestamp` and call write on that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/19775#discussion_r166896270 --- Diff: core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/PushGatewayWithTimestamp.java --- @@ -0,0 +1,320 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.metrics.prometheus.client.exporter; + +import io.prometheus.client.Collector; +import io.prometheus.client.CollectorRegistry; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.HttpURLConnection; +import java.net.InetAddress; +import java.net.UnknownHostException; +import java.net.URL; +import java.net.URLEncoder; +import java.util.Collections; +import java.util.HashMap; +import java.util.Map; + +import java.io.BufferedWriter; +import java.io.IOException; +import java.io.OutputStreamWriter; + +/** + * Export metrics via the Prometheus Pushgateway. + * + * The Prometheus Pushgateway exists to allow ephemeral and + * batch jobs to expose their metrics to Prometheus. + * Since these kinds of jobs may not exist long enough to be scraped, + * they can instead push their metrics to a Pushgateway. + * This class allows pushing the contents of a {@link CollectorRegistry} to + * a Pushgateway. + * + * Example usage: + * + * {@code + * void executeBatchJob() throws Exception { + * CollectorRegistry registry = new CollectorRegistry(); + * Gauge duration = Gauge.build() + * .name("my_batch_job_duration_seconds") + * .help("Duration of my batch job in seconds.") + * .register(registry); + * Gauge.Timer durationTimer = duration.startTimer(); + * try { + * // Your code here. + * + * // This is only added to the registry after success, + * // so that a previous success in the Pushgateway isn't overwritten on failure. + * Gauge lastSuccess = Gauge.build() + * .name("my_batch_job_last_success") + * .help("Last time my batch job succeeded, in unixtime.") + * .register(registry); + * lastSuccess.setToCurrentTime(); + * } finally { + * durationTimer.setDuration(); + * PushGatewayWithTimestamp pg = new PushGatewayWithTimestamp("127.0.0.1:9091"); + * pg.pushAdd(registry, "my_batch_job"); + * } + * } + * } + * + * + * See https://github.com/prometheus/pushgateway;> + * https://github.com/prometheus/pushgateway + */ +public class PushGatewayWithTimestamp { + +private static final Logger logger = LoggerFactory.getLogger(PushGatewayWithTimestamp.class); +private final String address; +private static final int SECONDS_PER_MILLISECOND = 1000; +/** + * Construct a Pushgateway, with the given address. + * + * @param address host:port or ip:port of the Pushgateway. + */ +public PushGatewayWithTimestamp(String address) { +this.address = address; +} + +/** + * Pushes all metrics in a registry, + * replacing all those with the same job and no grouping key. + * + * This uses the PUT HTTP method. + */ +public void push(CollectorRegistry registry, String job) throws IOException { +doRequest(registry, job, null, "PUT", null); +} + +/** + * Pushes all metrics in a Collector, + * replacing all those with the same job and no grouping key. + * + * This is useful for pushing a single Gauge. + * + * This uses the PUT HTTP method. + */ +public void push(Collector collector, String job) throws IOException { +CollectorRegistry registry = new CollectorRegistry(); +collector.register(registry); +push(registry, job); +} + +/** + * Pushes all metrics in a registry, + * replacing all those with the same job and grouping key. + * + * This uses the PUT HTTP method. + */ +public void push(CollectorRegistry registry, + String job, Map
[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/19775#discussion_r166883368 --- Diff: core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/TextFormatWithTimestamp.java --- @@ -0,0 +1,178 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.metrics.prometheus.client.exporter; + +import java.io.IOException; +import java.io.Writer; +import java.util.Enumeration; + +import io.prometheus.client.Collector; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class TextFormatWithTimestamp { --- End diff -- No doc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/19775#discussion_r166881873 --- Diff: core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/PushGatewayWithTimestamp.java --- @@ -0,0 +1,320 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.metrics.prometheus.client.exporter; + +import io.prometheus.client.Collector; +import io.prometheus.client.CollectorRegistry; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.HttpURLConnection; +import java.net.InetAddress; +import java.net.UnknownHostException; +import java.net.URL; +import java.net.URLEncoder; +import java.util.Collections; +import java.util.HashMap; +import java.util.Map; + +import java.io.BufferedWriter; +import java.io.IOException; +import java.io.OutputStreamWriter; + +/** + * Export metrics via the Prometheus Pushgateway. + * + * The Prometheus Pushgateway exists to allow ephemeral and + * batch jobs to expose their metrics to Prometheus. + * Since these kinds of jobs may not exist long enough to be scraped, + * they can instead push their metrics to a Pushgateway. + * This class allows pushing the contents of a {@link CollectorRegistry} to + * a Pushgateway. + * + * Example usage: + * + * {@code + * void executeBatchJob() throws Exception { + * CollectorRegistry registry = new CollectorRegistry(); + * Gauge duration = Gauge.build() + * .name("my_batch_job_duration_seconds") + * .help("Duration of my batch job in seconds.") + * .register(registry); + * Gauge.Timer durationTimer = duration.startTimer(); + * try { + * // Your code here. + * + * // This is only added to the registry after success, + * // so that a previous success in the Pushgateway isn't overwritten on failure. + * Gauge lastSuccess = Gauge.build() + * .name("my_batch_job_last_success") + * .help("Last time my batch job succeeded, in unixtime.") + * .register(registry); + * lastSuccess.setToCurrentTime(); + * } finally { + * durationTimer.setDuration(); + * PushGatewayWithTimestamp pg = new PushGatewayWithTimestamp("127.0.0.1:9091"); + * pg.pushAdd(registry, "my_batch_job"); + * } + * } + * } + * + * + * See https://github.com/prometheus/pushgateway;> + * https://github.com/prometheus/pushgateway + */ +public class PushGatewayWithTimestamp { + +private static final Logger logger = LoggerFactory.getLogger(PushGatewayWithTimestamp.class); +private final String address; +private static final int SECONDS_PER_MILLISECOND = 1000; +/** + * Construct a Pushgateway, with the given address. + * + * @param address host:port or ip:port of the Pushgateway. + */ +public PushGatewayWithTimestamp(String address) { +this.address = address; +} + +/** + * Pushes all metrics in a registry, + * replacing all those with the same job and no grouping key. + * + * This uses the PUT HTTP method. + */ +public void push(CollectorRegistry registry, String job) throws IOException { +doRequest(registry, job, null, "PUT", null); +} + +/** + * Pushes all metrics in a Collector, + * replacing all those with the same job and no grouping key. + * + * This is useful for pushing a single Gauge. + * + * This uses the PUT HTTP method. + */ +public void push(Collector collector, String job) throws IOException { +CollectorRegistry registry = new CollectorRegistry(); +collector.register(registry); +push(registry, job); +} + +/** + * Pushes all metrics in a registry, + * replacing all those with the same job and grouping key. + * + * This uses the PUT HTTP method. + */ +public void push(CollectorRegistry registry, + String job, Map
[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/19775#discussion_r166886527 --- Diff: core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/TextFormatWithTimestamp.java --- @@ -0,0 +1,178 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.metrics.prometheus.client.exporter; + +import java.io.IOException; +import java.io.Writer; +import java.util.Enumeration; + +import io.prometheus.client.Collector; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class TextFormatWithTimestamp { +private static final Logger logger = LoggerFactory.getLogger(TextFormatWithTimestamp.class); + +/** + * Content-type for text version 0.0.4. + */ +public static final String CONTENT_TYPE_004 = "text/plain; version=0.0.4; charset=utf-8"; + +private static StringBuilder jsonMessageLogBuilder = new StringBuilder(); + +public static void write004(Writer writer, +Enumeration mfs)throws IOException { +write004(writer, mfs, null); +} + +/** + * Write out the text version 0.0.4 of the given MetricFamilySamples. + */ +public static void write004(Writer writer,Enumeration mfs, +String timestamp) throws IOException { +/* See http://prometheus.io/docs/instrumenting/exposition_formats/ + * for the output format specification. */ +while(mfs.hasMoreElements()) { +Collector.MetricFamilySamples metricFamilySamples = mfs.nextElement(); --- End diff -- Also, method body is not indented well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/19775#discussion_r166896081 --- Diff: core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/PushGatewayWithTimestamp.java --- @@ -0,0 +1,320 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.metrics.prometheus.client.exporter; + +import io.prometheus.client.Collector; +import io.prometheus.client.CollectorRegistry; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.net.HttpURLConnection; +import java.net.InetAddress; +import java.net.UnknownHostException; +import java.net.URL; +import java.net.URLEncoder; +import java.util.Collections; +import java.util.HashMap; +import java.util.Map; + +import java.io.BufferedWriter; +import java.io.IOException; +import java.io.OutputStreamWriter; + +/** + * Export metrics via the Prometheus Pushgateway. + * + * The Prometheus Pushgateway exists to allow ephemeral and + * batch jobs to expose their metrics to Prometheus. + * Since these kinds of jobs may not exist long enough to be scraped, + * they can instead push their metrics to a Pushgateway. + * This class allows pushing the contents of a {@link CollectorRegistry} to + * a Pushgateway. + * + * Example usage: + * + * {@code + * void executeBatchJob() throws Exception { + * CollectorRegistry registry = new CollectorRegistry(); + * Gauge duration = Gauge.build() + * .name("my_batch_job_duration_seconds") + * .help("Duration of my batch job in seconds.") + * .register(registry); + * Gauge.Timer durationTimer = duration.startTimer(); + * try { + * // Your code here. + * + * // This is only added to the registry after success, + * // so that a previous success in the Pushgateway isn't overwritten on failure. + * Gauge lastSuccess = Gauge.build() + * .name("my_batch_job_last_success") + * .help("Last time my batch job succeeded, in unixtime.") + * .register(registry); + * lastSuccess.setToCurrentTime(); + * } finally { + * durationTimer.setDuration(); + * PushGatewayWithTimestamp pg = new PushGatewayWithTimestamp("127.0.0.1:9091"); + * pg.pushAdd(registry, "my_batch_job"); + * } + * } + * } + * + * + * See https://github.com/prometheus/pushgateway;> + * https://github.com/prometheus/pushgateway + */ +public class PushGatewayWithTimestamp { + +private static final Logger logger = LoggerFactory.getLogger(PushGatewayWithTimestamp.class); +private final String address; +private static final int SECONDS_PER_MILLISECOND = 1000; +/** + * Construct a Pushgateway, with the given address. + * + * @param address host:port or ip:port of the Pushgateway. + */ +public PushGatewayWithTimestamp(String address) { +this.address = address; +} + +/** + * Pushes all metrics in a registry, + * replacing all those with the same job and no grouping key. + * + * This uses the PUT HTTP method. + */ +public void push(CollectorRegistry registry, String job) throws IOException { +doRequest(registry, job, null, "PUT", null); +} + +/** + * Pushes all metrics in a Collector, + * replacing all those with the same job and no grouping key. + * + * This is useful for pushing a single Gauge. + * + * This uses the PUT HTTP method. + */ +public void push(Collector collector, String job) throws IOException { +CollectorRegistry registry = new CollectorRegistry(); +collector.register(registry); +push(registry, job); +} + +/** + * Pushes all metrics in a registry, + * replacing all those with the same job and grouping key. + * + * This uses the PUT HTTP method. + */ +public void push(CollectorRegistry registry, + String job, Map
[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/19775#discussion_r166886839 --- Diff: core/src/main/java/org/apache/spark/metrics/prometheus/client/exporter/TextFormatWithTimestamp.java --- @@ -0,0 +1,178 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.metrics.prometheus.client.exporter; + +import java.io.IOException; +import java.io.Writer; +import java.util.Enumeration; + +import io.prometheus.client.Collector; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class TextFormatWithTimestamp { +private static final Logger logger = LoggerFactory.getLogger(TextFormatWithTimestamp.class); + +/** + * Content-type for text version 0.0.4. + */ +public static final String CONTENT_TYPE_004 = "text/plain; version=0.0.4; charset=utf-8"; + +private static StringBuilder jsonMessageLogBuilder = new StringBuilder(); + +public static void write004(Writer writer, +Enumeration mfs)throws IOException { +write004(writer, mfs, null); +} + +/** + * Write out the text version 0.0.4 of the given MetricFamilySamples. + */ +public static void write004(Writer writer,Enumeration mfs, +String timestamp) throws IOException { +/* See http://prometheus.io/docs/instrumenting/exposition_formats/ + * for the output format specification. */ +while(mfs.hasMoreElements()) { +Collector.MetricFamilySamples metricFamilySamples = mfs.nextElement(); + +logger.debug("Metrics data"); +logger.debug(metricFamilySamples.toString()); +logger.debug("Logging metrics as a json format:"); + + +writer.write("# HELP "); +appendToJsonMessageLogBuilder("# HELP "); +writer.write(metricFamilySamples.name); +appendToJsonMessageLogBuilder(metricFamilySamples.name); +writer.write(' '); +appendToJsonMessageLogBuilder(' '); +writeEscapedHelp(writer, metricFamilySamples.help); +writer.write('\n'); +appendToJsonMessageLogBuilder('\n'); + +writer.write("# TYPE "); +appendToJsonMessageLogBuilder("# TYPE "); +writer.write(metricFamilySamples.name); +appendToJsonMessageLogBuilder(metricFamilySamples.name); +writer.write(' '); +appendToJsonMessageLogBuilder(' '); +writer.write(typeString(metricFamilySamples.type)); + appendToJsonMessageLogBuilder(typeString(metricFamilySamples.type)); +writer.write('\n'); +appendToJsonMessageLogBuilder('\n'); + +for (Collector.MetricFamilySamples.Sample sample: metricFamilySamples.samples) { +writer.write(sample.name); +appendToJsonMessageLogBuilder(sample.name); +if (sample.labelNames.size() > 0) { +writer.write('{'); +appendToJsonMessageLogBuilder('{'); +for (int i = 0; i < sample.labelNames.size(); ++i) { +writer.write(sample.labelNames.get(i)); + appendToJsonMessageLogBuilder(sample.labelNames.get(i)); +writer.write("=\""); +appendToJsonMessageLogBuilder("=\""); +writeEscapedLabelValue(writer, sample.labelValues.get(i)); +writer.write("\","); +appendToJsonMessageLogBuilder("\","); +} +writer.write('}'); +appendToJsonMessageLogBuilder('}'); +} +writer.write(' '); +appendToJsonMessageLogBuilder(' '); +writer.write(Collector.doubleToGoString(sample.value)); + appendToJsonMessageLogBuilder(Collector.doubleToGoString(sample.value)); +if(timestamp != null && !timestamp.isEmpty()) { +writer.write(" " + timestamp); +appendToJsonMessageLogBuilder(" " + timestamp); +} +writer.write('\n'); +
[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/19775#discussion_r166894472 --- Diff: core/src/main/scala/org/apache/spark/metrics/sink/PrometheusSink.scala --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.metrics.sink + +import java.net.URI +import java.util +import java.util.Properties +import java.util.concurrent.TimeUnit + +import scala.collection.JavaConverters._ +import scala.util.Try + +import com.codahale.metrics._ +import io.prometheus.client.CollectorRegistry +import io.prometheus.client.dropwizard.DropwizardExports + +import org.apache.spark.{SecurityManager, SparkConf, SparkContext, SparkEnv} +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config.METRICS_NAMESPACE +import org.apache.spark.metrics.MetricsSystem +import org.apache.spark.metrics.prometheus.client.exporter.PushGatewayWithTimestamp + + +private[spark] class PrometheusSink( + val property: Properties, + val registry: MetricRegistry, + securityMgr: SecurityManager) + extends Sink with Logging { + + protected class Reporter(registry: MetricRegistry) +extends ScheduledReporter( + registry, + "prometheus-reporter", + MetricFilter.ALL, + TimeUnit.SECONDS, + TimeUnit.MILLISECONDS) { + +val defaultSparkConf: SparkConf = new SparkConf(true) + +override def report( + gauges: util.SortedMap[String, Gauge[_]], + counters: util.SortedMap[String, Counter], + histograms: util.SortedMap[String, Histogram], + meters: util.SortedMap[String, Meter], + timers: util.SortedMap[String, Timer]): Unit = { + + // SparkEnv may become available only after metrics sink creation thus retrieving + // SparkConf from spark env here and not during the creation/initialisation of PrometheusSink. + val sparkConf: SparkConf = Option(SparkEnv.get).map(_.conf).getOrElse(defaultSparkConf) + + val metricsNamespace: Option[String] = sparkConf.get(METRICS_NAMESPACE) + val sparkAppId: Option[String] = sparkConf.getOption("spark.app.id") + val executorId: Option[String] = sparkConf.getOption("spark.executor.id") + + logInfo(s"metricsNamespace=$metricsNamespace, sparkAppId=$sparkAppId, " + +s"executorId=$executorId") + + val role: String = (sparkAppId, executorId) match { +case (Some(_), Some(SparkContext.DRIVER_IDENTIFIER)) => "driver" +case (Some(_), Some(_)) => "executor" +case _ => "shuffle" + } + + val job: String = role match { +case "driver" => metricsNamespace.getOrElse(sparkAppId.get) +case "executor" => metricsNamespace.getOrElse(sparkAppId.get) +case _ => metricsNamespace.getOrElse("shuffle") + } + logInfo(s"role=$role, job=$job") + + val groupingKey: Map[String, String] = (role, executorId) match { +case ("driver", _) => Map("role" -> role) +case ("executor", Some(id)) => Map ("role" -> role, "number" -> id) +case _ => Map("role" -> role) + } + + + pushGateway.pushAdd(pushRegistry, job, groupingKey.asJava, +s"${System.currentTimeMillis}") + +} + + } + + val DEFAULT_PUSH_PERIOD: Int = 10 + val DEFAULT_PUSH_PERIOD_UNIT: TimeUnit = TimeUnit.SECONDS + val DEFAULT_PUSHGATEWAY_ADDRESS: String = "127.0.0.1:9091" + val DEFAULT_PUSHGATEWAY_ADDRESS_PROTOCOL: String = "http" + + val KEY_PUSH_PERIOD = "period" + val KEY_PUSH_PERIOD_UNIT = "unit" + val KEY_PUSHGATEWAY_ADDRESS = "pushgateway-address" + val
[GitHub] spark pull request #19775: [SPARK-22343][core] Add support for publishing Sp...
Github user smurakozi commented on a diff in the pull request: https://github.com/apache/spark/pull/19775#discussion_r166894977 --- Diff: core/src/main/scala/org/apache/spark/metrics/sink/PrometheusSink.scala --- @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.metrics.sink + +import java.net.URI +import java.util +import java.util.Properties +import java.util.concurrent.TimeUnit + +import scala.collection.JavaConverters._ +import scala.util.Try + +import com.codahale.metrics._ +import io.prometheus.client.CollectorRegistry +import io.prometheus.client.dropwizard.DropwizardExports + +import org.apache.spark.{SecurityManager, SparkConf, SparkContext, SparkEnv} +import org.apache.spark.internal.Logging +import org.apache.spark.internal.config.METRICS_NAMESPACE +import org.apache.spark.metrics.MetricsSystem +import org.apache.spark.metrics.prometheus.client.exporter.PushGatewayWithTimestamp + + +private[spark] class PrometheusSink( + val property: Properties, + val registry: MetricRegistry, + securityMgr: SecurityManager) --- End diff -- `securityMgr` is never used --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...
Github user peshopetrov commented on the issue: https://github.com/apache/spark/pull/20512 For completeness it should be possible to enable OS-level TCP keep alives. The client does enable TCP keepalive on its side and it should be possible on the server too. However, independent of that it perhaps makes sense to also have application level heartbeats because in the JVM it seems it's not possible to tune the timeouts of TCP keepalive. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20396: [SPARK-23217][ML] Add cosine distance measure to Cluster...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20396 kindly ping @jkbradley @MLnick @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20543: [SPARK-23357][CORE] 'SHOW TABLE EXTENDED LIKE pattern=ST...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20543 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20543: [SPARK-23357][CORE] 'SHOW TABLE EXTENDED LIKE pattern=ST...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20543 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20543: [SPARK-23357][CORE] 'SHOW TABLE EXTENDED LIKE pat...
GitHub user guoxiaolongzte opened a pull request: https://github.com/apache/spark/pull/20543 [SPARK-23357][CORE] 'SHOW TABLE EXTENDED LIKE pattern=STRING' add âPartitionedâ display similar to hive, and partition is empty, also need to show empty partition field [] ## What changes were proposed in this pull request? 'SHOW TABLE EXTENDED LIKE pattern=STRING' add âPartitionedâ display similar to hive, and partition is empty, also need to show empty partition field [] hive: ![3](https://user-images.githubusercontent.com/26266482/35967523-12a15c70-0cfc-11e8-88ce-36b2595c1512.png) sparkSQL Non-partitioned table fix before: ![1](https://user-images.githubusercontent.com/26266482/35967561-32098ede-0cfc-11e8-8382-57ae4857556b.png) sparkSQL partitioned table fix before: ![2](https://user-images.githubusercontent.com/26266482/35967572-3e4f1150-0cfc-11e8-9956-5007ccb50761.png) sparkSQL Non-partitioned table fix after: ![4](https://user-images.githubusercontent.com/26266482/35967586-493376b0-0cfc-11e8-8652-0618912bd63f.png) sparkSQL partitioned table fix after: ![5](https://user-images.githubusercontent.com/26266482/35967602-52474588-0cfc-11e8-9b34-7532c6abae00.png) ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/guoxiaolongzte/spark SPARK-23357 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20543.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20543 commit 073542d7199acddfbb122d28ab5110f638c2ec82 Author: guoxiaolongDate: 2018-02-08T10:12:22Z [SPARK-23357][CORE] 'SHOW TABLE EXTENDED LIKE pattern=STRING' add âPartitionedâ display similar to hive, and partition is empty, also need to show empty partition field [] --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20518: [SPARK-22119][FOLLOWUP][ML] Use spherical KMeans ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/20518#discussion_r166884184 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala --- @@ -745,4 +763,27 @@ private[spark] class CosineDistanceMeasure extends DistanceMeasure { override def distance(v1: VectorWithNorm, v2: VectorWithNorm): Double = { 1 - dot(v1.vector, v2.vector) / v1.norm / v2.norm } + + /** + * Updates the value of `sum` adding the `point` vector. + * @param point a `VectorWithNorm` to be added to `sum` of a cluster + * @param sum the `sum` for a cluster to be updated + */ + override def updateClusterSum(point: VectorWithNorm, sum: Vector): Unit = { +axpy(1.0 / point.norm, point.vector, sum) --- End diff -- the cosine similarity/distance is not defined for zero points: if there were 0 points we would have earlier failures while computing any cosine distance involving them. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20449#discussion_r166877282 --- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala --- @@ -320,6 +321,41 @@ class JobCancellationSuite extends SparkFunSuite with Matchers with BeforeAndAft f2.get() } + test("Interruptible iterator of shuffle reader") { +import JobCancellationSuite._ +sc = new SparkContext("local[2]", "test") + +val f = sc.parallelize(1 to 1000, 2).map { i => (i, i) } + .repartitionAndSortWithinPartitions(new HashPartitioner(2)) + .mapPartitions { iter => +taskStartedSemaphore.release() +// Small delay to ensure that foreach is cancelled if task is killed +Thread.sleep(1000) --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20449#discussion_r166875113 --- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala --- @@ -320,6 +321,41 @@ class JobCancellationSuite extends SparkFunSuite with Matchers with BeforeAndAft f2.get() } + test("Interruptible iterator of shuffle reader") { +import JobCancellationSuite._ +sc = new SparkContext("local[2]", "test") + +val f = sc.parallelize(1 to 1000, 2).map { i => (i, i) } + .repartitionAndSortWithinPartitions(new HashPartitioner(2)) + .mapPartitions { iter => +taskStartedSemaphore.release() +// Small delay to ensure that foreach is cancelled if task is killed +Thread.sleep(1000) --- End diff -- I think using `sleep` will make the UT flaky, I think you should change to some deterministic ways. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/711/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19775: [SPARK-22343][core] Add support for publishing Spark met...
Github user matyix commented on the issue: https://github.com/apache/spark/pull/19775 Hello @erikerlandson @felixcheung @jerryshao - any feedback on this PR? Shall I close it and not worry about this being merged upstream anymore? We've been using this in production for the last 3 months and it's a bit awkward that our CI/CD system needs to `patch` the upstream version all the time but we can live with that (since it's automated). Please advise. Happy to help to get it merge or eventually just close it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20525 **[Test build #87211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87211/testReport)** for PR 20525 at commit [`cb73001`](https://github.com/apache/spark/commit/cb730014ee1d951b481cc6af65c21fa37d94bcb4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19077 **[Test build #87212 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87212/testReport)** for PR 19077 at commit [`78ede3f`](https://github.com/apache/spark/commit/78ede3fae243c5379eaea1c86584e200ce697c19). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20541 **[Test build #87210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87210/testReport)** for PR 20541 at commit [`36dbc9c`](https://github.com/apache/spark/commit/36dbc9c543f36dc5952a89c354bd70067ddd6883). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19077 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19077 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/710/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/20525 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20449 I see. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/19077 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20541: [SPARK-23356][SQL]Pushes Project to both sides of Union ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20541 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20499 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20499 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87206/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20499 **[Test build #87206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87206/testReport)** for PR 20499 at commit [`885b4d0`](https://github.com/apache/spark/commit/885b4d00af53dfd0148c431fdacce9a2789f32a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/20449 > I was wondering do we actually meet this issue in production envs, @jerryshao I met this issue in our production when I was debugging a Spark job. I noticed the aborted stage's task continues running until finishes. I cannot give a minimal reproduce code since the failure is related to our mixed(online and offline services) hosts. But you can have a look at the test case I added, it essentially captures the transformation I used except the async part. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20541: [SPARK-23356][SQL]Pushes Project to both sides of...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20541#discussion_r166870474 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -400,13 +400,24 @@ object PushProjectionThroughUnion extends Rule[LogicalPlan] with PredicateHelper // Push down deterministic projection through UNION ALL case p @ Project(projectList, Union(children)) => assert(children.nonEmpty) - if (projectList.forall(_.deterministic)) { -val newFirstChild = Project(projectList, children.head) + val (deterministicList, nonDeterministic) = projectList.partition(_.deterministic) + + if (deterministicList.nonEmpty) { +val newFirstChild = Project(deterministicList, children.head) val newOtherChildren = children.tail.map { child => val rewrites = buildRewrites(children.head, child) - Project(projectList.map(pushToRight(_, rewrites)), child) + Project(deterministicList.map(pushToRight(_, rewrites)), child) --- End diff -- do we push `a + 1` to union children? or just `a`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20521 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20521 **[Test build #87209 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87209/testReport)** for PR 20521 at commit [`326a8dd`](https://github.com/apache/spark/commit/326a8dd65a4315f7fa678fb4024d0cf2f19e252e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20521 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/709/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20449 I understood your intention. I was wondering do we actually meet this issue in production envs, or do you have a minimal reproduce code? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20521 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/20449 Hi, @jerryshao I didn't see exception. But the issue is: When the stage is abort and all the remaining tasks are killed, those tasks are not cancelled but rather continue running which is a waste of executor resource. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87204/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/708/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20477 **[Test build #87208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87208/testReport)** for PR 20477 at commit [`0efd5d3`](https://github.com/apache/spark/commit/0efd5d3919e24a480a9771ddd1d81bef11341e94). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20535 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/707/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20525 **[Test build #87204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87204/testReport)** for PR 20525 at commit [`cb73001`](https://github.com/apache/spark/commit/cb730014ee1d951b481cc6af65c21fa37d94bcb4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20535 **[Test build #87207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87207/testReport)** for PR 20535 at commit [`e92b6b2`](https://github.com/apache/spark/commit/e92b6b2083c4dbf31c27c961096a45cd8d84f16e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20535 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20477 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20535 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20449 @advancedxy did you see any issue or exception regarding to this issue? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20527#discussion_r166865177 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with Befo checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil) } } + + // TODO: This test is copied from HiveDDLSuite, unify it later. + test("SPARK-23348: append data to data source table with saveAsTable") { --- End diff -- maybe we can add this when unifying the test cases? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20527#discussion_r166865065 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -346,37 +349,11 @@ case class PreprocessTableInsertion(conf: SQLConf) extends Rule[LogicalPlan] wit """.stripMargin) } - castAndRenameChildOutput(insert.copy(partition = normalizedPartSpec), expectedColumns) + insert.copy(query = newQuery, partition = normalizedPartSpec) --- End diff -- it's also ok to always copy it and the code is neater. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/20449#discussion_r166864792 --- Diff: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala --- @@ -104,9 +104,18 @@ private[spark] class BlockStoreShuffleReader[K, C]( context.taskMetrics().incMemoryBytesSpilled(sorter.memoryBytesSpilled) context.taskMetrics().incDiskBytesSpilled(sorter.diskBytesSpilled) context.taskMetrics().incPeakExecutionMemory(sorter.peakMemoryUsedBytes) +// Use completion callback to stop sorter if task was cancelled. +context.addTaskCompletionListener(tc => { + // Note: we only stop sorter if cancelled as sorter.stop wouldn't be called in + // CompletionIterator. Another way would be making sorter.stop idempotent. + if (tc.isInterrupted()) { sorter.stop() } --- End diff -- > I may be missing something obvious, but seems ExternalSorter.stop() is already idempotent? Ah, yes. After another look, it's indeed idempotent. Will update the code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org