[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23236 **[Test build #99726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99726/testReport)** for PR 23236 at commit [`3c4ee75`](https://github.com/apache/spark/commit/3c4ee75c4d0585702cd87cc4df9af74e235bb431). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @gaborgsomogyi @steveloughran OK. I'll change the approach to just check against final path for each moving. As @steveloughran stated, it may bring performance hit for each checking when dealing with object stores, so we may also need to provide a way to disable checking as well with caution. (Btw, if moving file in object store requires huge overhead rather than globing, slow globing may not be a big deal. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239060606 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLShuffleMetricsReporter.scala --- @@ -95,3 +96,59 @@ private[spark] object SQLShuffleMetricsReporter { FETCH_WAIT_TIME -> SQLMetrics.createTimingMetric(sc, "fetch wait time"), RECORDS_READ -> SQLMetrics.createMetric(sc, "records read")) } + +/** + * A shuffle write metrics reporter for SQL exchange operators. Different with + * [[SQLShuffleReadMetricsReporter]], we need a function of (reporter => reporter) set in + * shuffle dependency, so the local SQLMetric should transient and create on executor. + * @param metrics Shuffle write metrics in current SparkPlan. + * @param metricsReporter Other reporter need to be updated in this SQLShuffleWriteMetricsReporter. + */ +private[spark] case class SQLShuffleWriteMetricsReporter( +metrics: Map[String, SQLMetric])(metricsReporter: ShuffleWriteMetricsReporter) + extends ShuffleWriteMetricsReporter with Serializable { + @transient private[this] lazy val _bytesWritten = +metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_BYTES_WRITTEN) + @transient private[this] lazy val _recordsWritten = +metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_RECORDS_WRITTEN) + @transient private[this] lazy val _writeTime = +metrics(SQLShuffleWriteMetricsReporter.SHUFFLE_WRITE_TIME) + + override private[spark] def incBytesWritten(v: Long): Unit = { +metricsReporter.incBytesWritten(v) +_bytesWritten.add(v) + } + override private[spark] def decRecordsWritten(v: Long): Unit = { +metricsReporter.decBytesWritten(v) +_recordsWritten.set(_recordsWritten.value - v) + } + override private[spark] def incRecordsWritten(v: Long): Unit = { +metricsReporter.incRecordsWritten(v) +_recordsWritten.add(v) + } + override private[spark] def incWriteTime(v: Long): Unit = { +metricsReporter.incWriteTime(v) +_writeTime.add(v) + } + override private[spark] def decBytesWritten(v: Long): Unit = { +metricsReporter.decBytesWritten(v) +_bytesWritten.set(_bytesWritten.value - v) + } +} + +private[spark] object SQLShuffleWriteMetricsReporter { + val SHUFFLE_BYTES_WRITTEN = "shuffleBytesWritten" + val SHUFFLE_RECORDS_WRITTEN = "shuffleRecordsWritten" + val SHUFFLE_WRITE_TIME = "shuffleWriteTime" --- End diff -- do we have other time metrics using nanoseconds? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23235 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99728/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99724/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23235 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23235 **[Test build #99728 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99728/testReport)** for PR 23235 at commit [`463f9e1`](https://github.com/apache/spark/commit/463f9e16ead2291a7f0f3893e485a56b77da2f06). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239059162 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -163,6 +171,8 @@ object SQLMetrics { Utils.bytesToString } else if (metricsType == TIMING_METRIC) { Utils.msDurationToString + } else if (metricsType == NS_TIMING_METRIC) { +duration => Utils.msDurationToString(duration / 1000 / 1000) --- End diff -- will this string lose the nanosecond precision? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23236 **[Test build #99724 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99724/testReport)** for PR 23236 at commit [`3c4ee75`](https://github.com/apache/spark/commit/3c4ee75c4d0585702cd87cc4df9af74e235bb431). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23236 **[Test build #99729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99729/testReport)** for PR 23236 at commit [`3c4ee75`](https://github.com/apache/spark/commit/3c4ee75c4d0585702cd87cc4df9af74e235bb431). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23235 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5771/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23235 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5770/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23236 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23235 **[Test build #99728 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99728/testReport)** for PR 23235 at commit [`463f9e1`](https://github.com/apache/spark/commit/463f9e16ead2291a7f0f3893e485a56b77da2f06). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23236 **[Test build #99727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99727/testReport)** for PR 23236 at commit [`3c4ee75`](https://github.com/apache/spark/commit/3c4ee75c4d0585702cd87cc4df9af74e235bb431). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23236 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23224: [SPARK-26277][SQL][TEST] WholeStageCodegen metrics shoul...
Github user seancxmao commented on the issue: https://github.com/apache/spark/pull/23224 @HyukjinKwon Thank you for your comments! I have filed a JIRA and updated the PR title accordingly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial resul...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/23235#discussion_r239055208 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -35,6 +35,8 @@ displayTitle: Spark SQL Upgrading Guide - Since Spark 3.0, CSV datasource uses java.time API for parsing and generating CSV content. New formatting implementation supports date/timestamp patterns conformed to ISO 8601. To switch back to the implementation used in Spark 2.4 and earlier, set `spark.sql.legacy.timeParser.enabled` to `true`. + - In Spark version 2.4 and earlier, CSV datasource converts a malformed CSV string to a row with all `null`s in the PERMISSIVE mode if specified schema is `StructType`. Since Spark 3.0, returned row can contain non-`null` fields if some of CSV column values were parsed and converted to desired types successfully. --- End diff -- you are right. I will remove the part about `StructType` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23236 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5769/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23236 test this please -- --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239054315 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -299,12 +312,25 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value") val df2 = (1 to 10).map(i => (i, i.toString)).toSeq.toDF("key", "value") // Assume the execution plan is - // ... -> ShuffledHashJoin(nodeId = 1) -> Project(nodeId = 0) + // Project(nodeId = 0) + // +- ShuffledHashJoin(nodeId = 1) + // :- Exchange(nodeId = 2) + // : +- Project(nodeId = 3) + // : +- LocalTableScan(nodeId = 4) + // +- Exchange(nodeId = 5) + // +- Project(nodeId = 6) + // +- LocalTableScan(nodeId = 7) val df = df1.join(df2, "key") testSparkPlanMetrics(df, 1, Map( 1L -> (("ShuffledHashJoin", Map( "number of output rows" -> 2L, - "avg hash probe (min, med, max)" -> "\n(1, 1, 1)" + "avg hash probe (min, med, max)" -> "\n(1, 1, 1)"))), +2L -> (("Exchange", Map( + "shuffle records written" -> 2L, + "records read" -> 2L))), --- End diff -- For most scenario the answer is yes, but like sort merge join cases, 2 sort node reuse same child will make shuffle records written/records read different, I also add cases in here: https://github.com/xuanyuanking/spark/blob/SPARK-26193/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala#L217-L222 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23236 **[Test build #99726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99726/testReport)** for PR 23236 at commit [`3c4ee75`](https://github.com/apache/spark/commit/3c4ee75c4d0585702cd87cc4df9af74e235bb431). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23195: [SPARK-26236][SS] Add kafka delegation token support doc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23195 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23195: [SPARK-26236][SS] Add kafka delegation token support doc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23195 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99723/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23195: [SPARK-26236][SS] Add kafka delegation token support doc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23195 **[Test build #99723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99723/testReport)** for PR 23195 at commit [`3ad9cb7`](https://github.com/apache/spark/commit/3ad9cb704c7d3daa181aeba0be78dc025dde24e2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5768/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23236 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23236 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23236 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23236 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99725/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23236 **[Test build #99724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99724/testReport)** for PR 23236 at commit [`3c4ee75`](https://github.com/apache/spark/commit/3c4ee75c4d0585702cd87cc4df9af74e235bb431). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23236 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23120 a late LGTM as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23227 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99721/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23230: [SPARK-26133][ML][Followup] Fix doc for OneHotEncoder
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23230 Thanks @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23227 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99702/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23236 **[Test build #99721 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99721/testReport)** for PR 23236 at commit [`3c4ee75`](https://github.com/apache/spark/commit/3c4ee75c4d0585702cd87cc4df9af74e235bb431). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23195: [SPARK-26236][SS] Add kafka delegation token support doc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23195 **[Test build #99723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99723/testReport)** for PR 23195 at commit [`3ad9cb7`](https://github.com/apache/spark/commit/3ad9cb704c7d3daa181aeba0be78dc025dde24e2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23227 **[Test build #99702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99702/testReport)** for PR 23227 at commit [`5cb416d`](https://github.com/apache/spark/commit/5cb416df5f03b0d750c83e1a8a344b8ea44b1735). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239050549 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -170,13 +172,23 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared val df = testData2.groupBy().agg(collect_set('a)) // 2 partitions testSparkPlanMetrics(df, 1, Map( 2L -> (("ObjectHashAggregate", Map("number of output rows" -> 2L))), + 1L -> (("Exchange", Map( +"shuffle records written" -> 2L, +"records read" -> 2L, +"local blocks fetched" -> 2L, --- End diff -- I agree "fetch" is a more code name in `ShuffleBlockFetcherIterator`, but do you mean just change the display in ui? Cause there's many place even api.scala use the name `localBlocksFetched`, change them all maybe not a good choice for code backport, WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial resul...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23235#discussion_r239049825 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -35,6 +35,8 @@ displayTitle: Spark SQL Upgrading Guide - Since Spark 3.0, CSV datasource uses java.time API for parsing and generating CSV content. New formatting implementation supports date/timestamp patterns conformed to ISO 8601. To switch back to the implementation used in Spark 2.4 and earlier, set `spark.sql.legacy.timeParser.enabled` to `true`. + - In Spark version 2.4 and earlier, CSV datasource converts a malformed CSV string to a row with all `null`s in the PERMISSIVE mode if specified schema is `StructType`. Since Spark 3.0, returned row can contain non-`null` fields if some of CSV column values were parsed and converted to desired types successfully. --- End diff -- Ah, `from_csv` and `to_csv` are added in 3.0 so it's intentionally not mentioned. BTW, I think CSV functionalities can only have `StructType` so maybe we don't have to mention. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239049398 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -163,6 +171,8 @@ object SQLMetrics { Utils.bytesToString } else if (metricsType == TIMING_METRIC) { Utils.msDurationToString + } else if (metricsType == NANO_TIMING_METRIC) { +duration => Utils.msDurationToString(duration / 10) --- End diff -- Sorry...Sorry for this, change it to `1000 / 1000` as other place do for safety. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239049121 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -78,6 +78,7 @@ object SQLMetrics { private val SUM_METRIC = "sum" private val SIZE_METRIC = "size" private val TIMING_METRIC = "timing" + private val NANO_TIMING_METRIC = "nanosecond" --- End diff -- Done in cf35b9f. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5767/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239049030 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -50,3 +50,57 @@ private[spark] trait ShuffleWriteMetricsReporter { private[spark] def decBytesWritten(v: Long): Unit private[spark] def decRecordsWritten(v: Long): Unit } + + +/** + * A proxy class of ShuffleWriteMetricsReporter which proxy all metrics updating to the input + * reporters. + */ +private[spark] class GroupedShuffleWriteMetricsReporter( +reporters: Seq[ShuffleWriteMetricsReporter]) extends ShuffleWriteMetricsReporter { + override private[spark] def incBytesWritten(v: Long): Unit = { +reporters.foreach(_.incBytesWritten(v)) + } + override private[spark] def decRecordsWritten(v: Long): Unit = { +reporters.foreach(_.decRecordsWritten(v)) + } + override private[spark] def incRecordsWritten(v: Long): Unit = { +reporters.foreach(_.incRecordsWritten(v)) + } + override private[spark] def incWriteTime(v: Long): Unit = { +reporters.foreach(_.incWriteTime(v)) + } + override private[spark] def decBytesWritten(v: Long): Unit = { +reporters.foreach(_.decBytesWritten(v)) + } +} + + +/** + * A proxy class of ShuffleReadMetricsReporter which proxy all metrics updating to the input + * reporters. + */ +private[spark] class GroupedShuffleReadMetricsReporter( --- End diff -- Got it, thanks for your guidance, revert to old approach and just little changes for `SQLShuffleReadMetricsReporter` which followed https://github.com/apache/spark/pull/23147. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239048356 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -50,3 +50,57 @@ private[spark] trait ShuffleWriteMetricsReporter { private[spark] def decBytesWritten(v: Long): Unit private[spark] def decRecordsWritten(v: Long): Unit } + + +/** + * A proxy class of ShuffleWriteMetricsReporter which proxy all metrics updating to the input + * reporters. + */ +private[spark] class GroupedShuffleWriteMetricsReporter( --- End diff -- Thanks for your guidance Reynold and Wenchen, I choose the second implementation, it takes account of both less heavy option and similar use patten as `SQLShuffleReadMetricsReporter`. Done in cf35b9f. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23207 **[Test build #99722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99722/testReport)** for PR 23207 at commit [`cf35b9f`](https://github.com/apache/spark/commit/cf35b9f948f174a5726a7feba611224c4ac495e7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99700/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99700/testReport)** for PR 22683 at commit [`8cc05a5`](https://github.com/apache/spark/commit/8cc05a57e8ecaa3e2a2f67d125b12645bb4eb3a2). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23235 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23235 **[Test build #99720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99720/testReport)** for PR 23235 at commit [`8c115f7`](https://github.com/apache/spark/commit/8c115f7871d4db66b13ee21ea3a1231f7153791e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23235 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99720/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99701/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23228: [MINOR][DOC]The condition description of serialized shuf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23228 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99703/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23228: [MINOR][DOC]The condition description of serialized shuf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23228 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23222 **[Test build #99701 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99701/testReport)** for PR 23222 at commit [`1270e89`](https://github.com/apache/spark/commit/1270e89026d80c862137c03edbeee53e56f3ed6d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23228: [MINOR][DOC]The condition description of serialized shuf...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23228 **[Test build #99703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99703/testReport)** for PR 23228 at commit [`d5dadbf`](https://github.com/apache/spark/commit/d5dadbf30d5429c36ec3d5c2845a71c2717fd6f3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23235 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5765/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23236 **[Test build #99721 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99721/testReport)** for PR 23236 at commit [`3c4ee75`](https://github.com/apache/spark/commit/3c4ee75c4d0585702cd87cc4df9af74e235bb431). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23236 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5766/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23235 @cloud-fan Please, have a look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23235 **[Test build #99720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99720/testReport)** for PR 23235 at commit [`8c115f7`](https://github.com/apache/spark/commit/8c115f7871d4db66b13ee21ea3a1231f7153791e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23236 cc @BryanCutler and @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial resul...
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/23235 [SPARK-26151][SQL][FOLLOWUP] Return partial results for bad CSV records ## What changes were proposed in this pull request? Updated SQL migration guide according to changes in https://github.com/apache/spark/pull/23120 You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 failuresafe-partial-result-followup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23235.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23235 commit 8c115f7871d4db66b13ee21ea3a1231f7153791e Author: Maxim Gekk Date: 2018-12-05T12:13:26Z Updating the migration guide --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23236 (cc @squito as well since it's from #23111) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23236: [SPARK-26275][PYTHON][ML] Increases timeout for Streamin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23236 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23236: [SPARK-26275][PYTHON][ML] Increases timeout for S...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/23236 [SPARK-26275][PYTHON][ML] Increases timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction test ## What changes were proposed in this pull request? Looks this test is flaky https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99704/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99569/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99644/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99548/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99454/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99609/console ``` == FAIL: test_training_and_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests) Test that the model improves on toy data with no. of batches -- Traceback (most recent call last): File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 367, in test_training_and_prediction self._eventually(condition) File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 78, in _eventually % (timeout, lastValue)) AssertionError: Test failed due to timeout after 30 sec, with last condition returning: Latest errors: 0.67, 0.71, 0.78, 0.7, 0.75, 0.74, 0.73, 0.69, 0.62, 0.71, 0.69, 0.75, 0.72, 0.77, 0.71, 0.74 -- Ran 13 tests in 185.051s FAILED (failures=1, skipped=1) ``` This looks happening after increasing the parallelism in Jenkins to speed up at https://github.com/apache/spark/pull/23111. I am able to reproduce this manually when the resource usage is heavy (with manual decrease of timeout). ## How was this patch tested? Manually tested by ``` cd python ./run-tests --testnames 'pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction' --python-executables=python ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-26275 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23236.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23236 commit 3c4ee75c4d0585702cd87cc4df9af74e235bb431 Author: Hyukjin Kwon Date: 2018-12-05T12:17:21Z Increases timeout for StreamingLogisticRegressionWithSGDTests.test_training_and_prediction test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23235 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23235: [SPARK-26151][SQL][FOLLOWUP] Return partial results for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23235 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23120 The PR https://github.com/apache/spark/pull/23235 updates the sql migration guide --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23227 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23227 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5764/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23227 **[Test build #99719 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99719/testReport)** for PR 23227 at commit [`5cb416d`](https://github.com/apache/spark/commit/5cb416df5f03b0d750c83e1a8a344b8ea44b1735). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23227 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99704/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23227 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23227 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23227: [SPARK-26271][FOLLOW-UP][SQL] remove unuse object SparkP...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23227 **[Test build #99704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99704/testReport)** for PR 23227 at commit [`5cb416d`](https://github.com/apache/spark/commit/5cb416df5f03b0d750c83e1a8a344b8ea44b1735). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23232: [SPARK-26233][SQL][BACKPORT-2.4] CheckOverflow when enco...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23232 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5763/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23232: [SPARK-26233][SQL][BACKPORT-2.4] CheckOverflow when enco...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23232 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23233: [SPARK-26233][SQL][BACKPORT-2.3] CheckOverflow when enco...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23233 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5762/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23233: [SPARK-26233][SQL][BACKPORT-2.3] CheckOverflow when enco...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23233 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23234: [SPARK-26233][SQL][BACKPORT-2.2] CheckOverflow when enco...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23234 **[Test build #99718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99718/testReport)** for PR 23234 at commit [`930c510`](https://github.com/apache/spark/commit/930c51029b845c74357305e7ec30a4f2e6ea748a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23233: [SPARK-26233][SQL][BACKPORT-2.3] CheckOverflow when enco...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23233 **[Test build #99717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99717/testReport)** for PR 23233 at commit [`a1e7744`](https://github.com/apache/spark/commit/a1e77445c2675137fbcddf73181c47469f159dbf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23234: [SPARK-26233][SQL][BACKPORT-2.2] CheckOverflow when enco...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23234 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5761/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23234: [SPARK-26233][SQL][BACKPORT-2.2] CheckOverflow when enco...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23234 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23210: [SPARK-26233][SQL] CheckOverflow when encoding a decimal...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/23210 thanks @cloud-fan @dongjoon-hyun, I created the PRs for the backports. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org