[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23218 **[Test build #4452 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4452/testReport)** for PR 23218 at commit [`b667d37`](https://github.com/apache/spark/commit/b667d37e9ee2d8cdce459806925cdc0fe725b7bf). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23221 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5743/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23221 **[Test build #99691 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99691/testReport)** for PR 23221 at commit [`e58fc91`](https://github.com/apache/spark/commit/e58fc919355c48d2d3b1cacb4d0ee18036cacbc6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23221 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238895286 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -144,9 +144,10 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { val (comments, code) = input.split("\n").partition(_.startsWith("--")) // Runs all the tests on both codegen-only and interpreter modes -val codegenConfigSets = Array(CODEGEN_ONLY, NO_CODEGEN).map { - case codegenFactoryMode => -Array(SQLConf.CODEGEN_FACTORY_MODE.key -> codegenFactoryMode.toString) +val codegenConfigSets = Array(("false", "NO_CODEGEN"), ("true", "CODEGEN_ONLY")).map { --- End diff -- I will check the time later, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238894837 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala --- @@ -535,4 +535,98 @@ class UnsafeRowConverterSuite extends SparkFunSuite with Matchers with PlanTestB assert(unsafeRow.getSizeInBytes == 8 + 8 * 2 + roundedSize(field1.getSizeInBytes) + roundedSize(field2.getSizeInBytes)) } + + testBothCodegenAndInterpreted("SPARK-25374 converts back into safe representation") { +def convertBackToInternalRow(inputRow: InternalRow, fields: Array[DataType]): InternalRow = { + val unsafeProj = UnsafeProjection.create(fields) + val unsafeRow = unsafeProj(inputRow) + val safeProj = SafeProjection.create(fields) + safeProj(unsafeRow) +} + +// Simple tests +val inputRow = InternalRow.fromSeq(Seq( + false, 3.toByte, 15.toShort, -83, 129L, 1.0f, 8.0, UTF8String.fromString("test"), + Decimal(255), CalendarInterval.fromString("interval 1 day"), Array[Byte](1, 2) +)) +val fields1 = Array( + BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, + DoubleType, StringType, DecimalType.defaultConcreteType, CalendarIntervalType, + BinaryType) + +assert(convertBackToInternalRow(inputRow, fields1) === inputRow) + +// Array tests +val arrayRow = InternalRow.fromSeq(Seq( + createArray(1, 2, 3), + createArray( +createArray(Seq("a", "b", "c").map(UTF8String.fromString): _*), +createArray(Seq("d").map(UTF8String.fromString): _*)) +)) +val fields2 = Array[DataType]( + ArrayType(IntegerType), + ArrayType(ArrayType(StringType))) + +assert(convertBackToInternalRow(arrayRow, fields2) === arrayRow) + +// Struct tests +val structRow = InternalRow.fromSeq(Seq( + InternalRow.fromSeq(Seq[Any](1, 4.0)), + InternalRow.fromSeq(Seq( +UTF8String.fromString("test"), +InternalRow.fromSeq(Seq( + 1, + createArray(Seq("2", "3").map(UTF8String.fromString): _*) +)) + )) +)) +val fields3 = Array[DataType]( + StructType( +StructField("c0", IntegerType) :: +StructField("c1", DoubleType) :: +Nil), + StructType( +StructField("c2", StringType) :: +StructField("c3", StructType( + StructField("c4", IntegerType) :: + StructField("c5", ArrayType(StringType)) :: + Nil)) :: +Nil)) + +assert(convertBackToInternalRow(structRow, fields3) === structRow) + +// Map tests +val mapRow = InternalRow.fromSeq(Seq( + createMap(Seq("k1", "k2").map(UTF8String.fromString): _*)(1, 2), + createMap( +createMap(3, 5)(Seq("v1", "v2").map(UTF8String.fromString): _*), +createMap(7, 9)(Seq("v3", "v4").map(UTF8String.fromString): _*) + )( +createMap(Seq("k3", "k4").map(UTF8String.fromString): _*)(3.toShort, 4.toShort), +createMap(Seq("k5", "k6").map(UTF8String.fromString): _*)(5.toShort, 6.toShort) + ))) +val fields4 = Array[DataType]( + MapType(StringType, IntegerType), + MapType(MapType(IntegerType, StringType), MapType(StringType, ShortType))) + +val mapResultRow = convertBackToInternalRow(mapRow, fields4).toSeq(fields4) +val mapExpectedRow = mapRow.toSeq(fields4) +// Since `ArrayBasedMapData` does not override `equals` and `hashCode`, --- End diff -- Aha, thanks. I remember that its related to SPARK-18134. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23221 I applied my own feedback to the original PR and will merge pending tests (since it was already reviewed), unless someone comments first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23221: [SPARK-24243][CORE] Expose exceptions from InProc...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/23221 [SPARK-24243][CORE] Expose exceptions from InProcessAppHandle ## What changes were proposed in this pull request? Adds a new method to SparkAppHandle called getError which returns the exception (if present) that caused the underlying Spark app to fail. ## How was this patch tested? New tests added to SparkLauncherSuite for the new method. Closes #21849 You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-24243 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23221.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23221 commit 9240b77078936dceaaa4a68f6a54c5c0c16aab73 Author: Sahil Takiar Date: 2018-07-23T17:31:24Z [SPARK-24243][CORE] Expose exceptions from InProcessAppHandle Adds a new method to `SparkAppHandle` called `getError` which returns the exception (if present) that caused the underlying Spark app to fail. New tests added to `SparkLauncherSuite` for the new method. commit 29f1436e14b453b41b055be6b4e124c5eae7d8ff Author: Marcelo Vanzin Date: 2018-12-05T00:37:58Z Merge branch 'master' into SPARK-24243 commit e58fc919355c48d2d3b1cacb4d0ee18036cacbc6 Author: Marcelo Vanzin Date: 2018-12-05T00:41:44Z Feedback. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23220: [SPARK-25877][k8s] Move all feature logic to feature cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23220 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23220: [SPARK-25877][k8s] Move all feature logic to feature cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23220 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5739/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23220: [SPARK-25877][k8s] Move all feature logic to feature cla...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23220 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/5739/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22721 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22721 **[Test build #99690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99690/testReport)** for PR 22721 at commit [`c91c154`](https://github.com/apache/spark/commit/c91c15493b30e49e81fbf9097b37bf0b4bdafc79). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22721 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5742/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99685/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23216: [SPARK-26264][CORE]It is better to add @transient...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23216#discussion_r238892679 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala --- @@ -56,7 +56,7 @@ private[spark] class ResultTask[T, U]( stageAttemptId: Int, taskBinary: Broadcast[Array[Byte]], partition: Partition, -locs: Seq[TaskLocation], +@transient private var locs: Seq[TaskLocation], --- End diff -- why is it `var` BTW? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5740/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23108 **[Test build #99688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99688/testReport)** for PR 23108 at commit [`fe472c8`](https://github.com/apache/spark/commit/fe472c81a21700ff52c84808437b85d02d6871ed). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23220: [SPARK-25877][k8s] Move all feature logic to feature cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23220 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99687/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23220: [SPARK-25877][k8s] Move all feature logic to feature cla...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23220 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22721 **[Test build #99689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99689/testReport)** for PR 22721 at commit [`c601b67`](https://github.com/apache/spark/commit/c601b674ec1c0e288c0b3852dcdb511c64bfa6a5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23196 **[Test build #99685 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99685/testReport)** for PR 23196 at commit [`57600e2`](https://github.com/apache/spark/commit/57600e2e41d8caa99afab161b16afe02ef640375). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `logError(s\"Failed to load class $childMainClass.\")` * `class CSVInferSchema(val options: CSVOptions) extends Serializable ` * `class InterpretedSafeProjection(expressions: Seq[Expression]) extends Projection ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22721 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5741/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23220: [SPARK-25877][k8s] Move all feature logic to feature cla...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23220 **[Test build #99687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99687/testReport)** for PR 23220 at commit [`a13bafd`](https://github.com/apache/spark/commit/a13bafd8e48d8a03fa35c6ff6817f03908f17e2d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22721 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23220: [SPARK-25877][k8s] Move all feature logic to feature cla...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23220 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/5739/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/22721#discussion_r238891454 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -45,6 +45,8 @@ object CommandUtils extends Logging { } else { catalog.alterTableStats(table.identifier, None) } +} else { + catalog.refreshTable(table.identifier) --- End diff -- Sure. move to DDLs is better. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23220: [SPARK-25877][k8s] Move all feature logic to feature cla...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23220 **[Test build #99687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99687/testReport)** for PR 23220 at commit [`a13bafd`](https://github.com/apache/spark/commit/a13bafd8e48d8a03fa35c6ff6817f03908f17e2d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23220: [SPARK-25877][k8s] Move all feature logic to feat...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/23220 [SPARK-25877][k8s] Move all feature logic to feature classes. This change makes the driver and executor builders a lot simpler by encapsulating almost all feature logic into the respective feature classes. The only logic that remains is the creation of the initial pod, which needs to happen before anything else so is better to be left in the builder class. Most feature classes already behave fine when the config has nothing they should handle, but a few minor tweaks had to be added. Unit tests were also updated or added to account for these. The builder suites were simplified a lot and just test the remaining pod-related code in the builders themselves. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-25877 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23220.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23220 commit a13bafd8e48d8a03fa35c6ff6817f03908f17e2d Author: Marcelo Vanzin Date: 2018-12-04T19:42:31Z [SPARK-25877][k8s] Move all feature logic to feature classes. This change makes the driver and executor builders a lot simpler by encapsulating almost all feature logic into the respective feature classes. The only logic that remains is the creation of the initial pod, which needs to happen before anything else so is better to be left in the builder class. Most feature classes already behave fine when the config has nothing they should handle, but a few minor tweaks had to be added. Unit tests were also updated or added to account for these. The builder suites were simplified a lot and just test the remaining pod-related code in the builders themselves. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23203: [SPARK-26252][PYTHON] Add support to run specific...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23203#discussion_r238887812 --- Diff: python/run-tests.py --- @@ -93,17 +93,18 @@ def run_individual_python_test(target_dir, test_name, pyspark_python): "pyspark-shell" ] env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args) - -LOGGER.info("Starting test(%s): %s", pyspark_python, test_name) +str_test_name = " ".join(test_name) +LOGGER.info("Starting test(%s): %s", pyspark_python, str_test_name) start_time = time.time() try: per_test_output = tempfile.TemporaryFile() retcode = subprocess.Popen( -[os.path.join(SPARK_HOME, "bin/pyspark"), test_name], --- End diff -- Oh, yea. Looks that's going to reduce the diff. Let me try. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22612 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22612 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99684/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22612 **[Test build #99684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99684/testReport)** for PR 22612 at commit [`0a7402e`](https://github.com/apache/spark/commit/0a7402e92e87aef13d2f91043083b92abfa80233). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5738/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23169 **[Test build #99686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99686/testReport)** for PR 23169 at commit [`22fe117`](https://github.com/apache/spark/commit/22fe117656ea004757efaffd847f81dc01df8433). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23216 Are you sure it's even a field in the class? it looks like it's only used to define this: ``` @transient private[this] val preferredLocs: Seq[TaskLocation] = { if (locs == null) Nil else locs.toSet.toSeq } ``` I'd expect Scala would not generate a field. Indeed the thing it is used to make is transient. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23218 **[Test build #4452 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4452/testReport)** for PR 23218 at commit [`b667d37`](https://github.com/apache/spark/commit/b667d37e9ee2d8cdce459806925cdc0fe725b7bf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r238871523 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -181,62 +180,39 @@ case class RelationConversions( conf: SQLConf, sessionCatalog: HiveSessionCatalog) extends Rule[LogicalPlan] { private def isConvertible(relation: HiveTableRelation): Boolean = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) -serde.contains("parquet") && conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || - serde.contains("orc") && conf.getConf(HiveUtils.CONVERT_METASTORE_ORC) +isConvertible(relation.tableMeta) } - // Return true for Apache ORC and Hive ORC-related configuration names. - // Note that Spark doesn't support configurations like `hive.merge.orcfile.stripe.level`. - private def isOrcProperty(key: String) = -key.startsWith("orc.") || key.contains(".orc.") - - private def isParquetProperty(key: String) = -key.startsWith("parquet.") || key.contains(".parquet.") - - private def convert(relation: HiveTableRelation): LogicalRelation = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) - -// Consider table and storage properties. For properties existing in both sides, storage -// properties will supersede table properties. -if (serde.contains("parquet")) { - val options = relation.tableMeta.properties.filterKeys(isParquetProperty) ++ -relation.tableMeta.storage.properties + (ParquetOptions.MERGE_SCHEMA -> - conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING).toString) - sessionCatalog.metastoreCatalog -.convertToLogicalRelation(relation, options, classOf[ParquetFileFormat], "parquet") -} else { - val options = relation.tableMeta.properties.filterKeys(isOrcProperty) ++ -relation.tableMeta.storage.properties - if (conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native") { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.execution.datasources.orc.OrcFileFormat], - "orc") - } else { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.hive.orc.OrcFileFormat], - "orc") - } -} + private def isConvertible(tableMeta: CatalogTable): Boolean = { +val serde = tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) +serde.contains("parquet") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || + serde.contains("orc") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_ORC) } + private val metastoreCatalog = sessionCatalog.metastoreCatalog + override def apply(plan: LogicalPlan): LogicalPlan = { plan resolveOperators { // Write path case InsertIntoTable(r: HiveTableRelation, partition, query, overwrite, ifPartitionNotExists) // Inserting into partitioned table is not supported in Parquet/Orc data source (yet). if query.resolved && DDLUtils.isHiveTable(r.tableMeta) && !r.isPartitioned && isConvertible(r) => -InsertIntoTable(convert(r), partition, query, overwrite, ifPartitionNotExists) +InsertIntoTable(metastoreCatalog.convert(r), partition, + query, overwrite, ifPartitionNotExists) // Read path case relation: HiveTableRelation if DDLUtils.isHiveTable(relation.tableMeta) && isConvertible(relation) => -convert(relation) +metastoreCatalog.convert(relation) + + // CTAS + case CreateTable(tableDesc, mode, Some(query)) + if DDLUtils.isHiveTable(tableDesc) && tableDesc.partitionColumnNames.isEmpty && +isConvertible(tableDesc) => --- End diff -- Add an internal SQL conf here? The perf impact is huge. It could be better or worse. Also add it to the migration guide and explain the behavior changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23159: [SPARK-26191][SQL] Control truncation of Spark pl...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23159#discussion_r238869530 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1777,7 +1777,7 @@ class Analyzer( case p if p.expressions.exists(hasGenerator) => throw new AnalysisException("Generators are not supported outside the SELECT clause, but " + - "got: " + p.simpleString) + "got: " + p.simpleString((SQLConf.get.maxToStringFields))) --- End diff -- Nit: are there extra parens here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23203: [SPARK-26252][PYTHON] Add support to run specific...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/23203#discussion_r238868565 --- Diff: python/run-tests.py --- @@ -93,17 +93,18 @@ def run_individual_python_test(target_dir, test_name, pyspark_python): "pyspark-shell" ] env["PYSPARK_SUBMIT_ARGS"] = " ".join(spark_args) - -LOGGER.info("Starting test(%s): %s", pyspark_python, test_name) +str_test_name = " ".join(test_name) +LOGGER.info("Starting test(%s): %s", pyspark_python, str_test_name) start_time = time.time() try: per_test_output = tempfile.TemporaryFile() retcode = subprocess.Popen( -[os.path.join(SPARK_HOME, "bin/pyspark"), test_name], --- End diff -- Just a thought, could you leave `test_name` as a string and then change this line to `[os.path.join(SPARK_HOME, "bin/pyspark")] + test_name.split(),`? I think it would be a little more simple and wouldn't need `str_test_name`, wdyt? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23218 **[Test build #4451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4451/testReport)** for PR 23218 at commit [`b667d37`](https://github.com/apache/spark/commit/b667d37e9ee2d8cdce459806925cdc0fe725b7bf). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23092: [SPARK-26094][CORE][STREAMING] createNonEcFile cr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23092 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23159 > Rather than change every single call to this method, if this should generally be the value of the argument, then why not make it the default value or something? New parameter aims to solve the problem when there are multiple callers, and each of them needs different maximum fields. So, a feasible approach is to propagate `maxFields` from callers to `truncatedString`. Changing global SQL config does not solve the problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23092: [SPARK-26094][CORE][STREAMING] createNonEcFile creates p...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23092 Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23196 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5737/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22721#discussion_r238864486 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala --- @@ -45,6 +45,8 @@ object CommandUtils extends Logging { } else { catalog.alterTableStats(table.identifier, None) } +} else { + catalog.refreshTable(table.identifier) --- End diff -- Could we move this to the DDLs that we need to refresh the table? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23196 **[Test build #99685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99685/testReport)** for PR 23196 at commit [`57600e2`](https://github.com/apache/spark/commit/57600e2e41d8caa99afab161b16afe02ef640375). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @gaborgsomogyi @steveloughran `GlobExpander` only looks like handling `{}` pattern. We need to still deal with `*` and `?` which can't be expanded like this. It would only work if we would be OK with restricting descendants of multiple paths (for now we restrict descendants of one path), so while it would help fixing the bug of current patch, it might be still too restrictive. I think changing Hadoop version because of this costs too much. If we really would like to go, only viable solution is copying the code. (Actually we can also just reimplement it since its requirements are like a kind of assignment, though we may end up with similar code.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99681/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23215 **[Test build #99681 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99681/testReport)** for PR 23215 at commit [`4060c30`](https://github.com/apache/spark/commit/4060c30be00f0026c5c8e7304244bab2b70537f9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23196: [SPARK-26243][SQL] Use java.time API for parsing ...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/23196#discussion_r238853314 --- Diff: sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala --- @@ -49,8 +49,8 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter { override def beforeAll() { super.beforeAll() TestHive.setCacheTables(true) -// Timezone is fixed to America/Los_Angeles for those timezone sensitive tests (timestamp_*) -TimeZone.setDefault(TimeZone.getTimeZone("America/Los_Angeles")) +// Timezone is fixed to GMT for those timezone sensitive tests (timestamp_*) --- End diff -- While porting on new parser/formatter, I faced to 2problems at least: - The time zone from SQL config is not taken into account on parsing at all. Basically used functions take default time zone from jvm settings. It could be fixed by `TimeZone.setDefault` or using absolute values. - Round trip in parsing a date to `DateType` and back to a date as a string could give different string because `DateType` stores only days (as `Int`) since epoch (in `UTC`). And such representation loses time zone offset. So, exact matching is impossible due to lack of information. For example, roundtrip converting for `TimestampType` works perfectly. This is the case for the changes. Previously, it worked because the specified time zone is not used at all (did not impact on number of days while converting a string to `DateType`). With new parser/formatter, it becomes matter, and I have to change time zone to `GMT` to eliminate the problem of loosing timezone offsets (it is zero for `GMT`). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23217 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99680/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23217 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23217 **[Test build #99680 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99680/testReport)** for PR 23217 at commit [`38f3bfa`](https://github.com/apache/spark/commit/38f3bfa237570a3204c355774bb323973f962d67). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238845399 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -299,12 +312,25 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value") val df2 = (1 to 10).map(i => (i, i.toString)).toSeq.toDF("key", "value") // Assume the execution plan is - // ... -> ShuffledHashJoin(nodeId = 1) -> Project(nodeId = 0) + // Project(nodeId = 0) + // +- ShuffledHashJoin(nodeId = 1) + // :- Exchange(nodeId = 2) + // : +- Project(nodeId = 3) + // : +- LocalTableScan(nodeId = 4) + // +- Exchange(nodeId = 5) + // +- Project(nodeId = 6) + // +- LocalTableScan(nodeId = 7) val df = df1.join(df2, "key") testSparkPlanMetrics(df, 1, Map( 1L -> (("ShuffledHashJoin", Map( "number of output rows" -> 2L, - "avg hash probe (min, med, max)" -> "\n(1, 1, 1)" + "avg hash probe (min, med, max)" -> "\n(1, 1, 1)"))), +2L -> (("Exchange", Map( + "shuffle records written" -> 2L, + "records read" -> 2L))), --- End diff -- is this always going to be the same as "shuffle records written" ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22957 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99682/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22957 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238845029 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -170,13 +172,23 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared val df = testData2.groupBy().agg(collect_set('a)) // 2 partitions testSparkPlanMetrics(df, 1, Map( 2L -> (("ObjectHashAggregate", Map("number of output rows" -> 2L))), + 1L -> (("Exchange", Map( +"shuffle records written" -> 2L, +"records read" -> 2L, +"local blocks fetched" -> 2L, --- End diff -- i think we should be consistent and name these "read", rather than "fetch". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22957 **[Test build #99682 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99682/testReport)** for PR 22957 at commit [`bf1d04a`](https://github.com/apache/spark/commit/bf1d04a819855737d1096b61b1c3d46010f50dee). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23159 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23159 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99678/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23159 **[Test build #99678 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99678/testReport)** for PR 23159 at commit [`b6fa959`](https://github.com/apache/spark/commit/b6fa95981970788a09657b0a29712b53c01831db). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238843017 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -163,6 +171,8 @@ object SQLMetrics { Utils.bytesToString } else if (metricsType == TIMING_METRIC) { Utils.msDurationToString + } else if (metricsType == NANO_TIMING_METRIC) { +duration => Utils.msDurationToString(duration / 10) --- End diff -- is this the right conversion from nanosecs to millisecs? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238842276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -78,6 +78,7 @@ object SQLMetrics { private val SUM_METRIC = "sum" private val SIZE_METRIC = "size" private val TIMING_METRIC = "timing" + private val NANO_TIMING_METRIC = "nanosecond" --- End diff -- ns --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99676/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23207 **[Test build #99676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99676/testReport)** for PR 23207 at commit [`ca6c407`](https://github.com/apache/spark/commit/ca6c407929e62492a2c5233504efaeaf731f8cc9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238837000 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -50,3 +50,57 @@ private[spark] trait ShuffleWriteMetricsReporter { private[spark] def decBytesWritten(v: Long): Unit private[spark] def decRecordsWritten(v: Long): Unit } + + +/** + * A proxy class of ShuffleWriteMetricsReporter which proxy all metrics updating to the input + * reporters. + */ +private[spark] class GroupedShuffleWriteMetricsReporter( +reporters: Seq[ShuffleWriteMetricsReporter]) extends ShuffleWriteMetricsReporter { + override private[spark] def incBytesWritten(v: Long): Unit = { +reporters.foreach(_.incBytesWritten(v)) + } + override private[spark] def decRecordsWritten(v: Long): Unit = { +reporters.foreach(_.decRecordsWritten(v)) + } + override private[spark] def incRecordsWritten(v: Long): Unit = { +reporters.foreach(_.incRecordsWritten(v)) + } + override private[spark] def incWriteTime(v: Long): Unit = { +reporters.foreach(_.incWriteTime(v)) + } + override private[spark] def decBytesWritten(v: Long): Unit = { +reporters.foreach(_.decBytesWritten(v)) + } +} + + +/** + * A proxy class of ShuffleReadMetricsReporter which proxy all metrics updating to the input + * reporters. + */ +private[spark] class GroupedShuffleReadMetricsReporter( --- End diff -- Again - I think your old approach is much better. No point creating a general util when there is only one implementation without any known future needs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238836448 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -50,3 +50,57 @@ private[spark] trait ShuffleWriteMetricsReporter { private[spark] def decBytesWritten(v: Long): Unit private[spark] def decRecordsWritten(v: Long): Unit } + + +/** + * A proxy class of ShuffleWriteMetricsReporter which proxy all metrics updating to the input + * reporters. + */ +private[spark] class GroupedShuffleWriteMetricsReporter( --- End diff -- I'd not create a general API here. Just put one in SQL similar to the read side that also calls the default one. It can be expensive to go through a seq for each record and bytes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99677/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23207 **[Test build #99677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99677/testReport)** for PR 23207 at commit [`fcd62b3`](https://github.com/apache/spark/commit/fcd62b390ba4b5e2b1b9c6138026ac6da1b78d1f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99683/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23169 **[Test build #99683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99683/testReport)** for PR 23169 at commit [`1b692a0`](https://github.com/apache/spark/commit/1b692a0444a1c0f1fc24a08241f24dd35e4c428b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class TreeNodeSuite extends SparkFunSuite with SQLHelper ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22535: [SPARK-17636][SQL][WIP] Parquet predicate pushdown in ne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22535 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99673/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22535: [SPARK-17636][SQL][WIP] Parquet predicate pushdown in ne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22535 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22535: [SPARK-17636][SQL][WIP] Parquet predicate pushdown in ne...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22535 **[Test build #99673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99673/testReport)** for PR 22535 at commit [`c95706f`](https://github.com/apache/spark/commit/c95706f60e4d576caca78a32000d4a7bbb12c141). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23216 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23216 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99674/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23216 **[Test build #99674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99674/testReport)** for PR 23216 at commit [`b3ede8b`](https://github.com/apache/spark/commit/b3ede8be1a9073f057cc46fb82eacd7fa3ec36c6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23218 **[Test build #4451 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4451/testReport)** for PR 23218 at commit [`b667d37`](https://github.com/apache/spark/commit/b667d37e9ee2d8cdce459806925cdc0fe725b7bf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21465#discussion_r238801573 --- Diff: python/pyspark/ml/classification.py --- @@ -1174,9 +1165,31 @@ def trees(self): return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))] +class GBTClassifierParams(GBTParams, HasVarianceImpurity): +""" +Private class to track supported GBTClassifier params. + +.. versionadded:: 3.0.0 +""" + +supportedLossTypes = ["logistic"] + +lossType = Param(Params._dummy(), "lossType", + "Loss function which GBT tries to minimize (case-insensitive). " + + "Supported options: " + ", ".join(supportedLossTypes), + typeConverter=TypeConverters.toString) + +@since("3.0.0") +def setLossType(self, value): --- End diff -- `setLossType` should be in the estimators, `getLossType` should be here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21465#discussion_r238808440 --- Diff: python/pyspark/ml/classification.py --- @@ -1174,9 +1165,31 @@ def trees(self): return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))] +class GBTClassifierParams(GBTParams, HasVarianceImpurity): --- End diff -- this should extend `TreeClassifierParams` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21465#discussion_r238809338 --- Diff: python/pyspark/ml/classification.py --- @@ -1242,40 +1255,36 @@ class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol [0.25..., 0.23..., 0.21..., 0.19..., 0.18...] >>> model.numClasses 2 +>>> gbt = gbt.setValidationIndicatorCol("validationIndicator") +>>> gbt.getValidationIndicatorCol() +'validationIndicator' +>>> gbt.getValidationTol() +0.01 .. versionadded:: 1.4.0 """ -lossType = Param(Params._dummy(), "lossType", - "Loss function which GBT tries to minimize (case-insensitive). " + - "Supported options: " + ", ".join(GBTParams.supportedLossTypes), - typeConverter=TypeConverters.toString) - -stepSize = Param(Params._dummy(), "stepSize", - "Step size (a.k.a. learning rate) in interval (0, 1] for shrinking " + - "the contribution of each estimator.", - typeConverter=TypeConverters.toFloat) - @keyword_only def __init__(self, featuresCol="features", labelCol="label", predictionCol="prediction", maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, lossType="logistic", - maxIter=20, stepSize=0.1, seed=None, subsamplingRate=1.0, - featureSubsetStrategy="all"): + maxIter=20, stepSize=0.1, seed=None, subsamplingRate=1.0, impurity="variance", --- End diff -- this is not the correct default impurity --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21465#discussion_r238801256 --- Diff: python/pyspark/ml/classification.py --- @@ -1174,9 +1165,31 @@ def trees(self): return [DecisionTreeClassificationModel(m) for m in list(self._call_java("trees"))] +class GBTClassifierParams(GBTParams, HasVarianceImpurity): +""" +Private class to track supported GBTClassifier params. + +.. versionadded:: 3.0.0 +""" + +supportedLossTypes = ["logistic"] + +lossType = Param(Params._dummy(), "lossType", + "Loss function which GBT tries to minimize (case-insensitive). " + + "Supported options: " + ", ".join(supportedLossTypes), + typeConverter=TypeConverters.toString) + +@since("3.0.0") --- End diff -- don't change the version, since we are just refactoring the base classes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21465: [SPARK-24333][ML][PYTHON]Add fit with validation ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/21465#discussion_r238809091 --- Diff: python/pyspark/ml/regression.py --- @@ -650,19 +650,20 @@ def getFeatureSubsetStrategy(self): return self.getOrDefault(self.featureSubsetStrategy) -class TreeRegressorParams(Params): +class HasVarianceImpurity(Params): --- End diff -- This shouldn't be changed, impurity is different for regression and classification, so the param needs to be defined in `TreeRegressorParams` and `TreeClassifierParams`, as it was already --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99672/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99671/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23218 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99679/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23218 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23215 **[Test build #99672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99672/testReport)** for PR 23215 at commit [`272bb1d`](https://github.com/apache/spark/commit/272bb1da8317883c8256e0484738b029bea9f9bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22612: [SPARK-24958] Add executors' process tree total memory i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22612 **[Test build #99684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99684/testReport)** for PR 22612 at commit [`0a7402e`](https://github.com/apache/spark/commit/0a7402e92e87aef13d2f91043083b92abfa80233). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org