[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20477 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20477#discussion_r166175748 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -36,11 +38,14 @@ import org.apache.spark.sql.types.StructType */ case class DataSourceV2ScanExec( fullOutput: Seq[AttributeReference], -@transient reader: DataSourceReader) +@transient reader: DataSourceReader, +@transient sourceClass: Class[_ <: DataSourceV2]) extends LeafExecNode with DataSourceReaderHolder with ColumnarBatchScan { override def canEqual(other: Any): Boolean = other.isInstanceOf[DataSourceV2ScanExec] + override def simpleString: String = s"Scan $metadataString" --- End diff -- I've replied on that PR. I don't think overwriting `nodeName` is the right way to fix the UI issue, as we need to overwrite more methods. We can discuss more on that PR about this problem, but it should not block this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/20477#discussion_r165728696 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -36,11 +38,14 @@ import org.apache.spark.sql.types.StructType */ case class DataSourceV2ScanExec( fullOutput: Seq[AttributeReference], -@transient reader: DataSourceReader) +@transient reader: DataSourceReader, +@transient sourceClass: Class[_ <: DataSourceV2]) extends LeafExecNode with DataSourceReaderHolder with ColumnarBatchScan { override def canEqual(other: Any): Boolean = other.isInstanceOf[DataSourceV2ScanExec] + override def simpleString: String = s"Scan $metadataString" --- End diff -- +1 for overriding nodeName. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20477#discussion_r165726915 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceReaderHolder.scala --- @@ -65,4 +73,23 @@ trait DataSourceReaderHolder { lazy val output: Seq[Attribute] = reader.readSchema().map(_.name).map { name => fullOutput.find(_.name == name).get } + + def metadataString: String = { +val entries = scala.collection.mutable.ArrayBuffer.empty[(String, String)] +if (filters.nonEmpty) entries += "PushedFilter" -> filters.mkString("[", ", ", "]") + +val outputStr = Utils.truncatedString(output, "[", ", ", "]") + +val entriesStr = if (entries.nonEmpty) { + Utils.truncatedString(entries.map { +case (key, value) => key + ": " + StringUtils.abbreviate(redact(value), 100) + }, " (", ", ", ")") +} else "" --- End diff -- Nit. style --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20477#discussion_r165726645 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExec.scala --- @@ -36,11 +38,14 @@ import org.apache.spark.sql.types.StructType */ case class DataSourceV2ScanExec( fullOutput: Seq[AttributeReference], -@transient reader: DataSourceReader) +@transient reader: DataSourceReader, +@transient sourceClass: Class[_ <: DataSourceV2]) extends LeafExecNode with DataSourceReaderHolder with ColumnarBatchScan { override def canEqual(other: Any): Boolean = other.isInstanceOf[DataSourceV2ScanExec] + override def simpleString: String = s"Scan $metadataString" --- End diff -- For your info, https://github.com/apache/spark/pull/20226/files#diff-3e1258979e16f72a829abb8a1cd88bda is also updating the output of the explain. Overriding the nodeName looks better for UI. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20477: [SPARK-23303][SQL] improve the explain result for...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/20477 [SPARK-23303][SQL] improve the explain result for data source v2 relations ## What changes were proposed in this pull request? The current explain result for data source v2 relation is unreadable: ``` == Parsed Logical Plan == 'Filter ('i > 6) +- AnalysisBarrier +- Project [j#1] +- DataSourceV2Relation [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940 == Analyzed Logical Plan == j: int Project [j#1] +- Filter (i#0 > 6) +- Project [j#1, i#0] +- DataSourceV2Relation [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940 == Optimized Logical Plan == Project [j#1] +- Filter isnotnull(i#0) +- DataSourceV2Relation [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940 == Physical Plan == *(1) Project [j#1] +- *(1) Filter isnotnull(i#0) +- *(1) DataSourceV2Scan [i#0, j#1], org.apache.spark.sql.sources.v2.AdvancedDataSourceV2$Reader@3b415940 ``` after this PR ``` == Parsed Logical Plan == 'Project [unresolvedalias('j, None)] +- AnalysisBarrier +- Relation SimpleDataSourceV2[i#0, j#1] == Analyzed Logical Plan == j: int Project [j#1] +- Relation SimpleDataSourceV2[i#0, j#1] == Optimized Logical Plan == Project [j#1] +- Relation SimpleDataSourceV2[i#0, j#1] == Physical Plan == *(1) Project [j#1] +- *(1) Scan SimpleDataSourceV2[i#0, j#1] ``` --- ``` == Parsed Logical Plan == 'Filter ('i > 3) +- AnalysisBarrier +- Relation AdvancedDataSourceV2[i#0, j#1] == Analyzed Logical Plan == i: int, j: int Filter (i#0 > 3) +- Relation AdvancedDataSourceV2[i#0, j#1] == Optimized Logical Plan == Relation AdvancedDataSourceV2[i#0, j#1] == Physical Plan == *(1) Scan AdvancedDataSourceV2[i#0, j#1] (PushedFilter: [IsNotNull(i), GreaterThan(i,3)]) ``` ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark explain Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20477.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20477 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org