[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r453272949 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.internal.connector + +import org.apache.spark.annotation.Evolving Review comment: @dongjoon-hyun Will do. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r453272871 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala ## @@ -94,22 +96,27 @@ trait FileScan extends Scan with Batch with SupportsReportStatistics with Loggin override def hashCode(): Int = getClass.hashCode() override def description(): String = { +val maxMetadataValueLength = 100 +val metadataStr = getMetaData().toSeq.sorted.map { + case (key, value) => +val redactedValue = + Utils.redact(sparkSession.sessionState.conf.stringRedactionPattern, value) +key + ": " + StringUtils.abbreviate(redactedValue, maxMetadataValueLength) +}.mkString(", ") +s"${this.getClass.getSimpleName} $metadataStr" + } + + override def getMetaData(): Map[String, String] = { val maxMetadataValueLength = 100 val locationDesc = fileIndex.getClass.getSimpleName + Utils.buildLocationMetadata(fileIndex.rootPaths, maxMetadataValueLength) -val metadata: Map[String, String] = Map( +Map( + "Format" -> s"${this.getClass.getSimpleName.replace("Scan", "").toLowerCase(Locale.ROOT)}", Review comment: @dongjoon-hyun Previously we used to print the scan node class name and in the new format, we print it in its own line. Please see the old output in the pr description. We have it printed as `ParquetScan`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r453272923 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala ## @@ -94,22 +96,27 @@ trait FileScan extends Scan with Batch with SupportsReportStatistics with Loggin override def hashCode(): Int = getClass.hashCode() override def description(): String = { +val maxMetadataValueLength = 100 Review comment: @dongjoon-hyun Sure. will make the change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r453272935 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala ## @@ -43,7 +44,32 @@ trait DataSourceV2ScanExecBase extends LeafExecNode { override def simpleString(maxFields: Int): String = { val result = s"$nodeName${truncatedString(output, "[", ", ", "]", maxFields)} ${scan.description()}" -Utils.redact(sqlContext.sessionState.conf.stringRedactionPattern, result) +redact(result) + } + + /** + * Redact the sensitive information in the given string. Review comment: @dongjoon-hyun Sure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r453272871 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala ## @@ -94,22 +96,27 @@ trait FileScan extends Scan with Batch with SupportsReportStatistics with Loggin override def hashCode(): Int = getClass.hashCode() override def description(): String = { +val maxMetadataValueLength = 100 +val metadataStr = getMetaData().toSeq.sorted.map { + case (key, value) => +val redactedValue = + Utils.redact(sparkSession.sessionState.conf.stringRedactionPattern, value) +key + ": " + StringUtils.abbreviate(redactedValue, maxMetadataValueLength) +}.mkString(", ") +s"${this.getClass.getSimpleName} $metadataStr" + } + + override def getMetaData(): Map[String, String] = { val maxMetadataValueLength = 100 val locationDesc = fileIndex.getClass.getSimpleName + Utils.buildLocationMetadata(fileIndex.rootPaths, maxMetadataValueLength) -val metadata: Map[String, String] = Map( +Map( + "Format" -> s"${this.getClass.getSimpleName.replace("Scan", "").toLowerCase(Locale.ROOT)}", Review comment: @dongjoon-hyun Previously we used to print the scan node class name and in the new format, we print it in its own line. Please see the old output in the pr description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r453272707 ## File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala ## @@ -360,6 +360,54 @@ class ExplainSuite extends ExplainSuiteHelper with DisableAdaptiveExecutionSuite } } } + + test("Explain formatted output for scan operator for datasource V2") { +withTempDir { dir => + Seq("parquet", "orc", "csv", "json").foreach { fmt => +val basePath = dir.getCanonicalPath + "/" + fmt +val pushFilterMaps = Map ( + "parquet" -> +"|PushedFilers: \\[.*\\(id\\), .*\\(value\\), .*\\(id,1\\), .*\\(value,2\\)\\]", + "orc" -> +"|PushedFilers: \\[.*\\(id\\), .*\\(value\\), .*\\(id,1\\), .*\\(value,2\\)\\]", + "csv" -> +"|PushedFilers: \\[IsNotNull\\(value\\), GreaterThan\\(value,2\\)\\]", + "json" -> +"|remove_marker" +) +val expected_plan_fragment1 = + s""" + |\\(1\\) BatchScan + |Output \\[2\\]: \\[value#x, id#x\\] + |DataFilters: \\[isnotnull\\(value#x\\), \\(value#x > 2\\)\\] + |Format: $fmt + |Location: InMemoryFileIndex\\[.*\\] + |PartitionFilters: \\[isnotnull\\(id#x\\), \\(id#x > 1\\)\\] + ${pushFilterMaps.get(fmt).get} + |ReadSchema: struct\\ + |""".stripMargin.replaceAll("\nremove_marker", "").trim Review comment: @dongjoon-hyun Please see my response above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r453272686 ## File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala ## @@ -360,6 +360,54 @@ class ExplainSuite extends ExplainSuiteHelper with DisableAdaptiveExecutionSuite } } } + + test("Explain formatted output for scan operator for datasource V2") { +withTempDir { dir => + Seq("parquet", "orc", "csv", "json").foreach { fmt => +val basePath = dir.getCanonicalPath + "/" + fmt +val pushFilterMaps = Map ( + "parquet" -> +"|PushedFilers: \\[.*\\(id\\), .*\\(value\\), .*\\(id,1\\), .*\\(value,2\\)\\]", + "orc" -> +"|PushedFilers: \\[.*\\(id\\), .*\\(value\\), .*\\(id,1\\), .*\\(value,2\\)\\]", + "csv" -> +"|PushedFilers: \\[IsNotNull\\(value\\), GreaterThan\\(value,2\\)\\]", + "json" -> +"|remove_marker" Review comment: @dongjoon-hyun I had tried and it didn't work for me. Perhaps there is a better way to do this. Basically, for JSON, i don't want a line printed for pushedFilters. Putting a `""` results in the following as expected output. Here i wanted to get rid of the empty line between `PartitionFilters` and `ReadSchema` ``` \(1\) BatchScan Output \[2\]: \[value#x, id#x\] DataFilters: \[isnotnull\(value#x\), \(value#x > 2\)\] Format: json Location: InMemoryFileIndex\[.*\] PartitionFilters: \[isnotnull\(id#x\), \(id#x > 1\)\] ReadSchema: struct\ ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r420550918 ## File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala ## @@ -343,6 +343,54 @@ class ExplainSuite extends ExplainSuiteHelper with DisableAdaptiveExecutionSuite assert(getNormalizedExplain(df1, FormattedMode) === getNormalizedExplain(df2, FormattedMode)) } } + + test("Explain formatted output for scan operator for datasource V2") { Review comment: +1. Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r420496754 ## File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala ## @@ -343,6 +343,54 @@ class ExplainSuite extends ExplainSuiteHelper with DisableAdaptiveExecutionSuite assert(getNormalizedExplain(df1, FormattedMode) === getNormalizedExplain(df2, FormattedMode)) } } + + test("Explain formatted output for scan operator for datasource V2") { Review comment: @cloud-fan Agree. Actually i had tried but could not get the V2 scan set up through SQL. Could you please tell me how to do it ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r420495962 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala ## @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.internal.connector + +import org.apache.spark.annotation.Evolving + +/** + * A mix in interface for {@link FileScan}. This can be used to report metadata + * for a file based scan operator. This is currently used for supporting formatted + * explain. + */ +@Evolving Review comment: @cloud-fan Thanks. will remove. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r418880068 ## File path: external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala ## @@ -65,4 +65,8 @@ case class AvroScan( } override def hashCode(): Int = super.hashCode() + + override def getMetaData(): Map[String, String] = { +super.metaData ++ Map("Format" -> "avro") Review comment: @maropu OK.. added a test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r418820695 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.internal.connector + +/** + * A mix in interface for {@link FileScan}. This can be used to report metadata + * for a file based scan operator. This is currently used for supporting formatted + * explain. + */ Review comment: @maropu On second thought, this is not an external interface, right ? So don't think we need any annotations here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r418791564 ## File path: external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala ## @@ -65,4 +65,8 @@ case class AvroScan( } override def hashCode(): Int = super.hashCode() + + override def getMetaData(): Map[String, String] = { +super.metaData ++ Map("Format" -> "avro") Review comment: @maropu Did you want a explain suite created in the avro external module ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r418791564 ## File path: external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala ## @@ -65,4 +65,8 @@ case class AvroScan( } override def hashCode(): Int = super.hashCode() + + override def getMetaData(): Map[String, String] = { +super.metaData ++ Map("Format" -> "avro") Review comment: @maropu Did you want a explain suite created in the avro module ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r418790252 ## File path: external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala ## @@ -65,4 +65,6 @@ case class AvroScan( } override def hashCode(): Int = super.hashCode() + + override def getMetaData(): Map[String, String] = super.metaData Review comment: @maropu Thanks a lot :-) I will make a change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r418763357 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.internal.connector + +/** + * A mix in interface for {@link FileScan}. This can be used to report metadata + * for a file based scan operator. This is currently used for supporting formatted + * explain. + */ Review comment: @maropu Will Add. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r418763290 ## File path: external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala ## @@ -65,4 +65,6 @@ case class AvroScan( } override def hashCode(): Int = super.hashCode() + + override def getMetaData(): Map[String, String] = super.metaData Review comment: @maropu I get a compile error that forces me to implement it here ? Let me know if you have any suggestion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r418450926 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala ## @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.internal.connector + +trait SupportsMetadata { + def getMetaData(): Map[String, String] +} Review comment: @maropu Don't see the need to make it a part of external V2 contract. We are using it for explain now.. so thought of keeping it internal just like we use the Logging trait. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r418450311 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala ## @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.internal.connector + +trait SupportsMetadata { + def getMetaData(): Map[String, String] +} Review comment: @maropu OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r418450404 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala ## @@ -46,6 +47,35 @@ trait DataSourceV2ScanExecBase extends LeafExecNode { Utils.redact(sqlContext.sessionState.conf.stringRedactionPattern, result) } + /** + * Shorthand for calling redactString() without specifying redacting rules + */ + protected def redact(text: String): String = { +Utils.redact(sqlContext.sessionState.conf.stringRedactionPattern, text) + } + + + override def verboseStringWithOperatorId(): String = { +val metaDataStr = if (scan.isInstanceOf[SupportsMetadata]) { + val scanWithExplainSupport: SupportsMetadata = scan.asInstanceOf[SupportsMetadata] + val metadataStr = scanWithExplainSupport.getMetaData.toSeq.sorted.filterNot { +case (_, value) if (value.isEmpty || value.equals("[]")) => true +case (_, _) => false + }.map { +case (key, value) => s"$key: ${redact(value)}" + } Review comment: @maropu Thanks. Looks much better :-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r418450253 ## File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala ## @@ -343,6 +343,74 @@ class ExplainSuite extends ExplainSuiteHelper with DisableAdaptiveExecutionSuite assert(getNormalizedExplain(df1, FormattedMode) === getNormalizedExplain(df2, FormattedMode)) } } + + test("Explain formatted output for scan operator for datasource V2") { +withTempDir { dir => + Seq("parquet", "orc", "csv", "json").foreach { format => +val basePath = dir.getCanonicalPath + "/" + format +val expected_plan_fragment = + s""" + |\\(1\\) BatchScan + |Output \\[2\\]: \\[value#x, id#x\\] + |DataFilters: \\[isnotnull\\(value#x\\), \\(value#x > 2\\)\\] + |Format: $format + |Location: InMemoryFileIndex\\[.*\\] + |PartitionFilters: \\[isnotnull\\(id#x\\), \\(id#x > 1\\)\\] + |PushedFilers: \\[.*\\(id\\), .*\\(value\\), .*\\(id,1\\), .*\\(value,2\\)\\] Review comment: @beliefer Thanks.. I have updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r418450311 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala ## @@ -0,0 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.internal.connector + +trait SupportsMetadata { + def getMetaData(): Map[String, String] +} Review comment: @maropu OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node
dilipbiswal commented on a change in pull request #28425: URL: https://github.com/apache/spark/pull/28425#discussion_r418449530 ## File path: external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala ## @@ -65,4 +65,8 @@ case class AvroScan( } override def hashCode(): Int = super.hashCode() + + override def getMetaData(): Map[String, String] = { +super.metaData ++ Map("Format" -> "avro") Review comment: @maropu Actually i had tried to test out avro. But i get the following error: ``` "Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide" ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org