[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-07-12 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r453272949



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala
##
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.internal.connector
+
+import org.apache.spark.annotation.Evolving

Review comment:
   @dongjoon-hyun Will do.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-07-12 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r453272871



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala
##
@@ -94,22 +96,27 @@ trait FileScan extends Scan with Batch with 
SupportsReportStatistics with Loggin
   override def hashCode(): Int = getClass.hashCode()
 
   override def description(): String = {
+val maxMetadataValueLength = 100
+val metadataStr = getMetaData().toSeq.sorted.map {
+  case (key, value) =>
+val redactedValue =
+  Utils.redact(sparkSession.sessionState.conf.stringRedactionPattern, 
value)
+key + ": " + StringUtils.abbreviate(redactedValue, 
maxMetadataValueLength)
+}.mkString(", ")
+s"${this.getClass.getSimpleName} $metadataStr"
+  }
+
+  override def getMetaData(): Map[String, String] = {
 val maxMetadataValueLength = 100
 val locationDesc =
   fileIndex.getClass.getSimpleName +
 Utils.buildLocationMetadata(fileIndex.rootPaths, 
maxMetadataValueLength)
-val metadata: Map[String, String] = Map(
+Map(
+  "Format" -> s"${this.getClass.getSimpleName.replace("Scan", 
"").toLowerCase(Locale.ROOT)}",

Review comment:
   @dongjoon-hyun Previously we used to print the scan node class name and 
in the new format, we print it in its own line. Please see the old output in 
the pr description. We have it printed as `ParquetScan`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-07-12 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r453272923



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala
##
@@ -94,22 +96,27 @@ trait FileScan extends Scan with Batch with 
SupportsReportStatistics with Loggin
   override def hashCode(): Int = getClass.hashCode()
 
   override def description(): String = {
+val maxMetadataValueLength = 100

Review comment:
   @dongjoon-hyun Sure. will make the change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-07-12 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r453272935



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala
##
@@ -43,7 +44,32 @@ trait DataSourceV2ScanExecBase extends LeafExecNode {
   override def simpleString(maxFields: Int): String = {
 val result =
   s"$nodeName${truncatedString(output, "[", ", ", "]", maxFields)} 
${scan.description()}"
-Utils.redact(sqlContext.sessionState.conf.stringRedactionPattern, result)
+redact(result)
+  }
+
+  /**
+   * Redact the sensitive information in the given string.

Review comment:
   @dongjoon-hyun Sure.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-07-12 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r453272871



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala
##
@@ -94,22 +96,27 @@ trait FileScan extends Scan with Batch with 
SupportsReportStatistics with Loggin
   override def hashCode(): Int = getClass.hashCode()
 
   override def description(): String = {
+val maxMetadataValueLength = 100
+val metadataStr = getMetaData().toSeq.sorted.map {
+  case (key, value) =>
+val redactedValue =
+  Utils.redact(sparkSession.sessionState.conf.stringRedactionPattern, 
value)
+key + ": " + StringUtils.abbreviate(redactedValue, 
maxMetadataValueLength)
+}.mkString(", ")
+s"${this.getClass.getSimpleName} $metadataStr"
+  }
+
+  override def getMetaData(): Map[String, String] = {
 val maxMetadataValueLength = 100
 val locationDesc =
   fileIndex.getClass.getSimpleName +
 Utils.buildLocationMetadata(fileIndex.rootPaths, 
maxMetadataValueLength)
-val metadata: Map[String, String] = Map(
+Map(
+  "Format" -> s"${this.getClass.getSimpleName.replace("Scan", 
"").toLowerCase(Locale.ROOT)}",

Review comment:
   @dongjoon-hyun Previously we used to print the scan node class name and 
in the new format, we print it in its own line. Please see the old output in 
the pr description.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-07-12 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r453272707



##
File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
##
@@ -360,6 +360,54 @@ class ExplainSuite extends ExplainSuiteHelper with 
DisableAdaptiveExecutionSuite
   }
 }
   }
+
+  test("Explain formatted output for scan operator for datasource V2") {
+withTempDir { dir =>
+  Seq("parquet", "orc", "csv", "json").foreach { fmt =>
+val basePath = dir.getCanonicalPath + "/" + fmt
+val pushFilterMaps = Map (
+  "parquet" ->
+"|PushedFilers: \\[.*\\(id\\), .*\\(value\\), .*\\(id,1\\), 
.*\\(value,2\\)\\]",
+  "orc" ->
+"|PushedFilers: \\[.*\\(id\\), .*\\(value\\), .*\\(id,1\\), 
.*\\(value,2\\)\\]",
+  "csv" ->
+"|PushedFilers: \\[IsNotNull\\(value\\), 
GreaterThan\\(value,2\\)\\]",
+  "json" ->
+"|remove_marker"
+)
+val expected_plan_fragment1 =
+  s"""
+ |\\(1\\) BatchScan
+ |Output \\[2\\]: \\[value#x, id#x\\]
+ |DataFilters: \\[isnotnull\\(value#x\\), \\(value#x > 2\\)\\]
+ |Format: $fmt
+ |Location: InMemoryFileIndex\\[.*\\]
+ |PartitionFilters: \\[isnotnull\\(id#x\\), \\(id#x > 1\\)\\]
+ ${pushFilterMaps.get(fmt).get}
+ |ReadSchema: struct\\
+ |""".stripMargin.replaceAll("\nremove_marker", "").trim

Review comment:
   @dongjoon-hyun Please see my response above.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-07-12 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r453272686



##
File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
##
@@ -360,6 +360,54 @@ class ExplainSuite extends ExplainSuiteHelper with 
DisableAdaptiveExecutionSuite
   }
 }
   }
+
+  test("Explain formatted output for scan operator for datasource V2") {
+withTempDir { dir =>
+  Seq("parquet", "orc", "csv", "json").foreach { fmt =>
+val basePath = dir.getCanonicalPath + "/" + fmt
+val pushFilterMaps = Map (
+  "parquet" ->
+"|PushedFilers: \\[.*\\(id\\), .*\\(value\\), .*\\(id,1\\), 
.*\\(value,2\\)\\]",
+  "orc" ->
+"|PushedFilers: \\[.*\\(id\\), .*\\(value\\), .*\\(id,1\\), 
.*\\(value,2\\)\\]",
+  "csv" ->
+"|PushedFilers: \\[IsNotNull\\(value\\), 
GreaterThan\\(value,2\\)\\]",
+  "json" ->
+"|remove_marker"

Review comment:
   @dongjoon-hyun I had tried and it didn't work for me. Perhaps there is a 
better way to do this. Basically, for JSON, i don't want a line printed for 
pushedFilters. Putting a `""` results in the following as expected output. Here 
i wanted to get rid of the empty line between `PartitionFilters` and 
`ReadSchema`
   ```
   \(1\) BatchScan
   Output \[2\]: \[value#x, id#x\]
   DataFilters: \[isnotnull\(value#x\), \(value#x > 2\)\]
   Format: json
   Location: InMemoryFileIndex\[.*\]
   PartitionFilters: \[isnotnull\(id#x\), \(id#x > 1\)\]
   
   ReadSchema: struct\
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-05 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r420550918



##
File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
##
@@ -343,6 +343,54 @@ class ExplainSuite extends ExplainSuiteHelper with 
DisableAdaptiveExecutionSuite
   assert(getNormalizedExplain(df1, FormattedMode) === 
getNormalizedExplain(df2, FormattedMode))
 }
   }
+
+  test("Explain formatted output for scan operator for datasource V2") {

Review comment:
   +1. Thank you.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-05 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r420496754



##
File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
##
@@ -343,6 +343,54 @@ class ExplainSuite extends ExplainSuiteHelper with 
DisableAdaptiveExecutionSuite
   assert(getNormalizedExplain(df1, FormattedMode) === 
getNormalizedExplain(df2, FormattedMode))
 }
   }
+
+  test("Explain formatted output for scan operator for datasource V2") {

Review comment:
   @cloud-fan Agree. Actually i had tried but could not get the V2 scan set 
up through SQL. Could you please tell me how to do it ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-05 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r420495962



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala
##
@@ -0,0 +1,29 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.internal.connector
+
+import org.apache.spark.annotation.Evolving
+
+/**
+ * A mix in interface for {@link FileScan}. This can be used to report metadata
+ * for a file based scan operator. This is currently used for supporting 
formatted
+ * explain.
+ */
+@Evolving

Review comment:
   @cloud-fan Thanks. will remove.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-01 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r418880068



##
File path: 
external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala
##
@@ -65,4 +65,8 @@ case class AvroScan(
   }
 
   override def hashCode(): Int = super.hashCode()
+
+  override def getMetaData(): Map[String, String] = {
+super.metaData ++ Map("Format" -> "avro")

Review comment:
   @maropu OK.. added a test.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-01 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r418820695



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala
##
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.internal.connector
+
+/**
+ * A mix in interface for {@link FileScan}. This can be used to report metadata
+ * for a file based scan operator. This is currently used for supporting 
formatted
+ * explain.
+ */

Review comment:
   @maropu On second thought, this is not an external interface, right ? So 
don't think we need any annotations here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-01 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r418791564



##
File path: 
external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala
##
@@ -65,4 +65,8 @@ case class AvroScan(
   }
 
   override def hashCode(): Int = super.hashCode()
+
+  override def getMetaData(): Map[String, String] = {
+super.metaData ++ Map("Format" -> "avro")

Review comment:
   @maropu Did you want a explain suite created in the avro external module 
? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-01 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r418791564



##
File path: 
external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala
##
@@ -65,4 +65,8 @@ case class AvroScan(
   }
 
   override def hashCode(): Int = super.hashCode()
+
+  override def getMetaData(): Map[String, String] = {
+super.metaData ++ Map("Format" -> "avro")

Review comment:
   @maropu Did you want a explain suite created in the avro module ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-01 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r418790252



##
File path: 
external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala
##
@@ -65,4 +65,6 @@ case class AvroScan(
   }
 
   override def hashCode(): Int = super.hashCode()
+
+  override def getMetaData(): Map[String, String] = super.metaData

Review comment:
   @maropu Thanks a lot :-) I will make a change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-01 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r418763357



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala
##
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.internal.connector
+
+/**
+ * A mix in interface for {@link FileScan}. This can be used to report metadata
+ * for a file based scan operator. This is currently used for supporting 
formatted
+ * explain.
+ */

Review comment:
   @maropu Will Add.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-01 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r418763290



##
File path: 
external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala
##
@@ -65,4 +65,6 @@ case class AvroScan(
   }
 
   override def hashCode(): Int = super.hashCode()
+
+  override def getMetaData(): Map[String, String] = super.metaData

Review comment:
   @maropu I get a compile error that forces me to implement it here ? Let 
me know if you have any suggestion.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-01 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r418450926



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala
##
@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.internal.connector
+
+trait SupportsMetadata {
+  def getMetaData(): Map[String, String]
+}

Review comment:
   @maropu Don't see the need to make it a part of external V2 contract. We 
are using it for explain now.. so thought of keeping it internal just like we 
use the Logging trait.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-01 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r418450311



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala
##
@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.internal.connector
+
+trait SupportsMetadata {
+  def getMetaData(): Map[String, String]
+}

Review comment:
   @maropu OK.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-01 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r418450404



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala
##
@@ -46,6 +47,35 @@ trait DataSourceV2ScanExecBase extends LeafExecNode {
 Utils.redact(sqlContext.sessionState.conf.stringRedactionPattern, result)
   }
 
+  /**
+   * Shorthand for calling redactString() without specifying redacting rules
+   */
+  protected def redact(text: String): String = {
+Utils.redact(sqlContext.sessionState.conf.stringRedactionPattern, text)
+  }
+
+
+  override def verboseStringWithOperatorId(): String = {
+val metaDataStr = if (scan.isInstanceOf[SupportsMetadata]) {
+  val scanWithExplainSupport: SupportsMetadata = 
scan.asInstanceOf[SupportsMetadata]
+  val metadataStr = 
scanWithExplainSupport.getMetaData.toSeq.sorted.filterNot {
+case (_, value) if (value.isEmpty || value.equals("[]")) => true
+case (_, _) => false
+  }.map {
+case (key, value) => s"$key: ${redact(value)}"
+  }

Review comment:
   @maropu Thanks. Looks much better :-)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-01 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r418450253



##
File path: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
##
@@ -343,6 +343,74 @@ class ExplainSuite extends ExplainSuiteHelper with 
DisableAdaptiveExecutionSuite
   assert(getNormalizedExplain(df1, FormattedMode) === 
getNormalizedExplain(df2, FormattedMode))
 }
   }
+
+  test("Explain formatted output for scan operator for datasource V2") {
+withTempDir { dir =>
+  Seq("parquet", "orc", "csv", "json").foreach { format =>
+val basePath = dir.getCanonicalPath + "/" + format
+val expected_plan_fragment =
+  s"""
+ |\\(1\\) BatchScan
+ |Output \\[2\\]: \\[value#x, id#x\\]
+ |DataFilters: \\[isnotnull\\(value#x\\), \\(value#x > 2\\)\\]
+ |Format: $format
+ |Location: InMemoryFileIndex\\[.*\\]
+ |PartitionFilters: \\[isnotnull\\(id#x\\), \\(id#x > 1\\)\\]
+ |PushedFilers: \\[.*\\(id\\), .*\\(value\\), .*\\(id,1\\), 
.*\\(value,2\\)\\]

Review comment:
   @beliefer Thanks.. I have updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-01 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r418450311



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala
##
@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.internal.connector
+
+trait SupportsMetadata {
+  def getMetaData(): Map[String, String]
+}

Review comment:
   @maropu OK.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on a change in pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-05-01 Thread GitBox


dilipbiswal commented on a change in pull request #28425:
URL: https://github.com/apache/spark/pull/28425#discussion_r418449530



##
File path: 
external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroScan.scala
##
@@ -65,4 +65,8 @@ case class AvroScan(
   }
 
   override def hashCode(): Int = super.hashCode()
+
+  override def getMetaData(): Map[String, String] = {
+super.metaData ++ Map("Format" -> "avro")

Review comment:
   @maropu Actually i had tried to test out avro. But i get the following 
error:
   ```
   "Failed to find data source: avro. Avro is built-in but external data source 
module since Spark 2.4. Please deploy the application as per the deployment 
section of "Apache Avro Data Source Guide"
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org