AngersZhuuuu commented on pull request #29739:
URL: https://github.com/apache/spark/pull/29739#issuecomment-692410347
For native file source, such as FileSourceScanExec, it will collect metadata
of needed information:
```
override lazy val metadata: Map[String, String] = {
def seqToString(seq: Seq[Any]) = seq.mkString("[", ", ", "]")
val location = relation.location
val locationDesc =
location.getClass.getSimpleName +
Utils.buildLocationMetadata(location.rootPaths,
maxMetadataValueLength)
val metadata =
Map(
"Format" -> relation.fileFormat.toString,
"ReadSchema" -> requiredSchema.catalogString,
"Batched" -> supportsColumnar.toString,
"PartitionFilters" -> seqToString(partitionFilters),
"PushedFilters" -> seqToString(pushedDownFilters),
"DataFilters" -> seqToString(dataFilters),
"Location" -> locationDesc)
val withSelectedBucketsCount = relation.bucketSpec.map { spec =>
val numSelectedBuckets = optionalBucketSet.map { b =>
b.cardinality()
} getOrElse {
spec.numBuckets
}
metadata + ("SelectedBucketsCount" ->
(s"$numSelectedBuckets out of ${spec.numBuckets}" +
optionalNumCoalescedBuckets.map { b => s" (Coalesced to
$b)"}.getOrElse("")))
} getOrElse {
metadata
}
withSelectedBucketsCount
}
```
Then in his parent class `DataSourceScanExec`, it will construct
`simpleString` with `metadata`. In `simpleString`
```
override def simpleString(maxFields: Int): String = {
val metadataEntries = metadata.toSeq.sorted.map {
case (key, value) =>
key + ": " + StringUtils.abbreviate(redact(value),
maxMetadataValueLength)
}
val metadataStr = truncatedString(metadataEntries, " ", ", ", "",
maxFields)
redact(
s"$nodeNamePrefix$nodeName${truncatedString(output, "[", ",", "]",
maxFields)}$metadataStr")
}
```
each info will be abbreviated to length of 100. So FileSourceScan's
FileIndex message will be hided because of length of 100.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]