tagatac opened a new pull request, #54467:
URL: https://github.com/apache/spark/pull/54467
### What changes were proposed in this pull request?
- Add `serdeName` to
`org.apache.spark.sql.catalyst.catalog.CatalogStorageFormat`.
- Include this field when responding to `DESCRIBE EXTENDED` queries.
- Handle this field when parsing table details from the Hive Metastore API
and when writing back to it.
### Why are the changes needed?
- This field is included in
[`SerDeInfo`](https://github.com/apache/hive/blob/5160d3af392248255f68e41e1e0557eae4d95273/metastore/if/hive_metastore.thrift#L260)
returned by the Hive Metastore API.
- Its omission in the internal representation of Hive tables makes it
cumbersome to consume this field.
Before this change:
```
private def hasExampleSerdeName(h: HiveTableRelation): Boolean = {
val key = (h.tableMeta.database, h.tableMeta.identifier.table)
serdeNameCache.computeIfAbsent(key, _ => {
val catalog = session.sharedState.externalCatalog.unwrapped
.asInstanceOf[HiveExternalCatalog]
catalog.client.getRawHiveTableOption(key._1, key._2).exists {
rawHiveTable =>
// Use reflection to access SerDeInfo.name across classloader
boundaries,
// so that this works even when spark.sql.hive.metastore.jars is
configured.
val rawTable = rawHiveTable.rawTable
val tTable =
rawTable.getClass.getMethod("getTTable").invoke(rawTable)
val sd = tTable.getClass.getMethod("getSd").invoke(tTable)
val serdeInfo = sd.getClass.getMethod("getSerdeInfo").invoke(sd)
val name = serdeInfo.getClass.getMethod("getName").invoke(serdeInfo)
name == ExampleSerdeInfoName
}
})
}
```
After this change:
```
private def hasExampleSerdeName(h: HiveTableRelation): Boolean = {
h.tableMeta.storage.serdeName.contains(ExampleSerdeInfoName)
}
```
### Does this PR introduce _any_ user-facing change?
Yes, developers can now access `CatalogStorageFormat.serdeName`,
representing the Hive Metastore API field `SerDeInfo.name`, when interacting
with Spark representations of Hive tables.
### How was this patch tested?
- Unit test added.
- `DESCRIBE EXTENDED` run via `spark-shell` returns "Serde Name" properly
for a Hive table with a Serde name:
```
scala> spark.sql("CREATE TABLE t (d1 DECIMAL(10,3), d2 STRING) STORED AS
TEXTFILE;").show()
++
||
++
++
scala> spark.sql("DESCRIBE EXTENDED t;").show()
+--------------------+--------------------+-------+
| col_name| data_type|comment|
+--------------------+--------------------+-------+
| d1| decimal(10,3)| NULL|
| d2| string| NULL|
| | | |
|# Detailed Table ...| | |
...
| Location|file:/local/home/...| |
| Serde Library|org.apache.hadoop...| |
...
+--------------------+--------------------+-------+
scala> import org.apache.spark.sql.catalyst.TableIdentifier
import org.apache.spark.sql.catalyst.TableIdentifier
scala> val hiveTable =
spark.sessionState.catalog.getTableMetadata(TableIdentifier("t",
Some("default")))
val hiveTable: org.apache.spark.sql.catalyst.catalog.CatalogTable =
...
scala> val updated = hiveTable.copy(storage =
hiveTable.storage.copy(serdeName = Some("testSerdeName")))
val updated: org.apache.spark.sql.catalyst.catalog.CatalogTable =
...
scala> spark.sessionState.catalog.alterTable(updated)
scala> spark.sql("DESCRIBE EXTENDED t;").show()
+--------------------+--------------------+-------+
| col_name| data_type|comment|
+--------------------+--------------------+-------+
| d1| decimal(10,3)| NULL|
| d2| string| NULL|
| | | |
|# Detailed Table ...| | |
...
| Location|file:/local/home/...| |
| Serde Name| testSerdeName| |
| Serde Library|org.apache.hadoop...| |
...
+--------------------+--------------------+-------+
```
### Was this patch authored or co-authored using generative AI tooling?
No.
*This contribution is my original work, and I license the work to the Spark
project under the project’s open source license.*
Cc: @[email protected] @asl3
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]