wypoon commented on a change in pull request #26895: [SPARK-17398][SQL] Fix
ClassCastException when querying partitioned JSON table
URL: https://github.com/apache/spark/pull/26895#discussion_r358004041
##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
##########
@@ -132,7 +132,9 @@ class HadoopTableReader(
val deserializedHadoopRDD = hadoopRDD.mapPartitions { iter =>
val hconf = broadcastedHadoopConf.value.value
val deserializer = deserializerClass.getConstructor().newInstance()
- deserializer.initialize(hconf, localTableDesc.getProperties)
+ DeserializerLock.synchronized {
Review comment:
On the question of other callers needing to use this global lock:
AFAIK, the only reported problem in Spark with the
`HCatRecordObjectInspectorFactory` bug is in reading partitioned `JsonSerDe`
tables. In `HadoopTableReader#makeRDDForTable`, we can even do without using
the lock; we only really need to use the lock in
`HadoopTableReader#makeRDDForPartitionedTable`.
From my search, there are two other places in Spark where we call
`Deserializer#initialize`:
1. `HiveTableScanExec`, in initializing a `HadoopTableReader` instance,
before `HadoopTableReader#makeRDDForTable` or
`HadoopTableReader#makeRDDForPartitionedTable` even get called in `doExec`.
2. `HiveScriptIOSchema`.
I don't think we need to use the lock for 1.
I don't know about 2., but if no problem has been reported due to the race,
we can also leave it alone.
In other words, I'm not proposing we guard against the race in the Hive bug
everywhere, just in this known case.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]