gaogaotiantian commented on code in PR #56600:
URL: https://github.com/apache/spark/pull/56600#discussion_r3444567296
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlGenerator.scala:
##########
@@ -72,7 +72,13 @@ class StaxXmlGenerator(
private val binaryFormatter = ToStringBase.getBinaryFormatter
private val gen = {
- val factory = XMLOutputFactory.newInstance()
+ // Instantiate the Woodstox factory directly from the shaded Hadoop
classes instead of
+ // using XMLOutputFactory.newInstance(). The latter resolves an
implementation via the
+ // service-loader mechanism, which could pick up a different (unshaded)
StAX provider on the
+ // classpath. Such a provider would not understand the shaded
WstxOutputProperties keys set
+ // below and would throw IllegalArgumentException. Constructing the shaded
factory directly
+ // guarantees the properties and the implementation always match.
+ val factory = new WstxOutputFactory()
// to_xml disables structure validation to allow multiple root tags
factory.setProperty(WstxOutputProperties.P_OUTPUT_VALIDATE_STRUCTURE,
validateStructure)
factory.setProperty(WstxOutputProperties.P_OUTPUT_VALIDATE_NAMES,
options.validateName)
Review Comment:
That's a behavior change and will not solve the racing issue. If another
library exists, we can still use either factory based on some random
environment factors. We will have unstable behavior switching between the two
libraries. We need a consistent result.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]