MaxGekk commented on code in PR #56469:
URL: https://github.com/apache/spark/pull/56469#discussion_r3452435821
##########
docs/sql-data-sources-xml.md:
##########
@@ -105,6 +105,13 @@ Data source options of XML can be set via:
<td>read</td>
</tr>
+ <tr>
+ <td><code>singleVariantColumn</code></td>
+ <td>(none)</td>
+ <td>If specified, the entire XML record is parsed and stored as a single
column of <code>VariantType</code> with the given column name, instead of being
split into individual fields.</td>
+ <td>read</td>
Review Comment:
For XML this is arguably `read/write`, not `read`. `StaxXmlGenerator.write`
also consumes the option (StaxXmlGenerator.scala:138): when the schema is a
single field matching `singleVariantColumn`, it unwraps the
single-variant-column layer and writes the Variant value directly under
`rowTag`. The XML table already marks options consumed by both the parser and
the generator as `read/write` (e.g. the `rowTag` row), so `read/write` would be
consistent here.
The CSV and JSON rows are correctly `read` -- their generators
(`UnivocityGenerator`, `JacksonGenerator`) don't reference the option.
Non-blocking.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]