Re: [PR] [SPARK-57418][DOCS] Add singleVariantColumn option to CSV, JSON, and XML data source options tables [spark]

via GitHub Mon, 22 Jun 2026 06:08:57 -0700


MaxGekk commented on code in PR #56469:
URL: https://github.com/apache/spark/pull/56469#discussion_r3452435821



##########
docs/sql-data-sources-xml.md:
##########
@@ -105,6 +105,13 @@ Data source options of XML can be set via:
       <td>read</td>
   </tr>
 
+  <tr>
+      <td><code>singleVariantColumn</code></td>
+      <td>(none)</td>
+      <td>If specified, the entire XML record is parsed and stored as a single 
column of <code>VariantType</code> with the given column name, instead of being 
split into individual fields.</td>
+      <td>read</td>

Review Comment:
   For XML this is arguably `read/write`, not `read`. `StaxXmlGenerator.write` 
also consumes the option (StaxXmlGenerator.scala:138): when the schema is a 
single field matching `singleVariantColumn`, it unwraps the 
single-variant-column layer and writes the Variant value directly under 
`rowTag`. The XML table already marks options consumed by both the parser and 
the generator as `read/write` (e.g. the `rowTag` row), so `read/write` would be 
consistent here.
   
   The CSV and JSON rows are correctly `read` -- their generators 
(`UnivocityGenerator`, `JacksonGenerator`) don't reference the option. 
Non-blocking.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57418][DOCS] Add singleVariantColumn option to CSV, JSON, and XML data source options tables [spark]

Reply via email to