Re: [PR] [SPARK-46630][SQL] XML: Validate XML element name on write [spark]

via GitHub Mon, 08 Jan 2024 21:20:58 -0800


shujingyang-db commented on code in PR #44634:
URL: https://github.com/apache/spark/pull/44634#discussion_r1445632275



##########
docs/sql-data-sources-xml.md:
##########
@@ -94,7 +94,7 @@ Data source options of XML can be set via:
   <tr>
       <td><code>inferSchema</code></td>
       <td><code>true</code></td>
-      <td>If true, attempts to infer an appropriate type for each resulting 
DataFrame column. If false, all resulting columns are of string type. Default 
is true. XML built-in functions ignore this option.</td>
+      <td>If true, attempts to infer an appropriate type for each resulting 
DataFrame column. If false, all resulting columns are of string type. XML 
built-in functions ignore this option.</td>

Review Comment:
   It seems that we are not ignoring this option: 
https://src.dev.databricks.com/databricks/runtime/-/blob/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/XmlInferSchema.scala?L136



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/xml/XmlSuite.scala:
##########
@@ -2828,4 +2826,59 @@ class XmlSuite
       }
     }
   }
+
+  test("XML Validate Name") {
+    val data = Seq(Row("Random String"))
+
+    def checkValidation(fieldName: String,
+                        errorMsg: String,
+                        validateName: Boolean = true): Unit = {
+      val schema = StructType(Seq(StructField(fieldName, StringType)))
+      val df = spark.createDataFrame(data.asJava, schema)
+
+      withTempDir { dir =>
+        val path = dir.getCanonicalPath
+        validateName match {
+          case false =>
+            df.write
+              .option("rowTag", "ROW")
+              .option("validateName", false)
+              .option("declaration", "")
+              .option("indent", "")
+              .mode(SaveMode.Overwrite)
+              .xml(path)
+            // read file back and check its content
+            val xmlFile = new File(path).listFiles()
+              .filter(_.isFile)
+              .filter(_.getName.endsWith("xml")).head
+            val actualContent = 
Files.readString(xmlFile.toPath).replaceAll("\\n", "")
+            assert(actualContent ===
+              s"<${XmlOptions.DEFAULT_ROOT_TAG}><ROW>" +
+                s"<$fieldName>${data.head.getString(0)}</$fieldName>" +
+                s"</ROW></${XmlOptions.DEFAULT_ROOT_TAG}>")
+
+          case true =>
+            val e = intercept[SparkException] {
+              df.write

Review Comment:
   nit: Is the error expected to be thrown in the write path? If so, shall we 
remove the read path in this case?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46630][SQL] XML: Validate XML element name on write [spark]

Reply via email to