Re: [PR] [SPARK-45912][SQL] Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility [spark]

via GitHub Tue, 14 Nov 2023 18:26:52 -0800


HyukjinKwon commented on code in PR #43789:
URL: https://github.com/apache/spark/pull/43789#discussion_r1393556057



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XSDToSchema.scala:
##########
@@ -35,34 +38,32 @@ import org.apache.spark.sql.types._
 object XSDToSchema {
 
   /**
-   * Reads a schema from an XSD file.
+   * Reads a schema from an XSD path.
    * Note that if the schema consists of one complex parent type which you 
want to use as
    * the row tag schema, then you will need to extract the schema of the 
single resulting
    * struct in the resulting StructType, and use its StructType as your schema.
    *
-   * @param xsdFile XSD file
+   * @param xsdPath XSD path
    * @return Spark-compatible schema
    */
-  def read(xsdFile: File): StructType = {
+  def read(xsdPath: Path): StructType = {
+    val in = try {
+      // Handle case where file exists as specified
+      val fs = xsdPath.getFileSystem(SparkHadoopUtil.get.conf)
+      fs.open(xsdPath)
+    } catch {
+      case _: Throwable =>
+        // Handle case where it was added with sc.addFile
+        val addFileUrl = SparkFiles.get(xsdPath.toString)

Review Comment:
   When they are added by different scheme, e.g., `hdfs://my_xsd_file`, they 
are downloaded into local so `SparkFiles.get(xsdPath.toString)` should always 
the local file path. so you won't need to call hadoop FS.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-45912][SQL] Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility [spark]

Reply via email to