Hello Experts, I am using Spark XML package to parse the XML. Below exception is being thrown when trying to parse a tag which exist in arrays of array depth. i.e. in this case subordinate_clerk.xxxx .duty.name
With below sample XML, issue is reproducible: <emplist> <emp> <manager> <id>1</id> <name>mgr1</name> <dateOfJoin>2005-07-31</dateOfJoin> <subordinates> <subordinate_clerk> <cid>2</cid> <cname>clerk2</cname> <dateOfJoin>2005-07-31</dateOfJoin> </subordinate_clerk> <subordinate_clerk> <cid>3</cid> <cname>clerk3</cname> <dateOfJoin>2005-07-31</dateOfJoin> </subordinate_clerk> </subordinates> </manager> </emp> <emp> <manager> <id>11</id> <name>mgr11</name> <subordinates> <subordinate_clerk> <cid>12</cid> <cname>clerk12</cname> <duties> <duty> <name>first duty</name> </duty> <duty> <name>second duty</name> </duty> </duties> </subordinate_clerk> </subordinates> </manager> </emp> </emplist> scala> df.select( "manager.subordinates.subordinate_clerk.duties.duty.name").show Exception is: org.apache.spark.sql.AnalysisException: cannot resolve 'manager.subordinates.subordinate_clerk.duties.duty[name]' due to data type mismatch: argument 2 requires integral type, however, 'name' is of string type.; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:65) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:332) ... more scala> df.printSchema root |-- manager: struct (nullable = true) | |-- dateOfJoin: string (nullable = true) | |-- id: long (nullable = true) | |-- name: string (nullable = true) | |-- subordinates: struct (nullable = true) | | |-- subordinate_clerk: array (nullable = true) | | | |-- element: struct (containsNull = true) | | | | |-- cid: long (nullable = true) | | | | |-- cname: string (nullable = true) | | | | |-- dateOfJoin: string (nullable = true) | | | | |-- duties: struct (nullable = true) | | | | | |-- duty: array (nullable = true) | | | | | | |-- element: struct (containsNull = true) | | | | | | | |-- name: string (nullable = true) Versions info: Spark - 1.6.0 Scala - 2.10.5 Spark XML - com.databricks:spark-xml_2.10:0.3.3 Please let me know if there is a solution or workaround for this? Thanks, Sreekanth