Just ran that code and it works fine, here is the output:
What version are you using?
val ctx = SQLContext.getOrCreate(sc)
val df = ctx.read.format("com.databricks.spark.xml").option("rowTag",
"book").load("file:///tmp/sample.xml")
df.printSchema()
root
|-- name: long (nullable = true)
|-- orderId: long (nullable = true)
|-- price: long (nullable = true)
On Sun, Feb 21, 2016 at 2:14 PM Prathamesh Dharangutte <
pratham.d...@gmail.com> wrote:
> This is the code I am using for parsing xml file:
>
>
>
> import org.apache.spark.{SparkConf,SparkContext}
> import org.apache.spark.sql.{DataFrame,SQLContext}
> import com.databricks.spark.xml
>
>
> object XmlProcessing {
>
> def main(args : Array[String]) = {
>
> val conf = new SparkConf()
> .setAppName("XmlProcessing")
> .setMaster("local")
>
> val sc = new SparkContext(conf)
> val sqlContext : SQLContext = new org.apache.spark.sql.SQLContext(sc)
>
> loadXMLdata(sqlContext)
>
> }
>
> def loadXMLdata(sqlContext : SQLContext) = {
>
> var df : DataFrame = null
>
> var newDf : DataFrame = null
>
> df = sqlContext.read
> .format("com.databricks.spark.xml")
> .option("rowTag","book")
> .load("/home/prathamsh/Workspace/Xml/datafiles/sample.xml")
>
> df.printSchema()
>
>
>
> }
>
> }
>
>
>
>
>
>
> On Sun, Feb 21, 2016 at 7:10 PM, Sebastian Piu <sebastian....@gmail.com>
> wrote:
>
>> Can you paste the code you are using?
>>
>> On Sun, 21 Feb 2016, 13:19 Prathamesh Dharangutte <pratham.d...@gmail.com>
>> wrote:
>>
>>> I am trying to parse xml file using spark-xml. But for some reason when
>>> i print schema it only shows root instead of the hierarchy. I am using
>>> sqlcontext to read the data. I am proceeding according to this video :
>>> https://www.youtube.com/watch?v=NemEp53yGbI
>>>
>>> The structure of xml file is somewhat like this:
>>>
>>> <books>
>>> <book>
>>> <name></name>
>>> <price></price>
>>> <orderId></orderId>
>>> </book>
>>> <book>
>>> //Some more data
>>> </book>
>>> </books>
>>>
>>> For some books there,are multiple orders i.e. large number of orders
>>> while for some it just occurs once as empty. I use the "rowtag" attribute
>>> as book. How do i proceed or is there any other way to tackle this
>>> problem? Help would be much appreciated. Thank you.
>>>
>>
>