Hi, I used XMLInputFormat , in that i used Record Reader class. Same as u have given
THe whole xml is been split into part For Eg: consider the below xml <Comp><Emp><id></id><name></name></Emp><Emp><id></id><name></name></Emp></Comp> after using the RecordReader class the xml output is <Emp><id></id><name></name></Emp><Emp><id></id><name></name></Emp> the starting and end tag is Emp. it does not convert into text. Please suggest and help. Thanks in advance Ranjini On Fri, Jan 3, 2014 at 11:22 AM, Azuryy Yu <[email protected]> wrote: > Hi, > > you can use org.apache.hadoop.streaming.StreamInputFormat using map > reduce to convert XML to text. > > such as your xml like this: > <xml> > <name>lll</name> > </xml> > > you need to specify stream.recordreader.begin and stream.recordreader.end > in the Configuration: > Configuration conf = new Configuration(); > conf.set("stream.recordreader.begin", "<xml>"); > conf.set("stream.recordreader.end", "</xml>"); > > > > > > > On Fri, Jan 3, 2014 at 1:16 PM, Ranjini Rathinam > <[email protected]>wrote: > >> Hi, >> >> Need to convert XML into text using mapreduce. >> >> I have used DOM and SAX parser. >> >> After using SAX Builder in mapper class. the child node act as root >> Element. >> >> While seeing in Sys out i found thar root element is taking the child >> element and printing. >> >> For Eg, >> >> <Comp><Emp><id>100</id><name>RR</name></Emp></Comp> >> when this xml is passed in mapper , in sys out printing the root element >> >> I am getting the the root element as >> >> <id> >> <name> >> >> Please suggest and help to fix this. >> >> I need to convert the xml into text using mapreduce code. Please provide >> with example. >> >> Required output is >> >> id,name >> 100,RR >> >> Please help. >> >> Thanks in advance, >> Ranjini R >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >
