Hi Gutierrez ,
As suggest i tried with the code , but in the result.txt i got output only
header. Nothing else was printing.
After debugging i came to know that while parsing , there is no value.
The problem is in line given below which is bold. While putting SysOut i
found no value printing in this line.
String xmlContent = value.toString();
InputStream is = new ByteArrayInputStream(xmlContent.getBytes());
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try {
builder = factory.newDocumentBuilder();
* Document doc = builder.parse(is);* String
ed=doc.getDocumentElement().getNodeName();
out.write(ed.getBytes());
DTMNodeList list = (DTMNodeList) getNode("/Company/Employee",
doc,XPathConstants.NODESET);
When iam printing
out.write(xmlContent.getBytes):- the whole xml is being printed.
then i wrote for Sysout for list ,nothing printed.
out.write(ed.getBytes):- nothing is being printed.
Please suggest where i am going wrong. Please help to fix this.
Thanks in advance.
I have attached my code.Please review.
Mapper class:-
public class XmlTextMapper extends Mapper<LongWritable, Text, Text, Text> {
private static final XPathFactory xpathFactory =
XPathFactory.newInstance();
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String resultFileName = "/user/task/Sales/result.txt";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(resultFileName), conf);
FSDataOutputStream out = fs.create(new Path(resultFileName));
InputStream resultIS = new ByteArrayInputStream(new byte[0]);
String header = "id,name\n";
out.write(header.getBytes());
String xmlContent = value.toString();
InputStream is = new ByteArrayInputStream(xmlContent.getBytes());
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try {
builder = factory.newDocumentBuilder();
Document doc = builder.parse(is);
String ed=doc.getDocumentElement().getNodeName();
out.write(ed.getBytes());
DTMNodeList list = (DTMNodeList) getNode("/Company/Employee",
doc,XPathConstants.NODESET);
int size = list.getLength();
for (int i = 0; i < size; i++) {
Node node = list.item(i);
String line = "";
NodeList nodeList = node.getChildNodes();
int childNumber = nodeList.getLength();
for (int j = 0; j < childNumber; j++)
{
line += nodeList.item(j).getTextContent() + ",";
}
if (line.endsWith(","))
line = line.substring(0, line.length() - 1);
line += "\n";
out.write(line.getBytes());
}
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (XPathExpressionException e) {
e.printStackTrace();
}
IOUtils.copyBytes(resultIS, out, 4096, true);
out.close();
}
public static Object getNode(String xpathStr, Node node, QName
retunType)
throws XPathExpressionException {
XPath xpath = xpathFactory.newXPath();
return xpath.evaluate(xpathStr, node, retunType);
}
}
Main class
public class MainXml {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
if (args.length != 2) {
System.err
.println("Usage: XMLtoText <input path> <output path>");
System.exit(-1);
}
String output="/user/task/Sales/";
Job job = new Job(conf, "XML to Text");
job.setJarByClass(MainXml.class);
// job.setJobName("XML to Text");
FileInputFormat.addInputPath(job, new Path(args[0]));
// FileOutputFormat.setOutputPath(job, new Path(args[1]));
Path outPath = new Path(output);
FileOutputFormat.setOutputPath(job, outPath);
FileSystem dfs = FileSystem.get(outPath.toUri(), conf);
if (dfs.exists(outPath)) {
dfs.delete(outPath, true);
}
job.setMapperClass(XmlTextMapper.class);
job.setNumReduceTasks(0);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
My xml file
<Company>
<Employee>
<id>100</id>
<ename>ranjini</ename>
<dept>IT1</dept>
<sal>123456</sal>
<location>nextlevel1</location>
<Address>
<Home>Chennai1</Home>
<Office>Navallur1</Office>
</Address>
</Employee>
<Employee>
<id>1001</id>
<ename>ranjinikumar</ename>
<dept>IT</dept>
<sal>1234516</sal>
<location>nextlevel</location>
<Address>
<Home>Chennai</Home>
<Office>Navallur</Office>
</Address>
</Employee>
</Company>
Thanks in advance.
Ranjini
> On Mon, Jan 6, 2014 at 2:44 PM, Ranjini Rathinam
> <[email protected]>wrote:
>
>> Hi,
>>
>> Thanks a lot .
>>
>> Ranjini
>>
>> On Fri, Jan 3, 2014 at 10:40 PM, Diego Gutierrez <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> I suggest to use the XPath, this is a native java support for parse xml
>>> and json formats.
>>>
>>> For the main problem, like distcp command(
>>> http://hadoop.apache.org/docs/r0.19.0/distcp.pdf ) there is no need of
>>> a reduce function, because you can parse the xml input file and create the
>>> file you need in the map function.For example the following code reads an
>>> xml file in HDFS, parse it and create a new file ( "/result.txt" ) with the
>>> expected format:
>>> id,name
>>> 100,RR
>>>
>>>
>>> Mapper function:
>>>
>>> import java.io.ByteArrayInputStream;
>>> import java.io.IOException;
>>> import java.io.InputStream;
>>> import java.net.URI;
>>>
>>> import javax.xml.namespace.QName;
>>> import javax.xml.parsers.DocumentBuilder;
>>> import javax.xml.parsers.DocumentBuilderFactory;
>>> import javax.xml.parsers.ParserConfigurationException;
>>> import javax.xml.xpath.XPath;
>>> import javax.xml.xpath.XPathConstants;
>>> import javax.xml.xpath.XPathExpressionException;
>>> import javax.xml.xpath.XPathFactory;
>>>
>>> import org.apache.hadoop.conf.Configuration;
>>> import org.apache.hadoop.fs.FSDataOutputStream;
>>> import org.apache.hadoop.fs.FileSystem;
>>> import org.apache.hadoop.fs.Path;
>>> import org.apache.hadoop.io.IOUtils;
>>> import org.apache.hadoop.io.LongWritable;
>>> import org.apache.hadoop.io.Text;
>>> import org.apache.hadoop.mapreduce.Mapper;
>>> import org.w3c.dom.Document;
>>> import org.w3c.dom.Node;
>>> import org.w3c.dom.NodeList;
>>> import org.xml.sax.SAXException;
>>>
>>> import com.sun.org.apache.xml.internal.dtm.ref.DTMNodeList;
>>>
>>> public class XmlToTextMapper extends Mapper<LongWritable, Text, Text,
>>> Text> {
>>>
>>> private static final XPathFactory xpathFactory =
>>> XPathFactory.newInstance();
>>>
>>> @Override
>>> public void map(LongWritable key, Text value, Context context)
>>> throws IOException, InterruptedException {
>>>
>>> String resultFileName = "/result.txt";
>>>
>>>
>>> Configuration conf = new Configuration();
>>> FileSystem fs = FileSystem.get(URI.create(resultFileName), conf);
>>> FSDataOutputStream out = fs.create(new Path(resultFileName));
>>>
>>> InputStream resultIS = new ByteArrayInputStream(new byte[0]);
>>>
>>> String header = "id,name\n";
>>> out.write(header.getBytes());
>>>
>>> String xmlContent = value.toString();
>>> InputStream is = new ByteArrayInputStream(xmlContent.getBytes());
>>> DocumentBuilderFactory factory =
>>> DocumentBuilderFactory.newInstance();
>>> DocumentBuilder builder;
>>> try {
>>> builder = factory.newDocumentBuilder();
>>> Document doc = builder.parse(is);
>>> DTMNodeList list = (DTMNodeList) getNode("/main/data", doc,
>>> XPathConstants.NODESET);
>>>
>>> int size = list.getLength();
>>> for (int i = 0; i < size; i++) {
>>> Node node = list.item(i);
>>> String line = "";
>>> NodeList nodeList = node.getChildNodes();
>>> int childNumber = nodeList.getLength();
>>> for (int j = 0; j < childNumber; j++) {
>>> line += nodeList.item(j).getTextContent() + ",";
>>> }
>>> if (line.endsWith(","))
>>> line = line.substring(0, line.length() - 1);
>>> line += "\n";
>>> out.write(line.getBytes());
>>>
>>> }
>>>
>>> } catch (ParserConfigurationException e) {
>>> MyLogguer.log("error: " + e.getMessage());
>>> e.printStackTrace();
>>> } catch (SAXException e) {
>>> MyLogguer.log("error: " + e.getMessage());
>>> e.printStackTrace();
>>> } catch (XPathExpressionException e) {
>>> MyLogguer.log("error: " + e.getMessage());
>>> e.printStackTrace();
>>> }
>>>
>>> IOUtils.copyBytes(resultIS, out, 4096, true);
>>> out.close();
>>> }
>>>
>>> public static Object getNode(String xpathStr, Node node, QName
>>> retunType)
>>> throws XPathExpressionException {
>>> XPath xpath = xpathFactory.newXPath();
>>> return xpath.evaluate(xpathStr, node, retunType);
>>> }
>>> }
>>>
>>>
>>>
>>> --------------------------------------
>>> Main class:
>>>
>>>
>>> public class Main {
>>>
>>> public static void main(String[] args) throws Exception {
>>>
>>> if (args.length != 2) {
>>> System.err
>>> .println("Usage: XMLtoText <input path> <output
>>> path>");
>>> System.exit(-1);
>>> }
>>>
>>> Job job = new Job();
>>> job.setJarByClass(Main.class);
>>> job.setJobName("XML to Text");
>>> FileInputFormat.addInputPath(job, new Path(args[0]));
>>> FileOutputFormat.setOutputPath(job, new Path(args[1]));
>>>
>>> job.setMapperClass(XmlToTextMapper.class);
>>> job.setNumReduceTasks(0);
>>> job.setMapOutputKeyClass(Text.class);
>>> job.setMapOutputValueClass(Text.class);
>>> System.exit(job.waitForCompletion(true) ? 0 : 1);
>>>
>>> }
>>> }
>>>
>>> To execute the job you can use :
>>>
>>> bin/hadoop Main /data.xml /output.
>>>
>>>
>>> Then you can use this to see result.txt file:
>>>
>>> hadoop fs -cat /result.txt
>>>
>>>
>>> I'm using this xml as input:
>>>
>>>
>>> <Comp><Emp><id>1</id><name>NameA</name></data><data><id>2</id><name>NameB</name></Emp></Comp>
>>>
>>> and the content in result.txt is like this:
>>>
>>> id,name
>>> 1,NameA
>>> 2,NameB
>>>
>>>
>>> Hope this helps.
>>>
>>>
>>> 2014/1/3 Ranjini Rathinam <[email protected]>
>>>
>>>> Hi,
>>>>
>>>> Need to convert XML into text using mapreduce.
>>>>
>>>> I have used DOM and SAX parser.
>>>>
>>>> After using SAX Builder in mapper class. the child node act as root
>>>> Element.
>>>>
>>>> While seeing in Sys out i found thar root element is taking the child
>>>> element and printing.
>>>>
>>>> For Eg,
>>>>
>>>> <Comp><Emp><id>100</id><name>RR</name></Emp></Comp>
>>>> when this xml is passed in mapper , in sys out printing the root element
>>>>
>>>> I am getting the the root element as
>>>>
>>>> <id>
>>>> <name>
>>>>
>>>> Please suggest and help to fix this.
>>>>
>>>> I need to convert the xml into text using mapreduce code. Please
>>>> provide with example.
>>>>
>>>> Required output is
>>>>
>>>> id,name
>>>> 100,RR
>>>>
>>>> Please help.
>>>>
>>>> Thanks in advance,
>>>> Ranjini R
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>