RE: XPATH outOfMemory while evaluating large XML files.

Andreas . Rulle Mon, 16 Jan 2006 00:48:26 -0800

Hello Enric,

depending on the needs of your application it may be an alternative
approach to combine a Stax-compliant (JSR-173) parser like woodstox (see
http://woodstox.codehaus.org/) and XPath


1.   Parse the tree with woodstox.

2.   For small subtrees build a (J)Dom-Tree.

3.   Use XPath to select nodes from the subtree.

We used this approach to semantically compare two BMEcat messages
(see http://www.bmecat.org). It has been tested the comparison of
two 900 MByte files.

Regards,

     Andreas


>>On Fri, 13 Jan 2006, Karr, David wrote:

>> How many nodes is your Xpath expression returning?  If you're
>> essentially returning the vast majority of the nodes in the file, then
>> you're probably using the wrong tool for this job.  That is, don't use
>> Xpath for this.

>The curious think is that my XPath expression doesn't return any node. So
I guess XPath needs to build a DOM tree to do >its job, even if returns
nothing (could someone confirm this?)

>Thanks to those who provided pointers to other tools.  After googling a
bit, I found a commercial product (
http://www.eweek.com/article2/0,1759,1780265,00.asp) where they say can
process a 1TB file by doing streaming instead of >DOM. I found also
'exist', an open source native XML database (http://exist.sourceforge.net/)
where they say can work with >documents with  up to 2^63 nodes.

>>Regards,

>>-Enric

>>
>> > -----Original Message-----
>> > From: Enric Jaen [mailto:[EMAIL PROTECTED]
>> >
>> > >If you think there is bugs in the impl of XPath, please open a bug
>> > >report at https://issues.apache.org/jira/secure/Dashboard.jspa
>> > >and attach a valid test case that can demonstrate the problem.
>> >
>> > I don't think is a bug. I rather think that XPATH builds a
>> > DOM tree when returns a NodeSet (please correct me if I am
>> > wrong). When the file is about 6MB the java memory crashes.
>> > Two workarounds I have tried are to increase heap and divide
>> > the xml file. Both solutions bring the evaluation limit
>> > farther, but there is still a limit.
>> >
>> > I think it would be possible an XPATH implementation for SAX,
>> > such as Sequential XPATH, but I haven't gone deelply into this.
>> >
>> > -Enric
>> >
>> > On Fri, 13 Jan 2006, Enric Jaen wrote:
>> >
>> > > Hello, I got an OutOfMemory when I evaluate an XPATH expression a
>> > > large XML file.
>> > >
>> > > I am using this code:
>> > >
>> > >          XPathFactory factory = XPathFactory.newInstance();
>> > >         XPath xpath = factory.newXPath();
>> > >         InputSource entities_is=new InputSource("file.xml");
>> > >         XPathExpression xpathExpr = xpath.compile(expr);
>> > >         return (NodeList)xpathExpr.evaluate(entities_is,
>> > > XPathConstants.NODESET);
>> > >
>> > > I am not an expert of XPATH development, therefore I'd
>> > appreciate if
>> > > someone could give me an explanation of why is this error
>> > happening.
>> > > Is this because XPATH uses DOM internaly? If so, is there any
>> > > implementation for XPATH for SAX? Is there any other
>> > > explanation/solution?
>> > >
>> > > Thanks in advance for your help.
>> > > -Enric
>>
>>

RE: XPATH outOfMemory while evaluating large XML files.

Reply via email to