RE: XPATH outOfMemory while evaluating large XML files.

Enric Jaen Mon, 16 Jan 2006 03:11:33 -0800


On Mon, 16 Jan 2006 [EMAIL PROTECTED] wrote:


> Hello Enric,
>
> depending on the needs of your application it may be an alternative
> approach to combine a Stax-compliant (JSR-173) parser like woodstox (see
> http://woodstox.codehaus.org/) and XPath
>
> 1.   Parse the tree with woodstox.
>
> 2.   For small subtrees build a (J)Dom-Tree.
>
> 3.   Use XPath to select nodes from the subtree.

 I have not clear something: How can I tell Xalan-XPATH to use the tree 
generated by woodstrox ? I thought that Xalan-XPATH creates its own DOM tree 
from InputSource to select the nodes.

Thanks for any clarification on this..

-Enric



>
> We used this approach to semantically compare two BMEcat messages
> (see http://www.bmecat.org). It has been tested the comparison of
> two 900 MByte files.
>
> Regards,
>
>      Andreas
>
>
> >>On Fri, 13 Jan 2006, Karr, David wrote:
>
> >> How many nodes is your Xpath expression returning?  If you're
> >> essentially returning the vast majority of the nodes in the file, then
> >> you're probably using the wrong tool for this job.  That is, don't use
> >> Xpath for this.
>
> >The curious think is that my XPath expression doesn't return any node. So
> I guess XPath needs to build a DOM tree to do >its job, even if returns
> nothing (could someone confirm this?)
>
> >Thanks to those who provided pointers to other tools.  After googling a
> bit, I found a commercial product (
> http://www.eweek.com/article2/0,1759,1780265,00.asp) where they say can
> process a 1TB file by doing streaming instead of >DOM. I found also
> 'exist', an open source native XML database (http://exist.sourceforge.net/)
> where they say can work with >documents with  up to 2^63 nodes.
>
> >>Regards,
>
> >>-Enric
>
> >>
> >> > -----Original Message-----
> >> > From: Enric Jaen [mailto:[EMAIL PROTECTED]
> >> >
> >> > >If you think there is bugs in the impl of XPath, please open a bug
> >> > >report at https://issues.apache.org/jira/secure/Dashboard.jspa
> >> > >and attach a valid test case that can demonstrate the problem.
> >> >
> >> > I don't think is a bug. I rather think that XPATH builds a
> >> > DOM tree when returns a NodeSet (please correct me if I am
> >> > wrong). When the file is about 6MB the java memory crashes.
> >> > Two workarounds I have tried are to increase heap and divide
> >> > the xml file. Both solutions bring the evaluation limit
> >> > farther, but there is still a limit.
> >> >
> >> > I think it would be possible an XPATH implementation for SAX,
> >> > such as Sequential XPATH, but I haven't gone deelply into this.
> >> >
> >> > -Enric
> >> >
> >> > On Fri, 13 Jan 2006, Enric Jaen wrote:
> >> >
> >> > > Hello, I got an OutOfMemory when I evaluate an XPATH expression a
> >> > > large XML file.
> >> > >
> >> > > I am using this code:
> >> > >
> >> > >          XPathFactory factory = XPathFactory.newInstance();
> >> > >         XPath xpath = factory.newXPath();
> >> > >         InputSource entities_is=new InputSource("file.xml");
> >> > >         XPathExpression xpathExpr = xpath.compile(expr);
> >> > >         return (NodeList)xpathExpr.evaluate(entities_is,
> >> > > XPathConstants.NODESET);
> >> > >
> >> > > I am not an expert of XPATH development, therefore I'd
> >> > appreciate if
> >> > > someone could give me an explanation of why is this error
> >> > happening.
> >> > > Is this because XPATH uses DOM internaly? If so, is there any
> >> > > implementation for XPATH for SAX? Is there any other
> >> > > explanation/solution?
> >> > >
> >> > > Thanks in advance for your help.
> >> > > -Enric
> >>
> >>
>
>
>
>

RE: XPATH outOfMemory while evaluating large XML files.

Reply via email to