RE: XPATH outOfMemory while evaluating large XML files.

Enric Jaen Mon, 16 Jan 2006 07:08:44 -0800


On Mon, 16 Jan 2006 [EMAIL PROTECTED] wrote:


Hello Andreas,
>
> for a brief introduction into StAX you could look at "An introduction into
> StAX"
> by Harold, E. R., see "http://www.xml.com/pub/a/2003/09/17/stax.html";.
>
> StAX does not build a tree, but you can build the Dom-tree from the StAX
> events.
>
> A very simple, but runtime consuming approach would be to
>
> * add the events from the StAX parser into an XMLEventWriter for each
> subtree,
> * produce an XML string s_subtree for each subtree and
> * build a Dom-tree t_subtree from this string s_subtree.
>
> This Dom-tree t_subtree can be the input for Xalan-XPATH.

I see, divide and conquer.. The large XML file is divided into smaller DOM 
subtrees using woodstox (or any other pull parser) and then each subtree is 
evaluated with XPATH individually. I guess this will work provided that the 
subtree fits inside the Java heap
I understand now! Thanks a lot for your help :)

Regards,

-Enric

>
> It is much faster to build the Dom-tree directly from the StAX parser
> events.
>
> Perhaps you could try stax2dom, see https://stax2dom.dev.java.net/
> for a direct StAX to DOM mapping for the subtrees or find some ideas
> at http://woodstox.codehaus.org/StaxMisc.
>
> Regards
>
>      Andreas
>
> >On Mon, 16 Jan 2006 [EMAIL PROTECTED] wrote:
>
> > Hello Enric,
> >
> > depending on the needs of your application it may be an alternative
> > approach to combine a Stax-compliant (JSR-173) parser like woodstox (see
> > http://woodstox.codehaus.org/) and XPath
> >
> > 1.   Parse the tree with woodstox.
> >
> > 2.   For small subtrees build a (J)Dom-Tree.
> >
> > 3.   Use XPath to select nodes from the subtree.
>
>  >I have not clear something: How can I tell Xalan-XPATH to use the tree
> generated by woodstrox ? I thought that >Xalan-XPATH creates its own DOM
> tree from InputSource to select the nodes.
>
> >Thanks for any clarification on this..
>
> >-Enric
>
>
>
> >
> > We used this approach to semantically compare two BMEcat messages
> > (see http://www.bmecat.org). It has been tested the comparison of
> > two 900 MByte files.
> >
> > Regards,
> >
> >      Andreas
> >
> >
> > >>On Fri, 13 Jan 2006, Karr, David wrote:
> >
> > >> How many nodes is your Xpath expression returning?  If you're
> > >> essentially returning the vast majority of the nodes in the file, then
> > >> you're probably using the wrong tool for this job.  That is, don't use
> > >> Xpath for this.
> >
> > >The curious think is that my XPath expression doesn't return any node.
> So
> > I guess XPath needs to build a DOM tree to do >its job, even if returns
> > nothing (could someone confirm this?)
> >
> > >Thanks to those who provided pointers to other tools.  After googling a
> > bit, I found a commercial product (
> > http://www.eweek.com/article2/0,1759,1780265,00.asp) where they say can
> > process a 1TB file by doing streaming instead of >DOM. I found also
> > 'exist', an open source native XML database (
> http://exist.sourceforge.net/)
> > where they say can work with >documents with  up to 2^63 nodes.
> >
> > >>Regards,
> >
> > >>-Enric
> >
> > >>
> > >> > -----Original Message-----
> > >> > From: Enric Jaen [mailto:[EMAIL PROTECTED]
> > >> >
> > >> > >If you think there is bugs in the impl of XPath, please open a bug
> > >> > >report at https://issues.apache.org/jira/secure/Dashboard.jspa
> > >> > >and attach a valid test case that can demonstrate the problem.
> > >> >
> > >> > I don't think is a bug. I rather think that XPATH builds a
> > >> > DOM tree when returns a NodeSet (please correct me if I am
> > >> > wrong). When the file is about 6MB the java memory crashes.
> > >> > Two workarounds I have tried are to increase heap and divide
> > >> > the xml file. Both solutions bring the evaluation limit
> > >> > farther, but there is still a limit.
> > >> >
> > >> > I think it would be possible an XPATH implementation for SAX,
> > >> > such as Sequential XPATH, but I haven't gone deelply into this.
> > >> >
> > >> > -Enric
> > >> >
> > >> > On Fri, 13 Jan 2006, Enric Jaen wrote:
> > >> >
> > >> > > Hello, I got an OutOfMemory when I evaluate an XPATH expression a
> > >> > > large XML file.
> > >> > >
> > >> > > I am using this code:
> > >> > >
> > >> > >          XPathFactory factory = XPathFactory.newInstance();
> > >> > >         XPath xpath = factory.newXPath();
> > >> > >         InputSource entities_is=new InputSource("file.xml");
> > >> > >         XPathExpression xpathExpr = xpath.compile(expr);
> > >> > >         return (NodeList)xpathExpr.evaluate(entities_is,
> > >> > > XPathConstants.NODESET);
> > >> > >
> > >> > > I am not an expert of XPATH development, therefore I'd
> > >> > appreciate if
> > >> > > someone could give me an explanation of why is this error
> > >> > happening.
> > >> > > Is this because XPATH uses DOM internaly? If so, is there any
> > >> > > implementation for XPATH for SAX? Is there any other
> > >> > > explanation/solution?
> > >> > >
> > >> > > Thanks in advance for your help.
> > >> > > -Enric
> > >>
> > >>
> >
> >
> >
> >
>
>
>
>
>
> ---------------------------------------------------------------------------
>
> Andreas Rulle
>
> Intermoves AG
> Technologiepark 19
> 33100 Paderborn
>
> Tel.  + 49 (0) 52 51 1613-0
> * Fax  + 49 (0) 52 51 1613-99
> * mailto:[EMAIL PROTECTED]
>
>
>

RE: XPATH outOfMemory while evaluating large XML files.

Reply via email to