On Mon, 16 Jan 2006 [EMAIL PROTECTED] wrote:
Hello Andreas, > > for a brief introduction into StAX you could look at "An introduction into > StAX" > by Harold, E. R., see "http://www.xml.com/pub/a/2003/09/17/stax.html". > > StAX does not build a tree, but you can build the Dom-tree from the StAX > events. > > A very simple, but runtime consuming approach would be to > > * add the events from the StAX parser into an XMLEventWriter for each > subtree, > * produce an XML string s_subtree for each subtree and > * build a Dom-tree t_subtree from this string s_subtree. > > This Dom-tree t_subtree can be the input for Xalan-XPATH. I see, divide and conquer.. The large XML file is divided into smaller DOM subtrees using woodstox (or any other pull parser) and then each subtree is evaluated with XPATH individually. I guess this will work provided that the subtree fits inside the Java heap I understand now! Thanks a lot for your help :) Regards, -Enric > > It is much faster to build the Dom-tree directly from the StAX parser > events. > > Perhaps you could try stax2dom, see https://stax2dom.dev.java.net/ > for a direct StAX to DOM mapping for the subtrees or find some ideas > at http://woodstox.codehaus.org/StaxMisc. > > Regards > > Andreas > > >On Mon, 16 Jan 2006 [EMAIL PROTECTED] wrote: > > > Hello Enric, > > > > depending on the needs of your application it may be an alternative > > approach to combine a Stax-compliant (JSR-173) parser like woodstox (see > > http://woodstox.codehaus.org/) and XPath > > > > 1. Parse the tree with woodstox. > > > > 2. For small subtrees build a (J)Dom-Tree. > > > > 3. Use XPath to select nodes from the subtree. > > >I have not clear something: How can I tell Xalan-XPATH to use the tree > generated by woodstrox ? I thought that >Xalan-XPATH creates its own DOM > tree from InputSource to select the nodes. > > >Thanks for any clarification on this.. > > >-Enric > > > > > > > We used this approach to semantically compare two BMEcat messages > > (see http://www.bmecat.org). It has been tested the comparison of > > two 900 MByte files. > > > > Regards, > > > > Andreas > > > > > > >>On Fri, 13 Jan 2006, Karr, David wrote: > > > > >> How many nodes is your Xpath expression returning? If you're > > >> essentially returning the vast majority of the nodes in the file, then > > >> you're probably using the wrong tool for this job. That is, don't use > > >> Xpath for this. > > > > >The curious think is that my XPath expression doesn't return any node. > So > > I guess XPath needs to build a DOM tree to do >its job, even if returns > > nothing (could someone confirm this?) > > > > >Thanks to those who provided pointers to other tools. After googling a > > bit, I found a commercial product ( > > http://www.eweek.com/article2/0,1759,1780265,00.asp) where they say can > > process a 1TB file by doing streaming instead of >DOM. I found also > > 'exist', an open source native XML database ( > http://exist.sourceforge.net/) > > where they say can work with >documents with up to 2^63 nodes. > > > > >>Regards, > > > > >>-Enric > > > > >> > > >> > -----Original Message----- > > >> > From: Enric Jaen [mailto:[EMAIL PROTECTED] > > >> > > > >> > >If you think there is bugs in the impl of XPath, please open a bug > > >> > >report at https://issues.apache.org/jira/secure/Dashboard.jspa > > >> > >and attach a valid test case that can demonstrate the problem. > > >> > > > >> > I don't think is a bug. I rather think that XPATH builds a > > >> > DOM tree when returns a NodeSet (please correct me if I am > > >> > wrong). When the file is about 6MB the java memory crashes. > > >> > Two workarounds I have tried are to increase heap and divide > > >> > the xml file. Both solutions bring the evaluation limit > > >> > farther, but there is still a limit. > > >> > > > >> > I think it would be possible an XPATH implementation for SAX, > > >> > such as Sequential XPATH, but I haven't gone deelply into this. > > >> > > > >> > -Enric > > >> > > > >> > On Fri, 13 Jan 2006, Enric Jaen wrote: > > >> > > > >> > > Hello, I got an OutOfMemory when I evaluate an XPATH expression a > > >> > > large XML file. > > >> > > > > >> > > I am using this code: > > >> > > > > >> > > XPathFactory factory = XPathFactory.newInstance(); > > >> > > XPath xpath = factory.newXPath(); > > >> > > InputSource entities_is=new InputSource("file.xml"); > > >> > > XPathExpression xpathExpr = xpath.compile(expr); > > >> > > return (NodeList)xpathExpr.evaluate(entities_is, > > >> > > XPathConstants.NODESET); > > >> > > > > >> > > I am not an expert of XPATH development, therefore I'd > > >> > appreciate if > > >> > > someone could give me an explanation of why is this error > > >> > happening. > > >> > > Is this because XPATH uses DOM internaly? If so, is there any > > >> > > implementation for XPATH for SAX? Is there any other > > >> > > explanation/solution? > > >> > > > > >> > > Thanks in advance for your help. > > >> > > -Enric > > >> > > >> > > > > > > > > > > > > > > --------------------------------------------------------------------------- > > Andreas Rulle > > Intermoves AG > Technologiepark 19 > 33100 Paderborn > > Tel. + 49 (0) 52 51 1613-0 > * Fax + 49 (0) 52 51 1613-99 > * mailto:[EMAIL PROTECTED] > > >