Decreasing XPath Performance on large files?

Thomas Maschutznig Tue, 04 Sep 2007 02:22:41 -0700

I am using xalan-j 2.7.0 on java 1.5 together with JAXP 1.3 similarto the xalan-j ApplyXPathJAXP sample. I have to read data from a 20MBXML file with approx. 3000 nodes directly below the document root;each one of these nodes contains some sub-nodes with attributes. Iwant to partially extract data from this file and create Java beansso I choose XPath expressions to extract exactly the tag andattribute data I need.First, I search for all of those 3000 nodes directly below root likethis:

  XPath xPath = XPathFactory.newInstance().newXPath();

org.w3c.dom.NodeList nodes = (NodeList) xPath.evaluate("/Waveset/User", inputSource, XPathConstants.NODESET);

Then I go through all matching nodes in a for-loop and extract datafrom each node's content using around 5 to 10 relative XPathexpressions.

  for(int i=0; i < nodes.getLength(); i++) {
    System.out.println("Identity Count is : " + i);
    node = (org.w3c.dom.Element) nodes.item(i);

firstName = xPath.evaluate("[EMAIL PROTECTED]'firstname']/@value", node);lastName = xPath.evaluate("[EMAIL PROTECTED]'lastname']/@value",node);

    // some more similar lines here...
  }

I can read "Identity Count is: x" for the first 60 to 90 lines veryfast, within 2 or 3 seconds, but then it seems to start slowing downand finally at a count of around 1500 it takes up to 10 seconds andlater maybe even more for one node to be processed (even so after JVMand gc options were tuned; before that it was significantly worse).I tuned JVM options, maximizing heap-space and resizing eden-space; Ican see garbage collections happen every 20 to 30 seconds. My JVMoptions (on Windows 2003 x64, jdk 1.5.0_11 64bit) right now are:-Xms4g -Xmx4g -XX:NewSize=2g -XX:ThreadStackSize=16384 -XX:+UseParallelGC -server -XX:+AggressiveOpts

Classpath is: .:IMR_Import_Lib.jar:antlr-2.7.6.jar:asm.jar:asm-attrs.jar:c3p0-0.9.1.jar:cglib-2.1.3.jar:commons-collections-2.1.1.jar:commons-logging-1.0.4.jar:dom4j-1.6.1.jar:ejb3-persistence.jar:hibernate3.jar:jdbc2_0-stdext.jar:jta.jar:log4j-1.2.14.jar:ojdbc14.jar:serializer.jar:xalan.jar:xercesImpl.jar:xml-apis.jar:hibernate-annotations.jar:hibernate-commons-annotations.jar


(there is a .properties file on .)

I also tried a modified version of the first xPath.evaluate(),explicitly creating a Document object of the XML, to no avail:

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document d = db.parse(new File(this.xmlFilePathName));

    XPath xPath = XPathFactory.newInstance().newXPath();

NodeList nodes = (NodeList) xPath.evaluate("/Waveset/User", d,XPathConstants.NODESET);

I am a little stuck here with the drastically decreasing performancearound half way through the XML file. Did I miss anything in my code?I know using a lot of XPath expressions like I do is very expensivebut why would the second half of the file take 5 times as long as thefirst one while the first 100 /Wave/User nodes are parsed withinseconds?


 Thomas

Decreasing XPath Performance on large files?

Reply via email to