Very inefficient default behaviour for looking up DTMManager
------------------------------------------------------------

                 Key: XALANJ-2540
                 URL: https://issues.apache.org/jira/browse/XALANJ-2540
             Project: XalanJ2
          Issue Type: Improvement
      Security Level: No security risk; visible to anyone (Ordinary problems in 
Xalan projects.  Anybody can view the issue.)
          Components: DTM, XPath
    Affects Versions: 2.7, 2.7.1
            Reporter: Lukas Eder


I have analysed an issue that has been bothering me for some time. When 
executing XPath evaluations, it looks like a very significant amount of time is 
spent in the initialisation of the XPathContext. I have asked this question on 
Stack Overflow and answered it myself:

http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance

I think the default behaviour of 
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal 
and should be improved, statically. I imagine, it is unlikely that this 
configuration is going to change once classes have been loaded. Hence, the 
fallback lookup of META-INF/service/org.apache.xml.dtm.DTMManager should only 
be done once.

For reference, here's the question and answer again in JIRA:

----

I have come to an astonishing conclusion that this:

    Element e = (Element) 
document.getElementsByTagName("SomeElementName").item(0);
    String result = ((Element) e).getTextContent();

Seems to be an incredible 100x faster than this:

    // Accounts for 30%, can be cached
    XPathFactory factory = XPathFactory.newInstance();

    // Negligible
    XPath xpath = factory.newXPath();

    // Accounts for 70% (caching a compiled expression doesn't change much...)
    String result = (String) xpath.evaluate(
      "//SomeElementName", document, XPathConstants.STRING);

I'm using the JVM's default implementation of JAXP:

org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl

I'm really confused, because it's easy to see how JAXP could optimise the above 
XPath query to actually execute a simple getElementsByTagName() instead. But it 
doesn't seem to do that. This problem is limited to around 5-6 frequently used 
XPath calls, that are abstracted and hidden by an API. Those queries involve 
simple paths (e.g. /a/b/c, no variables, conditions) against an always 
available DOM Document only. So, if an optimisation can be done, it will be 
quite easy to achieve.

----

I have debugged and profiled my test-case and Xalan/JAXP in general. I managed 
to identify the big major problem in

org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()

It can be seen that every one of the 10k test XPath evaluations led to the 
classloader trying to lookup the DTMManager instance in some sort of default 
configuration. This configuration is not loaded into memory but accessed every 
time. Furthermore, this access seems to be protected by a lock on the 
ObjectFactory.class itself. When the access fails (by default), then the 
configuration is loaded from the xalan.jar file's

META-INF/service/org.apache.xml.dtm.DTMManager

configuration file. Every time!:

Fortunately, this behaviour can be overridden by specifying a JVM parameter 
like this:

-Dorg.apache.xml.dtm.DTMManager=
  org.apache.xml.dtm.ref.DTMManagerDefault

or

-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
  com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault

So here's a performance improvement overview for 10k consecutive XPath 
evaluations of //SomeNodeName against a 90k XML file (measured with 
System.nanoTime():

measured library        : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 
1.1.3   
--------------------------------------------------------------------------------
without optimisation    :     10400ms |      4717ms |              |     25500ms
reusing XPathFactory    :      5995ms |      2829ms |              |
reusing XPath           :      5900ms |      2890ms |              |
reusing XPathExpression :      5800ms |      2915ms |      16000ms |     25000ms
adding the JVM param    :      1163ms |       761ms |        n/a   |

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscr...@xml.apache.org
For additional commands, e-mail: xalan-dev-h...@xml.apache.org

Reply via email to