so it seems xalan could easily be fixed but nobody has done it. also the
project is abandoned. should we still be depending on an abandoned
project? what is the alternative? lots of projects still depend on
xalan surely?
-------- Forwarded Message --------
Subject: [jira] [Commented] (XALANJ-2540) Very inefficient default
behaviour for looking up DTMManager
Date: Wed, 4 Apr 2018 20:05:00 +0000 (UTC)
From: Laszlo Hornyak (JIRA) <j...@apache.org>
To: matthew.broadh...@nbmlaw.co.uk
[
https://issues.apache.org/jira/browse/XALANJ-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426099#comment-16426099
]
Laszlo Hornyak commented on XALANJ-2540:
----------------------------------------
It is easy fix, but the project is abandoned for years. It is better if you set
the properties as described.
Very inefficient default behaviour for looking up DTMManager
------------------------------------------------------------
Key: XALANJ-2540
URL: https://issues.apache.org/jira/browse/XALANJ-2540
Project: XalanJ2
Issue Type: Improvement
Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects. Anybody can view the issue.)
Components: DTM, XPath
Affects Versions: 2.7.1, 2.7
Reporter: Lukas Eder
Priority: Major
I have analysed an issue that has been bothering me for some time. When
executing XPath evaluations, it looks like a very significant amount of time is
spent in the initialisation of the XPathContext. I have asked this question on
Stack Overflow and answered it myself:
http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance
I think the default behaviour of
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal and should be improved, statically. I imagine, it is unlikely that this configuration is going to change once classes have been loaded. Hence, the fallback lookup of META-INF/service/org.apache.xml.dtm.DTMManager should only be done once.
For reference, here's the question and answer again in JIRA:
----
I have come to an astonishing conclusion that this:
Element e = (Element)
document.getElementsByTagName("SomeElementName").item(0);
String result = ((Element) e).getTextContent();
Seems to be an incredible 100x faster than this:
// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();
// Negligible
XPath xpath = factory.newXPath();
// Accounts for 70% (caching a compiled expression doesn't change much...)
String result = (String) xpath.evaluate(
"//SomeElementName", document, XPathConstants.STRING);
I'm using the JVM's default implementation of JAXP:
org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl
I'm really confused, because it's easy to see how JAXP could optimise the above
XPath query to actually execute a simple getElementsByTagName() instead. But it
doesn't seem to do that. This problem is limited to around 5-6 frequently used
XPath calls, that are abstracted and hidden by an API. Those queries involve
simple paths (e.g. /a/b/c, no variables, conditions) against an always
available DOM Document only. So, if an optimisation can be done, it will be
quite easy to achieve.
----
I have debugged and profiled my test-case and Xalan/JAXP in general. I managed
to identify the big major problem in
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()
It can be seen that every one of the 10k test XPath evaluations led to the
classloader trying to lookup the DTMManager instance in some sort of default
configuration. This configuration is not loaded into memory but accessed every
time. Furthermore, this access seems to be protected by a lock on the
ObjectFactory.class itself. When the access fails (by default), then the
configuration is loaded from the xalan.jar file's
META-INF/service/org.apache.xml.dtm.DTMManager
configuration file. Every time!:
Fortunately, this behaviour can be overridden by specifying a JVM parameter
like this:
-Dorg.apache.xml.dtm.DTMManager=
org.apache.xml.dtm.ref.DTMManagerDefault
or
-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
So here's a performance improvement overview for 10k consecutive XPath
evaluations of //SomeNodeName against a 90k XML file (measured with
System.nanoTime():
measured library : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation : 10400ms | 4717ms | | 25500ms
reusing XPathFactory : 5995ms | 2829ms | |
reusing XPath : 5900ms | 2890ms | |
reusing XPathExpression : 5800ms | 2915ms | 16000ms | 25000ms
adding the JVM param : 1163ms | 761ms | n/a |
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: taglibs-user-unsubscr...@tomcat.apache.org
For additional commands, e-mail: taglibs-user-h...@tomcat.apache.org