[ https://issues.apache.org/jira/browse/XALANJ-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085779#comment-13085779 ]
Michael Glavassevich commented on XALANJ-2540: ---------------------------------------------- It's probably not safe to return the same NodeList since those objects returned from the DOM are live [1]. That could allow the NodeList to change after it's been returned from the XPath evaluator. [1] http://www.w3.org/TR/DOM-Level-3-Core/core.html#td-live > Very inefficient default behaviour for looking up DTMManager > ------------------------------------------------------------ > > Key: XALANJ-2540 > URL: https://issues.apache.org/jira/browse/XALANJ-2540 > Project: XalanJ2 > Issue Type: Improvement > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: DTM, XPath > Affects Versions: 2.7.1, 2.7 > Reporter: Lukas Eder > > I have analysed an issue that has been bothering me for some time. When > executing XPath evaluations, it looks like a very significant amount of time > is spent in the initialisation of the XPathContext. I have asked this > question on Stack Overflow and answered it myself: > http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance > I think the default behaviour of > org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite > sub-optimal and should be improved, statically. I imagine, it is unlikely > that this configuration is going to change once classes have been loaded. > Hence, the fallback lookup of META-INF/service/org.apache.xml.dtm.DTMManager > should only be done once. > For reference, here's the question and answer again in JIRA: > ---- > I have come to an astonishing conclusion that this: > Element e = (Element) > document.getElementsByTagName("SomeElementName").item(0); > String result = ((Element) e).getTextContent(); > Seems to be an incredible 100x faster than this: > // Accounts for 30%, can be cached > XPathFactory factory = XPathFactory.newInstance(); > // Negligible > XPath xpath = factory.newXPath(); > // Accounts for 70% (caching a compiled expression doesn't change much...) > String result = (String) xpath.evaluate( > "//SomeElementName", document, XPathConstants.STRING); > I'm using the JVM's default implementation of JAXP: > org.apache.xpath.jaxp.XPathFactoryImpl > org.apache.xpath.jaxp.XPathImpl > I'm really confused, because it's easy to see how JAXP could optimise the > above XPath query to actually execute a simple getElementsByTagName() > instead. But it doesn't seem to do that. This problem is limited to around > 5-6 frequently used XPath calls, that are abstracted and hidden by an API. > Those queries involve simple paths (e.g. /a/b/c, no variables, conditions) > against an always available DOM Document only. So, if an optimisation can be > done, it will be quite easy to achieve. > ---- > I have debugged and profiled my test-case and Xalan/JAXP in general. I > managed to identify the big major problem in > org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() > It can be seen that every one of the 10k test XPath evaluations led to the > classloader trying to lookup the DTMManager instance in some sort of default > configuration. This configuration is not loaded into memory but accessed > every time. Furthermore, this access seems to be protected by a lock on the > ObjectFactory.class itself. When the access fails (by default), then the > configuration is loaded from the xalan.jar file's > META-INF/service/org.apache.xml.dtm.DTMManager > configuration file. Every time!: > Fortunately, this behaviour can be overridden by specifying a JVM parameter > like this: > -Dorg.apache.xml.dtm.DTMManager= > org.apache.xml.dtm.ref.DTMManagerDefault > or > -Dcom.sun.org.apache.xml.internal.dtm.DTMManager= > com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault > So here's a performance improvement overview for 10k consecutive XPath > evaluations of //SomeNodeName against a 90k XML file (measured with > System.nanoTime(): > measured library : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen > 1.1.3 > -------------------------------------------------------------------------------- > without optimisation : 10400ms | 4717ms | | > 25500ms > reusing XPathFactory : 5995ms | 2829ms | | > reusing XPath : 5900ms | 2890ms | | > reusing XPathExpression : 5800ms | 2915ms | 16000ms | > 25000ms > adding the JVM param : 1163ms | 761ms | n/a | -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: xalan-dev-unsubscr...@xml.apache.org For additional commands, e-mail: xalan-dev-h...@xml.apache.org