Hi, I've done very few changes to existing code: http://www.sectorbase.com/tmp/islast-20030227.zip
As shown below, results of this optimization are pretty good. I tested this code on the bunch of different XPath queries and XSL templates. If you want to see what they were, I can send it too. If you have a set of test cases you use to ensure conformity of the code, I'll be glad to check them too. In this case, please send me location of these tests. If you'll have any questions about changes I've done or comments, please send them to me. Basically changes are: 1. New interface org.apache.xml.dtm.DTMIteratorIsLast is intended to work with DTMIterator and responsible for this optimization. It just defines one method: isLastPos. I decided agains adding this method directly to DTMIterator, b/c this interface is outside of org.apache.xpath package and thus I wasn't sure if it's used somewhere else. But this interfaces can be joined, b/c this method is natural for DTMIterator interface. 2. extended interface org.apache.xpath.axes.SubContextList to include similar method isLastPos. 3. added internal function org.apache.xpath.functions.FuncIsLast. This function is created as the result of optimization. It uses DTMIteratorIsLast.isLastPos and SubContextList.isLastPos methods if available, otherwise, it performs "position() = last()". 3. Class org.apache.xpath.compiler.Compiler: added code for optimization of expressions "position() X last()", where X is '=', '!=', '>', '<', '>=', '<='. Const OPTIMIZE_IS_LAST switches this optimization on/off. 5. Class org.apache.xpath.axes.PredicatedNodeTest: 5.1 implemented next-node-prefetch system for is-last optimization 5.2 made few changes to methods: acceptNode and executePredicate to avoid duplicate score and predicate testing to additionally improve is-last optimization. 6. Class org.apache.xpath.axes.AxesWalker: 6.1 I fixed bug http://nagoya.apache.org/bugzilla/show_bug.cgi?id=17400 (reported by me, but I'm pretty sure it does exist and this is the correct fix, plus it's very simple) 6.2 implemented next-node-prefetch isLast optimization. 7. Class org.apache.xpath.patterns.StepPattern: implemented isLastPos method. This is a good example how this system can improve performance w/o introducing caches and anything else. About 70% is accomplished by replacing "position() = last()" with "is-last()", b/c: a) two traversals for "position()" (in avg. 50% of all match pattern node-list) and last() (100% of node-list) are joined into one traversal. b) this one traversal only should traverse math pattern node-list, untill it finds current position (50% avg) + 1 more traverse, to ensure this position is the last in the list (or not). 8. Class org.apache.xpath.axes.LocPathIterator is made compatible with changes described, so that implementing this functionality in its direct descendants would be easy. 9. Classes org.apache.xpath.axes.WalkingIterator, org.apache.xpath.axes.BasicTestIterator (with descendants), org.apache.xpath.axes.NodeSequence: implemented isLastPos. 10. Class org.apache.xalan.lib.sql.SQLDocument: present implementation of this class (streaming mode) doesn't yield "position() X last()" correct result neither with nor w/o is-last optimization. But is-last optimization allows to modify this class in a way, so that it would yield correct results w/o big caching overhead. I showed one way how this can be done in this class. TESTING: So, as I said above, mainly is-last optimization has been done for 4 classes: AxesWalker, WalkingIterator, BasicTestIterator and StepPattern. These are results of testing against cases 1 - 3 with initial traversal size (before applying predicates or expressions, i.e count(/doc/nx)) of 10 and 100 for optimized and not optimized version. Numbers presented are calculated using the formula: [opt ratio] = [avg. time w/o opt ] / [avg time w/opt] count(/doc/nx) = 10 count(/doc/nx) = 100 cases 1 1.4 10.1 cases 2 1.8 2.8 cases 3 1.1 3.0 As you can see biggest improvement is in the area of AxesWalker+WalkingIterator queries. Below you can see queries used for cases and XML document. XML document (versions with with 10 and 100 "nx" nodes): <doc> <nx a="">1</nx> <nx>2</nx> <nx a="">3</nx> <nx>4</nx> <nx a="">5</nx> <nx a="">6</nx> <nx>7</nx> <nx>8</nx> <nx>9</nx> <nx>10</nx> </doc> Cases 1: AxesWalker + WalkingIterator: a) <xsl:copy-of select="/doc/nx[position() = last()]"/> b) <xsl:copy-of select="/doc/[EMAIL PROTECTED]() = last()]"/> c) <xsl:for-each select="/doc/nx[position() != last()]"> <xsl:if test="position() = last()"> <xsl:copy-of select="."/> </xsl:if> </xsl:for-each> Cases 2: BasicTestIterator: a) <xsl:for-each select="/doc"> <xsl:for-each select="nx[position() = last()]"> <xsl:copy-of select="."/> </xsl:for-each> </xsl:for-each> b) <xsl:for-each select="/doc"> <xsl:for-each select="nx"> <xsl:if test="position() = last()"> <xsl:copy-of select="."/> </xsl:if> </xsl:for-each> </xsl:for-each> c) <xsl:for-each select="/doc"> <xsl:for-each select="nx[position() != last()]"> <xsl:if test="position() = last()"> <xsl:copy-of select="."/> </xsl:if> </xsl:for-each> </xsl:for-each> Cases 3: StepPattern: a) <xsl:apply-templates select="/doc/nx"/> .... .... <xsl:template match="nx"> </xsl:template> <xsl:template match="nx[position() = last()]"> <xsl:copy-of select="."/> </xsl:template> Thanks, Dimitry -----Original Message----- From: Voytenko, Dimitry Sent: Friday, February 21, 2003 11:35 To: '[EMAIL PROTECTED]'; Voytenko, Dimitry Cc: [EMAIL PROTECTED] Subject: RE: position() = last() optimization feature request Hi Morris, >> The idea of the isLast() interface will be a good thing if the >> overhead for caching is small enough. If it'd be defined as a seperate interface (as I understand this is what you're suggesting), then implementations of only "double-fetch" critical iterators can implement this - which will make changes minimal. For the rest of DTMIterator's is-last can be implemented as "position() =(!=) last()". I currently don't see any cases when cache of more than one "node" will be necessary. Consider sql extension lib in streaming mode: to return value for isLast() method you need to just call ResultSet.next() once. So this would look like (approx): isLast() { if (!m_nextFetched) { m_nextFetched = true; m_hasMore = resultSet.next(); // current record is already cached in the streaming node } return m_hasMore; } nextNode() { if (!m_nextFetched) { // current implementation } else { m_nextFetched = false; // implementation similar to current, but w/o extra ResultSet.next(); } } Since, the idea seems to be good, I'll go ahead and make changes necessary for this functionality and post them here. Thanks, Dimitry -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, February 21, 2003 11:12 To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Re: position() = last() optimization feature request The idea of the isLast() interface will be a good thing if the overhead for caching is small enough. By the way, I will also look at the optimizations for "node[last()]" in my XSLTC_DTM performance improvement work. This is better to be done on the DTM side. My current idea is to have the DTMAxisIterator support a getLastNode() interface. This is a separate issue from isLast. I will suggest you to look at the isLast issue first. Regards, Morris Kwan XSLT Development IBM Toronto Lab Tel: (905)413-3729 Email: [EMAIL PROTECTED] "Voytenko, Dimitry" To: [EMAIL PROTECTED] <[EMAIL PROTECTED] cc: data.com> Subject: position() = last() optimization feature request 02/20/2003 09:42 PM Please respond to xalan-dev Hi, Please take a look at the attached letter and response from W3C about is-last optimization. It looks like this could be advantageous for many queries (especially for sql extension in streaming mode). XPath expressions (sub-expressions) of view: position() ? last(), where "?" is =, !=, <, >, etc, can be easily optimized (org.apache.xpath.compiler.Compiler class) into the call to the internal "is-last()" function. Simpliest (or caching) iterators can implement this call just as "position () ? last()". And more complicated implementations would require (in the worst case) caching of only one (next) node. So this can really save resources for many queries. I could post changes that need to be done in Compiler class (they'd be pretty simple). But this would make sense only if DTMManager (and all other iterator interfaces) would have isLast() method, which already requires changes to the interfaces. >From the iteration standpoint java.util.Iterator and Enumeration interfaces all have hasNext() method (just as examples). Please, send your comments if this looks interesting. In addition, similar optimization could be done for queries "node[last()]" as call to internal "get-last()" method, and probably some others. Thanks, Dimitry _____________________________________________________ Sector Data, LLC, is not affiliated with Sector, Inc., or SIAC <<RE: Iteration functionality request for XPath 2.0>> ----- Message from "Kay, Michael" <[EMAIL PROTECTED]> on Fri, 20 Sep 2002 02:30:54 -0800 ----- To: "Voytenko, Dimitry" <[EMAIL PROTECTED]>, "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]> Subject: RE: Iteration functionality request for XPath 2.0 > .... > Sometimes it's very important to know if current node is the > last one in the node-set. There're two basic ways to do it in > XPath 1.0: > **** FIRST APPROACH: using "position" and "last": > <xsl:if test="position() = last()"> > .... > </xsl:if> > > Drawbacks: > If node-set implementation is iterative, i.e. new node > is fetched only when for-each instruction goes to a new cycle > and whole list of nodes is not cached in node-set object, > then the first call to last() function will go through all > nodes to find out their quantity. This will repeat the > node-set fetch procedure twice or will lead to the caching of > the nodes list..... > In my point of view, this could be easily solved by extending > list of node-set/context functions with one more function, > something like "is-last" (another function "is-first" could > also be added for symmetry, eventhough it'd be redundant). > This function would return boolean true if current node is > the last node in the node-set. Saxon has an internal function like your is-last(), and its optimizer rewrites the predicate [position()=last()] as [is-last()], for all the reasons you mention. I don't see a good reason why is-last() should be made available as a user-visible function, when this construct is very easy for optimizers to handle internally. Michael Kay _____________________________________________________ Sector Data, LLC, is not affiliated with Sector, Inc., or SIAC
