Hi,

I've done very few changes to existing code:
http://www.sectorbase.com/tmp/islast-20030227.zip

As shown below, results of this optimization are pretty good.

I tested this code on the bunch of different XPath queries and XSL
templates. If you want to see what they were, I can send it too. If you have
a set of test cases you use to ensure conformity of the code, I'll be glad
to check them too. In this case, please send me location of these tests.
If you'll have any questions about changes I've done or comments, please
send them to me.
Basically changes are:

1. New interface org.apache.xml.dtm.DTMIteratorIsLast is intended to work
with DTMIterator and responsible for this optimization. It just defines one
method: isLastPos. I decided agains adding this method directly to
DTMIterator, b/c this interface is outside of org.apache.xpath package and
thus I wasn't sure if it's used somewhere else. But this interfaces can be
joined, b/c this method is natural for DTMIterator interface.

2. extended interface org.apache.xpath.axes.SubContextList to include
similar method isLastPos.

3. added internal function org.apache.xpath.functions.FuncIsLast. This
function is created as the result of optimization. It uses
DTMIteratorIsLast.isLastPos and SubContextList.isLastPos methods if
available, otherwise, it performs "position() = last()".

3. Class org.apache.xpath.compiler.Compiler: added code for optimization of
expressions "position() X last()", where X is '=', '!=', '>', '<', '>=',
'<='. Const OPTIMIZE_IS_LAST switches this optimization on/off.

5. Class org.apache.xpath.axes.PredicatedNodeTest: 
5.1 implemented next-node-prefetch system for is-last optimization
5.2 made few changes to methods: acceptNode and executePredicate to avoid
duplicate score and predicate testing to additionally improve is-last
optimization.

6. Class org.apache.xpath.axes.AxesWalker:
6.1 I fixed bug http://nagoya.apache.org/bugzilla/show_bug.cgi?id=17400
(reported by me, but I'm pretty sure it does exist and this is the correct
fix, plus it's very simple)
6.2 implemented next-node-prefetch isLast optimization.

7. Class org.apache.xpath.patterns.StepPattern: implemented isLastPos
method. This is a good example how this system can improve performance w/o
introducing caches and anything else. About 70% is accomplished by replacing
"position() = last()" with "is-last()", b/c:
a) two traversals for "position()" (in avg. 50% of all match pattern
node-list) and last() (100% of node-list) are joined into one traversal.
b) this one traversal only should traverse math pattern node-list, untill it
finds current position (50% avg) + 1 more traverse, to ensure this position
is the last in the list (or not).

8. Class org.apache.xpath.axes.LocPathIterator is made compatible with
changes described, so that implementing this functionality in its direct
descendants would be easy.

9. Classes org.apache.xpath.axes.WalkingIterator,
org.apache.xpath.axes.BasicTestIterator (with descendants),
org.apache.xpath.axes.NodeSequence: implemented isLastPos.

10. Class org.apache.xalan.lib.sql.SQLDocument: present implementation of
this class (streaming mode) doesn't yield "position() X last()" correct
result neither with nor w/o is-last optimization. But is-last optimization
allows to modify this class in a way, so that it would yield correct results
w/o big caching overhead. I showed one way how this can be done in this
class.

TESTING:

So, as I said above, mainly is-last optimization has been done for 4
classes: AxesWalker, WalkingIterator, BasicTestIterator and StepPattern.

These are results of testing against cases 1 - 3 with initial traversal size
(before applying predicates or expressions, i.e count(/doc/nx)) of 10 and
100 for optimized and not optimized version. Numbers presented are
calculated using the formula:
        [opt ratio] = [avg. time w/o opt ] / [avg time w/opt]

                count(/doc/nx) = 10     count(/doc/nx) = 100
cases 1         1.4                     10.1
cases 2         1.8                     2.8
cases 3         1.1                     3.0

As you can see biggest improvement is in the area of
AxesWalker+WalkingIterator queries.

Below you can see queries used for cases and XML document.

XML document (versions with with 10 and 100 "nx" nodes):
<doc>
   <nx a="">1</nx>
   <nx>2</nx>
   <nx a="">3</nx>
   <nx>4</nx>
   <nx a="">5</nx>
   <nx a="">6</nx>
   <nx>7</nx>
   <nx>8</nx>
   <nx>9</nx>
   <nx>10</nx>
</doc>

Cases 1: AxesWalker + WalkingIterator:
a)      <xsl:copy-of select="/doc/nx[position() = last()]"/>
b)      <xsl:copy-of select="/doc/[EMAIL PROTECTED]() = last()]"/>
c)      <xsl:for-each select="/doc/nx[position() != last()]">
                <xsl:if test="position() = last()">
                                <xsl:copy-of select="."/>
                </xsl:if>
        </xsl:for-each>

Cases 2: BasicTestIterator:
a)      <xsl:for-each select="/doc">
                <xsl:for-each select="nx[position() = last()]">
                        <xsl:copy-of select="."/>
                </xsl:for-each>
        </xsl:for-each>
b)      <xsl:for-each select="/doc">
                <xsl:for-each select="nx">
                        <xsl:if test="position() = last()">
                                <xsl:copy-of select="."/>
                        </xsl:if>
                </xsl:for-each>
        </xsl:for-each>
c)      <xsl:for-each select="/doc">
                <xsl:for-each select="nx[position() != last()]">
                        <xsl:if test="position() = last()">
                                <xsl:copy-of select="."/>
                        </xsl:if>
                </xsl:for-each>
        </xsl:for-each>

Cases 3: StepPattern:
a)      <xsl:apply-templates select="/doc/nx"/>
....
....
<xsl:template match="nx">
</xsl:template>

<xsl:template match="nx[position() = last()]">
        <xsl:copy-of select="."/>
</xsl:template>


Thanks,
Dimitry


-----Original Message-----
From: Voytenko, Dimitry 
Sent: Friday, February 21, 2003 11:35
To: '[EMAIL PROTECTED]'; Voytenko, Dimitry
Cc: [EMAIL PROTECTED]
Subject: RE: position() = last() optimization feature request


Hi Morris,

>> The idea of the isLast() interface will be a good thing if the 
>> overhead for caching is small enough.
If it'd be defined as a seperate interface (as I understand this is what
you're suggesting), then implementations of only "double-fetch" critical
iterators can implement this - which will make changes minimal. For the rest
of DTMIterator's is-last can be implemented as "position() =(!=) last()".
I currently don't see any cases when cache of more than one "node" will be
necessary. Consider sql extension lib in streaming mode: to return value for
isLast() method you need to just call ResultSet.next() once. So this would
look like (approx):

isLast() {
  if (!m_nextFetched) {
    m_nextFetched = true;
    m_hasMore = resultSet.next(); 
    // current record is already cached in the streaming node
  }
  return m_hasMore;
}

nextNode() {
  if (!m_nextFetched) {
    // current implementation
  } else {
    m_nextFetched = false;
    // implementation similar to current, but w/o extra ResultSet.next();
  }
}

Since, the idea seems to be good, I'll go ahead and make changes necessary
for this functionality and post them here.

Thanks,
Dimitry

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Friday, February 21, 2003 11:12
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: position() = last() optimization feature request


The idea of the isLast() interface will be a good thing if the overhead for
caching is small enough.

By the way, I will also look at the optimizations for  "node[last()]"  in
my XSLTC_DTM performance improvement work. This is better to be done on the
DTM side. My current idea is to have the DTMAxisIterator support a
getLastNode() interface. This is a separate issue from isLast. I will
suggest you to look at the isLast issue first.

Regards,

Morris Kwan
XSLT Development
IBM Toronto Lab
Tel: (905)413-3729
Email: [EMAIL PROTECTED]



 

                      "Voytenko,

                      Dimitry"                 To:
[EMAIL PROTECTED]                                                    
                      <[EMAIL PROTECTED]        cc:

                      data.com>                Subject:  position() = last()
optimization feature request                            
 

                      02/20/2003 09:42

                      PM

                      Please respond to

                      xalan-dev

 

 




Hi,

Please take a look at the attached letter and response from W3C about
is-last optimization. It looks like this could be advantageous for many
queries (especially for sql extension in streaming mode). XPath expressions
(sub-expressions) of view:
             position() ? last(), where "?" is =, !=, <, >, etc,

can be easily optimized (org.apache.xpath.compiler.Compiler class) into the
call to the internal "is-last()" function.
Simpliest (or caching) iterators can implement this call just as "position
()
? last()". And more complicated implementations would require (in the worst
case) caching of only one (next) node. So this can really save resources
for
many queries.

I could post changes that need to be done in Compiler class (they'd be
pretty simple). But this would make sense only if DTMManager (and all other
iterator interfaces) would have isLast() method, which already requires
changes to the interfaces.
>From the iteration standpoint java.util.Iterator and Enumeration interfaces
all have hasNext() method (just as examples).

Please, send your comments if this looks interesting.

In addition, similar optimization could be done for queries "node[last()]"
as call to internal "get-last()" method, and probably some others.

Thanks,
Dimitry


_____________________________________________________
Sector Data, LLC, is not affiliated with Sector, Inc., or SIAC
 <<RE: Iteration functionality request for XPath 2.0>>

----- Message from "Kay, Michael" <[EMAIL PROTECTED]> on Fri, 20
Sep 2002 02:30:54 -0800 -----
 

      To: "Voytenko, Dimitry" <[EMAIL PROTECTED]>,
"'[EMAIL PROTECTED]'"              
          <[EMAIL PROTECTED]>

 

 Subject: RE: Iteration functionality request for XPath 2.0

 


> ....
> Sometimes it's very important to know if current node is the
> last one in the node-set. There're two basic ways to do it in
> XPath 1.0:
> **** FIRST APPROACH: using "position" and "last":
>            <xsl:if test="position() = last()">
>                        ....
>            </xsl:if>
>
> Drawbacks:
>            If node-set implementation is iterative, i.e. new node
> is fetched only when for-each instruction goes to a new cycle
> and whole list of nodes is not cached in node-set object,
> then the first call to last() function will go through all
> nodes to find out their quantity. This will repeat the
> node-set fetch procedure twice or will lead to the caching of
> the nodes list.....

> In my point of view, this could be easily solved by extending
> list of node-set/context functions with one more function,
> something like "is-last" (another function "is-first" could
> also be added for symmetry, eventhough it'd be redundant).
> This function would return boolean true if current node is
> the last node in the node-set.

Saxon has an internal function like your is-last(), and its optimizer
rewrites the predicate [position()=last()] as [is-last()], for all the
reasons you mention.

I don't see a good reason why is-last() should be made available as a
user-visible function, when this construct is very easy for optimizers to
handle internally.
Michael Kay




_____________________________________________________
Sector Data, LLC, is not affiliated with Sector, Inc., or SIAC

Reply via email to