XSLTC performance problems with xsl:key in Muenchian grouping
-------------------------------------------------------------

         Key: XALANJ-2295
         URL: http://issues.apache.org/jira/browse/XALANJ-2295
     Project: XalanJ2
        Type: Bug

  Components: XSLTC  
    Versions: 2.7    
    Reporter: Henry Zongaro
 Assigned to: Henry Zongaro 


Performance scales poorly for Muenchian grouping in XSLTC.  For an instruction 
like the following, the user would like only the first node in the node set 
returned by each reference to the key function to be processed in the 
predicate, which tests for node identity.

    <xsl:for-each select="item[generate-id() = generate-id(key('k', 
value)[1])]">

However, in XSLTC all nodes associated with each key value are copied into a 
temporary IntegerArray by the KeyIndex object used to represent the result of 
the key function, and then that result is always put through a 
DupFilterIterator, which sorts the nodes returned by the key function, even 
though the nodes produced by KeyIndex are always in document order; the 
DupFilterIterator seems to be used just as a mechanism for copying nodes from 
the KeyIndex object that was used to evaluate the result of the key function.  
So every node in the result of the key function is always copied at least twice.

Using a heap mechanism like that used in UnionPathIterator to sort the nodes 
would be more efficient, as it would only process each node once, and only as 
each node was needed.  Each node in the heap would be the set of nodes (which 
are already in document order because that's how KeyIndex is built to begin 
with) associated with a single key value for that reference to the key 
function, so pulling a single node would require at most O(log2(num_kv)) 
comparisons, where num_kv is the number of key values.

On top of that, a reference to key that has a positional predicate does not 
take advantage of the NthIterator class to directly retrieve the node.  This 
means that in an expression like

   key('k',keyvalues)[1]

the position of every node produced by the key is tested for equality to the 
value one.  Using NthIterator would mean as few nodes as possible would be 
retrieved - in this case,  just one.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to