[jira] Resolved: (XALANJ-2295) XSLTC performance problems with xsl:key in Muenchian grouping

Henry Zongaro (JIRA) Wed, 07 Jun 2006 13:31:36 -0700

     [ http://issues.apache.org/jira/browse/XALANJ-2295?page=all ]
     
Henry Zongaro resolved XALANJ-2295:
-----------------------------------


    Fix Version: Latest Development Code
     Resolution: Fixed

Applied patch in source repository.

> XSLTC performance problems with xsl:key in Muenchian grouping
> -------------------------------------------------------------
>
>          Key: XALANJ-2295
>          URL: http://issues.apache.org/jira/browse/XALANJ-2295
>      Project: XalanJ2
>         Type: Bug

>   Components: XSLTC
>     Versions: 2.7
>     Reporter: Henry Zongaro
>     Assignee: Henry Zongaro
>      Fix For: Latest Development Code
>  Attachments: gendata2295.xml, gendata2295.xsl, j2295.xsl, patch.j2295.txt, 
> patch.j2295.v2.txt
>
> Performance scales poorly for Muenchian grouping in XSLTC.  For an 
> instruction like the following, the user would like only the first node in 
> the node set returned by each reference to the key function to be processed 
> in the predicate, which tests for node identity.
>     <xsl:for-each select="item[generate-id() = generate-id(key('k', 
> value)[1])]">
> However, in XSLTC all nodes associated with each key value are copied into a 
> temporary IntegerArray by the KeyIndex object used to represent the result of 
> the key function, and then that result is always put through a 
> DupFilterIterator, which sorts the nodes returned by the key function, even 
> though the nodes produced by KeyIndex are always in document order; the 
> DupFilterIterator seems to be used just as a mechanism for copying nodes from 
> the KeyIndex object that was used to evaluate the result of the key function. 
>  So every node in the result of the key function is always copied at least 
> twice.
> Using a heap mechanism like that used in UnionPathIterator to sort the nodes 
> would be more efficient, as it would only process each node once, and only as 
> each node was needed.  Each node in the heap would be the set of nodes (which 
> are already in document order because that's how KeyIndex is built to begin 
> with) associated with a single key value for that reference to the key 
> function, so pulling a single node would require at most O(log2(num_kv)) 
> comparisons, where num_kv is the number of key values.
> On top of that, a reference to key that has a positional predicate does not 
> take advantage of the NthIterator class to directly retrieve the node.  This 
> means that in an expression like
>    key('k',keyvalues)[1]
> the position of every node produced by the key is tested for equality to the 
> value one.  Using NthIterator would mean as few nodes as possible would be 
> retrieved - in this case,  just one.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Resolved: (XALANJ-2295) XSLTC performance problems with xsl:key in Muenchian grouping

Reply via email to