[ http://issues.apache.org/jira/browse/XALANJ-2295?page=all ] Henry Zongaro resolved XALANJ-2295: -----------------------------------
Fix Version: Latest Development Code Resolution: Fixed Applied patch in source repository. > XSLTC performance problems with xsl:key in Muenchian grouping > ------------------------------------------------------------- > > Key: XALANJ-2295 > URL: http://issues.apache.org/jira/browse/XALANJ-2295 > Project: XalanJ2 > Type: Bug > Components: XSLTC > Versions: 2.7 > Reporter: Henry Zongaro > Assignee: Henry Zongaro > Fix For: Latest Development Code > Attachments: gendata2295.xml, gendata2295.xsl, j2295.xsl, patch.j2295.txt, > patch.j2295.v2.txt > > Performance scales poorly for Muenchian grouping in XSLTC. For an > instruction like the following, the user would like only the first node in > the node set returned by each reference to the key function to be processed > in the predicate, which tests for node identity. > <xsl:for-each select="item[generate-id() = generate-id(key('k', > value)[1])]"> > However, in XSLTC all nodes associated with each key value are copied into a > temporary IntegerArray by the KeyIndex object used to represent the result of > the key function, and then that result is always put through a > DupFilterIterator, which sorts the nodes returned by the key function, even > though the nodes produced by KeyIndex are always in document order; the > DupFilterIterator seems to be used just as a mechanism for copying nodes from > the KeyIndex object that was used to evaluate the result of the key function. > So every node in the result of the key function is always copied at least > twice. > Using a heap mechanism like that used in UnionPathIterator to sort the nodes > would be more efficient, as it would only process each node once, and only as > each node was needed. Each node in the heap would be the set of nodes (which > are already in document order because that's how KeyIndex is built to begin > with) associated with a single key value for that reference to the key > function, so pulling a single node would require at most O(log2(num_kv)) > comparisons, where num_kv is the number of key values. > On top of that, a reference to key that has a positional predicate does not > take advantage of the NthIterator class to directly retrieve the node. This > means that in an expression like > key('k',keyvalues)[1] > the position of every node produced by the key is tested for equality to the > value one. Using NthIterator would mean as few nodes as possible would be > retrieved - in this case, just one. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]