Hi,
I recently struck a rather interesting bug. I'm not sure whether this
qualifies as a bug in Xalan or simply a bug in my stylesheet.
In either case, it's a rather interesting and (for me unexpected)
interaction between the "muenchian method" of handling "node grouping"
and the <xsl:sort> operation.
Attached are several files:
test.xml is test input
test1.xsl is a stylesheet that works fine, but does not perform sorting
in quite the desired manner.
test2.xsl is a stylesheet that makes the obvious modification to
test1.xsl - with results that quite surprised me.
test3.xsl is a "fixed" stylesheet, but the fix seems most unusual,
particularly in a supposedly "functional" programming language.
info.txt describes what I think is going on here. I've made it a
separate file so that people can try to figure it out for themselves if
they want a challenge :-)
I would appreciate feedback on whether I should:
(a) file a xalan bug
(b) send a note to the xsl standards committee
(c) send a note to the webmaster of what appears to be the
prime source on the 'Muenchian method'
(http://www.jenitennison.com/xslt/grouping/muenchian.html)
(d) treat this as a plain bug that is obvious to everyone but me,
and go drown my shame in a few pints of the local ale...
NB: yes, it is also possible to implement the functionality of test3.xsl
using the exslt set:distinct function, which avoids the issue
completely.
Cheers,
Simon (aka [EMAIL PROTECTED])
<root>
<location id='3B' location-code='3'/>
<location id='2B' location-code='2'/>
<location id='1B' location-code='1'/>
<location id='3A' location-code='3'/>
<location id='2A' location-code='2'/>
<location id='1A' location-code='1'/>
</root>
<?xml version="1.0" encoding="ISO-8859-1"?>
<!--
- test1.xsl
-
- This stylesheet nicely groups the location nodes by their location-code
- but it doesn't sort nodes by id within locations (see test2.xsl).
-->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:key name="locationsByCode" match="/root/location" use="@location-code"/>
<xsl:template match="root">
<xsl:for-each select="location">
<xsl:sort select="@location-code"/>
<xsl:variable name="thisNode" select="generate-id(.)"/>
<xsl:variable name="nodesAtSameLocation" select="key('locationsByCode', @location-code)"/>
<xsl:variable name="firstNodeAtSameLocation" select="generate-id($nodesAtSameLocation[1])"/>
<xsl:if test="$thisNode = $firstNodeAtSameLocation">
<xsl:text>== Location </xsl:text>
<xsl:value-of select="@location-code"/>
<xsl:text>
</xsl:text>
<xsl:for-each select="$nodesAtSameLocation">
<xsl:text> id: </xsl:text>
<xsl:value-of select="@id"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
<?xml version="1.0" encoding="ISO-8859-1"?>
<!--
- test2.xsl
-
- This stylesheet demonstrates a surprising interaction between
- <xsl:sort> and the keys() function, particularly when using the
- "muenchian method" to group nodes.
-
- This stylesheet is an attempt to modify test1.xsl to add sorting
- of nodes by id within location groups - and it goes horribly wrong.
- See if you can spot the problem!
-->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:key name="locationsByCode" match="/root/location" use="@location-code"/>
<xsl:template match="root">
<xsl:for-each select="location">
<xsl:sort select="@location-code"/>
<xsl:variable name="thisNode" select="generate-id(.)"/>
<xsl:variable name="nodesAtSameLocation" select="key('locationsByCode', @location-code)"/>
<xsl:variable name="firstNodeAtSameLocation" select="generate-id($nodesAtSameLocation[1])"/>
<xsl:if test="$thisNode = $firstNodeAtSameLocation">
<xsl:text>== Location </xsl:text>
<xsl:value-of select="@location-code"/>
<xsl:text>
</xsl:text>
<xsl:for-each select="$nodesAtSameLocation">
<xsl:sort select="@id"/>
<xsl:text> id: </xsl:text>
<xsl:value-of select="@id"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
<?xml version="1.0" encoding="ISO-8859-1"?>
<!--
- test2.xsl
-
- This stylesheet demonstrates a surprising interaction between
- <xsl:sort> and the keys() function, particularly when using the
- "muenchian method" to group nodes.
-
- This stylesheet fixes test2.xsl - with a rather odd-looking bit
- of code...
-->
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:key name="locationsByCode" match="/root/location" use="@location-code"/>
<xsl:template match="root">
<xsl:for-each select="location">
<xsl:sort select="@location-code"/>
<xsl:variable name="thisNode" select="generate-id(.)"/>
<xsl:variable name="nodesAtSameLocation" select="key('locationsByCode', @location-code)"/>
<!-- here's the odd code -->
<xsl:for-each select="$nodesAtSameLocation">
<xsl:sort select="@id"/>
</xsl:for-each>
<xsl:variable name="firstNodeAtSameLocation" select="generate-id($nodesAtSameLocation[1])"/>
<xsl:if test="$thisNode = $firstNodeAtSameLocation">
<xsl:text>== Location </xsl:text>
<xsl:value-of select="@location-code"/>
<xsl:text>
</xsl:text>
<xsl:for-each select="$nodesAtSameLocation">
<xsl:sort select="@id"/>
<xsl:text> id: </xsl:text>
<xsl:value-of select="@id"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The cause of this problem is, I believe, as follows:
* the xsl:key function builds a map of (key->nodelist), with the
elements in the nodelist being inserted in document order.
* the key(...) function returns a *reference* to this list.
* the first time through the loop for a particular location code,
we compare the current node to the first element in the list
and it matches, so we enter the for-each list.
* the for-each loop then causes the list to be sorted.
THIS CAUSES THE LIST IN THE xsl:key MAP TO BE SORTED, thereby
changing what node is [1] in this list.
* later we enter the "for-each select=location" loop with some
other node that now matches element [1] in the list returned
by the key() function - and so output the group again.
So, is this situation:
(a) perfectly in compliance with the standard, and something eople using
the "muenchian method" in combination with <xsl:sort> should just
be aware of?
(b) something that should be fixed in key(), xsl:for-each or xsl:sort,
so that the sort operation doesn't modify the actual list held in
the key datastructure and returned by the key function?