Hi,

I recently struck a rather interesting bug. I'm not sure whether this
qualifies as a bug in Xalan or simply a bug in my stylesheet.

In either case, it's a rather interesting and (for me unexpected)
interaction between the "muenchian method" of handling "node grouping"
and the <xsl:sort> operation.

Attached are several files:

test.xml is test input

test1.xsl is a stylesheet that works fine, but does not perform sorting
in quite the desired manner.

test2.xsl is a stylesheet that makes the obvious modification to
test1.xsl - with results that quite surprised me.

test3.xsl is a "fixed" stylesheet, but the fix seems most unusual,
particularly in a supposedly "functional" programming language.

info.txt describes what I think is going on here. I've made it a
separate file so that people can try to figure it out for themselves if
they want a challenge :-)

I would appreciate feedback on whether I should:
(a) file a xalan bug
(b) send a note to the xsl standards committee
(c) send a note to the webmaster of what appears to be the 
   prime source on the 'Muenchian method'
   (http://www.jenitennison.com/xslt/grouping/muenchian.html)
(d) treat this as a plain bug that is obvious to everyone but me,
    and go drown my shame in a few pints of the local ale...


NB: yes, it is also possible to implement the functionality of test3.xsl
using the exslt set:distinct function, which avoids the issue
completely.

Cheers,

Simon (aka [EMAIL PROTECTED])

<root>
  <location id='3B' location-code='3'/>
  <location id='2B' location-code='2'/>
  <location id='1B' location-code='1'/>
  <location id='3A' location-code='3'/>
  <location id='2A' location-code='2'/>
  <location id='1A' location-code='1'/>
</root>
<?xml version="1.0" encoding="ISO-8859-1"?>

<!--
  - test1.xsl
  -
  - This stylesheet nicely groups the location nodes by their location-code
  - but it doesn't sort nodes by id within locations (see test2.xsl).
  -->

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">

  <xsl:output method="text"/>

  <xsl:key name="locationsByCode" match="/root/location" use="@location-code"/>

  <xsl:template match="root">
    <xsl:for-each select="location">
      <xsl:sort select="@location-code"/>

      <xsl:variable name="thisNode" select="generate-id(.)"/>
      <xsl:variable name="nodesAtSameLocation" select="key('locationsByCode', @location-code)"/>
      <xsl:variable name="firstNodeAtSameLocation" select="generate-id($nodesAtSameLocation[1])"/>

      <xsl:if test="$thisNode = $firstNodeAtSameLocation">
        <xsl:text>== Location </xsl:text>
        <xsl:value-of select="@location-code"/>
        <xsl:text>&#x0a;</xsl:text>

        <xsl:for-each select="$nodesAtSameLocation">
          <xsl:text>  id: </xsl:text>
          <xsl:value-of select="@id"/>
          <xsl:text>&#x0a;</xsl:text>
        </xsl:for-each>
      </xsl:if>
    </xsl:for-each>
   </xsl:template>

</xsl:stylesheet>

<?xml version="1.0" encoding="ISO-8859-1"?>

<!--
  - test2.xsl
  -
  - This stylesheet demonstrates a surprising interaction between
  - <xsl:sort> and the keys() function, particularly when using the
  - "muenchian method" to group nodes.
  -
  - This stylesheet is an attempt to modify test1.xsl to add sorting
  - of nodes by id within location groups - and it goes horribly wrong.
  - See if you can spot the problem!
  -->

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">

  <xsl:output method="text"/>

  <xsl:key name="locationsByCode" match="/root/location" use="@location-code"/>

  <xsl:template match="root">
    <xsl:for-each select="location">
      <xsl:sort select="@location-code"/>

      <xsl:variable name="thisNode" select="generate-id(.)"/>
      <xsl:variable name="nodesAtSameLocation" select="key('locationsByCode', @location-code)"/>
      <xsl:variable name="firstNodeAtSameLocation" select="generate-id($nodesAtSameLocation[1])"/>

      <xsl:if test="$thisNode = $firstNodeAtSameLocation">
        <xsl:text>== Location </xsl:text>
        <xsl:value-of select="@location-code"/>
        <xsl:text>&#x0a;</xsl:text>

        <xsl:for-each select="$nodesAtSameLocation">
          <xsl:sort select="@id"/>
          <xsl:text>  id: </xsl:text>
          <xsl:value-of select="@id"/>
          <xsl:text>&#x0a;</xsl:text>
        </xsl:for-each>
      </xsl:if>
    </xsl:for-each>
   </xsl:template>

</xsl:stylesheet>

<?xml version="1.0" encoding="ISO-8859-1"?>

<!--
  - test2.xsl
  -
  - This stylesheet demonstrates a surprising interaction between
  - <xsl:sort> and the keys() function, particularly when using the
  - "muenchian method" to group nodes.
  -
  - This stylesheet fixes test2.xsl - with a rather odd-looking bit
  - of code...
  -->

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">

  <xsl:output method="text"/>

  <xsl:key name="locationsByCode" match="/root/location" use="@location-code"/>

  <xsl:template match="root">
    <xsl:for-each select="location">
      <xsl:sort select="@location-code"/>

      <xsl:variable name="thisNode" select="generate-id(.)"/>
      <xsl:variable name="nodesAtSameLocation" select="key('locationsByCode', @location-code)"/>

      <!-- here's the odd code -->
      <xsl:for-each select="$nodesAtSameLocation">
        <xsl:sort select="@id"/>
      </xsl:for-each>

      <xsl:variable name="firstNodeAtSameLocation" select="generate-id($nodesAtSameLocation[1])"/>

      <xsl:if test="$thisNode = $firstNodeAtSameLocation">
        <xsl:text>== Location </xsl:text>
        <xsl:value-of select="@location-code"/>
        <xsl:text>&#x0a;</xsl:text>

        <xsl:for-each select="$nodesAtSameLocation">
          <xsl:sort select="@id"/>
          <xsl:text>  id: </xsl:text>
          <xsl:value-of select="@id"/>
          <xsl:text>&#x0a;</xsl:text>
        </xsl:for-each>
      </xsl:if>
    </xsl:for-each>
   </xsl:template>

</xsl:stylesheet>

The cause of this problem is, I believe, as follows:

* the xsl:key function builds a map of (key->nodelist), with the
  elements in the nodelist being inserted in document order.
* the key(...) function returns a *reference* to this list.
* the first time through the loop for a particular location code, 
  we compare the current node to the first element in the list
  and it matches, so we enter the for-each list.
* the for-each loop then causes the list to be sorted.
  THIS CAUSES THE LIST IN THE xsl:key MAP TO BE SORTED, thereby
  changing what node is [1] in this list.
* later we enter the "for-each select=location" loop with some
  other node that now matches element [1] in the list returned
  by the key() function - and so output the group again.

So, is this situation:
(a) perfectly in compliance with the standard, and something eople using
    the "muenchian method" in combination with <xsl:sort> should just
    be aware of?
(b) something that should be fixed in key(), xsl:for-each or xsl:sort,
    so that the sort operation doesn't modify the actual list held in
    the key datastructure and returned by the key function?

Reply via email to