I happened to have something floating around that was close to what you
asked for, so I modified it and include it here. It doesn't normalize
the space:

 

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";> 

     <xsl:output method="xml" version="1.0" encoding="UTF-8"
indent="no"/> 

             

            <!-- Kill these in the output tree -->

     <xsl:template match="a"> 

     </xsl:template>

      

     <xsl:template match="/">

        <htmltext>

            <xsl:apply-templates />

        </htmltext>

     </xsl:template>

 

            <!--

                        For all other node types, just copy the node and
it's content.

            -->

            <xsl:template match="*|processing-instruction()|comment()">

               <xsl:choose>

                  <xsl:when test="not(node())"><xsl:apply-templates
select="@*"/><xsl:text> </xsl:text></xsl:when>

                  <xsl:otherwise>

                     <xsl:apply-templates select="@*|node()"/><xsl:text>
</xsl:text>

                  </xsl:otherwise>

               </xsl:choose>

            </xsl:template>

            

            <!--

                        For all other attributes, copy the attribute.

            -->

            <xsl:template match="@*">

               <xsl:apply-templates /><xsl:text> </xsl:text>

            </xsl:template>

</xsl:stylesheet>

 

HTH,

 

 

________________________________

From: Peter Hollas [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 29, 2006 3:50 AM
To: xalan-j-users@xml.apache.org
Subject: XHTML link tag stripping

 

Hi everyone,

Please could someone provide an example stylesheet of how to strip <a>
link tags out of a source XHTML document whilst retaining the remaining
node text from within the body. Preferably the output should have
normalised whitespace and a space seperating each extracted piece of
text. eg. 

Source:

<html>
<head>
<title>Not wanted</title>
</head>
<body>
<a>Not wanted</a>
<div class="1">This text is wanted <a href="#">Not wanted</a> and so is
this</div> 
<p>Wanted</p>
</body>
</html>


Output:

<htmltext>This text is wanted and so is this Wanted</htmltext>

I'm sure that the solution is incredibly simple, but after days of
trying I keep hitting a brick wall. 

Many thanks, Peter.

Reply via email to