I need to do some limited text-node parsing in libxslt. I find parsing text very difficult in libxslt, last time I was doing some svg scaling xsl, I gave up on the path's that were expressed as a string of coordinates because non-destructive text parsing is too hard for me.
However, it's come upon me again and I need to detect certain text strings in a larger block of text and replace them with new nodes, with the surrounding text left intact. In perl, to markup embedded urls in text I'd just do something like print $string=~s|\(http://[^ ]*\)|<a href="\1">\1</a>|gi; of course thats rather simplified; and I'd like to do it properly in libxslt, which for a start seems to suggest some sort of recursion; here's my first cut, that outputs text before the url, outputs an <a> tag and then recurses for text following the url. <!-- enhance text by making <a>'s out of urls and email addresses --> <xsl:template match="text()" name="fixup-text"> <xsl:param name="text" select="string(.)"/> <xsl:variable name="parts" select="regexp:match($text, '\(.*?\)\(http://[^ ]*/\)\(.*\)','i')"/> <!-- output text up to url somehow, except I don't know if .*? non-greediness is supported --> <xsl:value-of select="$parts[0]"/> <!-- output a tag --> <a> <xsl:attribute name="href"><xsl:value-of select="$parts[1]"/></xsl:attribute> <xsl:attribute name="target">_blank</xsl:attribute> <xsl:value-of select="$parts[1]"/> </a> <!-- recurse for the rest --> <xsl:call-template name="fixup-text"> <xsl:with-param name="text" select="$parts[2]"/> </xsl:call-template> </xsl:template> Except libxslt doesn't seem to have regexp support, or at least not widely distributed or even packaged for most platforms (including mine). str::tokenize etc are not good because they are too destructive and destroy the separating tokens. The simplest expression will be of is a recursive parser that takes 1 character at a time, building up a string until it has either collected a url, or a non-url which it then outputs appropriately, before carring on to the rest of the (probably very large text) one character at a time. Clearly that is nuts. I'll probably have to go for use of: contains, substring-before, substring-after, substring and string-length; and maybe str:tokenize just to get lengths of substrings up to multiple delimeters. Clearly that is nuts too. Have I missed anything obvious? Sam _______________________________________________ xslt mailing list, project page http://xmlsoft.org/XSLT/ [email protected] http://mail.gnome.org/mailman/listinfo/xslt
