Re: Problem with foreign characters,

2011-09-08 Thread Pascal Sancho
Hi theresa,

#195;#169; is an UTF-8 sequence (0xC3 0xA9) that encode EACUTE as UTF-8;
#239;#187;#191; is an UTF-8 sequence (0xEF 0xBB 0xBB) that encode The
BOM as UTF-8 (this is the UTF-8 signature);

You should have a look on how char encoding is handled in your app, it
that seems to be an issue there.

That said, to convert a string in XSLT I imagine to ways:
 either in pure XSLT, using a recursive template (see below),
 or using embedded script (see [1] for Xalan).

xsl:template match=text()
  xsl:call-template name=text/
/xsl:template

xsl:template name=text
  xsl:param name=str select=./
  xsl:param name=find select='#xa0;'/
  xsl:param name=replace select='#x20;'/
  xsl:choose
xsl:when test=contains($str,$find)
  xsl:value-of select=substring-before($str,$find)/
  xsl:value-of select=$replace/
  xsl:call-template name=text
xsl:with-param name=str
select=substring-after($str,$find)/
  /xsl:call-template
/xsl:when
xsl:otherwise
  xsl:value-of select=$str/
/xsl:otherwise
  /xsl:choose
/xsl:template

[1] http://xml.apache.org/xalan-j/extensions.html


Le 08/09/2011 13:48, Theresa Jayne Forster a écrit :
 I have a minor issue and would like some help if I can,
 
 Before I start there are a couple of pointers here.
 1)  I cannot change the java code nor the version of FOP (modified 0.23)
 2)  I have a partial resolution already in place
 3)  I am just looking for the way to get the information I need.
 
 I have code which scrapes a web page and rips out text turning it into
 the downloadable pdf.
 Some characters like é do not display correctly so I am doing a replace
 in a template,
 I need to find what the characters are coming in as so I can convert
 them in the replace,
 For instance the é character comes in as the character codes #195;#169;
 How can I find the character codes coming in for all the other
 characters (or convert them on the fly within xsl)
 
 My template currently is as follows:
 xsl:template name=loose_nasty_entities
   xsl:param name=thisstring select=./
   xsl:variable name=thisstring1
 xsl:call-template name=replace
   xsl:with-param name=str select=$thisstring/
   xsl:with-param name=search-for select='#226;#8364;#8220;'/
   xsl:with-param name=replace-with select='-'/
 /xsl:call-template
   /xsl:variable
   xsl:variable name=thisstring2
 xsl:call-template name=replace
   xsl:with-param name=str select=$thisstring1/
   xsl:with-param name=search-for select='#239;#187;#191;'/
   xsl:with-param name=replace-with select=''/
 /xsl:call-template
   /xsl:variable
   xsl:variable name=thisstring3
 xsl:call-template name=replace
   xsl:with-param name=str select=$thisstring2/
   xsl:with-param name=search-for select='#194;'/
   xsl:with-param name=replace-with select=''/
 /xsl:call-template
   /xsl:variable
   xsl:variable name=thisstring4
 xsl:call-template name=replace
   xsl:with-param name=str select=$thisstring3/
   xsl:with-param name=search-for select='#195;#169;'/
   xsl:with-param name=replace-with select='é'/
 /xsl:call-template
   /xsl:variable
   xsl:variable name=thisstring5
 xsl:call-template name=replace
   xsl:with-param name=str select=$thisstring4/
   xsl:with-param name=search-for select='#195;#8211;'/
   xsl:with-param name=replace-with select='#214;'/
 /xsl:call-template
   /xsl:variable
   xsl:value-of select=$thisstring5/
 /xsl:template
 
 Kindest regards
 Theresa Forster
-- 
Pascal

-
To unsubscribe, e-mail: fop-users-unsubscr...@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-h...@xmlgraphics.apache.org



RE: Problem with foreign characters,

2011-09-08 Thread Theresa Jayne Forster
Well what happens is my xslt is calling in a html webpage via tagsoup 
So I have no visibility of it until it gets to me in the xsl...


Kindest regards


Theresa Forster
Senior Software Developer
-Original Message-
From: Pascal Sancho [mailto:pascal.san...@takoma.fr] 
Sent: 08 September 2011 14:02
To: fop-users@xmlgraphics.apache.org
Subject: Re: Problem with foreign characters,

Hi theresa,

#195;#169; is an UTF-8 sequence (0xC3 0xA9) that encode EACUTE as UTF-8;
#239;#187;#191; is an UTF-8 sequence (0xEF 0xBB 0xBB) that encode The
BOM as UTF-8 (this is the UTF-8 signature);

You should have a look on how char encoding is handled in your app, it
that seems to be an issue there.

That said, to convert a string in XSLT I imagine to ways:
 either in pure XSLT, using a recursive template (see below),
 or using embedded script (see [1] for Xalan).

xsl:template match=text()
  xsl:call-template name=text/
/xsl:template

xsl:template name=text
  xsl:param name=str select=./
  xsl:param name=find select='#xa0;'/
  xsl:param name=replace select='#x20;'/
  xsl:choose
xsl:when test=contains($str,$find)
  xsl:value-of select=substring-before($str,$find)/
  xsl:value-of select=$replace/
  xsl:call-template name=text
xsl:with-param name=str
select=substring-after($str,$find)/
  /xsl:call-template
/xsl:when
xsl:otherwise
  xsl:value-of select=$str/
/xsl:otherwise
  /xsl:choose
/xsl:template

[1] http://xml.apache.org/xalan-j/extensions.html


Le 08/09/2011 13:48, Theresa Jayne Forster a écrit :
 I have a minor issue and would like some help if I can,
 
 Before I start there are a couple of pointers here.
 1)  I cannot change the java code nor the version of FOP (modified 0.23)
 2)  I have a partial resolution already in place
 3)  I am just looking for the way to get the information I need.
 
 I have code which scrapes a web page and rips out text turning it into
 the downloadable pdf.
 Some characters like é do not display correctly so I am doing a replace
 in a template,
 I need to find what the characters are coming in as so I can convert
 them in the replace,
 For instance the é character comes in as the character codes #195;#169;
 How can I find the character codes coming in for all the other
 characters (or convert them on the fly within xsl)
 
 My template currently is as follows:
 xsl:template name=loose_nasty_entities
   xsl:param name=thisstring select=./
   xsl:variable name=thisstring1
 xsl:call-template name=replace
   xsl:with-param name=str select=$thisstring/
   xsl:with-param name=search-for select='#226;#8364;#8220;'/
   xsl:with-param name=replace-with select='-'/
 /xsl:call-template
   /xsl:variable
   xsl:variable name=thisstring2
 xsl:call-template name=replace
   xsl:with-param name=str select=$thisstring1/
   xsl:with-param name=search-for select='#239;#187;#191;'/
   xsl:with-param name=replace-with select=''/
 /xsl:call-template
   /xsl:variable
   xsl:variable name=thisstring3
 xsl:call-template name=replace
   xsl:with-param name=str select=$thisstring2/
   xsl:with-param name=search-for select='#194;'/
   xsl:with-param name=replace-with select=''/
 /xsl:call-template
   /xsl:variable
   xsl:variable name=thisstring4
 xsl:call-template name=replace
   xsl:with-param name=str select=$thisstring3/
   xsl:with-param name=search-for select='#195;#169;'/
   xsl:with-param name=replace-with select='é'/
 /xsl:call-template
   /xsl:variable
   xsl:variable name=thisstring5
 xsl:call-template name=replace
   xsl:with-param name=str select=$thisstring4/
   xsl:with-param name=search-for select='#195;#8211;'/
   xsl:with-param name=replace-with select='#214;'/
 /xsl:call-template
   /xsl:variable
   xsl:value-of select=$thisstring5/
 /xsl:template
 
 Kindest regards
 Theresa Forster
-- 
Pascal

-
To unsubscribe, e-mail: fop-users-unsubscr...@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-h...@xmlgraphics.apache.org



-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1392 / Virus Database: 1520/3880 - Release Date: 09/06/11



-
To unsubscribe, e-mail: fop-users-unsubscr...@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-h...@xmlgraphics.apache.org