On 2009-02-02, Bart Degryse <bart.degr...@indicator.be> wrote: > > --=__PartF6DE34E1.0__= > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: quoted-printable > > Thanks for the ideas! > The function Jasen suggests works partially. > There are also entities like > ‘ > ’ > “ > ” > …
these work fine here. jasen=# select htmlent('‘ ’ “ ” …'); htmlent ----------- ‘ ’ “ ” … (1 row) > The output of the htmlent function for these looks like | I see a pipe symbol '|' is that what you wrote? > Was that what you meant with >>"characters outside of the LATIN-1 space >> are not handled but apparently this is what you want." ? I was under a mistaken impression of the encoding of HTML numeric entities. > Because in that case, they should be handled too. > How should that be done. > Thanks, it works here because server_encoding is UTF8; to check do this. show server_encoding;show client_encoding; it looks like you's need to convert the numbers to utt-8 in a bytea type and then use convert to translate then to your preferred encoding (appears to be win125x) postgresql8.3 appears to provide no way to generate UTF-8 in a bytea type so if you can't use a utf-8 encoding for your database you'll have to write your own. >>>> Jasen Betts <ja...@xnet.co.nz> 2009-01-31 12:47 >>> > On 2009-01-30, Bart Degryse <bart.degr...@indicator.be> wrote: >> >> --=3D__Part8EA648F8.0__=3D >> Content-Type: text/plain; charset=3DUTF-8 >> Content-Transfer-Encoding: quoted-printable >> >> Hi, >> I have a text field with data like this: 'de patiënt niet' >> (without the quotes). >> I would like to convert this string to look like this: 'de pati=C3=ABnt >> niet' >> Basically what I need to do (I think) is >> - get rid of the &, # and ; >> - convert the number to hex >> - make a UTF8 from that (thus: \xEB) > that is not UTF8. > > the UTF8 representation for '=C3=AB' in SQL is e'\xC3\xAB' or chr(235) > > your input appears to be encoded in LATIN-1 (or possibly 8859-13) > > > I think you'll need to write a function. > > here, where I have database encoding UTF8 this appears to work as you > desire. > > CREATE or replace FUNCTION htmlent(inp text) returns text as > $f$ > DECLARE > str text; > BEGIN > str=3D regexp_replace(quote_literal( inp) > ,$$&#(\d+);$$ > ,$$'||chr(\1)||'$$=20 > ,'g'); > execute 'select '||str into str; > return str; > END > $f$ > LANGUAGE PLPGSQL; > > select htmlent('de patiënt niet'); > > probably the above should be expanded to handle named entities=20 > like '&' too. > > characters outside of the LATIN-1 space are not handled=20 > but aparently this is what you want. > > > --=20 > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql > > --=__PartF6DE34E1.0__= > Content-Type: text/html; charset=UTF-8 > Content-Transfer-Encoding: quoted-printable > ><HTML><HEAD> ><META http-equiv=3DContent-Type content=3D"text/html; charset=3Dutf-8"> ><META content=3D"MSHTML 6.00.2800.1264" name=3DGENERATOR></HEAD> ><BODY style=3D"MARGIN: 4px 4px 1px; FONT: 10pt Tahoma"> ><DIV>Thanks for the ideas!</DIV> ><DIV>The function Jasen suggests works partially.</DIV> ><DIV>There are also entities like</DIV> ><DIV>&#8216;</DIV> ><DIV> ><DIV>&#8217;</DIV> ><DIV>&#8220;</DIV> ><DIV>&#8221;</DIV> ><DIV>&#8230;</DIV> ><DIV>The output of the htmlent function for these looks like | </DIV> ><DIV>Was that what you meant with "characters outside of the LATIN-1 space = > are not handled</DIV> ><DIV>but apparently this is what you want." ?</DIV> ><DIV>Because in that case, they should be handled too.</DIV> ><DIV>How should that be done.</DIV> ><DIV>Thanks,</DIV> ><DIV>Bart </DIV><BR><BR>>>> Jasen Betts <ja...@xnet.co.nz&= > gt; 2009-01-31 12:47 >>><BR>On 2009-01-30, Bart Degryse <Bart.D= > egr...@indicator.be> wrote:<BR>><BR>> --=3D__Part8EA648F8.0__=3D<B= > R>> Content-Type: text/plain; charset=3DUTF-8<BR>> Content-Transfer-E= > ncoding: quoted-printable<BR>><BR>> Hi,<BR>> I have a text field = > with data like this: 'de pati&#235;nt niet'<BR>> (without the = > quotes).<BR>> I would like to convert this string to look like this: = > 'de pati=C3=ABnt<BR>> niet'<BR>> Basically what I need to do (I = > think) is<BR>> - get rid of the &, # and ;<BR>> - convert the = > number to hex<BR>> - make a UTF8 from that (thus: \xEB)<BR>that is not = > UTF8.<BR><BR>the UTF8 representation for '=C3=AB' in SQL is = > e'\xC3\xAB' or chr(235)<BR><BR>your input appears to be encoded in LATIN-1 = > (or possibly 8859-13)<BR><BR><BR>I think you'll need to write a function.<B= > R><BR>here, where I have database encoding UTF8 this appears to work as = > you<BR>desire.<BR><BR>CREATE or replace FUNCTION htmlent(inp text) returns = > text as<BR>$f$<BR>DECLARE<BR> str text;<BR>BEGIN<BR>str=3D regexp_rep= > lace(quote_literal( inp)<BR>  = > ; = > ,$$&#(\d+);$$<BR> ,$$'||chr(\1)||'$$ <BR> = > ,'g');<BR>execute 'select '||str into str;<BR>return str;<BR>END<BR>= > $f$<BR>LANGUAGE PLPGSQL;<BR><BR>select htmlent('de pati&#235;nt = > niet');<BR><BR>probably the above should be expanded to handle named = > entities <BR>like '&amp;' too.<BR><BR>characters outside of the = > LATIN-1 space are not handled <BR>but aparently this is what you want.<BR><= > BR><BR>-- <BR>Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)<BR= >>To make changes to your subscription:<BR><A href=3D"http://www.postgresql.= > org/mailpref/pgsql-sql">http://www.postgresql.org/mailpref/pgsql-sql</A><BR= >></DIV></BODY></HTML> > > --=__PartF6DE34E1.0__=-- > -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql