On 2009-02-02, Bart Degryse wrote:
>
> --=__PartF6DE34E1.0__=
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: quoted-printable
>
> Thanks for the ideas!
> The function Jasen suggests works partially.
> There are also entities like
> ‘
> ’
> “
> ”
> …
these work fine here.
jasen=# select htmlent('‘ ’ “ ” …');
htmlent
---
‘ ’ “ ” …
(1 row)
> The output of the htmlent function for these looks like |
I see a pipe symbol '|' is that what you wrote?
> Was that what you meant with
>>"characters outside of the LATIN-1 space
>> are not handled but apparently this is what you want." ?
I was under a mistaken impression of the encoding of HTML numeric
entities.
> Because in that case, they should be handled too.
> How should that be done.
> Thanks,
it works here because server_encoding is UTF8;
to check do this.
show server_encoding;show client_encoding;
it looks like you's need to convert the numbers to utt-8 in a bytea
type and then use convert to translate then to your preferred encoding
(appears to be win125x)
postgresql8.3 appears to provide no way to generate UTF-8 in a bytea
type so if you can't use a utf-8 encoding for your database you'll
have to write your own.
Jasen Betts 2009-01-31 12:47 >>>
> On 2009-01-30, Bart Degryse wrote:
>>
>> --=3D__Part8EA648F8.0__=3D
>> Content-Type: text/plain; charset=3DUTF-8
>> Content-Transfer-Encoding: quoted-printable
>>
>> Hi,
>> I have a text field with data like this: 'de patiënt niet'
>> (without the quotes).
>> I would like to convert this string to look like this: 'de pati=C3=ABnt
>> niet'
>> Basically what I need to do (I think) is
>> - get rid of the &, # and ;
>> - convert the number to hex
>> - make a UTF8 from that (thus: \xEB)
> that is not UTF8.
>
> the UTF8 representation for '=C3=AB' in SQL is e'\xC3\xAB' or chr(235)
>
> your input appears to be encoded in LATIN-1 (or possibly 8859-13)
>
>
> I think you'll need to write a function.
>
> here, where I have database encoding UTF8 this appears to work as you
> desire.
>
> CREATE or replace FUNCTION htmlent(inp text) returns text as
> $f$
> DECLARE
> str text;
> BEGIN
> str=3D regexp_replace(quote_literal( inp)
> ,$$(\d+);$$
> ,$$'||chr(\1)||'$$=20
> ,'g');
> execute 'select '||str into str;
> return str;
> END
> $f$
> LANGUAGE PLPGSQL;
>
> select htmlent('de patiënt niet');
>
> probably the above should be expanded to handle named entities=20
> like '&' too.
>
> characters outside of the LATIN-1 space are not handled=20
> but aparently this is what you want.
>
>
> --=20
> Sent via pgsql-sql mailing list ([email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>
> --=__PartF6DE34E1.0__=
> Content-Type: text/html; charset=UTF-8
> Content-Transfer-Encoding: quoted-printable
>
>
>
>
>
>Thanks for the ideas!
>The function Jasen suggests works partially.
>There are also entities like
>‘
>
>’
>“
>”
>…
>The output of the htmlent function for these looks like |
>Was that what you meant with "characters outside of the LATIN-1 space =
> are not handled
>but apparently this is what you want." ?
>Because in that case, they should be handled too.
>How should that be done.
>Thanks,
>Bart >>> Jasen Betts gt; 2009-01-31 12:47 >>>On 2009-01-30, Bart Degryse [email protected]> wrote:>> --=3D__Part8EA648F8.0__=3D R>> Content-Type: text/plain; charset=3DUTF-8> Content-Transfer-E=
> ncoding: quoted-printable>> Hi,> I have a text field =
> with data like this: 'de patiënt niet'> (without the =
> quotes).> I would like to convert this string to look like this: =
> 'de pati=C3=ABnt> niet'> Basically what I need to do (I =
> think) is> - get rid of the &, # and ;> - convert the =
> number to hex> - make a UTF8 from that (thus: \xEB)that is not =
> UTF8.the UTF8 representation for '=C3=AB' in SQL is =
> e'\xC3\xAB' or chr(235)your input appears to be encoded in LATIN-1 =
> (or possibly 8859-13)I think you'll need to write a function. R>here, where I have database encoding UTF8 this appears to work as =
> youdesire.CREATE or replace FUNCTION htmlent(inp text) returns =
> text as$f$DECLARE str text;BEGINstr=3D regexp_rep=
> lace(quote_literal( inp)=
> ; =
> ,$$(\d+);$$ ,$$'||chr(\1)||'$$ =
> ,'g');execute 'select '||str into str;return str;END=
> $f$LANGUAGE PLPGSQL;select htmlent('de patiënt =
> niet');probably the above should be expanded to handle named =
> entities like '&' too.characters outside of the =
> LATIN-1 space are not handled but aparently this is what you want.<=
> BR>-- Sent via pgsql-sql mailing list ([email protected])>To make changes to your subscription:http://www.postgresql.=
> org/mailpref/pgsql-sql">http://www.postgresql.org/mailpref/pgsql-sql>
>
> --=__PartF6DE34E1.0__=--
>
--
Sent via pgsql-sql mailing list ([email protected]