Hi Lanna,

we are running postgresql with UNICODE encoding on a regular
basis for our shop. This basically means it stores and retrieves
strings as utf-8 encoded. If you dont need special collating
rules, thats the way to go. However we are using Python/Zope
in front of the DB for presentation and maybe PHP behaves
differently. Another problem with Web-Clients is, that they
sometimes send forms with a default charset and not with what
your form HTML originally had. Meaning you send a page
with utf-8 and the post-resquest goes in as iso8859-1 or something
like that. This is irritating and should be investigated.
Try a simple recording proxy or packet sniffer with tcp-stream
assembling ability to log whats going over the wire.
One solution to the browsers bug is to mark the page with
a well known string which gets sent in the answer (say hidden
form field) and undergoes the same charset rules as the rest of
the form. If then you get your string back with the answer you
can check the encoding/charset.

HTH
Tino Wildenhain

Lynna Landstreet wrote:
Hello,

I'm running into a bit of trouble with a Unicode-enabled PostgreSQL database
(some of the data consists of artist and/or image names in other languages,
like French, Spanish, German and Portuguese, which frequently have accents,
and I don't want people entering data to have to use ASCII codes). Having (I
thought) managed to get past the issues of exporting text as Unicode in
order to import it into the database and uploading the text files as binary
instead of data to keep them Unicode/UTF-8 as I upload them, and then using
psql's \copy command to insert the data into the database, I can't get the
special characters to display properly on the web. :-(

I'm not even sure how to tell if the problem is on the input side or the
output side - as in, whether it's that the data in the database got muddled
on the way in and is not valid Unicode, or whether it's OK but every means I
try to use to view it doesn't want to accept Unicode. I'm pretty sure the
text files got to the server OK as Unicode, because I was able to view them
directly with a web browser and the special characters were OK then. But
when I imported them into the database, I was not then able to view the
special characters correctly, either in my browser through the PHP frontend
I'm developing for the database or phpPgAdmin, or via Telnet/SSH. So I don't
know if the problem came about somehow while using \copy to import them, or
with the means I'm using to view them.

I've set the charset encoding of my PHP pages to UTF-8, and the default
encoding in my browser as well, but that doesn't help. And I've tried
editing the data through phpPgAdmin to restore the special characters, but
got the following error message:

Error - /[path to my web directory]/phpPgAdmin/tbl_replace.php -- Line: 77

PostgreSQL said: ERROR: Invalid UNICODE character sequence found (0xe7e36f)
Your query: UPDATE "artists" SET "artist_id" = 485, "firstname" = 'Teresa', "lastname" =
'Ascenção'... [rest of query deleted]


Ironically, the accented characters in her last name (a c with a cedilla and
an a with a tilde, in case they don't show up here) displayed fine in the
error message! But it wouldn't enter them into the database.

Questions that come to mind:

1. Does anyone have any idea what's going wrong here?
2. Can \copy reduce UTF-8 text to plain ASCII while importing data from a
text file?
3. If so, can it be made not to, maybe through adding some kind of parameter
to the command? Or is there a better way to import the data?
4. Is if correct for the database encoding to be "UNICODE" or should it be
UTF-8 specifically? My impression thus far was that Unicode and UTF-8 were
more or less the same thing, but maybe more or less isn't good enough.
5. Does a web form have to be specially coded to accept text with accented
characters into a database, or does the encoding of the database itself
and/or the web page the form is on determine that?

Any assistance would be much appreciated...


Lynna



---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster

Reply via email to