We need to decide on how to handle encoding information embedded in xml 
data that is passed through the client/server encoding conversion.

Here is an example:

Client encoding is A, server encoding is B.  Client sends an xml datum 
that looks like this:

INSERT INTO table VALUES (xmlparse(document '<?xml version="1.0" 
encoding="C"?><content>...</content>'));

Assuming that A, B, and C are all distinct, this could fail at a number 
of places.

I suggest that we make the system ignore all encoding declarations in 
xml data.  That is, in the above example, the string would actually 
have to be encoded in client encoding B on the client, would be 
converted to A on the server and stored as such.  As far as I can tell, 
this is easily implemented and allowed by the XML standard.

The same would be done on the way back.  The datum would arrive in 
encoding B on the client.  It might be implementation-dependent whether 
the datum actually contains an XML declaration specifying an encoding 
and whether that encoding might read A, B, or C -- I haven't figured 
that out yet -- but the client will always be required to consider it 
to be B.

What should be done above the binary send/receive functionality?  
Looking at the send/receive functions for the text type, they 
communicate all data in the server encoding, so it seems reasonable to 
do this here as well.

Comments?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

Reply via email to