On 02/21/16 23:10, Tom Lane wrote:
> Another variable is that your answers might depend on what format you
> assume the client is trying to convert from/to. (It's presumably not
> text JSON, but then what is it?)
This connects tangentially to a question I've been meaning to ask
for a while, since I was looking at the representation of XML.
As far as I can tell, XML is simply stored in its character serialized
representation (very likely compressed, if large enough to TOAST), and
the text in/out methods simply deal in that representation. The 'binary'
send/recv methods seem to differ only in possibly using a different
character encoding on the wire.
Now, also as I understand it, there's no requirement that a type even
/have/ binary send/recv methods. Text in/out it always needs, but send/recv
only if they are interesting enough to buy you something. I'm not sure
the XML send/recv really do buy anything. It is not as if they present the
XML in any more structured or tokenized form. If they buy anything at all,
it may be only an extra transcoding that the other end will probably
immediately do in reverse.
So, if that's the situation, is there some other, really simple, choice
for what XML send/recv might usefully do, that would buy more than what
they do now?
Well, PGLZ is in libpqcommon now, right? What if xml send wrote a flag
to indicate compressed or not, and then if the value is compressed TOAST,
streamed it right out as is, with no expansion on the server? I could see
that being a worthwhile win, /without even having to devise some
XML-specific encoding/. (XML has a big expansion ratio.)
And, since that idea is not inherently XML-specific ... does the JSONB
representation have the same properties? How about even text or bytea?
The XML question has a related, JDBC-specific part. JDBC presents XML
via interfaces that can deal in Source and Result objects, and these
come in different flavors (DOMSource, an all-in-memory tree, SAXSource
and StAXSource, both streaming tokenized forms, or StreamSource, a
streaming, character-serialized form). Client code can ask for one of
those forms explicitly, or use null to say it doesn't care. In the
doesn't-care case, the driver is expected to choose the form closest
to what it's got under the hood; the client can convert if necessary,
and if it had any other preference, it would have said so. For PGJDBC,
that choice would naturally be the character StreamSource, because that
/is/ the form it's got under the hood, but for reasons mysterious to me,
pgjdbc actually chooses DOMSource in the don't-care case, and then
expends the full effort of turning the serialized stream it does have
into a full in-memory DOM that the client hasn't asked for and might
not even want. I know this is more a PGJDBC question, but I mention it
here just because it's so much like the what-should-send/recv-do question,
repeated at another level.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: