The minimum-refactoring solution to this would be to tweak
pg_do_encoding_conversion() so that if the src_encoding is SQL_ASCII but
the dest_encoding isn't, it does pg_verify_mbstr() rather than nothing.
I'm not sure if this would break anything we need to have work,
though. Thoughts? Do we want to back-patch such a change?
I looked through all the callers of pg_do_encoding_conversion(), and
AFAICS this change is a good idea. There are a whole bunch of places
that use pg_do_encoding_conversion() to convert from the database encoding
to encoding X (most usually UTF8), and right now if you do that in a
SQL_ASCII database you have no assurance whatever that what is produced
is actually valid in encoding X. I think we need to close that loophole.
The more I looked into mbutils.c, the less happy I got. The attached
proposed patch takes care of the missing-verification hole in
pg_do_encoding_conversion() and pg_server_to_any(), and also gets rid
of what I believe to be obsolete provisions in pg_do_encoding_conversion
to work if called outside a transaction --- if you consider it working
to completely fail to honor its API contract. That should no longer be
necessary now that we perform client-server encoding conversions via
perform_default_encoding_conversion rather than here.
For testing I inserted an Assert(IsTransactionState()); at the top of
pg_do_encoding_conversion(), and couldn't trigger it, so I'm fairly sure
this change is OK. Still, back-patching it might not be a good thing.
On the other hand, if there is still any code path that can get here
outside a transaction, we probably ought to find out rather than allow
completely bogus data to be returned.
I also made some more-cosmetic improvements, notably removing a bunch
of Asserts that are certainly dead code because the relevant variables
are never NULL.
I've not done anything yet about simplifying unnecessary calls of
pg_do_encoding_conversion into pg_server_to_any/pg_any_to_server.
How much of this is back-patch material, do you think?
regards, tom lane
diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 08440e9..7f43cae 100644
! * This file contains public functions for conversion between
! * client encoding and server (database) encoding.
! * Tatsuo Ishii
! * src/backend/utils/mb/mbutils.c
! * mbutils.c
! * This file contains functions for encoding conversion.
! * The string-conversion functions in this file share some API quirks.
! * Note the following:
! * The functions return a palloc'd, null-terminated string if conversion
! * is required. However, if no conversion is performed, the given source
! * string pointer is returned as-is.
! * Although the presence of a length argument means that callers can pass
! * non-null-terminated strings, care is required because the same string
! * will be passed back if no conversion occurs. Such callers *must* check
! * whether result == src and handle that case differently.
! * If the source and destination encodings are the same, the source string
! * is returned without any verification; it's assumed to be valid data.
! * If that might not be the case, the caller is responsible for validating
! * the string using a separate call to pg_verify_mbstr(). Whenever the
! * source and destination encodings are different, the functions ensure that
! * the result is validly encoded according to the destination encoding.
! * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
! * Portions Copyright (c) 1994, Regents of the University of California
! * IDENTIFICATION
! * src/backend/utils/mb/mbutils.c
const char *
! * Apply encoding conversion on src and return it. The encoding
! * conversion function is chosen from the pg_conversion system catalog
! * marked as default. If it is not found in the schema search path,
! * it's taken from pg_catalog schema. If it even is not in the schema,
! * warn and return src.
! * If conversion occurs, a palloc'd null-terminated string is returned.
! * In the case of no conversion, src is returned.
! * CAUTION: although the presence of a length argument means that callers