2017-03-12 0:56 GMT+01:00 Noah Misch <n...@leadboat.com>: > On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote: > > Today I played with xml_recv function and with xml processing functions. > > > > xml_recv function ensures correct encoding from document encoding to > server > > encoding. But the decl section holds original encoding info - that should > > be obsolete after encoding. Sometimes we solve this issue by removing > decl > > section - see the xml_out function. > > > > Sometimes we don't do it - lot of functions uses direct conversion from > > xmltype to xmlChar. > > > There are possible two fixes > > > > a) clean decl on input - the encoding info can be removed from decl part > > > > b) use xml_out_internal everywhere before transformation to > > xmlChar. pg_xmlCharStrndup can be good candidate. > > I'd prefer (a) if the xml type were a new feature, because no good can > come of > storing an encoding in each xml field when we know the actual encoding is > the > database encoding. However, if you implemented (a), we'd still see > untreated > values brought over via pg_upgrade. Therefore, I would try (b) first. I > suspect the intent of xml_parse() was to implement (b); it will be > interesting > to see your test case that malfunctions. >
I looked there again and I found so this issue is related to xpath function only Functions based on xml_parse are working without problems. xpath_internal uses own direct xmlCtxtReadMemory without correct encoding sanitation. so fix is pretty simple diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index f81cf489d2..89aae48cb3 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces, ns_count = 0; } - datastr = VARDATA(data); - len = VARSIZE(data) - VARHDRSZ; + datastr = xml_out_internal(data, 0); + len = strlen(datastr); + xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ; + if (xpath_len == 0) ereport(ERROR, (errcode(ERRCODE_DATA_EXCEPTION), Regards Pavel