2017-03-12 22:26 GMT+01:00 Pavel Stehule <pavel.steh...@gmail.com>: > > > 2017-03-12 21:57 GMT+01:00 Noah Misch <n...@leadboat.com>: > >> On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote: >> > 2017-03-12 0:56 GMT+01:00 Noah Misch <n...@leadboat.com>: >> > > On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote: >> > > > There are possible two fixes >> > > > >> > > > a) clean decl on input - the encoding info can be removed from decl >> part >> > > > >> > > > b) use xml_out_internal everywhere before transformation to >> > > > xmlChar. pg_xmlCharStrndup can be good candidate. >> > > >> > > I'd prefer (a) if the xml type were a new feature, because no good can >> > > come of >> > > storing an encoding in each xml field when we know the actual >> encoding is >> > > the >> > > database encoding. However, if you implemented (a), we'd still see >> > > untreated >> > > values brought over via pg_upgrade. Therefore, I would try (b) >> first. I >> > > suspect the intent of xml_parse() was to implement (b); it will be >> > > interesting >> > > to see your test case that malfunctions. >> > > >> > >> > I looked there again and I found so this issue is related to xpath >> function >> > only >> > >> > Functions based on xml_parse are working without problems. >> xpath_internal >> > uses own direct xmlCtxtReadMemory without correct encoding sanitation. >> > >> > so fix is pretty simple >> >> Please add a test case. >> > > It needs a application - currently there is not possibility to import XML > document via recv API :( > > I wrote a pgimportdoc utility, but it is not part of core > > >> >> > --- a/src/backend/utils/adt/xml.c >> > +++ b/src/backend/utils/adt/xml.c >> > @@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype >> *data, >> > ArrayType *namespaces, >> > ns_count = 0; >> > } >> > >> > - datastr = VARDATA(data); >> > - len = VARSIZE(data) - VARHDRSZ; >> > + datastr = xml_out_internal(data, 0); >> >> Why not use xml_parse() instead of calling xmlCtxtReadMemory() directly? >> The >> answer is probably in the archives, because someone understood the problem >> enough to document "Some XML-related functions may not work at all on >> non-ASCII data when the server encoding is not UTF-8. This is known to be >> an >> issue for xpath() in particular." > > > Probably there are two possible issues > > 1. what I touched - recv function does encoding to database encoding - but > document encoding is not updated. > > 2. there are not possibility to encode from document encoding to database > encoding. > > >> > + len = strlen(datastr); >> > + >> > xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ; >> > + >> >> The two lines of functional change don't create a cause for more >> newlines, so >> don't add these two newlines. >> > > ok > diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 1908b13db5..2786d5b1cb 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -3874,8 +3874,8 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces, ns_count = 0; }
- datastr = VARDATA(data); - len = VARSIZE(data) - VARHDRSZ; + datastr = xml_out_internal(data, 0); + len = strlen(datastr); xpath_len = VARSIZE_ANY_EXHDR(xpath_expr_text); if (xpath_len == 0) ereport(ERROR,
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 1908b13db5..2786d5b1cb 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -3874,8 +3874,8 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces, ns_count = 0; } - datastr = VARDATA(data); - len = VARSIZE(data) - VARHDRSZ; + datastr = xml_out_internal(data, 0); + len = strlen(datastr); xpath_len = VARSIZE_ANY_EXHDR(xpath_expr_text); if (xpath_len == 0) ereport(ERROR,
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers