On Sun, Aug 20, 2017 at 10:54:57AM +0200, Pavel Stehule wrote: > 2017-08-20 9:21 GMT+02:00 Noah Misch <n...@leadboat.com>: > > On Mon, Aug 07, 2017 at 05:10:14PM -0700, Noah Misch wrote: > > > On Wed, Apr 05, 2017 at 10:53:39PM +0200, Pavel Stehule wrote: > > > > One possible fix - and similar technique is used more times in xml.c > > > > code > > > > .. xmlroot > > > > > > > + /* remove header */ > > > > + parse_xml_decl(string, &header_len, NULL, NULL, NULL); > > > > > > > - doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, > > > > 0); > > > > + doc = xmlCtxtReadMemory(ctxt, (char *) string + header_len, len > > > > - > > > > > I like this parse_xml_decl() approach better. Would you update > > > your patch to use it and to add the test case I give above? > > > > Again, would you make those two edits? > > Now, I am not sure so it is correct fix. We will fix case, when server is > in UTF8, but maybe we will break some cases when server is not in UTF8 > (although it is broken already).
That's right. > I am think so correct solution is encoding to UTF8 and passing encoding > parameter. It will works safely in all cases - and we don't need cut > header. We should not to cut header if server encoding is not in UTF8 and > we don't pass encoding as parameter. When we pass encoding as parameter, > then cutting header is not necessary. xpath-bugfix.patch affected only xml values containing an xml declaration with "encoding" attribute. In UTF8 databases, this latest proposal (xpath-parsing-error-fix.patch) is equivalent to xpath-bugfix.patch. In non-UTF8 databases, xpath-parsing-error-fix.patch affects all xml values containing non-ASCII data. In a LATIN1 database, the following works today but breaks under your latest proposal: SELECT xpath('text()', ('<x>' || convert_from('\xc2b0', 'LATIN1') || '</x>')::xml); It's acceptable to break that, since the documentation explicitly disclaims support for it. xpath-bugfix.patch breaks different use cases, which are likewise acceptable to break. See my 2017-08-08 review for details. We have xpath-bugfix.patch and xpath-parsing-error-fix.patch. Both are equivalent under supported use cases (xpath in UTF8 databases). Among non-supported use cases, they each make different things better and different things worse. We should prefer to back-patch the version harming fewer applications. I expect non-ASCII data is more common than xml declarations with "encoding" attribute, so xpath-bugfix.patch will harm fewer applications. Having said that, I now see a third option. Condition this thread's patch's effects on GetDatabaseEncoding()==PG_UTF8. That way, we fix supported cases, and we remain bug-compatible in unsupported cases. I think that's better than the other options discussed so far. If you agree, please send a patch based on xpath-bugfix.patch with the GetDatabaseEncoding()==PG_UTF8 change and the two edits I described earlier. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers