Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2015-01-18 10:44 GMT+07:00 Peter Eisentraut pete...@gmx.net: On 1/7/15 8:51 PM, Ali Akbar wrote: So now +1 for back-patching this. Done. Was a bit of work to get it into all the old versions. Wow. Thanks. Btw, for bug-fix patches like this, should the patch creator (me) also create patches for back branches? Regards, -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
On Sun, Jan 18, 2015 at 06:05:05PM +0700, Ali Akbar wrote: 2015-01-18 10:44 GMT+07:00 Peter Eisentraut pete...@gmx.net: On 1/7/15 8:51 PM, Ali Akbar wrote: So now +1 for back-patching this. Done. Was a bit of work to get it into all the old versions. Wow. Thanks. Btw, for bug-fix patches like this, should the patch creator (me) also create patches for back branches? As I understand it, back-patches are the committer's responsibility. The submitter might make suggestions as to how this might be approached if it doesn't appear trivial. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
On Mon, Jan 19, 2015 at 2:38 AM, David Fetter da...@fetter.org wrote: On Sun, Jan 18, 2015 at 06:05:05PM +0700, Ali Akbar wrote: 2015-01-18 10:44 GMT+07:00 Peter Eisentraut pete...@gmx.net: Btw, for bug-fix patches like this, should the patch creator (me) also create patches for back branches? As I understand it, back-patches are the committer's responsibility. The submitter might make suggestions as to how this might be approached if it doesn't appear trivial. TBH, I would imagine that patches that can be applied to back-branches are a better start point than plain scratch particularly if there are diffs in stable branches compared to HEAD. Everybody's time is important. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
Michael Paquier michael.paqu...@gmail.com writes: On Mon, Jan 19, 2015 at 2:38 AM, David Fetter da...@fetter.org wrote: On Sun, Jan 18, 2015 at 06:05:05PM +0700, Ali Akbar wrote: 2015-01-18 10:44 GMT+07:00 Peter Eisentraut pete...@gmx.net: Btw, for bug-fix patches like this, should the patch creator (me) also create patches for back branches? As I understand it, back-patches are the committer's responsibility. The submitter might make suggestions as to how this might be approached if it doesn't appear trivial. TBH, I would imagine that patches that can be applied to back-branches are a better start point than plain scratch particularly if there are diffs in stable branches compared to HEAD. Everybody's time is important. Yeah --- and I'd argue that it's largely a waste of time to work on creating back-branch patches until the HEAD patch is in final form. Since we've generally reserved the right for the committer to whack patches around before committing, I think this means the committer also has to do the work of back-patch adjustment. Now a committer who doesn't feel like doing that could certainly say here's the version of the HEAD patch that I'm willing to commit, but it doesn't apply cleanly in back branches; could you work up back-branch equivalents?. But that hasn't been the usual approach. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
On 1/7/15 8:51 PM, Ali Akbar wrote: So now +1 for back-patching this. Done. Was a bit of work to get it into all the old versions. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
committed version 7 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
Peter Eisentraut pete...@gmx.net writes: committed version 7 Isn't that a back-patchable bug fix? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
I ran a test using postgres-US.fo built in the PostgreSQL source tree, which is 38 MB, and ran select unnest(xpath('//fo:bookmark-title', b, array[array['fo', 'http://www.w3.org/1999/XSL/Format']])) from data; (Table contains one row only.) The timings were basically indistinguishable between the three code versions. I'll try to reproduce your test. How big is your file? Do you have a link to the actual file? Could you share your load script? I use this xml sample: http://w3.energistics.org/schema/prodml_v1.2.0_data/XML_Examples/productVolume_no_xsl.xml Basically i loaded the xml to table u 100 times. Load script attached. Regards, -- Ali Akbar load_test.sql Description: application/sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2014-12-14 22:18 GMT+07:00 Ali Akbar the.ap...@gmail.com: I ran a test using postgres-US.fo built in the PostgreSQL source tree, which is 38 MB, and ran select unnest(xpath('//fo:bookmark-title', b, array[array['fo', 'http://www.w3.org/1999/XSL/Format']])) from data; (Table contains one row only.) The timings were basically indistinguishable between the three code versions. I'll try to reproduce your test. How big is your file? Do you have a link to the actual file? Could you share your load script? I use this xml sample: http://w3.energistics.org/schema/prodml_v1.2.0_data/XML_Examples/productVolume_no_xsl.xml Basically i loaded the xml to table u 100 times. Load script attached. Peter, while reviewing the better performing patch myself, now i think the patch needs more work to be committed. The structuring of the method will be confusing in the long term. I think i'll restructure the patch in the next commitfest. So i propose to break the patch: 1. We apply the current patch which uses xmlNodeCopy, so that the long-standing bug will be fixed in postgres. 2. I'll work with the performance enhancement in the next commitfest. Maybe for (2), the current better-performing patch can be viewed as PoC of the expected performance. Regards, -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote: Peter, while reviewing the better performing patch myself, now i think the patch needs more work to be committed. The structuring of the method will be confusing in the long term. I think I'll restructure the patch in the next commitfest. So i propose to break the patch: 1. We apply the current patch which uses xmlNodeCopy, so that the long-standing bug will be fixed in postgres. 2. I'll work with the performance enhancement in the next commitfest. Maybe for (2), the current better-performing patch can be viewed as PoC of the expected performance. Ali, are you currently working on that? Would you mind re-creating new entries in the commit fest app for the new set of patches that you are planning to do? For now I am switching this patch as returned with feedback. Thanks, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com: On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote: Peter, while reviewing the better performing patch myself, now i think the patch needs more work to be committed. The structuring of the method will be confusing in the long term. I think I'll restructure the patch in the next commitfest. So i propose to break the patch: 1. We apply the current patch which uses xmlNodeCopy, so that the long-standing bug will be fixed in postgres. 2. I'll work with the performance enhancement in the next commitfest. Maybe for (2), the current better-performing patch can be viewed as PoC of the expected performance. Ali, are you currently working on that? Would you mind re-creating new entries in the commit fest app for the new set of patches that you are planning to do? For now I am switching this patch as returned with feedback. Thanks, What i mean, the last patch (v7 patch) as it is is enough to fix the bug (nested xpath namespace problem). I think the performance regression is still acceptable (in my case it's ~20%), because the bug is severe. Currently, xpath can return invalid xml because the namespace isn't included in the output! What i'll be working is the v4 patch, because it turns out the v4 patch has better performance (~10%, but Peter's test shows it isn't the case). But, the problem is the v4 patch is organized wrongly, and hacks around the libxml's xml node structure (duplicating the namespace on the existing structure). I'll work on that, but it will affects the performance benefit. So what i propose is, we close the longstanding bug in this comitfest with the v7 patch. I'll work on improving the performance, without compromising good code structure. If the result is good, i'll submit the patch. Thanks Regards, -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2014-12-15 11:02 GMT+07:00 Ali Akbar the.ap...@gmail.com: 2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com: On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote: Peter, while reviewing the better performing patch myself, now i think the patch needs more work to be committed. The structuring of the method will be confusing in the long term. I think I'll restructure the patch in the next commitfest. So i propose to break the patch: 1. We apply the current patch which uses xmlNodeCopy, so that the long-standing bug will be fixed in postgres. 2. I'll work with the performance enhancement in the next commitfest. Maybe for (2), the current better-performing patch can be viewed as PoC of the expected performance. Ali, are you currently working on that? Would you mind re-creating new entries in the commit fest app for the new set of patches that you are planning to do? For now I am switching this patch as returned with feedback. Thanks, What i mean, the last patch (v7 patch) as it is is enough to fix the bug (nested xpath namespace problem). I think the performance regression is still acceptable (in my case it's ~20%), because the bug is severe. Currently, xpath can return invalid xml because the namespace isn't included in the output! Sorry, typo here. What i mean isn't invalid xml, but incomplete xml. Hence the next call to xpath with the previous result as its input will become a problem because the namespace will not match. What i'll be working is the v4 patch, because it turns out the v4 patch has better performance (~10%, but Peter's test shows it isn't the case). But, the problem is the v4 patch is organized wrongly, and hacks around the libxml's xml node structure (duplicating the namespace on the existing structure). I'll work on that, but it will affects the performance benefit. So what i propose is, we close the longstanding bug in this comitfest with the v7 patch. I'll work on improving the performance, without compromising good code structure. If the result is good, i'll submit the patch. Thanks Regards, -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
On Mon, Dec 15, 2014 at 1:02 PM, Ali Akbar the.ap...@gmail.com wrote: 2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com: On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote: Peter, while reviewing the better performing patch myself, now i think the patch needs more work to be committed. The structuring of the method will be confusing in the long term. I think I'll restructure the patch in the next commitfest. So i propose to break the patch: 1. We apply the current patch which uses xmlNodeCopy, so that the long-standing bug will be fixed in postgres. 2. I'll work with the performance enhancement in the next commitfest. Maybe for (2), the current better-performing patch can be viewed as PoC of the expected performance. Ali, are you currently working on that? Would you mind re-creating new entries in the commit fest app for the new set of patches that you are planning to do? For now I am switching this patch as returned with feedback. Thanks, What i mean, the last patch (v7 patch) as it is is enough to fix the bug (nested xpath namespace problem). I think the performance regression is still acceptable (in my case it's ~20%), because the bug is severe. Currently, xpath can return invalid xml because the namespace isn't included in the output! What i'll be working is the v4 patch, because it turns out the v4 patch has better performance (~10%, but Peter's test shows it isn't the case). But, the problem is the v4 patch is organized wrongly, and hacks around the libxml's xml node structure (duplicating the namespace on the existing structure). I'll work on that, but it will affects the performance benefit. So what i propose is, we close the longstanding bug in this comitfest with the v7 patch. I'll work on improving the performance, without compromising good code structure. If the result is good, i'll submit the patch. OK. Could you then move this patch to the new CF with Needs Review with Peter as reviewer then? He seems to be looking at it anyway seeing the update from 12/11. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2014-12-15 11:06 GMT+07:00 Michael Paquier michael.paqu...@gmail.com: On Mon, Dec 15, 2014 at 1:02 PM, Ali Akbar the.ap...@gmail.com wrote: 2014-12-15 10:19 GMT+07:00 Michael Paquier michael.paqu...@gmail.com: On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar the.ap...@gmail.com wrote: Peter, while reviewing the better performing patch myself, now i think the patch needs more work to be committed. The structuring of the method will be confusing in the long term. I think I'll restructure the patch in the next commitfest. So i propose to break the patch: 1. We apply the current patch which uses xmlNodeCopy, so that the long-standing bug will be fixed in postgres. 2. I'll work with the performance enhancement in the next commitfest. Maybe for (2), the current better-performing patch can be viewed as PoC of the expected performance. Ali, are you currently working on that? Would you mind re-creating new entries in the commit fest app for the new set of patches that you are planning to do? For now I am switching this patch as returned with feedback. Thanks, What i mean, the last patch (v7 patch) as it is is enough to fix the bug (nested xpath namespace problem). I think the performance regression is still acceptable (in my case it's ~20%), because the bug is severe. Currently, xpath can return invalid xml because the namespace isn't included in the output! What i'll be working is the v4 patch, because it turns out the v4 patch has better performance (~10%, but Peter's test shows it isn't the case). But, the problem is the v4 patch is organized wrongly, and hacks around the libxml's xml node structure (duplicating the namespace on the existing structure). I'll work on that, but it will affects the performance benefit. So what i propose is, we close the longstanding bug in this comitfest with the v7 patch. I'll work on improving the performance, without compromising good code structure. If the result is good, i'll submit the patch. OK. Could you then move this patch to the new CF with Needs Review with Peter as reviewer then? He seems to be looking at it anyway seeing the update from 12/11. OK. Moved to the new CF. Regards, -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
On 11/5/14 9:50 AM, Ali Akbar wrote: I noticed somewhat big performance regression if the xml is large (i use PRODML Product Volume sample from energistics.org http://energistics.org): * Without patch (tested 3 times): select unnest(xpath('//a:flow', x, ARRAY[['a','http://www.prodml.org/schemas/1series']])) from u; Time: 84,012 ms Time: 85,683 ms Time: 88,766 ms * With latest v6 patch (notice the correct result with namespace definition): select unnest(xpath('//a:flow', x, ARRAY[['a','http://www.prodml.org/schemas/1series']])) from u; Time: 108,912 ms Time: 108,267 ms Time: 114,848 ms It's 23% performance regression. * Just curious, i'm also testing v5 patch performance (notice the namespace in the result): select unnest(xpath('//a:flow', x, ARRAY[['a','http://www.prodml.org/schemas/1series']])) from u; Time: 92,552 ms Time: 97,440 ms Time: 99,309 ms The regression is only 13%. I know the xmlCopyNode() version (v6 patch) is much more cleaner than v5patch, should we consider the performance benefit? I ran a test using postgres-US.fo built in the PostgreSQL source tree, which is 38 MB, and ran select unnest(xpath('//fo:bookmark-title', b, array[array['fo', 'http://www.w3.org/1999/XSL/Format']])) from data; (Table contains one row only.) The timings were basically indistinguishable between the three code versions. I'll try to reproduce your test. How big is your file? Do you have a link to the actual file? Could you share your load script? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2014-11-04 22:16 GMT+07:00 Peter Eisentraut pete...@gmx.net: On 10/6/14 10:24 PM, Ali Akbar wrote: While reviewing the patch myself, i spotted some formatting problems in the code. Fixed in this v5 patch. Also, this patch uses context patch format (in first versions, because of the small modification, context patch format obfucates the changes. After reimplementation this isn't the case anymore) I think the problem this patch is addressing is real, and your approach is sound, but I'd ask you to go back to the xmlCopyNode() version, and try to add a test case for why the second argument = 1 is necessary. I don't see any other problems. OK. Because upstream code is fixed in current version, i'll revert to the previous version. Test case added to regression test. With =1 argument, the result is correct: local:piece xmlns:local=\http://127.0.0.1\; xmlns=\http://127.0.0.2\; id=\1\ internalnumber one/internal internal2/ /local:piece without the argument, the result is not correct, all children will be lost. Because of that, the other regression test will fail too because the children is not copied: *** 584,593 -- Text XPath expressions evaluation SELECT xpath('/value', data) FROM xmltest; ! xpath ! -- ! {valueone/value} ! {valuetwo/value} (2 rows) SELECT xpath(NULL, NULL) IS NULL FROM xmltest; --- 584,593 -- Text XPath expressions evaluation SELECT xpath('/value', data) FROM xmltest; !xpath ! ! {value/} ! {value/} (2 rows) SELECT xpath(NULL, NULL) IS NULL FROM xmltest; *** ... cut updated patch attached. I noticed somewhat big performance regression if the xml is large (i use PRODML Product Volume sample from energistics.org): * Without patch (tested 3 times): select unnest(xpath('//a:flow', x, ARRAY[['a',' http://www.prodml.org/schemas/1series']])) from u; unnest --- flow + kindgas lift/kind+ ... Time: 84,012 ms Time: 85,683 ms Time: 88,766 ms * With latest v6 patch (notice the correct result with namespace definition): select unnest(xpath('//a:flow', x, ARRAY[['a',' http://www.prodml.org/schemas/1series']])) from u; unnest --- flow xmlns=http://www.prodml.org/schemas/1series; + ... Time: 108,912 ms Time: 108,267 ms Time: 114,848 ms It's 23% performance regression. * Just curious, i'm also testing v5 patch performance (notice the namespace in the result): select unnest(xpath('//a:flow', x, ARRAY[['a',' http://www.prodml.org/schemas/1series']])) from u; unnest --- flow xmlns=http://www.prodml.org/schemas/1series; + kindgas lift/kind+ Time: 92,552 ms Time: 97,440 ms Time: 99,309 ms The regression is only 13%. I know the xmlCopyNode() version (v6 patch) is much more cleaner than v5patch, should we consider the performance benefit? Anyway, thanks for the review! :) Regards, -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
On 7/11/14 4:36 AM, Ali Akbar wrote: But i found some bug in libxml2's code, because we call xmlCopyNode with 1 as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode whether it's NULL. So if someday the recursion returns NULL (maybe because of memory exhaustion), it will SEGFAULT. Knowing this but in libxml2 code, is this patch is still acceptable? This problem was fixed in libxml2 2.9.2 (see https://bugzilla.gnome.org/show_bug.cgi?id=708681). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
On 7/8/14 6:03 AM, Ali Akbar wrote: If we put 1 as the parameter, the resulting Node will link to the existing children. In this case, the namespace problem for the children might be still exists. If any of the children uses different namespace(s) than the parent, the namespace definition will not be copied correctly. This analysis might very well be right, but I think we should prove it with a test case. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
On 10/6/14 10:24 PM, Ali Akbar wrote: While reviewing the patch myself, i spotted some formatting problems in the code. Fixed in this v5 patch. Also, this patch uses context patch format (in first versions, because of the small modification, context patch format obfucates the changes. After reimplementation this isn't the case anymore) I think the problem this patch is addressing is real, and your approach is sound, but I'd ask you to go back to the xmlCopyNode() version, and try to add a test case for why the second argument = 1 is necessary. I don't see any other problems. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
While reviewing the patch myself, i spotted some formatting problems in the code. Fixed in this v5 patch. Also, this patch uses context patch format (in first versions, because of the small modification, context patch format obfucates the changes. After reimplementation this isn't the case anymore) Thanks, Ali Akbar *** a/src/backend/utils/adt/xml.c --- b/src/backend/utils/adt/xml.c *** *** 141,149 static bool print_xml_decl(StringInfo buf, const xmlChar *version, pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, int encoding); ! static text *xml_xmlnodetoxmltype(xmlNodePtr cur); static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate); #endif /* USE_LIBXML */ static StringInfo query_to_xml_internal(const char *query, char *tablename, --- 141,151 pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, int encoding); ! static text *xml_xmlnodetoxmltype(xmlNodePtr cur, ! PgXmlErrorContext *xmlerrcxt); static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate, ! PgXmlErrorContext *xmlerrcxt); #endif /* USE_LIBXML */ static StringInfo query_to_xml_internal(const char *query, char *tablename, *** *** 3594,3620 SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename, #ifdef USE_LIBXML /* * Convert XML node to text (dump subtree in case of element, * return value otherwise) */ static text * ! xml_xmlnodetoxmltype(xmlNodePtr cur) { xmltype*result; if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; buf = xmlBufferCreate(); PG_TRY(); { xmlNodeDump(buf, NULL, cur, 0, 1); result = xmlBuffer_to_xmltype(buf); } PG_CATCH(); { xmlBufferFree(buf); PG_RE_THROW(); } --- 3596,3706 #ifdef USE_LIBXML + /* check ns definition of node and its childrens. If any one of ns is + * not defined in node and it's children, but in the node's parent, + * copy the definition to node. + */ + static void + xml_checkandcopyns(xmlNodePtr root, + PgXmlErrorContext *xmlerrcxt, + xmlNodePtr node, + xmlNsPtr lastns_before) + { + xmlNsPtr ns = NULL; + xmlNsPtr cur_ns; + xmlNodePtr cur; + + if (node-ns != NULL) + { + /* check in new nses */ + cur_ns = lastns_before == NULL ? node-nsDef : lastns_before-next; + while (cur_ns != NULL) + { + if (cur_ns-href != NULL) + { + if (((cur_ns-prefix == NULL) (node-ns-prefix == NULL)) || + ((cur_ns-prefix != NULL) (node-ns-prefix != NULL) + xmlStrEqual(cur_ns-prefix, node-ns-prefix))) + { + ns = cur_ns; + break; + } + } + cur_ns = cur_ns-next; + } + if (ns == NULL) /* not in new nses */ + { + ns = xmlSearchNs(NULL, node-parent, node-ns-prefix); + + if (ns != NULL) + { + ns = xmlNewNs(root, ns-href, ns-prefix); + + if (ns == NULL xmlerrcxt-err_occurred) + xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY, + could not allocate xmlNs); + } + } + } + /* check and copy ns for children recursively */ + cur = node-children; + while (cur != NULL) + { + xml_checkandcopyns(root, xmlerrcxt, cur, lastns_before); + cur = cur-next; + } + } + /* * Convert XML node to text (dump subtree in case of element, * return value otherwise) */ static text * ! xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt) { xmltype*result; if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; + xmlNsPtr lastns_before; + xmlNsPtr ns; + xmlNsPtr next; buf = xmlBufferCreate(); + PG_TRY(); { + lastns_before = cur-nsDef; + if (lastns_before != NULL) + { + while (lastns_before-next != NULL) + lastns_before = lastns_before-next; + } + xml_checkandcopyns(cur, xmlerrcxt, cur, lastns_before); + xmlNodeDump(buf, NULL, cur, 0, 1); result = xmlBuffer_to_xmltype(buf); + + /* delete and free new nses */ + ns = lastns_before == NULL ? cur-nsDef : lastns_before-next; + while (ns != NULL) + { + next = ns-next; + xmlFree(ns); + ns = next; + } + + if (lastns_before == NULL) + cur-nsDef = NULL; + else + lastns_before-next = NULL; } PG_CATCH(); { + /* new namespaces will be freed while free-ing the node, so we + * won't free it here + */ xmlBufferFree(buf); PG_RE_THROW(); } *** *** 3660,3666 xml_xmlnodetoxmltype(xmlNodePtr cur) */ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate) { int result = 0; Datum datum; --- 3746,3753 */ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, !
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
Greetings, Because of the memory bug in xmlCopyNode, this is a new patch with different method. In this patch, instead of using xmlCopyNode to bring the namespace back, we added the required namespaces to the node before dumping the node to string, and cleaning it up afterwards (because the same node could be dumped again in next xpath result). Thanks, Ali Akbar 2014-07-11 15:36 GMT+07:00 Ali Akbar the.ap...@gmail.com: Greetings, Attached modified patch to handle xmlCopyNode returning NULL. The patch is larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt is needed for calling xml_ereport). From libxml2 source, it turns out that the other cases that xmlCopyNode will return NULL will not happen. So in this patch i assume that the only case is memory exhaustion. But i found some bug in libxml2's code, because we call xmlCopyNode with 1 as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode whether it's NULL. So if someday the recursion returns NULL (maybe because of memory exhaustion), it will SEGFAULT. Knowing this but in libxml2 code, is this patch is still acceptable? Thanks, Ali Akbar -- Ali Akbar diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c index 422be69..f34c87e 100644 --- a/src/backend/utils/adt/xml.c +++ b/src/backend/utils/adt/xml.c @@ -141,9 +141,11 @@ static bool print_xml_decl(StringInfo buf, const xmlChar *version, pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, int encoding); -static text *xml_xmlnodetoxmltype(xmlNodePtr cur); +static text *xml_xmlnodetoxmltype(xmlNodePtr cur, + PgXmlErrorContext *xmlerrcxt); static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, - ArrayBuildState **astate); + ArrayBuildState **astate, + PgXmlErrorContext *xmlerrcxt); #endif /* USE_LIBXML */ static StringInfo query_to_xml_internal(const char *query, char *tablename, @@ -3590,27 +3592,109 @@ SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename, #ifdef USE_LIBXML +/* check ns definition of node and its childrens. If any one of ns is + * not defined in node and it's children, but in the node's parent, + * copy the definition to node. + */ +static void +xml_checkandcopyns(xmlNodePtr root, + PgXmlErrorContext *xmlerrcxt, + xmlNodePtr node, + xmlNsPtr lastns_before) +{ + xmlNsPtr ns = NULL; + xmlNsPtr cur_ns; + xmlNodePtr cur; + + if (node-ns != NULL) + { + /* check in new nses */ + cur_ns = lastns_before == NULL ? node-nsDef : lastns_before-next; + while (cur_ns != NULL) + { + if (cur_ns-href != NULL) + { +if (((cur_ns-prefix == NULL) (node-ns-prefix == NULL)) || + ((cur_ns-prefix != NULL) (node-ns-prefix != NULL) + xmlStrEqual(cur_ns-prefix, node-ns-prefix))) +{ + ns = cur_ns; + break; +} + } + cur_ns = cur_ns-next; + } + if (ns == NULL) /* not in new nses */ + { + ns = xmlSearchNs(NULL, node-parent, node-ns-prefix); + + if (ns != NULL) { +ns = xmlNewNs(root, ns-href, ns-prefix); + +if (ns == NULL xmlerrcxt-err_occurred) { + xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY, + could not allocate xmlNs); +} + } + } + } + /* check and copy ns for children recursively */ + cur = node-children; + while (cur != NULL) { + xml_checkandcopyns(root, xmlerrcxt, cur, lastns_before); + cur = cur-next; + } +} + /* * Convert XML node to text (dump subtree in case of element, * return value otherwise) */ static text * -xml_xmlnodetoxmltype(xmlNodePtr cur) +xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt) { xmltype*result; if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; + xmlNsPtr lastns_before; + xmlNsPtr ns; + xmlNsPtr next; buf = xmlBufferCreate(); + PG_TRY(); { + lastns_before = cur-nsDef; + if (lastns_before != NULL) { +while (lastns_before-next != NULL) { + lastns_before = lastns_before-next; +} + } + xml_checkandcopyns(cur, xmlerrcxt, cur, lastns_before); + xmlNodeDump(buf, NULL, cur, 0, 1); result = xmlBuffer_to_xmltype(buf); + + /* delete and free new nses */ + ns = lastns_before == NULL ? cur-nsDef : lastns_before-next; + while (ns != NULL) { +next = ns-next; +xmlFree(ns); +ns = next; + } + if (lastns_before == NULL) { +cur-nsDef = NULL; + } else { +lastns_before-next = NULL; + } } PG_CATCH(); { + /* new namespaces will be freed while free-ing the node, so we + * won't free it here + */ xmlBufferFree(buf); PG_RE_THROW(); } @@ -3656,7 +3740,8 @@ xml_xmlnodetoxmltype(xmlNodePtr cur) */ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, - ArrayBuildState **astate) + ArrayBuildState
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
Greetings, Attached modified patch to handle xmlCopyNode returning NULL. The patch is larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt is needed for calling xml_ereport). From libxml2 source, it turns out that the other cases that xmlCopyNode will return NULL will not happen. So in this patch i assume that the only case is memory exhaustion. But i found some bug in libxml2's code, because we call xmlCopyNode with 1 as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode whether it's NULL. So if someday the recursion returns NULL (maybe because of memory exhaustion), it will SEGFAULT. Knowing this but in libxml2 code, is this patch is still acceptable? Thanks, Ali Akbar *** a/src/backend/utils/adt/xml.c --- b/src/backend/utils/adt/xml.c *** *** 141,149 static bool print_xml_decl(StringInfo buf, const xmlChar *version, pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, int encoding); ! static text *xml_xmlnodetoxmltype(xmlNodePtr cur); static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate); #endif /* USE_LIBXML */ static StringInfo query_to_xml_internal(const char *query, char *tablename, --- 141,151 pg_enc encoding, int standalone); static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, int encoding); ! static text *xml_xmlnodetoxmltype(xmlNodePtr cur, ! PgXmlErrorContext *xmlerrcxt); static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate, ! PgXmlErrorContext *xmlerrcxt); #endif /* USE_LIBXML */ static StringInfo query_to_xml_internal(const char *query, char *tablename, *** *** 3595,3620 SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename, * return value otherwise) */ static text * ! xml_xmlnodetoxmltype(xmlNodePtr cur) { xmltype*result; if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; buf = xmlBufferCreate(); PG_TRY(); { ! xmlNodeDump(buf, NULL, cur, 0, 1); result = xmlBuffer_to_xmltype(buf); } PG_CATCH(); { xmlBufferFree(buf); PG_RE_THROW(); } PG_END_TRY(); xmlBufferFree(buf); } else --- 3597,3636 * return value otherwise) */ static text * ! xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt) { xmltype*result; if (cur-type == XML_ELEMENT_NODE) { xmlBufferPtr buf; + xmlNodePtr cur_copy; buf = xmlBufferCreate(); + + /* the result of xmlNodeDump won't contain namespace definitions + * from parent nodes, but xmlCopyNode duplicates a node along + * with its required namespace definitions. + */ + cur_copy = xmlCopyNode(cur, 1); + + if (cur_copy == NULL) + xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY, + could not serialize xml); + PG_TRY(); { ! xmlNodeDump(buf, NULL, cur_copy, 0, 1); result = xmlBuffer_to_xmltype(buf); } PG_CATCH(); { + xmlFreeNode(cur_copy); xmlBufferFree(buf); PG_RE_THROW(); } PG_END_TRY(); + xmlFreeNode(cur_copy); xmlBufferFree(buf); } else *** *** 3656,3662 xml_xmlnodetoxmltype(xmlNodePtr cur) */ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate) { int result = 0; Datum datum; --- 3672,3679 */ static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, ! ArrayBuildState **astate, ! PgXmlErrorContext *xmlerrcxt) { int result = 0; Datum datum; *** *** 3678,3684 xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj, for (i = 0; i result; i++) { ! datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj-nodesetval-nodeTab[i])); *astate = accumArrayResult(*astate, datum, false, XMLOID, CurrentMemoryContext); --- 3695,3702 for (i = 0; i result; i++) { ! datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj-nodesetval-nodeTab[i], ! xmlerrcxt)); *astate = accumArrayResult(*astate, datum, false, XMLOID, CurrentMemoryContext); *** *** 3882,3890 xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces, * Extract the results as requested. */ if (res_nitems != NULL) ! *res_nitems = xml_xpathobjtoxmlarray(xpathobj, astate); else ! (void) xml_xpathobjtoxmlarray(xpathobj, astate); } PG_CATCH(); { --- 3900,3908 * Extract the results as requested. */ if (res_nitems != NULL) ! *res_nitems =
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
I don't know enough about XML/XPATH to know if this is a good idea or not, Actually currently because of the namespace problem, xpath() returns wrong result (even worse: invalid xml!). So this patch corrects the behavior of xpath() to the correct one. but I did go read the documentation of xmlCopyNode(), and I notice it says the second argument is extended: if 1 do a recursive copy (properties, namespaces and children when applicable) if 2 copy properties and namespaces (when applicable) Do we really want it to copy children? Perhaps the value should be 2? (I have no idea, I'm just raising the question.) If we put 1 as the parameter, the resulting Node will link to the existing children. In this case, the namespace problem for the children might be still exists. If any of the children uses different namespace(s) than the parent, the namespace definition will not be copied correctly. Just to be safe, in the patch 1 passed as the second argument. Ideally, we can traverse the Node object and generate XML accordingly, but it is a lot of work, and a lot of duplicating libxml's code. Maybe we should push this upstream to libxml? I also notice that it says Returns: a new #xmlNodePtr, or NULL in case of error. and the patch is failing to cover the error case. Most likely, that would represent out-of-memory, so it definitely seems to be something we should account for. Ah, overlooked that. Skimming through libxml2's source, it looks like there are some other cases beside out-of-memory. Will update the patch according to these cases. Thanks for the review. -- Ali Akbar
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
Abhijit Menon-Sen a...@2ndquadrant.com writes: I can confirm that it works fine. I have attached here a very slightly tweaked version of the patch (removed trailing whitespace, and changed some comment text). I'm marking this ready for committer. I don't know enough about XML/XPATH to know if this is a good idea or not, but I did go read the documentation of xmlCopyNode(), and I notice it says the second argument is extended: if 1 do a recursive copy (properties, namespaces and children when applicable) if 2 copy properties and namespaces (when applicable) Do we really want it to copy children? Perhaps the value should be 2? (I have no idea, I'm just raising the question.) I also notice that it says Returns: a new #xmlNodePtr, or NULL in case of error. and the patch is failing to cover the error case. Most likely, that would represent out-of-memory, so it definitely seems to be something we should account for. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions
2014-06-16 22:52 GMT+07:00 Abhijit Menon-Sen a...@2ndquadrant.com: Thanks for the patch, and welcome to Postgres development. I can confirm that it works fine. I have attached here a very slightly tweaked version of the patch (removed trailing whitespace, and changed some comment text). I'm marking this ready for committer. Thanks for the review. Hope i will be able to contribute a little here and there in the future. -- Ali Akbar