Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2015-01-18 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Jan 19, 2015 at 2:38 AM, David Fetter  wrote:
>> On Sun, Jan 18, 2015 at 06:05:05PM +0700, Ali Akbar wrote:
>>> 2015-01-18 10:44 GMT+07:00 Peter Eisentraut :
 Btw, for bug-fix patches like this, should the patch creator (me) also
 create patches for back branches?

>> As I understand it, back-patches are the committer's responsibility.
>> The submitter might make suggestions as to how this might be
>> approached if it doesn't appear trivial.

> TBH, I would imagine that patches that can be applied to back-branches
> are a better start point than plain scratch particularly if there are
> diffs in stable branches compared to HEAD. Everybody's time is
> important.

Yeah --- and I'd argue that it's largely a waste of time to work on
creating back-branch patches until the HEAD patch is in final form.
Since we've generally reserved the right for the committer to whack
patches around before committing, I think this means the committer
also has to do the work of back-patch adjustment.

Now a committer who doesn't feel like doing that could certainly say
"here's the version of the HEAD patch that I'm willing to commit, but
it doesn't apply cleanly in back branches; could you work up back-branch
equivalents?".  But that hasn't been the usual approach.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2015-01-18 Thread Michael Paquier
On Mon, Jan 19, 2015 at 2:38 AM, David Fetter  wrote:
> On Sun, Jan 18, 2015 at 06:05:05PM +0700, Ali Akbar wrote:
>> 2015-01-18 10:44 GMT+07:00 Peter Eisentraut :
>> Btw, for bug-fix patches like this, should the patch creator (me) also
>> create patches for back branches?
>
> As I understand it, back-patches are the committer's responsibility.
> The submitter might make suggestions as to how this might be
> approached if it doesn't appear trivial.
TBH, I would imagine that patches that can be applied to back-branches
are a better start point than plain scratch particularly if there are
diffs in stable branches compared to HEAD. Everybody's time is
important.
--
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2015-01-18 Thread David Fetter
On Sun, Jan 18, 2015 at 06:05:05PM +0700, Ali Akbar wrote:
> 2015-01-18 10:44 GMT+07:00 Peter Eisentraut :
> 
> > On 1/7/15 8:51 PM, Ali Akbar wrote:
> > > So now +1 for back-patching this.
> >
> > Done.  Was a bit of work to get it into all the old versions.
> >
> 
> Wow. Thanks.
> 
> Btw, for bug-fix patches like this, should the patch creator (me) also
> create patches for back branches?

As I understand it, back-patches are the committer's responsibility.

The submitter might make suggestions as to how this might be
approached if it doesn't appear trivial.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2015-01-18 Thread Ali Akbar
2015-01-18 10:44 GMT+07:00 Peter Eisentraut :

> On 1/7/15 8:51 PM, Ali Akbar wrote:
> > So now +1 for back-patching this.
>
> Done.  Was a bit of work to get it into all the old versions.
>

Wow. Thanks.

Btw, for bug-fix patches like this, should the patch creator (me) also
create patches for back branches?

Regards,
-- 
Ali Akbar


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2015-01-17 Thread Peter Eisentraut
On 1/7/15 8:51 PM, Ali Akbar wrote:
> So now +1 for back-patching this.

Done.  Was a bit of work to get it into all the old versions.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2015-01-07 Thread Ali Akbar
>
> Peter Eisentraut  writes:
> > committed version 7
>
Thanks!

2015-01-07 13:28 GMT+07:00 Tom Lane :

> Isn't that a back-patchable bug fix?
>

Upthread, i noted:

> For back versions, i think because this patch changes xpath() behavior, we
> will only apply this to future versions. The old behavior is wrong
> (according to XPath standard) for not including namespaces, but maybe there
> are some application that depends on the old behavior.
>

Reviewing the behavior on 9.3, now i think the old behavior isn't usable
(the resulting xml is not even processable in postgres):

# select unnest(xpath('//a:b', 'http://test.com/a";>12'::xml,
array[array['a','http://test.com/a']]));
unnest
--
 1
 2
(2 rows)

# select xpath('//b:b', unnest(xpath('//a:b', 'http://test.com/a";>12'::xml, array[array['a','
http://test.com/a']])));
ERROR:  could not parse XML document
DETAIL:  line 1: Namespace prefix b on b is not defined
1

Maybe some application uses the result directly, but correct xml-using
applications should handle namespace correctly, so if '1'
becomes 'http://test.com/a' >1', there should be no
issue in those applications.


So now +1 for back-patching this.

Regards,
-- 
Ali Akbar


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2015-01-06 Thread Tom Lane
Peter Eisentraut  writes:
> committed version 7

Isn't that a back-patchable bug fix?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2015-01-06 Thread Peter Eisentraut
committed version 7


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar
2014-12-15 11:06 GMT+07:00 Michael Paquier :
>
> On Mon, Dec 15, 2014 at 1:02 PM, Ali Akbar  wrote:
> > 2014-12-15 10:19 GMT+07:00 Michael Paquier :
> >>
> >> On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar  wrote:
> >> > Peter, while reviewing the better performing patch myself, now i think
> >> > the
> >> > patch needs more work to be committed. The structuring of the method
> >> > will be
> >> > confusing in the long term. I think I'll restructure the patch in the
> >> > next
> >> > commitfest.
> >> > So i propose to break the patch:
> >> > 1. We apply the current patch which uses xmlNodeCopy, so that the
> >> > long-standing bug will be fixed in postgres.
> >> > 2. I'll work with the performance enhancement in the next commitfest.
> >> >
> >> > Maybe for (2), the current better-performing patch can be viewed as
> PoC
> >> > of
> >> > the expected performance.
> >>
> >> Ali, are you currently working on that? Would you mind re-creating new
> >> entries in the commit fest app for the new set of patches that you are
> >> planning to do?
> >> For now I am switching this patch as returned with feedback.
> >> Thanks,
> >
> >
> > What i mean, the last patch (v7 patch) as it is is enough to fix the bug
> > (nested xpath namespace problem). I think the performance regression is
> > still acceptable (in my case it's ~20%), because the bug is severe.
> > Currently, xpath can return invalid xml because the namespace isn't
> included
> > in the output!
> >
> > What i'll be working is the v4 patch, because it turns out the v4 patch
> has
> > better performance (~10%, but Peter's test shows it isn't the case). But,
> > the problem is the v4 patch is organized wrongly, and hacks around the
> > libxml's xml node structure (duplicating the namespace on the existing
> > structure). I'll work on that, but it will affects the performance
> benefit.
> >
> > So what i propose is, we close the longstanding bug in this comitfest
> with
> > the v7 patch. I'll work on improving the performance, without
> compromising
> > good code structure. If the result is good, i'll submit the patch.
> OK. Could you then move this patch to the new CF with Needs Review
>
with Peter as reviewer then? He seems to be looking at it anyway
> seeing the update from 12/11.
>

OK. Moved to the new CF.

Regards,
-- 
Ali Akbar


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Michael Paquier
On Mon, Dec 15, 2014 at 1:02 PM, Ali Akbar  wrote:
> 2014-12-15 10:19 GMT+07:00 Michael Paquier :
>>
>> On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar  wrote:
>> > Peter, while reviewing the better performing patch myself, now i think
>> > the
>> > patch needs more work to be committed. The structuring of the method
>> > will be
>> > confusing in the long term. I think I'll restructure the patch in the
>> > next
>> > commitfest.
>> > So i propose to break the patch:
>> > 1. We apply the current patch which uses xmlNodeCopy, so that the
>> > long-standing bug will be fixed in postgres.
>> > 2. I'll work with the performance enhancement in the next commitfest.
>> >
>> > Maybe for (2), the current better-performing patch can be viewed as PoC
>> > of
>> > the expected performance.
>>
>> Ali, are you currently working on that? Would you mind re-creating new
>> entries in the commit fest app for the new set of patches that you are
>> planning to do?
>> For now I am switching this patch as returned with feedback.
>> Thanks,
>
>
> What i mean, the last patch (v7 patch) as it is is enough to fix the bug
> (nested xpath namespace problem). I think the performance regression is
> still acceptable (in my case it's ~20%), because the bug is severe.
> Currently, xpath can return invalid xml because the namespace isn't included
> in the output!
>
> What i'll be working is the v4 patch, because it turns out the v4 patch has
> better performance (~10%, but Peter's test shows it isn't the case). But,
> the problem is the v4 patch is organized wrongly, and hacks around the
> libxml's xml node structure (duplicating the namespace on the existing
> structure). I'll work on that, but it will affects the performance benefit.
>
> So what i propose is, we close the longstanding bug in this comitfest with
> the v7 patch. I'll work on improving the performance, without compromising
> good code structure. If the result is good, i'll submit the patch.
OK. Could you then move this patch to the new CF with Needs Review
with Peter as reviewer then? He seems to be looking at it anyway
seeing the update from 12/11.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar
2014-12-15 11:02 GMT+07:00 Ali Akbar :
>
> 2014-12-15 10:19 GMT+07:00 Michael Paquier :
>>
>> On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar  wrote:
>> > Peter, while reviewing the better performing patch myself, now i think
>> the
>> > patch needs more work to be committed. The structuring of the method
>> will be
>> > confusing in the long term. I think I'll restructure the patch in the
>> next
>> > commitfest.
>> > So i propose to break the patch:
>> > 1. We apply the current patch which uses xmlNodeCopy, so that the
>> > long-standing bug will be fixed in postgres.
>> > 2. I'll work with the performance enhancement in the next commitfest.
>> >
>> > Maybe for (2), the current better-performing patch can be viewed as PoC
>> of
>> > the expected performance.
>>
>> Ali, are you currently working on that? Would you mind re-creating new
>> entries in the commit fest app for the new set of patches that you are
>> planning to do?
>> For now I am switching this patch as returned with feedback.
>> Thanks,
>>
>
> What i mean, the last patch (v7 patch) as it is is enough to fix the bug
> (nested xpath namespace problem). I think the performance regression is
> still acceptable (in my case it's ~20%), because the bug is severe.
> Currently, xpath can return invalid xml because the namespace isn't
> included in the output!
>

Sorry, typo here. What i mean isn't "invalid xml", but "incomplete xml".
Hence the next call to xpath with the previous result as its input will
become a problem because the namespace will not match.


>
> What i'll be working is the v4 patch, because it turns out the v4 patch
> has better performance (~10%, but Peter's test shows it isn't the case).
> But, the problem is the v4 patch is organized wrongly, and hacks around the
> libxml's xml node structure (duplicating the namespace on the existing
> structure). I'll work on that, but it will affects the performance benefit.
>
> So what i propose is, we close the longstanding bug in this comitfest with
> the v7 patch. I'll work on improving the performance, without compromising
> good code structure. If the result is good, i'll submit the patch.
>
> Thanks
>

Regards,
-- 
Ali Akbar


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar
2014-12-15 10:19 GMT+07:00 Michael Paquier :
>
> On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar  wrote:
> > Peter, while reviewing the better performing patch myself, now i think
> the
> > patch needs more work to be committed. The structuring of the method
> will be
> > confusing in the long term. I think I'll restructure the patch in the
> next
> > commitfest.
> > So i propose to break the patch:
> > 1. We apply the current patch which uses xmlNodeCopy, so that the
> > long-standing bug will be fixed in postgres.
> > 2. I'll work with the performance enhancement in the next commitfest.
> >
> > Maybe for (2), the current better-performing patch can be viewed as PoC
> of
> > the expected performance.
>
> Ali, are you currently working on that? Would you mind re-creating new
> entries in the commit fest app for the new set of patches that you are
> planning to do?
> For now I am switching this patch as returned with feedback.
> Thanks,
>

What i mean, the last patch (v7 patch) as it is is enough to fix the bug
(nested xpath namespace problem). I think the performance regression is
still acceptable (in my case it's ~20%), because the bug is severe.
Currently, xpath can return invalid xml because the namespace isn't
included in the output!

What i'll be working is the v4 patch, because it turns out the v4 patch has
better performance (~10%, but Peter's test shows it isn't the case). But,
the problem is the v4 patch is organized wrongly, and hacks around the
libxml's xml node structure (duplicating the namespace on the existing
structure). I'll work on that, but it will affects the performance benefit.

So what i propose is, we close the longstanding bug in this comitfest with
the v7 patch. I'll work on improving the performance, without compromising
good code structure. If the result is good, i'll submit the patch.

Thanks

Regards,
-- 
Ali Akbar


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Michael Paquier
On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar  wrote:
> Peter, while reviewing the better performing patch myself, now i think the
> patch needs more work to be committed. The structuring of the method will be
> confusing in the long term. I think I'll restructure the patch in the next
> commitfest.
> So i propose to break the patch:
> 1. We apply the current patch which uses xmlNodeCopy, so that the
> long-standing bug will be fixed in postgres.
> 2. I'll work with the performance enhancement in the next commitfest.
>
> Maybe for (2), the current better-performing patch can be viewed as PoC of
> the expected performance.

Ali, are you currently working on that? Would you mind re-creating new
entries in the commit fest app for the new set of patches that you are
planning to do?
For now I am switching this patch as returned with feedback.
Thanks,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar
2014-12-14 22:18 GMT+07:00 Ali Akbar :
>
>
> I ran a test using postgres-US.fo built in the PostgreSQL source tree,
>> which is 38 MB, and ran
>>
>> select unnest(xpath('//fo:bookmark-title', b, array[array['fo',
>> 'http://www.w3.org/1999/XSL/Format']])) from data;
>>
>> (Table contains one row only.)
>>
>> The timings were basically indistinguishable between the three code
>> versions.
>>
>> I'll try to reproduce your test.  How big is your file?  Do you have a
>> link to the actual file?  Could you share your load script?
>>
>
> I use this xml sample:
> http://w3.energistics.org/schema/prodml_v1.2.0_data/XML_Examples/productVolume_no_xsl.xml
>
> Basically i loaded the xml to table u 100 times. Load script attached.
>

Peter, while reviewing the better performing patch myself, now i think the
patch needs more work to be committed. The structuring of the method will
be confusing in the long term. I think i'll restructure the patch in the
next commitfest.

So i propose to break the patch:
1. We apply the current patch which uses xmlNodeCopy, so that the
long-standing bug will be fixed in postgres.
2. I'll work with the performance enhancement in the next commitfest.

Maybe for (2), the current better-performing patch can be viewed as PoC of
the expected performance.

Regards,
-- 
Ali Akbar


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-14 Thread Ali Akbar
>
> I ran a test using postgres-US.fo built in the PostgreSQL source tree,
> which is 38 MB, and ran
>
> select unnest(xpath('//fo:bookmark-title', b, array[array['fo',
> 'http://www.w3.org/1999/XSL/Format']])) from data;
>
> (Table contains one row only.)
>
> The timings were basically indistinguishable between the three code
> versions.
>
> I'll try to reproduce your test.  How big is your file?  Do you have a
> link to the actual file?  Could you share your load script?
>

I use this xml sample:
http://w3.energistics.org/schema/prodml_v1.2.0_data/XML_Examples/productVolume_no_xsl.xml

Basically i loaded the xml to table u 100 times. Load script attached.

Regards,
-- 
Ali Akbar


load_test.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-12-11 Thread Peter Eisentraut
On 11/5/14 9:50 AM, Ali Akbar wrote:
> I noticed somewhat big performance regression if the xml is large (i use
> PRODML Product Volume sample from energistics.org ):
> * Without patch (tested 3 times):
> select unnest(xpath('//a:flow', x,
> ARRAY[['a','http://www.prodml.org/schemas/1series']])) from u;

> Time: 84,012 ms
> Time: 85,683 ms
> Time: 88,766 ms

> * With latest v6 patch (notice the correct result with namespace
> definition):
> 
> select unnest(xpath('//a:flow', x,
> ARRAY[['a','http://www.prodml.org/schemas/1series']])) from u;

> Time: 108,912 ms
> Time: 108,267 ms
> Time: 114,848 ms
> 
> 
> It's 23% performance regression.
> 
> * Just curious, i'm also testing v5 patch performance (notice the
> namespace in the result):
> select unnest(xpath('//a:flow', x,
> ARRAY[['a','http://www.prodml.org/schemas/1series']])) from u;

> Time: 92,552 ms
> Time: 97,440 ms
> Time: 99,309 ms
> 
> The regression is only 13%. I know the xmlCopyNode() version (v6 patch)
> is much more cleaner than v5patch, should we consider the performance
> benefit?

I ran a test using postgres-US.fo built in the PostgreSQL source tree,
which is 38 MB, and ran

select unnest(xpath('//fo:bookmark-title', b, array[array['fo',
'http://www.w3.org/1999/XSL/Format']])) from data;

(Table contains one row only.)

The timings were basically indistinguishable between the three code
versions.

I'll try to reproduce your test.  How big is your file?  Do you have a
link to the actual file?  Could you share your load script?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-11-25 Thread Ali Akbar
2014-11-05 21:50 GMT+07:00 Ali Akbar :

> 2014-11-04 22:16 GMT+07:00 Peter Eisentraut :
>
>> I think the problem this patch is addressing is real, and your approach
>>> is sound, but I'd ask you to go back to the xmlCopyNode() version, and
>>> try to add a test case for why the second argument = 1 is necessary.  I
>>> don't see any other problems.
>>>
>>
>> OK. Because upstream code is fixed in current version, i'll revert to the
>> previous version. Test case added to regression test. With =1 argument, the
>> result is correct:
>> http://127.0.0.1\"; xmlns=\"http://127.0.0.2\";
>> id=\"1\">
>>number one
>>
>>  
>>
>> without the argument, the result is not correct, all children will be
>> lost. Because of that, the other regression test will fail too because the
>> children is not copied:
>> *** 584,593 
>>
>>   -- Text XPath expressions evaluation
>>   SELECT xpath('/value', data) FROM xmltest;
>> ! xpath
>> ! --
>> !  {one}
>> !  {two}
>>   (2 rows)
>>
>>   SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
>> --- 584,593 
>>
>>   -- Text XPath expressions evaluation
>>   SELECT xpath('/value', data) FROM xmltest;
>> !xpath
>> ! 
>> !  {}
>> !  {}
>>   (2 rows)
>>
>>   SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
>> ***
>> ... 
>>
>> updated patch attached.
>>
>
> I noticed somewhat big performance regression if the xml is large (i use
> PRODML Product Volume sample from energistics.org):
> * Without patch (tested 3 times):
> select unnest(xpath('//a:flow', x, ARRAY[['a','
> http://www.prodml.org/schemas/1series']])) from u;
>
> unnest
>
> ---
>  
> +
>  gas
> lift+
> ...
> Time: 84,012 ms
> Time: 85,683 ms
> Time: 88,766 ms
>
>
> * With latest v6 patch (notice the correct result with namespace
> definition):
>
> select unnest(xpath('//a:flow', x, ARRAY[['a','
> http://www.prodml.org/schemas/1series']])) from u;
>
> unnest
>
> ---
>  http://www.prodml.org/schemas/1series";>
> +
> ...
> Time: 108,912 ms
> Time: 108,267 ms
> Time: 114,848 ms
>
>
> It's 23% performance regression.
>
> * Just curious, i'm also testing v5 patch performance (notice the
> namespace in the result):
> select unnest(xpath('//a:flow', x, ARRAY[['a','
> http://www.prodml.org/schemas/1series']])) from u;
>
> unnest
>
> ---
>  http://www.prodml.org/schemas/1series";>
> +
>  gas
> lift+
> Time: 92,552 ms
> Time: 97,440 ms
> Time: 99,309 ms
>
> The regression is only 13%. I know the xmlCopyNode() version (v6 patch) is
> much more cleaner than v5patch, should we consider the performance benefit?
>
> Anyway, thanks for the review! :)
>

commit bac2739 in master by Tom Lane changes *astate definition
in xml_xpathobjtoxmlarray, this attached v7 patch rebases v6 patch with
latest master.

For performance comparison, i also rebased the v5 patch attached with name
v5-141126.patch


-- 
Ali Akbar
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
***
*** 141,149  static bool print_xml_decl(StringInfo buf, const xmlChar *version,
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState *astate);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
--- 141,151 
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur,
! 	   PgXmlErrorContext *xmlerrcxt);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState *astate,
! 	   PgXmlErrorContext *xmlerrcxt);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
***
*** 3594,3620  SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename,
  
  #ifdef USE_LIBXML
  
  /*
   * Convert XML node to text (dump subtree in case of element,
   * return value otherwise)
   */
  static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur)
  {
  	xmltype*result;
  
  	if (cur->type == XML_ELEMENT_NODE)
  	{
  		xmlBufferPtr buf;
  
  		buf = xmlBufferCreate();
  		PG_TRY();
  		{
  			xmlNodeDump(buf, NULL, cur, 0, 1);
  			result = xmlBuffer_to_xmltype(buf);
  		}
  		PG_CATCH();
  		{
  			xmlBufferFree(buf);
  			PG_RE_THROW(

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-11-05 Thread Ali Akbar
2014-11-04 22:16 GMT+07:00 Peter Eisentraut :

>
> On 10/6/14 10:24 PM, Ali Akbar wrote:
>> > While reviewing the patch myself, i spotted some formatting problems in
>> > the code. Fixed in this v5 patch.
>> >
>> > Also, this patch uses context patch format (in first versions, because
>> > of the small modification, context patch format obfucates the changes.
>> > After reimplementation this isn't the case anymore)
>>
>> I think the problem this patch is addressing is real, and your approach
>> is sound, but I'd ask you to go back to the xmlCopyNode() version, and
>> try to add a test case for why the second argument = 1 is necessary.  I
>> don't see any other problems.
>>
>
> OK. Because upstream code is fixed in current version, i'll revert to the
> previous version. Test case added to regression test. With =1 argument, the
> result is correct:
> http://127.0.0.1\"; xmlns=\"http://127.0.0.2\";
> id=\"1\">
>number one
>
>  
>
> without the argument, the result is not correct, all children will be
> lost. Because of that, the other regression test will fail too because the
> children is not copied:
> *** 584,593 
>
>   -- Text XPath expressions evaluation
>   SELECT xpath('/value', data) FROM xmltest;
> ! xpath
> ! --
> !  {one}
> !  {two}
>   (2 rows)
>
>   SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
> --- 584,593 
>
>   -- Text XPath expressions evaluation
>   SELECT xpath('/value', data) FROM xmltest;
> !xpath
> ! 
> !  {}
> !  {}
>   (2 rows)
>
>   SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
> ***
> ... 
>
> updated patch attached.
>

I noticed somewhat big performance regression if the xml is large (i use
PRODML Product Volume sample from energistics.org):
* Without patch (tested 3 times):
select unnest(xpath('//a:flow', x, ARRAY[['a','
http://www.prodml.org/schemas/1series']])) from u;

unnest
---
 
+
 gas
lift+
...
Time: 84,012 ms
Time: 85,683 ms
Time: 88,766 ms


* With latest v6 patch (notice the correct result with namespace
definition):

select unnest(xpath('//a:flow', x, ARRAY[['a','
http://www.prodml.org/schemas/1series']])) from u;

unnest
---
 http://www.prodml.org/schemas/1series";>
+
...
Time: 108,912 ms
Time: 108,267 ms
Time: 114,848 ms


It's 23% performance regression.

* Just curious, i'm also testing v5 patch performance (notice the namespace
in the result):
select unnest(xpath('//a:flow', x, ARRAY[['a','
http://www.prodml.org/schemas/1series']])) from u;

unnest
---
 http://www.prodml.org/schemas/1series";>
+
 gas
lift+
Time: 92,552 ms
Time: 97,440 ms
Time: 99,309 ms

The regression is only 13%. I know the xmlCopyNode() version (v6 patch) is
much more cleaner than v5patch, should we consider the performance benefit?

Anyway, thanks for the review! :)

Regards,
--
Ali Akbar


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-11-05 Thread Ali Akbar
2014-11-04 22:16 GMT+07:00 Peter Eisentraut :

> On 10/6/14 10:24 PM, Ali Akbar wrote:
> > While reviewing the patch myself, i spotted some formatting problems in
> > the code. Fixed in this v5 patch.
> >
> > Also, this patch uses context patch format (in first versions, because
> > of the small modification, context patch format obfucates the changes.
> > After reimplementation this isn't the case anymore)
>
> I think the problem this patch is addressing is real, and your approach
> is sound, but I'd ask you to go back to the xmlCopyNode() version, and
> try to add a test case for why the second argument = 1 is necessary.  I
> don't see any other problems.
>

OK. Because upstream code is fixed in current version, i'll revert to the
previous version. Test case added to regression test. With =1 argument, the
result is correct:
http://127.0.0.1\"; xmlns=\"http://127.0.0.2\";
id=\"1\">
   number one
   
 

without the argument, the result is not correct, all children will be lost.
Because of that, the other regression test will fail too because the
children is not copied:
*** 584,593 

  -- Text XPath expressions evaluation
  SELECT xpath('/value', data) FROM xmltest;
! xpath
! --
!  {one}
!  {two}
  (2 rows)

  SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
--- 584,593 

  -- Text XPath expressions evaluation
  SELECT xpath('/value', data) FROM xmltest;
!xpath
! 
!  {}
!  {}
  (2 rows)

  SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
***
... 

updated patch attached.

Regards,
-- 
Ali Akbar
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
***
*** 141,149  static bool print_xml_decl(StringInfo buf, const xmlChar *version,
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
--- 141,151 
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur,
! 	   PgXmlErrorContext *xmlerrcxt);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate,
! 	   PgXmlErrorContext *xmlerrcxt);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
***
*** 3599,3624  SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename,
   * return value otherwise)
   */
  static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur)
  {
  	xmltype*result;
  
  	if (cur->type == XML_ELEMENT_NODE)
  	{
  		xmlBufferPtr buf;
  
  		buf = xmlBufferCreate();
  		PG_TRY();
  		{
! 			xmlNodeDump(buf, NULL, cur, 0, 1);
  			result = xmlBuffer_to_xmltype(buf);
  		}
  		PG_CATCH();
  		{
  			xmlBufferFree(buf);
  			PG_RE_THROW();
  		}
  		PG_END_TRY();
  		xmlBufferFree(buf);
  	}
  	else
--- 3601,3640 
   * return value otherwise)
   */
  static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
  {
  	xmltype*result;
  
  	if (cur->type == XML_ELEMENT_NODE)
  	{
  		xmlBufferPtr buf;
+ 		xmlNodePtr cur_copy;
  
  		buf = xmlBufferCreate();
+ 
+ 		/* the result of xmlNodeDump won't contain namespace definitions
+ 		 * from parent nodes, but xmlCopyNode duplicates a node along
+ 		 * with its required namespace definitions.
+ 		 */
+ 		cur_copy = xmlCopyNode(cur, 1);
+ 
+ 		if (cur_copy == NULL)
+ 			xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ 		"could not serialize xml");
+ 
  		PG_TRY();
  		{
! 			xmlNodeDump(buf, NULL, cur_copy, 0, 1);
  			result = xmlBuffer_to_xmltype(buf);
  		}
  		PG_CATCH();
  		{
+ 			xmlFreeNode(cur_copy);
  			xmlBufferFree(buf);
  			PG_RE_THROW();
  		}
  		PG_END_TRY();
+ 		xmlFreeNode(cur_copy);
  		xmlBufferFree(buf);
  	}
  	else
***
*** 3660,3666  xml_xmlnodetoxmltype(xmlNodePtr cur)
   */
  static int
  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate)
  {
  	int			result = 0;
  	Datum		datum;
--- 3676,3683 
   */
  static int
  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate,
! 	   PgXmlErrorContext *xmlerrcxt)
  {
  	int			result = 0;
  	Datum		datum;
***
*** 3682,3688  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
  
  	for (i = 0; i < result; i++)
  	{
! 		datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i]));
  		*astate = accumArrayResult(*astate, datum,
     false, XMLOID,
     CurrentMemoryContext);
--- 3699,3706 
  
  	fo

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-11-04 Thread Peter Eisentraut
On 10/6/14 10:24 PM, Ali Akbar wrote:
> While reviewing the patch myself, i spotted some formatting problems in
> the code. Fixed in this v5 patch.
> 
> Also, this patch uses context patch format (in first versions, because
> of the small modification, context patch format obfucates the changes.
> After reimplementation this isn't the case anymore)

I think the problem this patch is addressing is real, and your approach
is sound, but I'd ask you to go back to the xmlCopyNode() version, and
try to add a test case for why the second argument = 1 is necessary.  I
don't see any other problems.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-11-04 Thread Peter Eisentraut
On 7/8/14 6:03 AM, Ali Akbar wrote:
> If we put 1 as the parameter, the resulting Node will link to the
> existing children. In this case, the namespace problem for the children
> might be still exists. If any of the children uses different
> namespace(s) than the parent, the namespace definition will not be
> copied correctly.

This analysis might very well be right, but I think we should prove it
with a test case.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-11-04 Thread Peter Eisentraut
On 7/11/14 4:36 AM, Ali Akbar wrote:
> But i found some bug in libxml2's code, because we call xmlCopyNode with
> 1 as the second parameter, internally xmlCopyNode will call
> xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And
> xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode
> whether it's NULL. So if someday the recursion returns NULL (maybe
> because of memory exhaustion), it will SEGFAULT.
> 
> Knowing this but in libxml2 code, is this patch is still acceptable?

This problem was fixed in libxml2 2.9.2 (see
https://bugzilla.gnome.org/show_bug.cgi?id=708681).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-10-06 Thread Ali Akbar
While reviewing the patch myself, i spotted some formatting problems in the
code. Fixed in this v5 patch.

Also, this patch uses context patch format (in first versions, because of
the small modification, context patch format obfucates the changes. After
reimplementation this isn't the case anymore)

Thanks,
Ali Akbar
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
***
*** 141,149  static bool print_xml_decl(StringInfo buf, const xmlChar *version,
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
--- 141,151 
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur,
! 	   PgXmlErrorContext *xmlerrcxt);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate,
! 	   PgXmlErrorContext *xmlerrcxt);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
***
*** 3594,3620  SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename,
  
  #ifdef USE_LIBXML
  
  /*
   * Convert XML node to text (dump subtree in case of element,
   * return value otherwise)
   */
  static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur)
  {
  	xmltype*result;
  
  	if (cur->type == XML_ELEMENT_NODE)
  	{
  		xmlBufferPtr buf;
  
  		buf = xmlBufferCreate();
  		PG_TRY();
  		{
  			xmlNodeDump(buf, NULL, cur, 0, 1);
  			result = xmlBuffer_to_xmltype(buf);
  		}
  		PG_CATCH();
  		{
  			xmlBufferFree(buf);
  			PG_RE_THROW();
  		}
--- 3596,3706 
  
  #ifdef USE_LIBXML
  
+ /* check ns definition of node and its childrens. If any one of ns is
+  * not defined in node and it's children, but in the node's parent,
+  * copy the definition to node.
+  */
+ static void
+ xml_checkandcopyns(xmlNodePtr root,
+    PgXmlErrorContext *xmlerrcxt,
+    xmlNodePtr node,
+    xmlNsPtr lastns_before)
+ {
+ 	xmlNsPtr ns = NULL;
+ 	xmlNsPtr cur_ns;
+ 	xmlNodePtr cur;
+ 
+ 	if (node->ns != NULL)
+ 	{
+ 		/* check in new nses */
+ 		cur_ns = lastns_before == NULL ? node->nsDef : lastns_before->next;
+ 		while (cur_ns != NULL)
+ 		{
+ 			if (cur_ns->href != NULL)
+ 			{
+ if (((cur_ns->prefix == NULL) && (node->ns->prefix == NULL)) ||
+ 	((cur_ns->prefix != NULL) && (node->ns->prefix != NULL) &&
+ 	 xmlStrEqual(cur_ns->prefix, node->ns->prefix)))
+ {
+ 	ns = cur_ns;
+ 	break;
+ }
+ 			}
+ 			cur_ns = cur_ns->next;
+ 		}
+ 		if (ns == NULL) /* not in new nses */
+ 		{
+ 			ns = xmlSearchNs(NULL, node->parent, node->ns->prefix);
+ 
+ 			if (ns != NULL)
+ 			{
+ ns = xmlNewNs(root, ns->href, ns->prefix);
+ 
+ if (ns == NULL && xmlerrcxt->err_occurred)
+ 	xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ 		"could not allocate xmlNs");
+ 			}
+ 		}
+ 	}
+ 	/* check and copy ns for children recursively */
+ 	cur = node->children;
+ 	while (cur != NULL)
+ 	{
+ 		xml_checkandcopyns(root, xmlerrcxt, cur, lastns_before);
+ 		cur = cur->next;
+ 	}
+ }
+ 
  /*
   * Convert XML node to text (dump subtree in case of element,
   * return value otherwise)
   */
  static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
  {
  	xmltype*result;
  
  	if (cur->type == XML_ELEMENT_NODE)
  	{
  		xmlBufferPtr buf;
+ 		xmlNsPtr lastns_before;
+ 		xmlNsPtr ns;
+ 		xmlNsPtr next;
  
  		buf = xmlBufferCreate();
+ 
  		PG_TRY();
  		{
+ 			lastns_before = cur->nsDef;
+ 			if (lastns_before != NULL)
+ 			{
+ while (lastns_before->next != NULL)
+ 	lastns_before = lastns_before->next;
+ 			}
+ 			xml_checkandcopyns(cur, xmlerrcxt, cur, lastns_before);
+ 
  			xmlNodeDump(buf, NULL, cur, 0, 1);
  			result = xmlBuffer_to_xmltype(buf);
+ 
+ 			/* delete and free new nses */
+ 			ns = lastns_before == NULL ? cur->nsDef : lastns_before->next;
+ 			while (ns != NULL)
+ 			{
+ next = ns->next;
+ xmlFree(ns);
+ ns = next;
+ 			}
+ 
+ 			if (lastns_before == NULL)
+ cur->nsDef = NULL;
+ 			else
+ lastns_before->next = NULL;
  		}
  		PG_CATCH();
  		{
+ 			/* new namespaces will be freed while free-ing the node, so we
+ 			 * won't free it here
+ 			 */
  			xmlBufferFree(buf);
  			PG_RE_THROW();
  		}
***
*** 3660,3666  xml_xmlnodetoxmltype(xmlNodePtr cur)
   */
  static int
  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate)
  {
  	int			result = 0;
  	Datum		datum;
--- 3746,3753 
   */
  static int
  xml_xpathobjtoxmla

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-08-29 Thread Ali Akbar
Greetings,

Because of the memory bug in xmlCopyNode, this is a new patch with
different method. In this patch, instead of using xmlCopyNode to bring the
namespace back, we added the required namespaces to the  node before
dumping the node to string, and cleaning it up afterwards (because the same
node could be dumped again in next xpath result).

Thanks,
Ali Akbar


2014-07-11 15:36 GMT+07:00 Ali Akbar :

> Greetings,
>
> Attached modified patch to handle xmlCopyNode returning NULL. The patch is
> larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt
> is needed for calling xml_ereport).
>
> From libxml2 source, it turns out that the other cases that xmlCopyNode
> will return NULL will not happen. So in this patch i assume that the only
> case is memory exhaustion.
>
> But i found some bug in libxml2's code, because we call xmlCopyNode with 1
> as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode
> recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't
> check the return of xmlStaticCopyNode whether it's NULL. So if someday the
> recursion returns NULL (maybe because of memory exhaustion), it will
> SEGFAULT.
>
> Knowing this but in libxml2 code, is this patch is still acceptable?
>
> Thanks,
> Ali Akbar
>



-- 
Ali Akbar
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 422be69..f34c87e 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -141,9 +141,11 @@ static bool print_xml_decl(StringInfo buf, const xmlChar *version,
 			   pg_enc encoding, int standalone);
 static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
 		  bool preserve_whitespace, int encoding);
-static text *xml_xmlnodetoxmltype(xmlNodePtr cur);
+static text *xml_xmlnodetoxmltype(xmlNodePtr cur,
+	   PgXmlErrorContext *xmlerrcxt);
 static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
-	   ArrayBuildState **astate);
+	   ArrayBuildState **astate,
+	   PgXmlErrorContext *xmlerrcxt);
 #endif   /* USE_LIBXML */
 
 static StringInfo query_to_xml_internal(const char *query, char *tablename,
@@ -3590,27 +3592,109 @@ SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename,
 
 #ifdef USE_LIBXML
 
+/* check ns definition of node and its childrens. If any one of ns is
+ * not defined in node and it's children, but in the node's parent,
+ * copy the definition to node.
+ */
+static void
+xml_checkandcopyns(xmlNodePtr root,
+   PgXmlErrorContext *xmlerrcxt,
+   xmlNodePtr node,
+   xmlNsPtr lastns_before)
+{
+	xmlNsPtr ns = NULL;
+	xmlNsPtr cur_ns;
+	xmlNodePtr cur;
+
+	if (node->ns != NULL)
+	{
+		/* check in new nses */
+		cur_ns = lastns_before == NULL ? node->nsDef : lastns_before->next;
+		while (cur_ns != NULL)
+		{
+			if (cur_ns->href != NULL)
+			{
+if (((cur_ns->prefix == NULL) && (node->ns->prefix == NULL)) ||
+	((cur_ns->prefix != NULL) && (node->ns->prefix != NULL) &&
+	 xmlStrEqual(cur_ns->prefix, node->ns->prefix)))
+{
+	ns = cur_ns;
+	break;
+}
+			}
+			cur_ns = cur_ns->next;
+		}
+		if (ns == NULL) /* not in new nses */
+		{
+			ns = xmlSearchNs(NULL, node->parent, node->ns->prefix);
+
+			if (ns != NULL) {
+ns = xmlNewNs(root, ns->href, ns->prefix);
+
+if (ns == NULL && xmlerrcxt->err_occurred) {
+	xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+		"could not allocate xmlNs");
+}
+			}
+		}
+	}
+	/* check and copy ns for children recursively */
+	cur = node->children;
+	while (cur != NULL) {
+		xml_checkandcopyns(root, xmlerrcxt, cur, lastns_before);
+		cur = cur->next;
+	}
+}
+
 /*
  * Convert XML node to text (dump subtree in case of element,
  * return value otherwise)
  */
 static text *
-xml_xmlnodetoxmltype(xmlNodePtr cur)
+xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
 {
 	xmltype*result;
 
 	if (cur->type == XML_ELEMENT_NODE)
 	{
 		xmlBufferPtr buf;
+		xmlNsPtr lastns_before;
+		xmlNsPtr ns;
+		xmlNsPtr next;
 
 		buf = xmlBufferCreate();
+
 		PG_TRY();
 		{
+			lastns_before = cur->nsDef;
+			if (lastns_before != NULL) {
+while (lastns_before->next != NULL) {
+	lastns_before = lastns_before->next;
+}
+			}
+			xml_checkandcopyns(cur, xmlerrcxt, cur, lastns_before);
+
 			xmlNodeDump(buf, NULL, cur, 0, 1);
 			result = xmlBuffer_to_xmltype(buf);
+
+			/* delete and free new nses */
+			ns = lastns_before == NULL ? cur->nsDef : lastns_before->next;
+			while (ns != NULL) {
+next = ns->next;
+xmlFree(ns);
+ns = next;
+			}
+			if (lastns_before == NULL) {
+cur->nsDef = NULL;
+			} else {
+lastns_before->next = NULL;
+			}
 		}
 		PG_CATCH();
 		{
+			/* new namespaces will be freed while free-ing the node, so we
+			 * won't free it here
+			 */
 			xmlBufferFree(buf);
 			PG_RE_THROW();
 		}
@@ -3656,7 +3740,8 @@ xml_xmlnodetoxmltype(xmlNodePtr cur)
  */
 static int
 xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
-	   ArrayBuild

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-07-11 Thread Ali Akbar
Greetings,

Attached modified patch to handle xmlCopyNode returning NULL. The patch is
larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt
is needed for calling xml_ereport).

>From libxml2 source, it turns out that the other cases that xmlCopyNode
will return NULL will not happen. So in this patch i assume that the only
case is memory exhaustion.

But i found some bug in libxml2's code, because we call xmlCopyNode with 1
as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode
recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't
check the return of xmlStaticCopyNode whether it's NULL. So if someday the
recursion returns NULL (maybe because of memory exhaustion), it will
SEGFAULT.

Knowing this but in libxml2 code, is this patch is still acceptable?

Thanks,
Ali Akbar
*** a/src/backend/utils/adt/xml.c
--- b/src/backend/utils/adt/xml.c
***
*** 141,149  static bool print_xml_decl(StringInfo buf, const xmlChar *version,
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
--- 141,151 
  			   pg_enc encoding, int standalone);
  static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg,
  		  bool preserve_whitespace, int encoding);
! static text *xml_xmlnodetoxmltype(xmlNodePtr cur,
! 	   PgXmlErrorContext *xmlerrcxt);
  static int xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate,
! 	   PgXmlErrorContext *xmlerrcxt);
  #endif   /* USE_LIBXML */
  
  static StringInfo query_to_xml_internal(const char *query, char *tablename,
***
*** 3595,3620  SPI_sql_row_to_xmlelement(int rownum, StringInfo result, char *tablename,
   * return value otherwise)
   */
  static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur)
  {
  	xmltype*result;
  
  	if (cur->type == XML_ELEMENT_NODE)
  	{
  		xmlBufferPtr buf;
  
  		buf = xmlBufferCreate();
  		PG_TRY();
  		{
! 			xmlNodeDump(buf, NULL, cur, 0, 1);
  			result = xmlBuffer_to_xmltype(buf);
  		}
  		PG_CATCH();
  		{
  			xmlBufferFree(buf);
  			PG_RE_THROW();
  		}
  		PG_END_TRY();
  		xmlBufferFree(buf);
  	}
  	else
--- 3597,3636 
   * return value otherwise)
   */
  static text *
! xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
  {
  	xmltype*result;
  
  	if (cur->type == XML_ELEMENT_NODE)
  	{
  		xmlBufferPtr buf;
+ 		xmlNodePtr cur_copy;
  
  		buf = xmlBufferCreate();
+ 
+ 		/* the result of xmlNodeDump won't contain namespace definitions
+ 		 * from parent nodes, but xmlCopyNode duplicates a node along
+ 		 * with its required namespace definitions.
+ 		 */
+ 		cur_copy = xmlCopyNode(cur, 1);
+ 
+ 		if (cur_copy == NULL)
+ 			xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
+ 		"could not serialize xml");
+ 
  		PG_TRY();
  		{
! 			xmlNodeDump(buf, NULL, cur_copy, 0, 1);
  			result = xmlBuffer_to_xmltype(buf);
  		}
  		PG_CATCH();
  		{
+ 			xmlFreeNode(cur_copy);
  			xmlBufferFree(buf);
  			PG_RE_THROW();
  		}
  		PG_END_TRY();
+ 		xmlFreeNode(cur_copy);
  		xmlBufferFree(buf);
  	}
  	else
***
*** 3656,3662  xml_xmlnodetoxmltype(xmlNodePtr cur)
   */
  static int
  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate)
  {
  	int			result = 0;
  	Datum		datum;
--- 3672,3679 
   */
  static int
  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
! 	   ArrayBuildState **astate,
! 	   PgXmlErrorContext *xmlerrcxt)
  {
  	int			result = 0;
  	Datum		datum;
***
*** 3678,3684  xml_xpathobjtoxmlarray(xmlXPathObjectPtr xpathobj,
  
  	for (i = 0; i < result; i++)
  	{
! 		datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i]));
  		*astate = accumArrayResult(*astate, datum,
     false, XMLOID,
     CurrentMemoryContext);
--- 3695,3702 
  
  	for (i = 0; i < result; i++)
  	{
! 		datum = PointerGetDatum(xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[i],
! 	 xmlerrcxt));
  		*astate = accumArrayResult(*astate, datum,
     false, XMLOID,
     CurrentMemoryContext);
***
*** 3882,3890  xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
  		 * Extract the results as requested.
  		 */
  		if (res_nitems != NULL)
! 			*res_nitems = xml_xpathobjtoxmlarray(xpathobj, astate);
  		else
! 			(void) xml_xpathobjtoxmlarray(xpathobj, astate);
  	}
  	PG_CATCH();
  	{
--- 3900,3908 
  		 * Extract the results as requested.
  		 */
  		if (res_nitems != NULL)
! 			*res_nitems = xml_xpa

Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-07-08 Thread Ali Akbar
> I don't know enough about XML/XPATH to know if this is a good idea or not,
>

Actually currently because of the namespace problem, xpath() returns wrong
result (even worse: invalid xml!). So this patch corrects the behavior of
xpath() to the correct one.


> but I did go read the documentation of xmlCopyNode(), and I notice it says
> the second argument is
>
> extended: if 1 do a recursive copy (properties, namespaces and children
> when applicable) if 2 copy properties and namespaces (when applicable)
>
> Do we really want it to "copy children"?  Perhaps the value should be 2?
> (I have no idea, I'm just raising the question.)
>

If we put 1 as the parameter, the resulting Node will link to the existing
children. In this case, the namespace problem for the children might be
still exists. If any of the children uses different namespace(s) than the
parent, the namespace definition will not be copied correctly.

Just to be safe, in the patch 1 passed as the second argument.

Ideally, we can traverse the Node object and generate XML accordingly, but
it is a lot of work, and a lot of duplicating libxml's code. Maybe we
should push this upstream to libxml?

I also notice that it says
>
> Returns: a new #xmlNodePtr, or NULL in case of error.
>
> and the patch is failing to cover the error case.  Most likely, that
> would represent out-of-memory, so it definitely seems to be something
> we should account for.
>

Ah, overlooked that.

Skimming through libxml2's source, it looks like there are some other cases
beside out-of-memory. Will update the patch according to these cases.

Thanks for the review.

-- 
Ali Akbar


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-07-07 Thread Tom Lane
Abhijit Menon-Sen  writes:
> I can confirm that it works fine. I have attached here a very slightly
> tweaked version of the patch (removed trailing whitespace, and changed
> some comment text).

> I'm marking this ready for committer.

I don't know enough about XML/XPATH to know if this is a good idea or not,
but I did go read the documentation of xmlCopyNode(), and I notice it says
the second argument is

extended: if 1 do a recursive copy (properties, namespaces and children
when applicable) if 2 copy properties and namespaces (when applicable)

Do we really want it to "copy children"?  Perhaps the value should be 2?
(I have no idea, I'm just raising the question.)

I also notice that it says

Returns: a new #xmlNodePtr, or NULL in case of error.

and the patch is failing to cover the error case.  Most likely, that
would represent out-of-memory, so it definitely seems to be something
we should account for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-06-22 Thread Ali Akbar
2014-06-16 22:52 GMT+07:00 Abhijit Menon-Sen :

> Thanks for the patch, and welcome to Postgres development.
>


> I can confirm that it works fine. I have attached here a very slightly
> tweaked version of the patch (removed trailing whitespace, and changed
> some comment text).
>
> I'm marking this ready for committer.
>

Thanks for the review. Hope i will be able to contribute a little here and
there in the future.

-- 
Ali Akbar