Re: [HACKERS] review: xml_is_well_formed
Mike Fowler m...@mlfowler.com writes: On 11/08/10 21:27, Tom Lane wrote: Yes. Mike, are you expecting to submit a new version before the end of the week? Yes and here it is, apologies for the delay. I have re-implemented xml_is_well_formed such that it is sensitive to the XMLOPTION. The additional _document and _content methods are now present. Tests and documentation adjusted to suit. Applied with minor cleanups, mostly in the documentation. The only thing that seems worth remarking on is that since xml_is_well_formed now depends on a GUC variable, it *cannot* be marked IMMUTABLE. The right marking in such cases is STABLE. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
Hello I checked last version: * there are not a problem with regress and contrib regress tests * the design is simple and clean now - well documented notes: * don't get a patch via copy/paste from mailing list archive - there are a broken xml2 tests via this access! * I didn't find a sentence so default for xml_is_well_formed a content checking - some like without change of xmloption, the behave is same as xml_is_well_formed_content Regards Pavel Stehule 2010/8/11 Mike Fowler m...@mlfowler.com: On 11/08/10 21:27, Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: On Mon, Aug 9, 2010 at 10:41 AM, Robert Haasrobertmh...@gmail.com wrote: There's also the fact that it would probably end up parsing the data twice. Given xmloption, I'm inclined to think Tom has it right: provided xml_is_well_formed() that follows xmloption, plus a specific version for each of content and document. Another reasonable option here would be to forget about having xml_is_well_formed() per se and ONLY offer xml_is_well_formed_content() and xml_is_well_formed_document(). We already have xml_is_well_formed(); just dropping it doesn't seem like a helpful choice. As a project management note, this CommitFest is over in 4 days, so unless we have a new version of this patch real soon now we need to defer it to the September 15th CommitFest Yes. Mike, are you expecting to submit a new version before the end of the week? Yes and here it is, apologies for the delay. I have re-implemented xml_is_well_formed such that it is sensitive to the XMLOPTION. The additional _document and _content methods are now present. Tests and documentation adjusted to suit. Regards, -- Mike Fowler Registered Linux user: 379787 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On Mon, Aug 9, 2010 at 10:41 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Aug 9, 2010 at 10:20 AM, Peter Eisentraut pete...@gmx.net wrote: On lör, 2010-08-07 at 16:47 +0100, Mike Fowler wrote: To be honest I'm happiest with returning a boolean, even if there is some confusion over content only being valid. Though changing the return value to DOCUMENT/CONTENT/NULL makes things a touch more explicit, the same results can be achieved by simply running: SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND data::xml IS DOCUMENT; Note that this wouldn't necessarily work because it is not guaranteed that the well-formedness test is executed before the cast to xml. SQL doesn't short-circuit left to right. (A CASE expression could work.) There's also the fact that it would probably end up parsing the data twice. Given xmloption, I'm inclined to think Tom has it right: provided xml_is_well_formed() that follows xmloption, plus a specific version for each of content and document. Another reasonable option here would be to forget about having xml_is_well_formed() per se and ONLY offer xml_is_well_formed_content() and xml_is_well_formed_document(). As a project management note, this CommitFest is over in 4 days, so unless we have a new version of this patch real soon now we need to defer it to the September 15th CommitFest (of course not precluding the possibility that someone will pick it up and commit it sooner, but we're not going to postpone 9.1alpha1 for this patch). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
Robert Haas robertmh...@gmail.com writes: On Mon, Aug 9, 2010 at 10:41 AM, Robert Haas robertmh...@gmail.com wrote: There's also the fact that it would probably end up parsing the data twice. Given xmloption, I'm inclined to think Tom has it right: provided xml_is_well_formed() that follows xmloption, plus a specific version for each of content and document. Another reasonable option here would be to forget about having xml_is_well_formed() per se and ONLY offer xml_is_well_formed_content() and xml_is_well_formed_document(). We already have xml_is_well_formed(); just dropping it doesn't seem like a helpful choice. As a project management note, this CommitFest is over in 4 days, so unless we have a new version of this patch real soon now we need to defer it to the September 15th CommitFest Yes. Mike, are you expecting to submit a new version before the end of the week? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On 11/08/10 21:27, Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: On Mon, Aug 9, 2010 at 10:41 AM, Robert Haasrobertmh...@gmail.com wrote: There's also the fact that it would probably end up parsing the data twice. Given xmloption, I'm inclined to think Tom has it right: provided xml_is_well_formed() that follows xmloption, plus a specific version for each of content and document. Another reasonable option here would be to forget about having xml_is_well_formed() per se and ONLY offer xml_is_well_formed_content() and xml_is_well_formed_document(). We already have xml_is_well_formed(); just dropping it doesn't seem like a helpful choice. As a project management note, this CommitFest is over in 4 days, so unless we have a new version of this patch real soon now we need to defer it to the September 15th CommitFest Yes. Mike, are you expecting to submit a new version before the end of the week? Yes and here it is, apologies for the delay. I have re-implemented xml_is_well_formed such that it is sensitive to the XMLOPTION. The additional _document and _content methods are now present. Tests and documentation adjusted to suit. Regards, -- Mike Fowler Registered Linux user: 379787 *** a/contrib/xml2/pgxml.sql.in --- b/contrib/xml2/pgxml.sql.in *** *** 5,18 SET search_path = public; --SQL for XML parser - CREATE OR REPLACE FUNCTION xml_is_well_formed(text) RETURNS bool - AS 'MODULE_PATHNAME' - LANGUAGE C STRICT IMMUTABLE; - -- deprecated old name for xml_is_well_formed CREATE OR REPLACE FUNCTION xml_valid(text) RETURNS bool ! AS 'MODULE_PATHNAME', 'xml_is_well_formed' ! LANGUAGE C STRICT IMMUTABLE; CREATE OR REPLACE FUNCTION xml_encode_special_chars(text) RETURNS text AS 'MODULE_PATHNAME' --- 5,14 --SQL for XML parser -- deprecated old name for xml_is_well_formed CREATE OR REPLACE FUNCTION xml_valid(text) RETURNS bool ! AS 'xml_is_well_formed' ! LANGUAGE INTERNAL STRICT IMMUTABLE; CREATE OR REPLACE FUNCTION xml_encode_special_chars(text) RETURNS text AS 'MODULE_PATHNAME' *** a/contrib/xml2/uninstall_pgxml.sql --- b/contrib/xml2/uninstall_pgxml.sql *** *** 29,33 DROP FUNCTION xml_encode_special_chars(text); -- deprecated old name for xml_is_well_formed DROP FUNCTION xml_valid(text); - - DROP FUNCTION xml_is_well_formed(text); --- 29,31 *** a/contrib/xml2/xpath.c --- b/contrib/xml2/xpath.c *** *** 27,33 PG_MODULE_MAGIC; /* externally accessible functions */ - Datum xml_is_well_formed(PG_FUNCTION_ARGS); Datum xml_encode_special_chars(PG_FUNCTION_ARGS); Datum xpath_nodeset(PG_FUNCTION_ARGS); Datum xpath_string(PG_FUNCTION_ARGS); --- 27,32 *** *** 71,97 pgxml_parser_init(void) } - /* Returns true if document is well-formed */ - - PG_FUNCTION_INFO_V1(xml_is_well_formed); - - Datum - xml_is_well_formed(PG_FUNCTION_ARGS) - { - text *t = PG_GETARG_TEXT_P(0); /* document buffer */ - int32 docsize = VARSIZE(t) - VARHDRSZ; - xmlDocPtr doctree; - - pgxml_parser_init(); - - doctree = xmlParseMemory((char *) VARDATA(t), docsize); - if (doctree == NULL) - PG_RETURN_BOOL(false); /* i.e. not well-formed */ - xmlFreeDoc(doctree); - PG_RETURN_BOOL(true); - } - - /* Encodes special characters (, , , and \r) as XML entities */ PG_FUNCTION_INFO_V1(xml_encode_special_chars); --- 70,75 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 8625,8630 SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY REF 'townstownTor --- 8625,8736 supports XPath, which is a subset of XQuery. /para /sect3 + +sect3 + titlexml_is_well_formed/title + + indexterm + primaryxml_is_well_formed/primary + secondarywell formed/secondary + /indexterm + + synopsis + functionxml_is_well_formed/function(replaceabletext/replaceable) + /synopsis + + para + The function functionxml_is_well_formed/function evaluates whether + the replaceabletext/replaceable is well formed XML content, returning + a boolean. It is useful for predetermining whether a cast to the XML type + will succeed, and therefore honours the current value of the + literalXMLOPTION/literal session configuration parameter. + /para + para + Example: + screen![CDATA[ + SET xmloption TO DOCUMENT; + SELECT xml_is_well_formed(''); + xml_is_well_formed + + f + (1 row) + + SELECT xml_is_well_formed('abc/'); + xml_is_well_formed + + t + (1 row) + + SET xmloption TO CONTENT; + SELECT xml_is_well_formed('abc'); + xml_is_well_formed + + t + (1 row) + ]]/screen + /para + para + In addition to the structure checks, the function ensures that namespaces are correcty matched. + screen![CDATA[ + SET xmloption TO DOCUMENT; + SELECT xml_is_well_formed('pg:foo
Re: [HACKERS] review: xml_is_well_formed
On lör, 2010-08-07 at 16:47 +0100, Mike Fowler wrote: To be honest I'm happiest with returning a boolean, even if there is some confusion over content only being valid. Though changing the return value to DOCUMENT/CONTENT/NULL makes things a touch more explicit, the same results can be achieved by simply running: SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND data::xml IS DOCUMENT; Note that this wouldn't necessarily work because it is not guaranteed that the well-formedness test is executed before the cast to xml. SQL doesn't short-circuit left to right. (A CASE expression could work.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On Mon, Aug 9, 2010 at 10:20 AM, Peter Eisentraut pete...@gmx.net wrote: On lör, 2010-08-07 at 16:47 +0100, Mike Fowler wrote: To be honest I'm happiest with returning a boolean, even if there is some confusion over content only being valid. Though changing the return value to DOCUMENT/CONTENT/NULL makes things a touch more explicit, the same results can be achieved by simply running: SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND data::xml IS DOCUMENT; Note that this wouldn't necessarily work because it is not guaranteed that the well-formedness test is executed before the cast to xml. SQL doesn't short-circuit left to right. (A CASE expression could work.) There's also the fact that it would probably end up parsing the data twice. Given xmloption, I'm inclined to think Tom has it right: provided xml_is_well_formed() that follows xmloption, plus a specific version for each of content and document. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
Peter Eisentraut pete...@gmx.net writes: On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote: I think the point of this function is to determine whether a cast to xml will throw an error. The behavior should probably match exactly whatever test would be applied there. Maybe there should be xml_is_well_formed() xml_is_well_formed_document() xml_is_well_formed_content() I agree that consistency with SQL/XML is desirable, but for someone coming from the outside, the unqualified claim that 'foo' is well-formed XML might sound suspicious. I think I agree with the later discussion that xml_is_well_formed() should tell you whether a cast to xml would succeed (and hence it has to pay attention to XMLOPTION). However, it seems to also make sense to provide the other two functions suggested here, both to satify people who know XML and to offer tests that will tell you whether XMLPARSE ( { DOCUMENT | CONTENT } value ) will succeed. Merging the three cases into one function doesn't seem like a win for either compatibility or usability. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On Sun, Aug 8, 2010 at 1:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote: I think the point of this function is to determine whether a cast to xml will throw an error. The behavior should probably match exactly whatever test would be applied there. Maybe there should be xml_is_well_formed() xml_is_well_formed_document() xml_is_well_formed_content() I agree that consistency with SQL/XML is desirable, but for someone coming from the outside, the unqualified claim that 'foo' is well-formed XML might sound suspicious. I think I agree with the later discussion that xml_is_well_formed() should tell you whether a cast to xml would succeed (and hence it has to pay attention to XMLOPTION). However, it seems to also make sense to provide the other two functions suggested here, both to satify people who know XML and to offer tests that will tell you whether XMLPARSE ( { DOCUMENT | CONTENT } value ) will succeed. Merging the three cases into one function doesn't seem like a win for either compatibility or usability. +1. I didn't realize how funky the XMLOPTION stuff was at the start of this discussion; I think your analysis here is spot-on. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On 06/08/10 21:55, Peter Eisentraut wrote: On fre, 2010-08-06 at 14:43 +0100, Mike Fowler wrote: Or perhaps it could return a string instead of a boolean: content, document, or NULL if it's neither. I like the sound of that. In fact this helps workaround the IS DOCUMENT and IS CONTENT limitations such that you can you can select only content, only documents or both is you use IS NOT NULL. Unless anyone sees a reason that this function needs to remain a boolean function, I'll rework the patch over the weekend. What is the actual use case for this function? Is the above behavior actually useful? The idea is to be able to filter a table that contains XML in TEXT that might not be well formed. Knowing that you're only dealing with well formed XML prevents you blowing up when you attempt the cast. One reason to stick with boolean is backward compatibility. To be honest I'm happiest with returning a boolean, even if there is some confusion over content only being valid. Though changing the return value to DOCUMENT/CONTENT/NULL makes things a touch more explicit, the same results can be achieved by simply running: SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND data::xml IS DOCUMENT; Regards, -- Mike Fowler Registered Linux user: 379787 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On 03/08/10 16:15, Peter Eisentraut wrote: On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote: Well-formedness should probably only allow XML documents. I think the point of this function is to determine whether a cast to xml will throw an error. The behavior should probably match exactly whatever test would be applied there. Maybe there should be xml_is_well_formed() xml_is_well_formed_document() xml_is_well_formed_content() I agree that consistency with SQL/XML is desirable, but for someone coming from the outside, the unqualified claim that 'foo' is well-formed XML might sound suspicious. What about making the function sensitive to the XML OPTION, such that: test=# SET xmloption TO DOCUMENT; SET text=# SELECT xml_is_well_formed('foo'); xml_is_well_formed f (1 row) test=# SET xmloption TO CONTENT; SET text=# SELECT xml_is_well_formed('foo'); xml_is_well_formed t (1 row) with the inverse for DOCUMENTS? To me this makes the most sense as it makes the function behave much more like the other xml functions. -- Mike Fowler Registered Linux user: 379787 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On Fri, Aug 6, 2010 at 4:28 AM, Mike Fowler m...@mlfowler.com wrote: On 03/08/10 16:15, Peter Eisentraut wrote: On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote: Well-formedness should probably only allow XML documents. I think the point of this function is to determine whether a cast to xml will throw an error. The behavior should probably match exactly whatever test would be applied there. Maybe there should be xml_is_well_formed() xml_is_well_formed_document() xml_is_well_formed_content() I agree that consistency with SQL/XML is desirable, but for someone coming from the outside, the unqualified claim that 'foo' is well-formed XML might sound suspicious. What about making the function sensitive to the XML OPTION, such that: test=# SET xmloption TO DOCUMENT; SET text=# SELECT xml_is_well_formed('foo'); xml_is_well_formed f (1 row) That will make using this function a huge hassle, won't it? Functions that do different things depending on GUC settings are usually troublesome. Having three functions would be more sensible if we need all three behaviors, but I don't see why we do. Or perhaps it could return a string instead of a boolean: content, document, or NULL if it's neither. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On 06/08/10 12:31, Robert Haas wrote: Maybe there should be xml_is_well_formed() xml_is_well_formed_document() xml_is_well_formed_content() I agree that consistency with SQL/XML is desirable, but for someone coming from the outside, the unqualified claim that 'foo' is well-formed XML might sound suspicious. What about making the function sensitive to the XML OPTION, such that: test=# SET xmloption TO DOCUMENT; SET text=# SELECT xml_is_well_formed('foo'); xml_is_well_formed f (1 row) That will make using this function a huge hassle, won't it? Functions that do different things depending on GUC settings are usually troublesome. Having three functions would be more sensible if we need all three behaviors, but I don't see why we do. Or perhaps it could return a string instead of a boolean: content, document, or NULL if it's neither. I like the sound of that. In fact this helps workaround the IS DOCUMENT and IS CONTENT limitations such that you can you can select only content, only documents or both is you use IS NOT NULL. Unless anyone sees a reason that this function needs to remain a boolean function, I'll rework the patch over the weekend. Regards, -- Mike Fowler Registered Linux user: 379787 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On fre, 2010-08-06 at 07:31 -0400, Robert Haas wrote: What about making the function sensitive to the XML OPTION, such that: test=# SET xmloption TO DOCUMENT; SET text=# SELECT xml_is_well_formed('foo'); xml_is_well_formed f (1 row) That will make using this function a huge hassle, won't it? Functions that do different things depending on GUC settings are usually troublesome. Having three functions would be more sensible if we need all three behaviors, but I don't see why we do. Upthread you opined that this function should essentially indicate whether a cast to type xml would succeed, and observing the xmloption is an essential part of that. I had assumed the original patch actually did that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On fre, 2010-08-06 at 14:43 +0100, Mike Fowler wrote: Or perhaps it could return a string instead of a boolean: content, document, or NULL if it's neither. I like the sound of that. In fact this helps workaround the IS DOCUMENT and IS CONTENT limitations such that you can you can select only content, only documents or both is you use IS NOT NULL. Unless anyone sees a reason that this function needs to remain a boolean function, I'll rework the patch over the weekend. What is the actual use case for this function? Is the above behavior actually useful? One reason to stick with boolean is backward compatibility. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On Fri, Aug 6, 2010 at 4:52 PM, Peter Eisentraut pete...@gmx.net wrote: On fre, 2010-08-06 at 07:31 -0400, Robert Haas wrote: What about making the function sensitive to the XML OPTION, such that: test=# SET xmloption TO DOCUMENT; SET text=# SELECT xml_is_well_formed('foo'); xml_is_well_formed f (1 row) That will make using this function a huge hassle, won't it? Functions that do different things depending on GUC settings are usually troublesome. Having three functions would be more sensible if we need all three behaviors, but I don't see why we do. Upthread you opined that this function should essentially indicate whether a cast to type xml would succeed, and observing the xmloption is an essential part of that. I had assumed the original patch actually did that. Oh. I didn't realize the casting behavior was already dependent on the GUC. Yuck. Maybe we should following the GUC after all, then. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote: Well-formedness should probably only allow XML documents. I think the point of this function is to determine whether a cast to xml will throw an error. The behavior should probably match exactly whatever test would be applied there. Maybe there should be xml_is_well_formed() xml_is_well_formed_document() xml_is_well_formed_content() I agree that consistency with SQL/XML is desirable, but for someone coming from the outside, the unqualified claim that 'foo' is well-formed XML might sound suspicious. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
Hello 2010/8/3 Peter Eisentraut pete...@gmx.net: On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote: Well-formedness should probably only allow XML documents. I think the point of this function is to determine whether a cast to xml will throw an error. The behavior should probably match exactly whatever test would be applied there. Maybe there should be xml_is_well_formed() xml_is_well_formed_document() xml_is_well_formed_content() I agree that consistency with SQL/XML is desirable, but for someone coming from the outside, the unqualified claim that 'foo' is well-formed XML might sound suspicious. yes, it is little bit curious - but it can be just documented. Now, I don't think, so we need more functions. Regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
2010/7/31 Robert Haas robertmh...@gmail.com: On Sat, Jul 31, 2010 at 8:10 AM, Peter Eisentraut pete...@gmx.net wrote: On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote: * xml_is_well_formed returns true for simple text postgres=# SELECT xml_is_well_formed(''); xml_is_well_formed t (1 row) it is probably wrong result - is it ok?? Yes this is OK, pure text is valid XML content. Are you speaking of XML content fragments that SQL/XML defines? Well-formedness should probably only allow XML documents. I think the point of this function is to determine whether a cast to xml will throw an error. The behavior should probably match exactly whatever test would be applied there. I agree with this idea - so I am able to do: postgres=# select 'xxx'::xml; xml - xxx (1 row) I have not any suggestions now - so I'll change flag to ready to commit Regards Pavel Stehule -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
2010/8/2 Pavel Stehule pavel.steh...@gmail.com: 2010/7/31 Robert Haas robertmh...@gmail.com: On Sat, Jul 31, 2010 at 8:10 AM, Peter Eisentraut pete...@gmx.net wrote: On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote: * xml_is_well_formed returns true for simple text postgres=# SELECT xml_is_well_formed(''); xml_is_well_formed t (1 row) it is probably wrong result - is it ok?? Yes this is OK, pure text is valid XML content. Are you speaking of XML content fragments that SQL/XML defines? Well-formedness should probably only allow XML documents. I think the point of this function is to determine whether a cast to xml will throw an error. The behavior should probably match exactly whatever test would be applied there. I agree with this idea - so I am able to do: postgres=# select 'xxx'::xml; xml - xxx (1 row) I have not any suggestions now - so I'll change flag to ready to commit sorry - contrib module should be a fixed patch attached Regards Pavel Stehule -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company *** ./contrib/xml2/pgxml.sql.in.orig 2010-03-01 19:07:59.0 +0100 --- ./contrib/xml2/pgxml.sql.in 2010-08-02 08:40:27.885789791 +0200 *** *** 5,18 --SQL for XML parser - CREATE OR REPLACE FUNCTION xml_is_well_formed(text) RETURNS bool - AS 'MODULE_PATHNAME' - LANGUAGE C STRICT IMMUTABLE; - -- deprecated old name for xml_is_well_formed CREATE OR REPLACE FUNCTION xml_valid(text) RETURNS bool ! AS 'MODULE_PATHNAME', 'xml_is_well_formed' ! LANGUAGE C STRICT IMMUTABLE; CREATE OR REPLACE FUNCTION xml_encode_special_chars(text) RETURNS text AS 'MODULE_PATHNAME' --- 5,14 --SQL for XML parser -- deprecated old name for xml_is_well_formed CREATE OR REPLACE FUNCTION xml_valid(text) RETURNS bool ! AS 'xml_is_well_formed' ! LANGUAGE INTERNAL STRICT IMMUTABLE; CREATE OR REPLACE FUNCTION xml_encode_special_chars(text) RETURNS text AS 'MODULE_PATHNAME' *** ./contrib/xml2/uninstall_pgxml.sql.orig 2007-11-13 05:24:29.0 +0100 --- ./contrib/xml2/uninstall_pgxml.sql 2010-08-02 08:38:40.134785878 +0200 *** *** 29,33 -- deprecated old name for xml_is_well_formed DROP FUNCTION xml_valid(text); - - DROP FUNCTION xml_is_well_formed(text); --- 29,31 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On 02/08/10 07:46, Pavel Stehule wrote: I have not any suggestions now - so I'll change flag to ready to commit sorry - contrib module should be a fixed patch attached Thanks Pavel, you saved me some time! Regards, -- Mike Fowler Registered Linux user: 379787 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote: * xml_is_well_formed returns true for simple text postgres=# SELECT xml_is_well_formed(''); xml_is_well_formed t (1 row) it is probably wrong result - is it ok?? Yes this is OK, pure text is valid XML content. Are you speaking of XML content fragments that SQL/XML defines? Well-formedness should probably only allow XML documents. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
On Sat, Jul 31, 2010 at 8:10 AM, Peter Eisentraut pete...@gmx.net wrote: On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote: * xml_is_well_formed returns true for simple text postgres=# SELECT xml_is_well_formed(''); xml_is_well_formed t (1 row) it is probably wrong result - is it ok?? Yes this is OK, pure text is valid XML content. Are you speaking of XML content fragments that SQL/XML defines? Well-formedness should probably only allow XML documents. I think the point of this function is to determine whether a cast to xml will throw an error. The behavior should probably match exactly whatever test would be applied there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: xml_is_well_formed
Hi Pavel, Thanks for taking the time to review my patch. Attached is a new version addressing your concerns. On 29/07/10 14:21, Pavel Stehule wrote: I have a few issues: * broken regress test (fedora 13 - xmllint: using libxml version 20707) postgres=# SELECT xml_is_well_formed('pg:foo xmlns:pg=http://postgresql.org/stuff;;bar/pg:foo'); xml_is_well_formed f (1 row) this xml is broken - but in regress tests is ok [pa...@pavel-stehule ~]$ xmllint xxx xxx:1: parser error : error parsing attribute name pg:foo xmlns:pg=http://postgresql.org/stuff;;bar/pg:foo This is a problem that was observered recently by Thom Brown - the commit fest webapp adds the semicolon after the quote. If you look at the attachment I sent in a email client you'll find there is no semicolon. The patch attached here will also not have the semicolon. * xml_is_well_formed returns true for simple text postgres=# SELECT xml_is_well_formed(''); xml_is_well_formed t (1 row) it is probably wrong result - is it ok?? Yes this is OK, pure text is valid XML content. * I don't understand to this fragment PG_TRY(); + { + size_t count; + xmlChar*version = NULL; + int standalone = -1; +. + res_code = parse_xml_decl(string,count,version, NULL,standalone); + if (res_code != 0) + xml_ereport_by_code(ERROR, ERRCODE_INVALID_XML_CONTENT, + invalid XML content: invalid XML declaration, + res_code); +. + doc = xmlNewDoc(version); + doc-encoding = xmlStrdup((const xmlChar *) UTF-8); + doc-standalone = 1; why? This function can raise exception when declaration is wrong. It is wrong - I think, this function should returns false instead exception. You're quite right, I should be returning false here and not causing an exception. I have corrected this in the attached patch. Regards, -- Mike Fowler Registered Linux user: 379787 *** a/contrib/xml2/xpath.c --- b/contrib/xml2/xpath.c *** *** 27,33 PG_MODULE_MAGIC; /* externally accessible functions */ - Datum xml_is_well_formed(PG_FUNCTION_ARGS); Datum xml_encode_special_chars(PG_FUNCTION_ARGS); Datum xpath_nodeset(PG_FUNCTION_ARGS); Datum xpath_string(PG_FUNCTION_ARGS); --- 27,32 *** *** 70,97 pgxml_parser_init(void) xmlLoadExtDtdDefaultValue = 1; } - - /* Returns true if document is well-formed */ - - PG_FUNCTION_INFO_V1(xml_is_well_formed); - - Datum - xml_is_well_formed(PG_FUNCTION_ARGS) - { - text *t = PG_GETARG_TEXT_P(0); /* document buffer */ - int32 docsize = VARSIZE(t) - VARHDRSZ; - xmlDocPtr doctree; - - pgxml_parser_init(); - - doctree = xmlParseMemory((char *) VARDATA(t), docsize); - if (doctree == NULL) - PG_RETURN_BOOL(false); /* i.e. not well-formed */ - xmlFreeDoc(doctree); - PG_RETURN_BOOL(true); - } - - /* Encodes special characters (, , , and \r) as XML entities */ PG_FUNCTION_INFO_V1(xml_encode_special_chars); --- 69,74 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *** *** 8554,8562 SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab; ]]/screen /para /sect3 sect3 ! titleXML Predicates/title indexterm primaryIS DOCUMENT/primary --- 8554,8566 ]]/screen /para /sect3 + /sect2 + + sect2 +titleXML Predicates/title sect3 ! titleIS DOCUMENT/title indexterm primaryIS DOCUMENT/primary *** *** 8574,8579 SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab; --- 8578,8675 between documents and content fragments. /para /sect3 + +sect3 + titlexml_is_well_formed/title + + indexterm + primaryxml_is_well_formed/primary + secondarywell formed/secondary + /indexterm + + synopsis + functionxml_is_well_formed/function(replaceabletext/replaceable) + /synopsis + + para + The function functionxml_is_well_formed/function evaluates whether + the replaceabletext/replaceable is well formed XML content, returning + a boolean. + /para + para + Example: + screen![CDATA[ + SELECT xml_is_well_formed('foobar/foo'); + xml_is_well_formed + + t + (1 row) + + SELECT xml_is_well_formed('foobar/foo'); + xml_is_well_formed + + f + (1 row) + + SELECT xml_is_well_formed('foobarstuff/foo'); + xml_is_well_formed + + f + (1 row) + ]]/screen + /para + para + In addition to the structure checks, the function ensures that namespaces are correcty matched. + screen![CDATA[ + SELECT xml_is_well_formed('pg:foo
Re: [HACKERS] review: xml_is_well_formed
Hello 2010/7/30 Mike Fowler m...@mlfowler.com: Hi Pavel, Thanks for taking the time to review my patch. Attached is a new version addressing your concerns. On 29/07/10 14:21, Pavel Stehule wrote: I have a few issues: * broken regress test (fedora 13 - xmllint: using libxml version 20707) ok - main regress test is ok now, next I checked a contrib test for xml2 inside contrib/xml2 make installcheck, and there is a problem SET client_min_messages = warning; \set ECHO none + psql:pgxml.sql:10: ERROR: could not find function xml_is_well_formed in file /usr/local/pgsql.xwf/lib/pgxml.so + psql:pgxml.sql:15: ERROR: could not find function xml_is_well_formed in file /usr/local/pgsql.xwf/lib/pgxml.so RESET client_min_messages; select query_to_xml('select 1 as x',true,false,''); you have to remove declaration from pgxml.sql.in and uninstall_pgxml.sql, and other related files in contrib/xml2/ regress test postgres=# SELECT xml_is_well_formed('pg:foo xmlns:pg=http://postgresql.org/stuff;;bar/pg:foo'); xml_is_well_formed f (1 row) this xml is broken - but in regress tests is ok [pa...@pavel-stehule ~]$ xmllint xxx xxx:1: parser error : error parsing attribute name pg:foo xmlns:pg=http://postgresql.org/stuff;;bar/pg:foo This is a problem that was observered recently by Thom Brown - the commit fest webapp adds the semicolon after the quote. If you look at the attachment I sent in a email client you'll find there is no semicolon. The patch attached here will also not have the semicolon. ok - attached patch is correct, ... please, can you remove a broken patch? * xml_is_well_formed returns true for simple text postgres=# SELECT xml_is_well_formed(''); xml_is_well_formed t (1 row) it is probably wrong result - is it ok?? Yes this is OK, pure text is valid XML content. It interesting for me - is somewhere some documentation about it? My colleagues speak some else :) http://zvon.org/comp/r/tut-XML.html#Pages~MinimalQ20XnumberQ20XofQ20Xelements http://www.w3.org/TR/REC-xml/#sec-prolog-dtd I am not a specialist on XML - so just don't know * I don't understand to this fragment PG_TRY(); + { + size_t count; + xmlChar *version = NULL; + int standalone = -1; +. + res_code = parse_xml_decl(string,count,version, NULL,standalone); + if (res_code != 0) + xml_ereport_by_code(ERROR, ERRCODE_INVALID_XML_CONTENT, + invalid XML content: invalid XML declaration, + res_code); +. + doc = xmlNewDoc(version); + doc-encoding = xmlStrdup((const xmlChar *) UTF-8); + doc-standalone = 1; why? This function can raise exception when declaration is wrong. It is wrong - I think, this function should returns false instead exception. You're quite right, I should be returning false here and not causing an exception. I have corrected this in the attached patch. ok Regards, -- Mike Fowler Registered Linux user: 379787 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] review: xml_is_well_formed
Hello I looked on patch https://commitfest.postgresql.org/action/patch_view?id=334 .This patch moves function xml_is_well_formed from contrib xm2 to core. * Is the patch in context diff format? yes * Does it apply cleanly to the current CVS HEAD? yes * Does it include reasonable tests, necessary doc patches, etc? yes * Does the patch actually implement that? yes * Do we want that? yes * Do we already have it? yes - simplified version in core * Does it follow SQL spec, or the community-agreed behavior? no - I didn't find any resources about conformance with SQL spec, but it has same behave like original contrib function * Does it include pg_dump support (if applicable)? not related * Are there dangers? no *Are there any assertion failures or crashes? not found I have a few issues: * broken regress test (fedora 13 - xmllint: using libxml version 20707) postgres=# SELECT xml_is_well_formed('pg:foo xmlns:pg=http://postgresql.org/stuff;;bar/pg:foo'); xml_is_well_formed f (1 row) this xml is broken - but in regress tests is ok [pa...@pavel-stehule ~]$ xmllint xxx xxx:1: parser error : error parsing attribute name pg:foo xmlns:pg=http://postgresql.org/stuff;;bar/pg:foo * xml_is_well_formed returns true for simple text postgres=# SELECT xml_is_well_formed(''); xml_is_well_formed t (1 row) it is probably wrong result - is it ok?? * I don't understand to this fragment PG_TRY(); + { + size_t count; + xmlChar*version = NULL; + int standalone = -1; +. + res_code = parse_xml_decl(string, count, version, NULL, standalone); + if (res_code != 0) + xml_ereport_by_code(ERROR, ERRCODE_INVALID_XML_CONTENT, + invalid XML content: invalid XML declaration, + res_code); +. + doc = xmlNewDoc(version); + doc-encoding = xmlStrdup((const xmlChar *) UTF-8); + doc-standalone = 1; why? This function can raise exception when declaration is wrong. It is wrong - I think, this function should returns false instead exception. Regards Pavel Stehule postgres=# select version(); version -- PostgreSQL 9.1devel on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.4 20100630 (Red Hat 4.4.4-10), 64-bit (1 row) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers