Re: [HACKERS] review: xml_is_well_formed

2010-08-13 Thread Tom Lane
Mike Fowler m...@mlfowler.com writes:
 On 11/08/10 21:27, Tom Lane wrote:
 Yes.  Mike, are you expecting to submit a new version before the end of
 the week?

 Yes and here it is, apologies for the delay. I have re-implemented 
 xml_is_well_formed such that it is sensitive to the XMLOPTION. The 
 additional _document and _content methods are now present. Tests and 
 documentation adjusted to suit.

Applied with minor cleanups, mostly in the documentation.  The only
thing that seems worth remarking on is that since xml_is_well_formed
now depends on a GUC variable, it *cannot* be marked IMMUTABLE.  The
right marking in such cases is STABLE.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-12 Thread Pavel Stehule
Hello

I checked last version:

* there are not a problem with regress and contrib regress tests
* the design is simple and clean now - well documented

notes:
* don't get a patch via copy/paste from mailing list archive - there
are a broken xml2 tests via this access!
* I didn't find a sentence so default for xml_is_well_formed a content
checking - some like without change of xmloption, the behave is same
as xml_is_well_formed_content

Regards

Pavel Stehule



2010/8/11 Mike Fowler m...@mlfowler.com:
 On 11/08/10 21:27, Tom Lane wrote:

 Robert Haasrobertmh...@gmail.com  writes:

 On Mon, Aug 9, 2010 at 10:41 AM, Robert Haasrobertmh...@gmail.com
  wrote:

 There's also the fact that it would probably end up parsing the data
 twice.  Given xmloption, I'm inclined to think Tom has it right:
 provided xml_is_well_formed() that follows xmloption, plus a specific
 version for each of content and document.

 Another reasonable option here would be to forget about having
 xml_is_well_formed() per se and ONLY offer
 xml_is_well_formed_content() and xml_is_well_formed_document().

 We already have xml_is_well_formed(); just dropping it doesn't seem like
 a helpful choice.

 As a project management note, this CommitFest is over in 4 days, so
 unless we have a new version of this patch real soon now we need to
 defer it to the September 15th CommitFest

 Yes.  Mike, are you expecting to submit a new version before the end of
 the week?


 Yes and here it is, apologies for the delay. I have re-implemented
 xml_is_well_formed such that it is sensitive to the XMLOPTION. The
 additional _document and _content methods are now present. Tests and
 documentation adjusted to suit.

 Regards,

 --
 Mike Fowler
 Registered Linux user: 379787


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-11 Thread Robert Haas
On Mon, Aug 9, 2010 at 10:41 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Aug 9, 2010 at 10:20 AM, Peter Eisentraut pete...@gmx.net wrote:
 On lör, 2010-08-07 at 16:47 +0100, Mike Fowler wrote:
 To be honest I'm happiest with returning a boolean, even if there is
 some confusion over content only being valid. Though changing the
 return
 value to DOCUMENT/CONTENT/NULL makes things a touch more explicit,
 the
 same results can be achieved by simply running:

 SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND
 data::xml IS DOCUMENT;

 Note that this wouldn't necessarily work because it is not guaranteed
 that the well-formedness test is executed before the cast to xml.  SQL
 doesn't short-circuit left to right.  (A CASE expression could work.)

 There's also the fact that it would probably end up parsing the data
 twice.  Given xmloption, I'm inclined to think Tom has it right:
 provided xml_is_well_formed() that follows xmloption, plus a specific
 version for each of content and document.

Another reasonable option here would be to forget about having
xml_is_well_formed() per se and ONLY offer
xml_is_well_formed_content() and xml_is_well_formed_document().

As a project management note, this CommitFest is over in 4 days, so
unless we have a new version of this patch real soon now we need to
defer it to the September 15th CommitFest (of course not precluding
the possibility that someone will pick it up and commit it sooner, but
we're not going to postpone 9.1alpha1 for this patch).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-11 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Aug 9, 2010 at 10:41 AM, Robert Haas robertmh...@gmail.com wrote:
 There's also the fact that it would probably end up parsing the data
 twice.  Given xmloption, I'm inclined to think Tom has it right:
 provided xml_is_well_formed() that follows xmloption, plus a specific
 version for each of content and document.

 Another reasonable option here would be to forget about having
 xml_is_well_formed() per se and ONLY offer
 xml_is_well_formed_content() and xml_is_well_formed_document().

We already have xml_is_well_formed(); just dropping it doesn't seem like
a helpful choice.

 As a project management note, this CommitFest is over in 4 days, so
 unless we have a new version of this patch real soon now we need to
 defer it to the September 15th CommitFest

Yes.  Mike, are you expecting to submit a new version before the end of
the week?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-11 Thread Mike Fowler

On 11/08/10 21:27, Tom Lane wrote:

Robert Haasrobertmh...@gmail.com  writes:

On Mon, Aug 9, 2010 at 10:41 AM, Robert Haasrobertmh...@gmail.com  wrote:

There's also the fact that it would probably end up parsing the data
twice.  Given xmloption, I'm inclined to think Tom has it right:
provided xml_is_well_formed() that follows xmloption, plus a specific
version for each of content and document.



Another reasonable option here would be to forget about having
xml_is_well_formed() per se and ONLY offer
xml_is_well_formed_content() and xml_is_well_formed_document().


We already have xml_is_well_formed(); just dropping it doesn't seem like
a helpful choice.


As a project management note, this CommitFest is over in 4 days, so
unless we have a new version of this patch real soon now we need to
defer it to the September 15th CommitFest


Yes.  Mike, are you expecting to submit a new version before the end of
the week?



Yes and here it is, apologies for the delay. I have re-implemented 
xml_is_well_formed such that it is sensitive to the XMLOPTION. The 
additional _document and _content methods are now present. Tests and 
documentation adjusted to suit.


Regards,

--
Mike Fowler
Registered Linux user: 379787
*** a/contrib/xml2/pgxml.sql.in
--- b/contrib/xml2/pgxml.sql.in
***
*** 5,18  SET search_path = public;
  
  --SQL for XML parser
  
- CREATE OR REPLACE FUNCTION xml_is_well_formed(text) RETURNS bool
- AS 'MODULE_PATHNAME'
- LANGUAGE C STRICT IMMUTABLE;
- 
  -- deprecated old name for xml_is_well_formed
  CREATE OR REPLACE FUNCTION xml_valid(text) RETURNS bool
! AS 'MODULE_PATHNAME', 'xml_is_well_formed'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE OR REPLACE FUNCTION xml_encode_special_chars(text) RETURNS text
  AS 'MODULE_PATHNAME'
--- 5,14 
  
  --SQL for XML parser
  
  -- deprecated old name for xml_is_well_formed
  CREATE OR REPLACE FUNCTION xml_valid(text) RETURNS bool
! AS 'xml_is_well_formed'
! LANGUAGE INTERNAL STRICT IMMUTABLE;
  
  CREATE OR REPLACE FUNCTION xml_encode_special_chars(text) RETURNS text
  AS 'MODULE_PATHNAME'
*** a/contrib/xml2/uninstall_pgxml.sql
--- b/contrib/xml2/uninstall_pgxml.sql
***
*** 29,33  DROP FUNCTION xml_encode_special_chars(text);
  
  -- deprecated old name for xml_is_well_formed
  DROP FUNCTION xml_valid(text);
- 
- DROP FUNCTION xml_is_well_formed(text);
--- 29,31 
*** a/contrib/xml2/xpath.c
--- b/contrib/xml2/xpath.c
***
*** 27,33  PG_MODULE_MAGIC;
  
  /* externally accessible functions */
  
- Datum		xml_is_well_formed(PG_FUNCTION_ARGS);
  Datum		xml_encode_special_chars(PG_FUNCTION_ARGS);
  Datum		xpath_nodeset(PG_FUNCTION_ARGS);
  Datum		xpath_string(PG_FUNCTION_ARGS);
--- 27,32 
***
*** 71,97  pgxml_parser_init(void)
  }
  
  
- /* Returns true if document is well-formed */
- 
- PG_FUNCTION_INFO_V1(xml_is_well_formed);
- 
- Datum
- xml_is_well_formed(PG_FUNCTION_ARGS)
- {
- 	text	   *t = PG_GETARG_TEXT_P(0);		/* document buffer */
- 	int32		docsize = VARSIZE(t) - VARHDRSZ;
- 	xmlDocPtr	doctree;
- 
- 	pgxml_parser_init();
- 
- 	doctree = xmlParseMemory((char *) VARDATA(t), docsize);
- 	if (doctree == NULL)
- 		PG_RETURN_BOOL(false);	/* i.e. not well-formed */
- 	xmlFreeDoc(doctree);
- 	PG_RETURN_BOOL(true);
- }
- 
- 
  /* Encodes special characters (, , ,  and \r) as XML entities */
  
  PG_FUNCTION_INFO_V1(xml_encode_special_chars);
--- 70,75 
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 8625,8630  SELECT xmlexists('//town[text() = ''Toronto'']' PASSING BY REF 'townstownTor
--- 8625,8736 
   supports XPath, which is a subset of XQuery.
  /para
 /sect3
+ 
+sect3
+ titlexml_is_well_formed/title
+ 
+ indexterm
+  primaryxml_is_well_formed/primary
+  secondarywell formed/secondary
+ /indexterm
+ 
+ synopsis
+ functionxml_is_well_formed/function(replaceabletext/replaceable)
+ /synopsis
+ 
+ para
+  The function functionxml_is_well_formed/function evaluates whether
+  the replaceabletext/replaceable is well formed XML content, returning
+  a boolean. It is useful for predetermining whether a cast to the XML type
+  will succeed, and therefore honours the current value of the
+  literalXMLOPTION/literal session configuration parameter.
+ /para
+ para
+ Example:
+ screen![CDATA[
+ SET xmloption TO DOCUMENT;
+ SELECT xml_is_well_formed('');
+  xml_is_well_formed 
+ 
+  f
+ (1 row)
+ 
+ SELECT xml_is_well_formed('abc/');
+  xml_is_well_formed 
+ 
+  t
+ (1 row)
+ 
+ SET xmloption TO CONTENT;
+ SELECT xml_is_well_formed('abc');
+  xml_is_well_formed 
+ 
+  t
+ (1 row)
+ ]]/screen
+ /para
+ para
+ In addition to the structure checks, the function ensures that namespaces are correcty matched.
+ screen![CDATA[
+ SET xmloption TO DOCUMENT;
+ SELECT xml_is_well_formed('pg:foo 

Re: [HACKERS] review: xml_is_well_formed

2010-08-09 Thread Peter Eisentraut
On lör, 2010-08-07 at 16:47 +0100, Mike Fowler wrote:
 To be honest I'm happiest with returning a boolean, even if there is 
 some confusion over content only being valid. Though changing the
 return 
 value to DOCUMENT/CONTENT/NULL makes things a touch more explicit,
 the 
 same results can be achieved by simply running:
 
 SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND
 data::xml IS DOCUMENT; 

Note that this wouldn't necessarily work because it is not guaranteed
that the well-formedness test is executed before the cast to xml.  SQL
doesn't short-circuit left to right.  (A CASE expression could work.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-09 Thread Robert Haas
On Mon, Aug 9, 2010 at 10:20 AM, Peter Eisentraut pete...@gmx.net wrote:
 On lör, 2010-08-07 at 16:47 +0100, Mike Fowler wrote:
 To be honest I'm happiest with returning a boolean, even if there is
 some confusion over content only being valid. Though changing the
 return
 value to DOCUMENT/CONTENT/NULL makes things a touch more explicit,
 the
 same results can be achieved by simply running:

 SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND
 data::xml IS DOCUMENT;

 Note that this wouldn't necessarily work because it is not guaranteed
 that the well-formedness test is executed before the cast to xml.  SQL
 doesn't short-circuit left to right.  (A CASE expression could work.)

There's also the fact that it would probably end up parsing the data
twice.  Given xmloption, I'm inclined to think Tom has it right:
provided xml_is_well_formed() that follows xmloption, plus a specific
version for each of content and document.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-08 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote:
 I think the point of this function is to determine whether a cast to
 xml will throw an error.  The behavior should probably match exactly
 whatever test would be applied there.

 Maybe there should be

 xml_is_well_formed()
 xml_is_well_formed_document()
 xml_is_well_formed_content()

 I agree that consistency with SQL/XML is desirable, but for someone
 coming from the outside, the unqualified claim that 'foo' is well-formed
 XML might sound suspicious.

I think I agree with the later discussion that xml_is_well_formed()
should tell you whether a cast to xml would succeed (and hence it has to
pay attention to XMLOPTION).  However, it seems to also make sense to
provide the other two functions suggested here, both to satify people
who know XML and to offer tests that will tell you whether
XMLPARSE ( { DOCUMENT | CONTENT } value ) will succeed.

Merging the three cases into one function doesn't seem like a win
for either compatibility or usability.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-08 Thread Robert Haas
On Sun, Aug 8, 2010 at 1:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
 On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote:
 I think the point of this function is to determine whether a cast to
 xml will throw an error.  The behavior should probably match exactly
 whatever test would be applied there.

 Maybe there should be

 xml_is_well_formed()
 xml_is_well_formed_document()
 xml_is_well_formed_content()

 I agree that consistency with SQL/XML is desirable, but for someone
 coming from the outside, the unqualified claim that 'foo' is well-formed
 XML might sound suspicious.

 I think I agree with the later discussion that xml_is_well_formed()
 should tell you whether a cast to xml would succeed (and hence it has to
 pay attention to XMLOPTION).  However, it seems to also make sense to
 provide the other two functions suggested here, both to satify people
 who know XML and to offer tests that will tell you whether
 XMLPARSE ( { DOCUMENT | CONTENT } value ) will succeed.

 Merging the three cases into one function doesn't seem like a win
 for either compatibility or usability.

+1.  I didn't realize how funky the XMLOPTION stuff was at the start
of this discussion; I think your analysis here is spot-on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-07 Thread Mike Fowler

On 06/08/10 21:55, Peter Eisentraut wrote:

On fre, 2010-08-06 at 14:43 +0100, Mike Fowler wrote:

Or perhaps it could return a string instead of a boolean: content,
document, or NULL if it's neither.



I like the sound of that. In fact this helps workaround the IS
DOCUMENT
and IS CONTENT limitations such that you can you can select only
content, only documents or both is you use IS NOT NULL.

Unless anyone sees a reason that this function needs to remain a
boolean function, I'll rework the patch over the weekend.


What is the actual use case for this function?  Is the above behavior
actually useful?


The idea is to be able to filter a table that contains XML in TEXT that 
might not be well formed. Knowing that you're only dealing with well 
formed XML prevents you blowing up when you attempt the cast.




One reason to stick with boolean is backward compatibility.



To be honest I'm happiest with returning a boolean, even if there is 
some confusion over content only being valid. Though changing the return 
value to DOCUMENT/CONTENT/NULL makes things a touch more explicit, the 
same results can be achieved by simply running:


SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND data::xml 
IS DOCUMENT;


Regards,
--
Mike Fowler
Registered Linux user: 379787

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-06 Thread Mike Fowler

On 03/08/10 16:15, Peter Eisentraut wrote:

On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote:

Well-formedness should probably only allow XML documents.


I think the point of this function is to determine whether a cast to
xml will throw an error.  The behavior should probably match exactly
whatever test would be applied there.


Maybe there should be

xml_is_well_formed()
xml_is_well_formed_document()
xml_is_well_formed_content()

I agree that consistency with SQL/XML is desirable, but for someone
coming from the outside, the unqualified claim that 'foo' is well-formed
XML might sound suspicious.


What about making the function sensitive to the XML OPTION, such that:

test=# SET xmloption TO DOCUMENT;
SET
text=# SELECT xml_is_well_formed('foo');

  xml_is_well_formed
 
  f
 (1 row)

test=# SET xmloption TO CONTENT;
SET
text=# SELECT xml_is_well_formed('foo');

  xml_is_well_formed
 
  t
 (1 row)

with the inverse for DOCUMENTS? To me this makes the most sense as it makes the 
function behave much more like the other xml functions.

--
Mike Fowler
Registered Linux user: 379787


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-06 Thread Robert Haas
On Fri, Aug 6, 2010 at 4:28 AM, Mike Fowler m...@mlfowler.com wrote:
 On 03/08/10 16:15, Peter Eisentraut wrote:

 On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote:

 Well-formedness should probably only allow XML documents.

 I think the point of this function is to determine whether a cast to
 xml will throw an error.  The behavior should probably match exactly
 whatever test would be applied there.

 Maybe there should be

 xml_is_well_formed()
 xml_is_well_formed_document()
 xml_is_well_formed_content()

 I agree that consistency with SQL/XML is desirable, but for someone
 coming from the outside, the unqualified claim that 'foo' is well-formed
 XML might sound suspicious.

 What about making the function sensitive to the XML OPTION, such that:

 test=# SET xmloption TO DOCUMENT;
 SET
 text=# SELECT xml_is_well_formed('foo');

  xml_is_well_formed
  
  f
  (1 row)

That will make using this function a huge hassle, won't it?  Functions
that do different things depending on GUC settings are usually
troublesome.  Having three functions would be more sensible if we need
all three behaviors, but I don't see why we do.

Or perhaps it could return a string instead of a boolean: content,
document, or NULL if it's neither.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-06 Thread Mike Fowler

On 06/08/10 12:31, Robert Haas wrote:


Maybe there should be

xml_is_well_formed()
xml_is_well_formed_document()
xml_is_well_formed_content()

I agree that consistency with SQL/XML is desirable, but for someone
coming from the outside, the unqualified claim that 'foo' is well-formed
XML might sound suspicious.
   

What about making the function sensitive to the XML OPTION, such that:

test=# SET xmloption TO DOCUMENT;
SET
text=# SELECT xml_is_well_formed('foo');

  xml_is_well_formed
  
  f
  (1 row)
 

That will make using this function a huge hassle, won't it?  Functions
that do different things depending on GUC settings are usually
troublesome.  Having three functions would be more sensible if we need
all three behaviors, but I don't see why we do.

Or perhaps it could return a string instead of a boolean: content,
document, or NULL if it's neither.
   


I like the sound of that. In fact this helps workaround the IS DOCUMENT 
and IS CONTENT limitations such that you can you can select only 
content, only documents or both is you use IS NOT NULL.


Unless anyone sees a reason that this function needs to remain a boolean 
function, I'll rework the patch over the weekend.


Regards,

--
Mike Fowler
Registered Linux user: 379787


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-06 Thread Peter Eisentraut
On fre, 2010-08-06 at 07:31 -0400, Robert Haas wrote:
  What about making the function sensitive to the XML OPTION, such
 that:
 
  test=# SET xmloption TO DOCUMENT;
  SET
  text=# SELECT xml_is_well_formed('foo');
 
   xml_is_well_formed
   
   f
   (1 row)
 
 That will make using this function a huge hassle, won't it?  Functions
 that do different things depending on GUC settings are usually
 troublesome.  Having three functions would be more sensible if we need
 all three behaviors, but I don't see why we do.

Upthread you opined that this function should essentially indicate
whether a cast to type xml would succeed, and observing the xmloption is
an essential part of that.  I had assumed the original patch actually
did that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-06 Thread Peter Eisentraut
On fre, 2010-08-06 at 14:43 +0100, Mike Fowler wrote:
  Or perhaps it could return a string instead of a boolean: content,
  document, or NULL if it's neither.
 
 
 I like the sound of that. In fact this helps workaround the IS
 DOCUMENT 
 and IS CONTENT limitations such that you can you can select only 
 content, only documents or both is you use IS NOT NULL.
 
 Unless anyone sees a reason that this function needs to remain a
 boolean function, I'll rework the patch over the weekend. 

What is the actual use case for this function?  Is the above behavior
actually useful?

One reason to stick with boolean is backward compatibility.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-06 Thread Robert Haas
On Fri, Aug 6, 2010 at 4:52 PM, Peter Eisentraut pete...@gmx.net wrote:
 On fre, 2010-08-06 at 07:31 -0400, Robert Haas wrote:
  What about making the function sensitive to the XML OPTION, such
 that:
 
  test=# SET xmloption TO DOCUMENT;
  SET
  text=# SELECT xml_is_well_formed('foo');
 
   xml_is_well_formed
   
   f
   (1 row)

 That will make using this function a huge hassle, won't it?  Functions
 that do different things depending on GUC settings are usually
 troublesome.  Having three functions would be more sensible if we need
 all three behaviors, but I don't see why we do.

 Upthread you opined that this function should essentially indicate
 whether a cast to type xml would succeed, and observing the xmloption is
 an essential part of that.  I had assumed the original patch actually
 did that.

Oh.

I didn't realize the casting behavior was already dependent on the GUC.  Yuck.

Maybe we should following the GUC after all, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-03 Thread Peter Eisentraut
On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote:
  Well-formedness should probably only allow XML documents.
 
 I think the point of this function is to determine whether a cast to
 xml will throw an error.  The behavior should probably match exactly
 whatever test would be applied there.

Maybe there should be

xml_is_well_formed()
xml_is_well_formed_document()
xml_is_well_formed_content()

I agree that consistency with SQL/XML is desirable, but for someone
coming from the outside, the unqualified claim that 'foo' is well-formed
XML might sound suspicious.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-03 Thread Pavel Stehule
Hello

2010/8/3 Peter Eisentraut pete...@gmx.net:
 On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote:
  Well-formedness should probably only allow XML documents.

 I think the point of this function is to determine whether a cast to
 xml will throw an error.  The behavior should probably match exactly
 whatever test would be applied there.

 Maybe there should be

 xml_is_well_formed()
 xml_is_well_formed_document()
 xml_is_well_formed_content()

 I agree that consistency with SQL/XML is desirable, but for someone
 coming from the outside, the unqualified claim that 'foo' is well-formed
 XML might sound suspicious.


yes, it is little bit curious - but it can be just documented. Now, I
don't think, so we need more functions.

Regards

Pavel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-02 Thread Pavel Stehule
2010/7/31 Robert Haas robertmh...@gmail.com:
 On Sat, Jul 31, 2010 at 8:10 AM, Peter Eisentraut pete...@gmx.net wrote:
 On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote:
  * xml_is_well_formed returns true for simple text
 
  postgres=# SELECT xml_is_well_formed('');
    xml_is_well_formed
  
    t
  (1 row)
 
  it is probably wrong result - is it ok??
 

 Yes this is OK, pure text is valid XML content.

 Are you speaking of XML content fragments that SQL/XML defines?

 Well-formedness should probably only allow XML documents.

 I think the point of this function is to determine whether a cast to
 xml will throw an error.  The behavior should probably match exactly
 whatever test would be applied there.

I agree with this idea - so I am able to do:

postgres=# select 'xxx'::xml;
 xml
-
 xxx
(1 row)

I have not any suggestions now - so I'll change flag to ready to commit

Regards

Pavel Stehule


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-02 Thread Pavel Stehule
2010/8/2 Pavel Stehule pavel.steh...@gmail.com:
 2010/7/31 Robert Haas robertmh...@gmail.com:
 On Sat, Jul 31, 2010 at 8:10 AM, Peter Eisentraut pete...@gmx.net wrote:
 On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote:
  * xml_is_well_formed returns true for simple text
 
  postgres=# SELECT xml_is_well_formed('');
    xml_is_well_formed
  
    t
  (1 row)
 
  it is probably wrong result - is it ok??
 

 Yes this is OK, pure text is valid XML content.

 Are you speaking of XML content fragments that SQL/XML defines?

 Well-formedness should probably only allow XML documents.

 I think the point of this function is to determine whether a cast to
 xml will throw an error.  The behavior should probably match exactly
 whatever test would be applied there.

 I agree with this idea - so I am able to do:

 postgres=# select 'xxx'::xml;
  xml
 -
  xxx
 (1 row)

 I have not any suggestions now - so I'll change flag to ready to commit

sorry - contrib module should be a fixed

patch attached



 Regards

 Pavel Stehule


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company


*** ./contrib/xml2/pgxml.sql.in.orig	2010-03-01 19:07:59.0 +0100
--- ./contrib/xml2/pgxml.sql.in	2010-08-02 08:40:27.885789791 +0200
***
*** 5,18 
  
  --SQL for XML parser
  
- CREATE OR REPLACE FUNCTION xml_is_well_formed(text) RETURNS bool
- AS 'MODULE_PATHNAME'
- LANGUAGE C STRICT IMMUTABLE;
- 
  -- deprecated old name for xml_is_well_formed
  CREATE OR REPLACE FUNCTION xml_valid(text) RETURNS bool
! AS 'MODULE_PATHNAME', 'xml_is_well_formed'
! LANGUAGE C STRICT IMMUTABLE;
  
  CREATE OR REPLACE FUNCTION xml_encode_special_chars(text) RETURNS text
  AS 'MODULE_PATHNAME'
--- 5,14 
  
  --SQL for XML parser
  
  -- deprecated old name for xml_is_well_formed
  CREATE OR REPLACE FUNCTION xml_valid(text) RETURNS bool
! AS 'xml_is_well_formed'
! LANGUAGE INTERNAL STRICT IMMUTABLE;
  
  CREATE OR REPLACE FUNCTION xml_encode_special_chars(text) RETURNS text
  AS 'MODULE_PATHNAME'
*** ./contrib/xml2/uninstall_pgxml.sql.orig	2007-11-13 05:24:29.0 +0100
--- ./contrib/xml2/uninstall_pgxml.sql	2010-08-02 08:38:40.134785878 +0200
***
*** 29,33 
  
  -- deprecated old name for xml_is_well_formed
  DROP FUNCTION xml_valid(text);
- 
- DROP FUNCTION xml_is_well_formed(text);
--- 29,31 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-08-02 Thread Mike Fowler

On 02/08/10 07:46, Pavel Stehule wrote:



I have not any suggestions now - so I'll change flag to ready to commit


sorry - contrib module should be a fixed

patch attached



Thanks Pavel, you saved me some time!

Regards,

--
Mike Fowler
Registered Linux user: 379787

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-07-31 Thread Peter Eisentraut
On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote:
  * xml_is_well_formed returns true for simple text
 
  postgres=# SELECT xml_is_well_formed('');
xml_is_well_formed
  
t
  (1 row)
 
  it is probably wrong result - is it ok??
 
 
 Yes this is OK, pure text is valid XML content.

Are you speaking of XML content fragments that SQL/XML defines?

Well-formedness should probably only allow XML documents.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-07-31 Thread Robert Haas
On Sat, Jul 31, 2010 at 8:10 AM, Peter Eisentraut pete...@gmx.net wrote:
 On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote:
  * xml_is_well_formed returns true for simple text
 
  postgres=# SELECT xml_is_well_formed('');
    xml_is_well_formed
  
    t
  (1 row)
 
  it is probably wrong result - is it ok??
 

 Yes this is OK, pure text is valid XML content.

 Are you speaking of XML content fragments that SQL/XML defines?

 Well-formedness should probably only allow XML documents.

I think the point of this function is to determine whether a cast to
xml will throw an error.  The behavior should probably match exactly
whatever test would be applied there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: xml_is_well_formed

2010-07-30 Thread Mike Fowler

Hi Pavel,

Thanks for taking the time to review my patch. Attached is a new version 
addressing your concerns.


On 29/07/10 14:21, Pavel Stehule wrote:

I have a few issues:
* broken regress test (fedora 13 - xmllint: using libxml version 20707)

postgres=# SELECT xml_is_well_formed('pg:foo
xmlns:pg=http://postgresql.org/stuff;;bar/pg:foo');
  xml_is_well_formed

  f
(1 row)

this xml is broken - but in regress tests is ok

[pa...@pavel-stehule ~]$ xmllint xxx
xxx:1: parser error : error parsing attribute name
pg:foo xmlns:pg=http://postgresql.org/stuff;;bar/pg:foo
   


This is a problem that was observered recently by Thom Brown - the 
commit fest webapp adds the semicolon after the quote. If you look at 
the attachment I sent in a email client you'll find there is no 
semicolon. The patch attached here will also not have the semicolon.



* xml_is_well_formed returns true for simple text

postgres=# SELECT xml_is_well_formed('');
  xml_is_well_formed

  t
(1 row)

it is probably wrong result - is it ok??
   


Yes this is OK, pure text is valid XML content.


* I don't understand to this fragment

PG_TRY();
+   {
+   size_t  count;
+   xmlChar*version = NULL;
+   int standalone = -1;
+.
+   res_code = parse_xml_decl(string,count,version,
NULL,standalone);
+   if (res_code != 0)
+   xml_ereport_by_code(ERROR, ERRCODE_INVALID_XML_CONTENT,
+ invalid XML
content: invalid XML declaration,
+   res_code);
+.
+   doc = xmlNewDoc(version);
+   doc-encoding = xmlStrdup((const xmlChar *) UTF-8);
+   doc-standalone = 1;

why? This function can raise exception when declaration is wrong. It
is wrong - I think, this function should returns false instead
exception.

   


You're quite right,  I should be returning false here and not causing an 
exception. I have corrected this in the attached patch.


Regards,

--
Mike Fowler
Registered Linux user: 379787

*** a/contrib/xml2/xpath.c
--- b/contrib/xml2/xpath.c
***
*** 27,33  PG_MODULE_MAGIC;
  
  /* externally accessible functions */
  
- Datum		xml_is_well_formed(PG_FUNCTION_ARGS);
  Datum		xml_encode_special_chars(PG_FUNCTION_ARGS);
  Datum		xpath_nodeset(PG_FUNCTION_ARGS);
  Datum		xpath_string(PG_FUNCTION_ARGS);
--- 27,32 
***
*** 70,97  pgxml_parser_init(void)
  	xmlLoadExtDtdDefaultValue = 1;
  }
  
- 
- /* Returns true if document is well-formed */
- 
- PG_FUNCTION_INFO_V1(xml_is_well_formed);
- 
- Datum
- xml_is_well_formed(PG_FUNCTION_ARGS)
- {
- 	text	   *t = PG_GETARG_TEXT_P(0);		/* document buffer */
- 	int32		docsize = VARSIZE(t) - VARHDRSZ;
- 	xmlDocPtr	doctree;
- 
- 	pgxml_parser_init();
- 
- 	doctree = xmlParseMemory((char *) VARDATA(t), docsize);
- 	if (doctree == NULL)
- 		PG_RETURN_BOOL(false);	/* i.e. not well-formed */
- 	xmlFreeDoc(doctree);
- 	PG_RETURN_BOOL(true);
- }
- 
- 
  /* Encodes special characters (, , ,  and \r) as XML entities */
  
  PG_FUNCTION_INFO_V1(xml_encode_special_chars);
--- 69,74 
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 8554,8562  SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
  ]]/screen
  /para
 /sect3
  
 sect3
! titleXML Predicates/title
  
  indexterm
   primaryIS DOCUMENT/primary
--- 8554,8566 
  ]]/screen
  /para
 /sect3
+   /sect2
+ 
+   sect2
+titleXML Predicates/title
  
 sect3
! titleIS DOCUMENT/title
  
  indexterm
   primaryIS DOCUMENT/primary
***
*** 8574,8579  SELECT xmlagg(x) FROM (SELECT * FROM test ORDER BY y DESC) AS tab;
--- 8578,8675 
   between documents and content fragments.
  /para
 /sect3
+ 
+sect3
+ titlexml_is_well_formed/title
+ 
+ indexterm
+  primaryxml_is_well_formed/primary
+  secondarywell formed/secondary
+ /indexterm
+ 
+ synopsis
+ functionxml_is_well_formed/function(replaceabletext/replaceable)
+ /synopsis
+ 
+ para
+  The function functionxml_is_well_formed/function evaluates whether
+  the replaceabletext/replaceable is well formed XML content, returning
+  a boolean.
+ /para
+ para
+ Example:
+ screen![CDATA[
+ SELECT xml_is_well_formed('foobar/foo');
+  xml_is_well_formed
+ 
+  t
+ (1 row)
+ 
+ SELECT xml_is_well_formed('foobar/foo');
+  xml_is_well_formed
+ 
+  f
+ (1 row)
+ 
+ SELECT xml_is_well_formed('foobarstuff/foo');
+  xml_is_well_formed
+ 
+  f
+ (1 row)
+ ]]/screen
+ /para
+ para
+ In addition to the structure checks, the function ensures that namespaces are correcty matched.
+ screen![CDATA[
+ SELECT xml_is_well_formed('pg:foo 

Re: [HACKERS] review: xml_is_well_formed

2010-07-30 Thread Pavel Stehule
Hello

2010/7/30 Mike Fowler m...@mlfowler.com:
 Hi Pavel,

 Thanks for taking the time to review my patch. Attached is a new version
 addressing your concerns.

 On 29/07/10 14:21, Pavel Stehule wrote:

 I have a few issues:
 * broken regress test (fedora 13 - xmllint: using libxml version 20707)

ok - main regress test is ok now, next I checked a contrib test for xml2

inside contrib/xml2 make installcheck, and there is a problem

  SET client_min_messages = warning;
  \set ECHO none
+ psql:pgxml.sql:10: ERROR:  could not find function
xml_is_well_formed in file /usr/local/pgsql.xwf/lib/pgxml.so
+ psql:pgxml.sql:15: ERROR:  could not find function
xml_is_well_formed in file /usr/local/pgsql.xwf/lib/pgxml.so
  RESET client_min_messages;
  select query_to_xml('select 1 as x',true,false,'');

you have to remove declaration from pgxml.sql.in and
uninstall_pgxml.sql, and other related files in contrib/xml2/ regress
test



 postgres=# SELECT xml_is_well_formed('pg:foo
 xmlns:pg=http://postgresql.org/stuff;;bar/pg:foo');
  xml_is_well_formed
 
  f
 (1 row)

 this xml is broken - but in regress tests is ok

 [pa...@pavel-stehule ~]$ xmllint xxx
 xxx:1: parser error : error parsing attribute name
 pg:foo xmlns:pg=http://postgresql.org/stuff;;bar/pg:foo


 This is a problem that was observered recently by Thom Brown - the commit
 fest webapp adds the semicolon after the quote. If you look at the
 attachment I sent in a email client you'll find there is no semicolon. The
 patch attached here will also not have the semicolon.


ok - attached patch is correct, ... please, can you remove a broken patch?


 * xml_is_well_formed returns true for simple text

 postgres=# SELECT xml_is_well_formed('');
  xml_is_well_formed
 
  t
 (1 row)

 it is probably wrong result - is it ok??


 Yes this is OK, pure text is valid XML content.

It interesting for me - is somewhere some documentation about it?

My colleagues speak some else :)

http://zvon.org/comp/r/tut-XML.html#Pages~MinimalQ20XnumberQ20XofQ20Xelements
http://www.w3.org/TR/REC-xml/#sec-prolog-dtd

I am not a specialist on XML - so just don't know


 * I don't understand to this fragment

        PG_TRY();
 +       {
 +               size_t          count;
 +               xmlChar    *version = NULL;
 +               int                     standalone = -1;
 +.
 +               res_code = parse_xml_decl(string,count,version,
 NULL,standalone);
 +               if (res_code != 0)
 +                       xml_ereport_by_code(ERROR,
 ERRCODE_INVALID_XML_CONTENT,
 +                                                 invalid XML
 content: invalid XML declaration,
 +                                                       res_code);
 +.
 +               doc = xmlNewDoc(version);
 +               doc-encoding = xmlStrdup((const xmlChar *) UTF-8);
 +               doc-standalone = 1;

 why? This function can raise exception when declaration is wrong. It
 is wrong - I think, this function should returns false instead
 exception.



 You're quite right,  I should be returning false here and not causing an
 exception. I have corrected this in the attached patch.


ok





 Regards,

 --
 Mike Fowler
 Registered Linux user: 379787



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] review: xml_is_well_formed

2010-07-29 Thread Pavel Stehule
Hello

I looked on patch
https://commitfest.postgresql.org/action/patch_view?id=334 .This patch
moves function xml_is_well_formed from contrib xm2 to core.

*  Is the patch in context diff format?
yes

* Does it apply cleanly to the current CVS HEAD?
yes

* Does it include reasonable tests, necessary doc patches, etc?
yes

*  Does the patch actually implement that?
yes

* Do we want that?
yes

* Do we already have it?
yes - simplified version in core

* Does it follow SQL spec, or the community-agreed behavior?
no - I didn't find any resources about conformance with SQL spec, but
it has same behave like original contrib function

* Does it include pg_dump support (if applicable)?
not related

* Are there dangers?
no

*Are there any assertion failures or crashes?

not found

I have a few issues:
* broken regress test (fedora 13 - xmllint: using libxml version 20707)

postgres=# SELECT xml_is_well_formed('pg:foo
xmlns:pg=http://postgresql.org/stuff;;bar/pg:foo');
 xml_is_well_formed

 f
(1 row)

this xml is broken - but in regress tests is ok

[pa...@pavel-stehule ~]$ xmllint xxx
xxx:1: parser error : error parsing attribute name
pg:foo xmlns:pg=http://postgresql.org/stuff;;bar/pg:foo

* xml_is_well_formed returns true for simple text

postgres=# SELECT xml_is_well_formed('');
 xml_is_well_formed

 t
(1 row)

it is probably wrong result - is it ok??

* I don't understand to this fragment

   PG_TRY();
+   {
+   size_t  count;
+   xmlChar*version = NULL;
+   int standalone = -1;
+.
+   res_code = parse_xml_decl(string, count, version,
NULL, standalone);
+   if (res_code != 0)
+   xml_ereport_by_code(ERROR, ERRCODE_INVALID_XML_CONTENT,
+ invalid XML
content: invalid XML declaration,
+   res_code);
+.
+   doc = xmlNewDoc(version);
+   doc-encoding = xmlStrdup((const xmlChar *) UTF-8);
+   doc-standalone = 1;

why? This function can raise exception when declaration is wrong. It
is wrong - I think, this function should returns false instead
exception.


Regards

Pavel Stehule

postgres=# select version();
   version

--

 PostgreSQL 9.1devel on x86_64-unknown-linux-gnu, compiled by GCC gcc
(GCC) 4.4.4 20100630 (Red Hat 4.4.4-10),
 64-bit
(1 row)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers