Re: [HACKERS] XMLATTRIBUTES vs. values of type XML

2011-08-11 Thread Peter Eisentraut
On fre, 2011-07-29 at 11:37 +0200, Florian Pflug wrote:
 On Jul28, 2011, at 22:51 , Peter Eisentraut wrote:
  On ons, 2011-07-27 at 23:21 +0200, Florian Pflug wrote:
  On Jul27, 2011, at 23:08 , Peter Eisentraut wrote:
  Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
  would behave as in
  
  SELECT XMLELEMENT(NAME t, XMLATTRIBUTES(XMLSERIALIZE(content 
  'amp;'::XML AS text) AS a))
  
  With both 9.1 and 9.2 this query returns
  
  xmlelement 
  
  t a=amp;amp;/
  
  i.e. makes the value of a represent the *literal* string 'amp;', *not*
  the literal string ''. Just to be sure there's no miss-understanding here
  - is this what you expect?
  
  Well, I expect it to fail.
 
 Now you've lost me. What exactly should fail under what circumstances?

To me, the best solution still appears to be forbidding passing values
of type xml to XMLATTRIBUTES, unless we find an obviously better
solution that is not, I came up with this custom escape function that I
tweaked so that it appears to make sense.

  Unfortunately, in the latest SQL/XML standard the final
  answer it nested deep in the three other standards, so I don't have an
  answer right now.  But there are plenty of standards in this area, so
  I'd hope that one of them can give us the right behavior, instead of us
  making something up.
 
 Which standards to you have in mind there? If you can point me to a place
 where I can obtain them, I could check if there's something in them
 which helps.

In SQL/XML 2008, the actual behavior of XMLSERIALIZE is delegated to
XSLT 2.0 and XQuery 1.0 Serialization.  I'm not familiar with this
latter standard, but it appears to have lots of options and parameters,
one of which might help us.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XMLATTRIBUTES vs. values of type XML

2011-08-11 Thread Florian Pflug
On Aug11, 2011, at 09:16 , Peter Eisentraut wrote:
 On fre, 2011-07-29 at 11:37 +0200, Florian Pflug wrote:
 On Jul28, 2011, at 22:51 , Peter Eisentraut wrote:
 On ons, 2011-07-27 at 23:21 +0200, Florian Pflug wrote:
 On Jul27, 2011, at 23:08 , Peter Eisentraut wrote:
 Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
 would behave as in
 
 SELECT XMLELEMENT(NAME t, XMLATTRIBUTES(XMLSERIALIZE(content 
 'amp;'::XML AS text) AS a))
 
 With both 9.1 and 9.2 this query returns
 
xmlelement 
 
 t a=amp;amp;/
 
 i.e. makes the value of a represent the *literal* string 'amp;', *not*
 the literal string ''. Just to be sure there's no miss-understanding here
 - is this what you expect?
 
 Well, I expect it to fail.
 
 Now you've lost me. What exactly should fail under what circumstances?
 
 To me, the best solution still appears to be forbidding passing values
 of type xml to XMLATTRIBUTES, unless we find an obviously better
 solution that is not, I came up with this custom escape function that I
 tweaked so that it appears to make sense.

Hm, OK, I see your point. However, if we simply raise an error in 9.2,
and do nothing else, that we make it impossible to use the result of
an XPath expression as an XML attribute value. Not just inconvenient,
but impossible, so I don't think we can do that. We'd thus need to add a
function 

  XMLUNESCAPE(XML) RETURNS TEXT

to restore that functionality. Defining a sane behaviour for such a function,
however, seems no easier than defining sane behaviour for an XML attribute
of already of type XML. The core of the problems remains to define the result
of XMLUNESCAPE('tagcontent/tag'), just as the core of the XMLATTRIBUTES
problems is to define XMLELEMENT(... XMLATTRIBUTES('tagcontent/tag' as a)).

Thinking about this further, it seems that we essentially have two distinct
classes of XML values. Some are essentially plain text, but might contains
entity references, while others are real XML fragments which contain at
least one tag. That suggests that a sensible behaviour for XMLUNESCAPE might
be to return a string with the entity references resolved in the former case,
and simply return an error in the latter.

To summarize, we'd have

  XMLUNESCAPE(''::XML) - 'a'
  XMLUNESCAPE('a'::XML) - 'a'
  XMLUNESCAPE('lt;'::XML) - ''
  XMLUNESCAPE('t/'::XML) - error

To not break applications needlessly, I'd then be inclined to make

  XMLATTRIBUTES(xml_value as a)

mean

  XMLATTRIBUTES(XMLUNESCAPE(xml_value) as a)

i.e. throw an error if xml_value contains anything but plain text and
entity references. But I could probably also live with not doing that.

 Unfortunately, in the latest SQL/XML standard the final
 answer it nested deep in the three other standards, so I don't have an
 answer right now.  But there are plenty of standards in this area, so
 I'd hope that one of them can give us the right behavior, instead of us
 making something up.
 
 Which standards to you have in mind there? If you can point me to a place
 where I can obtain them, I could check if there's something in them
 which helps.
 
 In SQL/XML 2008, the actual behavior of XMLSERIALIZE is delegated to
 XSLT 2.0 and XQuery 1.0 Serialization.  I'm not familiar with this
 latter standard, but it appears to have lots of options and parameters,
 one of which might help us.

I'll try to obtain a copy of that. Thanks.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XMLATTRIBUTES vs. values of type XML

2011-07-29 Thread Florian Pflug
On Jul28, 2011, at 22:51 , Peter Eisentraut wrote:
 On ons, 2011-07-27 at 23:21 +0200, Florian Pflug wrote:
 On Jul27, 2011, at 23:08 , Peter Eisentraut wrote:
 Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
 would behave as in
 
 SELECT XMLELEMENT(NAME t, XMLATTRIBUTES(XMLSERIALIZE(content 'amp;'::XML 
 AS text) AS a))
 
 With both 9.1 and 9.2 this query returns
 
 xmlelement 
 
 t a=amp;amp;/
 
 i.e. makes the value of a represent the *literal* string 'amp;', *not*
 the literal string ''. Just to be sure there's no miss-understanding here
 - is this what you expect?
 
 Well, I expect it to fail.

Now you've lost me. What exactly should fail under what circumstances?

 What's the workaround if you have a value of type XML, e.g. 'amp;',
 and want to set an attribute to the value represented by that XML fragment
 (i.e. '')? Since we have no XMLUNESCAPE function, I don't see an easy
 way to do that. Maybe I'm missing something, though.
 
 It may be worth researching whether the XMLSERIALIZE function is
 actually doing what it should, or whether it could/should do some
 unescaping.

I don't see how that could work. It can't unescape anything in e.g.
'tamp;/t', because the result would be a quite useless not-really-XML
kind of thing. It could unescape 'amp;' but that kind of content-dependent
behaviour seem even worse than my proposed escaping rules for XMLATTRIBUTES.

 Unfortunately, in the latest SQL/XML standard the final
 answer it nested deep in the three other standards, so I don't have an
 answer right now.  But there are plenty of standards in this area, so
 I'd hope that one of them can give us the right behavior, instead of us
 making something up.

Which standards to you have in mind there? If you can point me to a place
where I can obtain them, I could check if there's something in them
which helps.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XMLATTRIBUTES vs. values of type XML

2011-07-28 Thread Peter Eisentraut
On ons, 2011-07-27 at 23:21 +0200, Florian Pflug wrote:
 On Jul27, 2011, at 23:08 , Peter Eisentraut wrote:
  Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
  would behave as in
  
  SELECT XMLELEMENT(NAME t, XMLATTRIBUTES(XMLSERIALIZE(content 'amp;'::XML 
  AS text) AS a))
 
 With both 9.1 and 9.2 this query returns
 
  xmlelement 
 
  t a=amp;amp;/
 
 i.e. makes the value of a represent the *literal* string 'amp;', *not*
 the literal string ''. Just to be sure there's no miss-understanding here
 - is this what you expect?

Well, I expect it to fail.

 What's the workaround if you have a value of type XML, e.g. 'amp;',
 and want to set an attribute to the value represented by that XML fragment
 (i.e. '')? Since we have no XMLUNESCAPE function, I don't see an easy
 way to do that. Maybe I'm missing something, though.

It may be worth researching whether the XMLSERIALIZE function is
actually doing what it should, or whether it could/should do some
unescaping.  Unfortunately, in the latest SQL/XML standard the final
answer it nested deep in the three other standards, so I don't have an
answer right now.  But there are plenty of standards in this area, so
I'd hope that one of them can give us the right behavior, instead of us
making something up.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XMLATTRIBUTES vs. values of type XML

2011-07-27 Thread Peter Eisentraut
On tis, 2011-07-26 at 22:44 +0200, Florian Pflug wrote:
 While reviewing the (now applied) XPATH escaping patches, Radoslaw
 found one
 case where the previous failure of XPATH to escape its return value
 was offset
 by XMLATTRIBUTES insistence to escape all input values, even if
 they're
 already of type XML.
 
 To wit, if you do
 
   SELECT XMLELEMENT(NAME t, XMLATTRIBUTES('amp;'::XML AS a))
 
 you get
 
  xmlelement 
 
  t a=amp;amp;/ 

Per SQL standard, the attribute values may not be of type XML, so maybe
we should just prohibit it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XMLATTRIBUTES vs. values of type XML

2011-07-27 Thread Florian Pflug
On Jul27, 2011, at 16:18 , Peter Eisentraut wrote:
 On tis, 2011-07-26 at 22:44 +0200, Florian Pflug wrote:
 While reviewing the (now applied) XPATH escaping patches, Radoslaw
 found one
 case where the previous failure of XPATH to escape its return value
 was offset
 by XMLATTRIBUTES insistence to escape all input values, even if
 they're
 already of type XML.
 
 To wit, if you do
 
  SELECT XMLELEMENT(NAME t, XMLATTRIBUTES('amp;'::XML AS a))
 
 you get
 
 xmlelement 
 
 t a=amp;amp;/ 
 
 Per SQL standard, the attribute values may not be of type XML, so maybe
 we should just prohibit it.

We probably should have, but I think it's too late for that. I don't
believe I'm the only one who uses XPATH results as attribute values,
and we'd severely break that use-case.

You might say the same thing about my proposal, of course, but I believe
the risk is much smaller there. Applications would only break if they
  (a) Pass XML from a source other than a XPath expression selecting
  a text or attribute and
  (b) actually want double-escaping to occur.

As a data point, I've written an application with makes heavy use of
our XML infrastructure over the last few months (as you might have guessed
from the stream of patches ;-)). That application would be pretty much
untroubled by the changes to XMLATTRIBUTES I proposed, but would be
severely broken if we rejected values of type XML all together.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XMLATTRIBUTES vs. values of type XML

2011-07-27 Thread Peter Eisentraut
On ons, 2011-07-27 at 19:37 +0200, Florian Pflug wrote:
  Per SQL standard, the attribute values may not be of type XML, so
 maybe
  we should just prohibit it.
 
 We probably should have, but I think it's too late for that. I don't
 believe I'm the only one who uses XPATH results as attribute values,
 and we'd severely break that use-case.
 
 You might say the same thing about my proposal, of course, but I
 believe
 the risk is much smaller there. Applications would only break if they
   (a) Pass XML from a source other than a XPath expression selecting
   a text or attribute and
   (b) actually want double-escaping to occur.

Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
would behave as in

SELECT XMLELEMENT(NAME t, XMLATTRIBUTES(XMLSERIALIZE(content 'amp;'::XML AS 
text) AS a))

which is what it indeed does in 9.1.

So if we don't want to restrict this, for backward compatibility, then I
would suggest that we fix it to work like it used to.

I would be very hesitant about adding another escape mechanism that
escapes some things but not others.  We already have two or three of
those for XML, and it doesn't seem worth adding another one just for
this, which is outside the standard and for which a valid workaround
exists.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XMLATTRIBUTES vs. values of type XML

2011-07-27 Thread Florian Pflug
On Jul27, 2011, at 23:08 , Peter Eisentraut wrote:
 Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
 would behave as in
 
 SELECT XMLELEMENT(NAME t, XMLATTRIBUTES(XMLSERIALIZE(content 'amp;'::XML 
 AS text) AS a))

With both 9.1 and 9.2 this query returns

 xmlelement 

 t a=amp;amp;/

i.e. makes the value of a represent the *literal* string 'amp;', *not*
the literal string ''. Just to be sure there's no miss-understanding here
- is this what you expect?

 which is what it indeed does in 9.1.
 
 So if we don't want to restrict this, for backward compatibility, then I
 would suggest that we fix it to work like it used to.

There's currently no difference in behaviour between 9.1 and 9.2 there.
We've only modified XPATH to always correctly escape it's result in 9.2,
so there's only a difference if you pass the result of XPATH() to
XMLATTRIBUTES. Which I figured to be the most likely reason for to pass
values of type XML to XMLATTRIBUTES, but maybe you disagree there.

 I would be very hesitant about adding another escape mechanism that
 escapes some things but not others.  We already have two or three of
 those for XML, and it doesn't seem worth adding another one just for
 this, which is outside the standard and for which a valid workaround
 exists.

What's the workaround if you have a value of type XML, e.g. 'amp;',
and want to set an attribute to the value represented by that XML fragment
(i.e. '')? Since we have no XMLUNESCAPE function, I don't see an easy
way to do that. Maybe I'm missing something, though.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] XMLATTRIBUTES vs. values of type XML

2011-07-26 Thread Florian Pflug
Hi

While reviewing the (now applied) XPATH escaping patches, Radoslaw found one
case where the previous failure of XPATH to escape its return value was offset
by XMLATTRIBUTES insistence to escape all input values, even if they're
already of type XML.

To wit, if you do

  SELECT XMLELEMENT(NAME t, XMLATTRIBUTES('amp;'::XML AS a))

you get

 xmlelement 

 t a=amp;amp;/

which is somewhat surprising. Especially since

  SELECT XMLELEMENT(NAME t, 'amp;'::XML);

gives

  xmlelement  
--
 tamp;/t

as one would except. Now, it seems rather unlikely that a real application
would actually contain a query similar to the former one - you usually don 't
store XML in the attributes of XML nodes. But since XPATH() returns XML, it's
not unlikely that an application would do

  SELECT XMLELEMENT(NAME ..., XMLATTRIBUTES((XPATH(...))[1] AS ...))

As it stands, on 9.2 the values returned by the XPath expression will be
escaped twice - once by XPATH and a second time by XMLATTRIBUTES, while on 9.1
the value will be escaped only once.

Since this seems to be the most likely situation in which XMLATTRIBUTES would
receive an input argument of type XML, and since it's not entirely logical for
XMLATTRIBUTES to unconditionally escapes values already of type XML, I propose
to change the behaviour of XMLATTRIBUTES as follows.

Values not of type XML are be treated as they always have been, i.e. all
special characters (') are replaced by an entity reference (lt;,
gt;, amp;, quot;, apos;).

Values of type XML are assumed to be already escaped, so '' isn't treated as
a special character to avoid double escaping. The other special characters
(,,,') are escaped as usual.

The safety of this relies on the fact that '' may never appear in a
well-formed XML document, except as part of an entity reference. This seems to
be the case, except if CDATA sections are involved, which may contain plain
ampersands. So the actual escaping rule would have to be a bit more complex
than I made it sound above - we'd need to detect CDATA sections, and
re-enabled the escaping of '' until the end of such a section.

To actually implement this, I'd remove the use of libxml from the
implementation of XMLELEMENT, and instead use our already existing
escape_xml() function, enhanced with the ability to handle the partial
escaping algorithm outlined above. We already only use libxml to escape
attribute values, so doing that isn't a radical departure from xml.c's ways.
As an added benefit, all the encoding-related problems of XMLELEMENT and
XMLATTRIBUTES would go away once libxml is removed from this code path. So
XPATH() would be the only remaining function which breaks in non-UTF-8
databases.

Comments? Thoughts? Suggestions?

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers