Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2017-11-09 Thread Pavel Stehule
Hi

2017-11-06 14:00 GMT+01:00 Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp>:

> Thank you for the new patch.
>
> - The latest patch is missing xpath_parser.h at least since
>   ns-3. That of the first (not-numbered) version was still
>   usable.
>
> - c29c578 conflicts on doc/src/sgml/func.sgml
>
>
> At Sun, 15 Oct 2017 12:06:11 +0200, Pavel Stehule 
> wrote in  b9efon...@mail.gmail.com>
> > 2017-10-02 12:22 GMT+02:00 Kyotaro HORIGUCHI <
> > horiguchi.kyot...@lab.ntt.co.jp>:
> >
> > > Hi, thanks for the new patch.
> > >
> > > # The patch is missing xpath_parser.h. That of the first patch was
> usable.
> > >
> > > At Thu, 28 Sep 2017 07:59:41 +0200, Pavel Stehule <
> pavel.steh...@gmail.com>
> > > wrote in  > > mail.gmail.com>
> > > > Hi
> > > >
> > > > now xpath and xpath_exists supports default namespace too
> > >
> > > At Wed, 27 Sep 2017 22:41:52 +0200, Pavel Stehule <
> pavel.steh...@gmail.com>
> > > wrote in  > > gmail.com>
> > > > > 1. Uniformity among simliar features
> > > > >
> > > > >   As mentioned in the proposal, but it is lack of uniformity that
> > > > >   the xpath transformer is applied only to xmltable and not for
> > > > >   other xpath related functions.
> > > > >
> > > >
> > > > I have to fix the XPath function. The SQL/XML function Xmlexists
> doesn't
> > > > support namespaces/
> > >
> > > Sorry, I forgot to care about that. (And the definition of
> > > namespace array is of course fabricated by me). I'd like to leave
> > > this to committers.  Anyway it is working but the syntax (or
> > > whether it is acceptable) is still arguable.
> > >
> > > SELECT xpath('/a/text()', 'http://example.com";>
> > > test',
> > >  ARRAY[ARRAY['', 'http://example.com']]);
> > > |  xpath
> > > | 
> > > |  {test}
> > > | (1 row)
> > >
> > >
> > > The internal name is properly rejected, but the current internal
> > > name (pgdefnamespace.pgsqlxml.internal) seems a bit too long. We
> > > are preserving some short names and reject them as
> > > user-defined. Doesn't just 'pgsqlxml' work?
> > >
> > >
> > > Default namespace correctly become to be applied on bare
> > > attribute names.
> > >
> > > > updated doc,
> > > > fixed all variants of expected result test file
> > >
> > > Sorry for one by one comment but I found another misbehavior.
> > >
> > > create table t1 (id int, doc xml);
> > > insert into t1
> > >values
> > >(5, 'http://x.y";>50 > > rows>');
> > > select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' AS x),
> > > '/x:rows/x:row' passing t1.doc columns data int PATH
> > > 'child::x:a[1][attribute::hoge="haha"]') as x;
> > > |  data
> > > | --
> > > |50
> > >
> > > but the following fails.
> > >
> > > select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> > > '/rows/row' passing t1.doc columns data int PATH
> > > 'child::a[1][attribute::hoge="haha"]') as x;
> > > |  data
> > > | --
> > > |
> > > | (1 row)
> > >
> > > Perhaps child::a is not prefixed by the transformation.
> > >
> >
> > the problem was in unwanted attribute modification. The parser didn't
> > detect "attribute::hoge" as attribute. Updated parser does it. I reduce
> > duplicated code there more.
>
> It worked as expected. But the comparison of "attribute" is
> missing t1.length = 9 so the following expression wrongly passes.
>
> child::a[1][attributeabcdefg::hoge="haha"
>
> It is confusing that is_qual_name becomes true when t2 is not a
> "qual name", and the way it treats a double-colon is hard to
> understand.
>
> It essentially does inserting the default namespace before
> unqualified non-attribute name. I believe we can easily
> look-ahead to detect a double colon and it would make things
> simpler. Could you consider something like the attached patch?
> (applies on top of ns-4 patch.)
>
> > > XPath might be complex enough so that it's worth switching to
> > > yacc/lex based transformer that is formally verifiable and won't
> > > need a bunch of cryptic tests that finally cannot prove the
> > > completeness. synchronous_standy_names is far simpler than XPath
> > > but using yacc/lex parser.
> > >
> > >
> > > Anyway the following is nitpicking of the current xpath_parser.c.
> > >
> > > - NODENAME_FIRSTCHAR allows '-' as the first char but it is
> > >   excluded from NameStartChar (https://www.w3.org/TR/REC-
> > > xml/#NT-NameStartChar)
> > >   I think characters with high-bit set is okay.
> > >   Also IS_NODENAME_CHAR should be changed.
> > >
> >
> > fixed
> >
> >
> > > - NODENAME_FIRSTCHAR and IS_NODENAME_CHAR is in the same category
> > >   but have different naming schemes. Can these are named in the same
> way?
> > >
> >
> > fixed
> >
> >
> > > - The current transoformer seems to using up to one token stack
> > >   depth. Maybe the stack is needless. (pushed token is always
> > >   popped just after)
> > >
> >
> > fixed
>
> Thank you.
>
> I found another (and should be the last, so sorry..) functional
> defect in this. This doesn't add default namespace if the ta

Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2017-11-06 Thread Kyotaro HORIGUCHI
Thank you for the new patch.

- The latest patch is missing xpath_parser.h at least since
  ns-3. That of the first (not-numbered) version was still
  usable.

- c29c578 conflicts on doc/src/sgml/func.sgml


At Sun, 15 Oct 2017 12:06:11 +0200, Pavel Stehule  
wrote in 
> 2017-10-02 12:22 GMT+02:00 Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp>:
> 
> > Hi, thanks for the new patch.
> >
> > # The patch is missing xpath_parser.h. That of the first patch was usable.
> >
> > At Thu, 28 Sep 2017 07:59:41 +0200, Pavel Stehule 
> > wrote in  > mail.gmail.com>
> > > Hi
> > >
> > > now xpath and xpath_exists supports default namespace too
> >
> > At Wed, 27 Sep 2017 22:41:52 +0200, Pavel Stehule 
> > wrote in  > gmail.com>
> > > > 1. Uniformity among simliar features
> > > >
> > > >   As mentioned in the proposal, but it is lack of uniformity that
> > > >   the xpath transformer is applied only to xmltable and not for
> > > >   other xpath related functions.
> > > >
> > >
> > > I have to fix the XPath function. The SQL/XML function Xmlexists doesn't
> > > support namespaces/
> >
> > Sorry, I forgot to care about that. (And the definition of
> > namespace array is of course fabricated by me). I'd like to leave
> > this to committers.  Anyway it is working but the syntax (or
> > whether it is acceptable) is still arguable.
> >
> > SELECT xpath('/a/text()', 'http://example.com";>
> > test',
> >  ARRAY[ARRAY['', 'http://example.com']]);
> > |  xpath
> > | 
> > |  {test}
> > | (1 row)
> >
> >
> > The internal name is properly rejected, but the current internal
> > name (pgdefnamespace.pgsqlxml.internal) seems a bit too long. We
> > are preserving some short names and reject them as
> > user-defined. Doesn't just 'pgsqlxml' work?
> >
> >
> > Default namespace correctly become to be applied on bare
> > attribute names.
> >
> > > updated doc,
> > > fixed all variants of expected result test file
> >
> > Sorry for one by one comment but I found another misbehavior.
> >
> > create table t1 (id int, doc xml);
> > insert into t1
> >values
> >(5, 'http://x.y";>50 > rows>');
> > select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' AS x),
> > '/x:rows/x:row' passing t1.doc columns data int PATH
> > 'child::x:a[1][attribute::hoge="haha"]') as x;
> > |  data
> > | --
> > |50
> >
> > but the following fails.
> >
> > select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> > '/rows/row' passing t1.doc columns data int PATH
> > 'child::a[1][attribute::hoge="haha"]') as x;
> > |  data
> > | --
> > |
> > | (1 row)
> >
> > Perhaps child::a is not prefixed by the transformation.
> >
> 
> the problem was in unwanted attribute modification. The parser didn't
> detect "attribute::hoge" as attribute. Updated parser does it. I reduce
> duplicated code there more.

It worked as expected. But the comparison of "attribute" is
missing t1.length = 9 so the following expression wrongly passes.

child::a[1][attributeabcdefg::hoge="haha"

It is confusing that is_qual_name becomes true when t2 is not a
"qual name", and the way it treats a double-colon is hard to
understand.

It essentially does inserting the default namespace before
unqualified non-attribute name. I believe we can easily
look-ahead to detect a double colon and it would make things
simpler. Could you consider something like the attached patch?
(applies on top of ns-4 patch.)

> > XPath might be complex enough so that it's worth switching to
> > yacc/lex based transformer that is formally verifiable and won't
> > need a bunch of cryptic tests that finally cannot prove the
> > completeness. synchronous_standy_names is far simpler than XPath
> > but using yacc/lex parser.
> >
> >
> > Anyway the following is nitpicking of the current xpath_parser.c.
> >
> > - NODENAME_FIRSTCHAR allows '-' as the first char but it is
> >   excluded from NameStartChar (https://www.w3.org/TR/REC-
> > xml/#NT-NameStartChar)
> >   I think characters with high-bit set is okay.
> >   Also IS_NODENAME_CHAR should be changed.
> >
> 
> fixed
> 
> 
> > - NODENAME_FIRSTCHAR and IS_NODENAME_CHAR is in the same category
> >   but have different naming schemes. Can these are named in the same way?
> >
> 
> fixed
> 
> 
> > - The current transoformer seems to using up to one token stack
> >   depth. Maybe the stack is needless. (pushed token is always
> >   popped just after)
> >
> 
> fixed

Thank you.

I found another (and should be the last, so sorry..) functional
defect in this. This doesn't add default namespace if the tag
name in a predicate is 'and' or 'or'. It needs to be fixed, or
wrote in the documentation as a restriction. (seem hard to fix
it..)

create table t1 (id int, doc xml);
insert into t1 values (1, 'http://x.y";>5060');
select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' AS x),
'/x:rows/x:row' passing t1.doc columns data int PATH
'x:val[../x:and = 60]') as x;
 data
--
   50
(1 row)
select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http

Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2017-10-15 Thread Pavel Stehule
2017-10-02 12:22 GMT+02:00 Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp>:

> Hi, thanks for the new patch.
>
> # The patch is missing xpath_parser.h. That of the first patch was usable.
>
> At Thu, 28 Sep 2017 07:59:41 +0200, Pavel Stehule 
> wrote in  mail.gmail.com>
> > Hi
> >
> > now xpath and xpath_exists supports default namespace too
>
> At Wed, 27 Sep 2017 22:41:52 +0200, Pavel Stehule 
> wrote in  gmail.com>
> > > 1. Uniformity among simliar features
> > >
> > >   As mentioned in the proposal, but it is lack of uniformity that
> > >   the xpath transformer is applied only to xmltable and not for
> > >   other xpath related functions.
> > >
> >
> > I have to fix the XPath function. The SQL/XML function Xmlexists doesn't
> > support namespaces/
>
> Sorry, I forgot to care about that. (And the definition of
> namespace array is of course fabricated by me). I'd like to leave
> this to committers.  Anyway it is working but the syntax (or
> whether it is acceptable) is still arguable.
>
> SELECT xpath('/a/text()', 'http://example.com";>
> test',
>  ARRAY[ARRAY['', 'http://example.com']]);
> |  xpath
> | 
> |  {test}
> | (1 row)
>
>
> The internal name is properly rejected, but the current internal
> name (pgdefnamespace.pgsqlxml.internal) seems a bit too long. We
> are preserving some short names and reject them as
> user-defined. Doesn't just 'pgsqlxml' work?
>
>
> Default namespace correctly become to be applied on bare
> attribute names.
>
> > updated doc,
> > fixed all variants of expected result test file
>
> Sorry for one by one comment but I found another misbehavior.
>
> create table t1 (id int, doc xml);
> insert into t1
>values
>(5, 'http://x.y";>50 rows>');
> select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' AS x),
> '/x:rows/x:row' passing t1.doc columns data int PATH
> 'child::x:a[1][attribute::hoge="haha"]') as x;
> |  data
> | --
> |50
>
> but the following fails.
>
> select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> '/rows/row' passing t1.doc columns data int PATH
> 'child::a[1][attribute::hoge="haha"]') as x;
> |  data
> | --
> |
> | (1 row)
>
> Perhaps child::a is not prefixed by the transformation.
>

the problem was in unwanted attribute modification. The parser didn't
detect "attribute::hoge" as attribute. Updated parser does it. I reduce
duplicated code there more.


> XPath might be complex enough so that it's worth switching to
> yacc/lex based transformer that is formally verifiable and won't
> need a bunch of cryptic tests that finally cannot prove the
> completeness. synchronous_standy_names is far simpler than XPath
> but using yacc/lex parser.
>
>
> Anyway the following is nitpicking of the current xpath_parser.c.
>
> - NODENAME_FIRSTCHAR allows '-' as the first char but it is
>   excluded from NameStartChar (https://www.w3.org/TR/REC-
> xml/#NT-NameStartChar)
>   I think characters with high-bit set is okay.
>   Also IS_NODENAME_CHAR should be changed.
>

fixed


> - NODENAME_FIRSTCHAR and IS_NODENAME_CHAR is in the same category
>   but have different naming schemes. Can these are named in the same way?
>

fixed


> - The current transoformer seems to using up to one token stack
>   depth. Maybe the stack is needless. (pushed token is always
>   popped just after)
>

fixed

Regards

Pavel

>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index b52407822d..af72a07326 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10468,7 +10468,8 @@ SELECT xml_is_well_formed_document('http://postgresql.org/stuf
  second the namespace URI. It is not required that aliases provided in
  this array be the same as those being used in the XML document itself (in
  other words, both in the XML document and in the xpath
- function context, aliases are local).
+ function context, aliases are local). Default namespace has
+ empty name (empty string) and should be only one.
 
 
 
@@ -10487,8 +10488,8 @@ SELECT xpath('/my:a/text()', 'http://example.com";>test',
 
  To deal with default (anonymous) namespaces, do something like this:
 

Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2017-10-02 Thread Kyotaro HORIGUCHI
At Mon, 2 Oct 2017 12:43:19 +0200, Pavel Stehule  
wrote in 
> > Sorry, I forgot to care about that. (And the definition of
> > namespace array is of course fabricated by me). I'd like to leave
> > this to committers.  Anyway it is working but the syntax (or
> > whether it is acceptable) is still arguable.
> >
> > SELECT xpath('/a/text()', 'http://example.com";>
> > test',
> >  ARRAY[ARRAY['', 'http://example.com']]);
> > |  xpath
> > | 
> > |  {test}
> > | (1 row)
> >
> >
> > The internal name is properly rejected, but the current internal
> > name (pgdefnamespace.pgsqlxml.internal) seems a bit too long. We
> > are preserving some short names and reject them as
> > user-defined. Doesn't just 'pgsqlxml' work?
> >
> 
> LibXML2 does trim to 100 bytes length names. So
> pgdefnamespace.pgsqlxml.internal
> is safe from this perspective.
> 
> I would to decraese a risk of possible collision, so longer string is
> better. Maybe "pgsqlxml.internal" is good enoug - I have not a idea. But if
> somewhere will be this string printed, then
> "pgdefnamespace.pgsqlxml.internal" has clean semantic, and it is reason,
> why I prefer this string. PostgreSQL uses 63 bytes names - and this string
> is correct too.

Ok, I'm fine with that.

> > Default namespace correctly become to be applied on bare
> > attribute names.
> >
> > > updated doc,
> > > fixed all variants of expected result test file
> >
> > Sorry for one by one comment but I found another misbehavior.
> >
> > create table t1 (id int, doc xml);
> > insert into t1
> >values
> >(5, 'http://x.y";>50 > rows>');
> > select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' AS x),
> > '/x:rows/x:row' passing t1.doc columns data int PATH
> > 'child::x:a[1][attribute::hoge="haha"]') as x;
> > |  data
> > | --
> > |50
> >
> > but the following fails.
> >
> > select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> > '/rows/row' passing t1.doc columns data int PATH
> > 'child::a[1][attribute::hoge="haha"]') as x;
> > |  data
> > | --
> > |
> > | (1 row)
> >
> > Perhaps child::a is not prefixed by the transformation.
> >
> > XPath might be complex enough so that it's worth switching to
> > yacc/lex based transformer that is formally verifiable and won't
> > need a bunch of cryptic tests that finally cannot prove the
> > completeness. synchronous_standy_names is far simpler than XPath
> > but using yacc/lex parser.
> >
> 
> I don't think (not yet) - it is simple state machine now, and when the code
> will be stable, then will not be modified.

Hmm. Ok, agreed. I didn't mean the current shape ought to be
changed.

> Thank you for comments, I'll look on it


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2017-10-02 Thread Pavel Stehule
2017-10-02 12:22 GMT+02:00 Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp>:

> Hi, thanks for the new patch.
>
> # The patch is missing xpath_parser.h. That of the first patch was usable.
>
> At Thu, 28 Sep 2017 07:59:41 +0200, Pavel Stehule 
> wrote in  mail.gmail.com>
> > Hi
> >
> > now xpath and xpath_exists supports default namespace too
>
> At Wed, 27 Sep 2017 22:41:52 +0200, Pavel Stehule 
> wrote in  gmail.com>
> > > 1. Uniformity among simliar features
> > >
> > >   As mentioned in the proposal, but it is lack of uniformity that
> > >   the xpath transformer is applied only to xmltable and not for
> > >   other xpath related functions.
> > >
> >
> > I have to fix the XPath function. The SQL/XML function Xmlexists doesn't
> > support namespaces/
>
> Sorry, I forgot to care about that. (And the definition of
> namespace array is of course fabricated by me). I'd like to leave
> this to committers.  Anyway it is working but the syntax (or
> whether it is acceptable) is still arguable.
>
> SELECT xpath('/a/text()', 'http://example.com";>
> test',
>  ARRAY[ARRAY['', 'http://example.com']]);
> |  xpath
> | 
> |  {test}
> | (1 row)
>
>
> The internal name is properly rejected, but the current internal
> name (pgdefnamespace.pgsqlxml.internal) seems a bit too long. We
> are preserving some short names and reject them as
> user-defined. Doesn't just 'pgsqlxml' work?
>

LibXML2 does trim to 100 bytes length names. So
pgdefnamespace.pgsqlxml.internal
is safe from this perspective.

I would to decraese a risk of possible collision, so longer string is
better. Maybe "pgsqlxml.internal" is good enoug - I have not a idea. But if
somewhere will be this string printed, then
"pgdefnamespace.pgsqlxml.internal" has clean semantic, and it is reason,
why I prefer this string. PostgreSQL uses 63 bytes names - and this string
is correct too.


>
> Default namespace correctly become to be applied on bare
> attribute names.
>
> > updated doc,
> > fixed all variants of expected result test file
>
> Sorry for one by one comment but I found another misbehavior.
>
> create table t1 (id int, doc xml);
> insert into t1
>values
>(5, 'http://x.y";>50 rows>');
> select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' AS x),
> '/x:rows/x:row' passing t1.doc columns data int PATH
> 'child::x:a[1][attribute::hoge="haha"]') as x;
> |  data
> | --
> |50
>
> but the following fails.
>
> select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> '/rows/row' passing t1.doc columns data int PATH
> 'child::a[1][attribute::hoge="haha"]') as x;
> |  data
> | --
> |
> | (1 row)
>
> Perhaps child::a is not prefixed by the transformation.
>
> XPath might be complex enough so that it's worth switching to
> yacc/lex based transformer that is formally verifiable and won't
> need a bunch of cryptic tests that finally cannot prove the
> completeness. synchronous_standy_names is far simpler than XPath
> but using yacc/lex parser.
>

I don't think (not yet) - it is simple state machine now, and when the code
will be stable, then will not be modified.

Thank you for comments, I'll look on it

Regards

Pavel


>
> Anyway the following is nitpicking of the current xpath_parser.c.
>
> - NODENAME_FIRSTCHAR allows '-' as the first char but it is
>   excluded from NameStartChar (https://www.w3.org/TR/REC-
> xml/#NT-NameStartChar)
>   I think characters with high-bit set is okay.
>   Also IS_NODENAME_CHAR should be changed.
>
> - NODENAME_FIRSTCHAR and IS_NODENAME_CHAR is in the same category
>   but have different naming schemes. Can these are named in the same way?
>
> - The current transoformer seems to using up to one token stack
>   depth. Maybe the stack is needless. (pushed token is always
>   popped just after)
>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>


Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2017-10-02 Thread Kyotaro HORIGUCHI
Hi, thanks for the new patch.

# The patch is missing xpath_parser.h. That of the first patch was usable.

At Thu, 28 Sep 2017 07:59:41 +0200, Pavel Stehule  
wrote in 
> Hi
> 
> now xpath and xpath_exists supports default namespace too

At Wed, 27 Sep 2017 22:41:52 +0200, Pavel Stehule  
wrote in 
> > 1. Uniformity among simliar features
> >
> >   As mentioned in the proposal, but it is lack of uniformity that
> >   the xpath transformer is applied only to xmltable and not for
> >   other xpath related functions.
> >
> 
> I have to fix the XPath function. The SQL/XML function Xmlexists doesn't
> support namespaces/

Sorry, I forgot to care about that. (And the definition of
namespace array is of course fabricated by me). I'd like to leave
this to committers.  Anyway it is working but the syntax (or
whether it is acceptable) is still arguable.

SELECT xpath('/a/text()', 'http://example.com";>test',
 ARRAY[ARRAY['', 'http://example.com']]);
|  xpath  
| 
|  {test}
| (1 row)


The internal name is properly rejected, but the current internal
name (pgdefnamespace.pgsqlxml.internal) seems a bit too long. We
are preserving some short names and reject them as
user-defined. Doesn't just 'pgsqlxml' work?


Default namespace correctly become to be applied on bare
attribute names.

> updated doc,
> fixed all variants of expected result test file

Sorry for one by one comment but I found another misbehavior.

create table t1 (id int, doc xml);
insert into t1 
   values
   (5, 'http://x.y";>50');
select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' AS x), '/x:rows/x:row' 
passing t1.doc columns data int PATH 'child::x:a[1][attribute::hoge="haha"]') 
as x;
|  data 
| --
|50

but the following fails.

select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'), '/rows/row' 
passing t1.doc columns data int PATH 'child::a[1][attribute::hoge="haha"]') as 
x;
|  data 
| --
|  
| (1 row)

Perhaps child::a is not prefixed by the transformation.

XPath might be complex enough so that it's worth switching to
yacc/lex based transformer that is formally verifiable and won't
need a bunch of cryptic tests that finally cannot prove the
completeness. synchronous_standy_names is far simpler than XPath
but using yacc/lex parser.


Anyway the following is nitpicking of the current xpath_parser.c.

- NODENAME_FIRSTCHAR allows '-' as the first char but it is
  excluded from NameStartChar (https://www.w3.org/TR/REC-xml/#NT-NameStartChar)
  I think characters with high-bit set is okay.
  Also IS_NODENAME_CHAR should be changed.

- NODENAME_FIRSTCHAR and IS_NODENAME_CHAR is in the same category
  but have different naming schemes. Can these are named in the same way?

- The current transoformer seems to using up to one token stack
  depth. Maybe the stack is needless. (pushed token is always
  popped just after)

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2017-09-27 Thread Pavel Stehule
Hi

now xpath and xpath_exists supports default namespace too

updated doc,
fixed all variants of expected result test file

Regards

Pavel
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2f036015cc..610f709933 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10477,7 +10477,8 @@ SELECT xml_is_well_formed_document('http://postgresql.org/stuf
  second the namespace URI. It is not required that aliases provided in
  this array be the same as those being used in the XML document itself (in
  other words, both in the XML document and in the xpath
- function context, aliases are local).
+ function context, aliases are local). Default namespace has
+ empty name (empty string) and should be only one.
 
 
 
@@ -10496,8 +10497,8 @@ SELECT xpath('/my:a/text()', 'http://example.com";>test',
 
  To deal with default (anonymous) namespaces, do something like this:
 

Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2017-09-27 Thread Pavel Stehule
Hi

2017-09-25 13:25 GMT+02:00 Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp>:

> Hello, this patch have been ignored for a long time since its proposal...
>
> At Sat, 11 Mar 2017 20:44:31 +0100, Pavel Stehule 
> wrote in  com>
> > Hi
> >
> > This proposal is followup of implementation of XMLTABLE.
> >
> > Lot of XML documents has assigned document namespace.
> >
> > http://x.y";>10
> >
> > For these XML document any search path must use schema "http://x.y";.
> This
> > is not too intuitive, and from XMLTABLE usage is not too user friendly,
> > because the default column path (same like column name) cannot be used. A
> > solution of this issue is default namespace - defined in SQL/XML.
> >
> > example - related to previous xml
> >
> > without default namespace:
> > XMLTABLE(NAMESPACES('http://x.y' AS aux),
> > '/aux:rows/aux:row' PASSING ...
> > COLUMNS a int PATH 'aux:a')
> >
> > with default namespace
> > XMLTABLE(NAMESPACES(DEFAULT 'http://x.y'),
> > '/rows/row' PASSING ...
> > COLUMNS a int);
> >
> >
> > Unfortunately the libxml2 doesn't support default namespaces in XPath
> > expressions. Because the libxml2 functionality is frozen, there is not
> big
> > chance for support in near future. A implementation is not too hard -
> > although it requires simple XPath expressions state translator.
> >
> > The databases with XMLTABLE implementation supports default namespace for
> > XPath expressions.
> >
> > The patch for initial implementation is attached.
>
> The original message is a bit less informative for those who
> wants to review this but are not accustomed (like me) to this
> area. I try to augment this with a bit more information. (Perhaps)
>
> An example of this issue can be as follows.
>
> create table t1 (id int, doc xml);
> insert into t1
>   values
>   (1, 'http://x.y";>10'),
>   (2, 'http://x.y";>20'),
>   (3, 'http://x.y";>30'),
>   (4, 'http://x.y";>40');
> select x.* from t1, xmltable('/rows/row' passing t1.doc columns data int
> PATH 'a') as x;
> |  data
> | --
> | (0 rows)
> select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' as n),
> '/n:rows/n:row' passing t1.doc columns data int PATH 'n:a') as x;
> |  data
> | --
> |10
> |20
> |30
> |40
> | (4 rows)
>
> But, currently the follwing command fails with error.
>
> select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> '/rows/row' passing t1.doc columns data int PATH 'a') as x;
> | ERROR:  DEFAULT namespace is not supported
>
> This patch let PostgreSQL allow this syntax by transforming xpath
> string when DEFAULT namespace is defined.
>

> ===
> I have some review comments.
>
> This patch still applies with shifts and works as expected.
>
> 1. Uniformity among simliar features
>
>   As mentioned in the proposal, but it is lack of uniformity that
>   the xpath transformer is applied only to xmltable and not for
>   other xpath related functions.
>

I have to fix the XPath function. The SQL/XML function Xmlexists doesn't
support namespaces/

>
>
> 2. XML comformance
>
>I'm not yet sure this works for the all extent of the
>available syntax but at least fails for the following
>expression.
>
> (delete from t1;)
> insert into t1
>values
>(5, 'http://x.y";>50 rows>');
>
> select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> '/rows/row' passing t1.doc columns data int PATH 'a[1][@hoge]') as x;
> >  data
> > --
> >
> > (1 row)
>
>
>   The following expression works.
>
> select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' as x),
> '/x:rows/x:row' passing t1.doc columns data int PATH 'x:a[1][@hoge]') as x;
> >  data
> > --
> >50
> > (1 row)
>
>   The w3c says as follows.
>
>   https://www.w3.org/TR/xml-names/#defaulting
>   > The namespace name for an unprefixed attribute name always has no
> value.
>
>   We currently don't have a means to make sure that this works
>   correctly for the whole extent. More regression test helps?
>

I fixed this issue and I used your examples as regression tests

>
>
> 3. The predefined prefix for default namespace
>
>   The patch defines the name of the defaut namespace as
>   "pgdefnamespace". If a default namespace is defined, a
>   namespace is made with the name and with_default_ns is true. If
>   a namespace with the name is defined, a namespace is made also
>   with the same name but with_default_ns is false. This causes a
>   confused behavior.
>
> select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.x', '
> http://x.y' as pgdefnamespace), '/rows/row' passing t1.doc columns data
> int PATH 'a') as x;
> |  data
> | --
> |10
> |20
> |30
> |40
> | (4 rows)
>
>  The reason for the design is the fact that xmlXPathRegisterNs
>  doesn't accept NULL or empty string as a namespace prefix and it
>  only accepts a string consists of valid XML caharacters.
>
>  Even if we are to live with such res

Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

2017-09-25 Thread Kyotaro HORIGUCHI
Hello, this patch have been ignored for a long time since its proposal...

At Sat, 11 Mar 2017 20:44:31 +0100, Pavel Stehule  
wrote in 
> Hi
> 
> This proposal is followup of implementation of XMLTABLE.
> 
> Lot of XML documents has assigned document namespace.
> 
> http://x.y";>10
> 
> For these XML document any search path must use schema "http://x.y";. This
> is not too intuitive, and from XMLTABLE usage is not too user friendly,
> because the default column path (same like column name) cannot be used. A
> solution of this issue is default namespace - defined in SQL/XML.
> 
> example - related to previous xml
> 
> without default namespace:
> XMLTABLE(NAMESPACES('http://x.y' AS aux),
> '/aux:rows/aux:row' PASSING ...
> COLUMNS a int PATH 'aux:a')
> 
> with default namespace
> XMLTABLE(NAMESPACES(DEFAULT 'http://x.y'),
> '/rows/row' PASSING ...
> COLUMNS a int);
> 
> 
> Unfortunately the libxml2 doesn't support default namespaces in XPath
> expressions. Because the libxml2 functionality is frozen, there is not big
> chance for support in near future. A implementation is not too hard -
> although it requires simple XPath expressions state translator.
> 
> The databases with XMLTABLE implementation supports default namespace for
> XPath expressions.
> 
> The patch for initial implementation is attached.

The original message is a bit less informative for those who
wants to review this but are not accustomed (like me) to this
area. I try to augment this with a bit more information. (Perhaps)

An example of this issue can be as follows.

create table t1 (id int, doc xml);
insert into t1 
  values
  (1, 'http://x.y";>10'),
  (2, 'http://x.y";>20'),
  (3, 'http://x.y";>30'),
  (4, 'http://x.y";>40');
select x.* from t1, xmltable('/rows/row' passing t1.doc columns data int PATH 
'a') as x;
|  data 
| --
| (0 rows)
select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' as n), '/n:rows/n:row' 
passing t1.doc columns data int PATH 'n:a') as x;
|  data 
| --
|10
|20
|30
|40
| (4 rows)

But, currently the follwing command fails with error.

select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'), '/rows/row' 
passing t1.doc columns data int PATH 'a') as x;
| ERROR:  DEFAULT namespace is not supported

This patch let PostgreSQL allow this syntax by transforming xpath
string when DEFAULT namespace is defined.

===
I have some review comments.

This patch still applies with shifts and works as expected.

1. Uniformity among simliar features

  As mentioned in the proposal, but it is lack of uniformity that
  the xpath transformer is applied only to xmltable and not for
  other xpath related functions.


2. XML comformance

   I'm not yet sure this works for the all extent of the
   available syntax but at least fails for the following
   expression.

(delete from t1;)
insert into t1 
   values
   (5, 'http://x.y";>50');

select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'), '/rows/row' 
passing t1.doc columns data int PATH 'a[1][@hoge]') as x;
>  data 
> --
>  
> (1 row)


  The following expression works.

select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' as x), '/x:rows/x:row' 
passing t1.doc columns data int PATH 'x:a[1][@hoge]') as x;
>  data 
> --
>50
> (1 row)

  The w3c says as follows.

  https://www.w3.org/TR/xml-names/#defaulting
  > The namespace name for an unprefixed attribute name always has no value.  

  We currently don't have a means to make sure that this works
  correctly for the whole extent. More regression test helps?
  

3. The predefined prefix for default namespace

  The patch defines the name of the defaut namespace as
  "pgdefnamespace". If a default namespace is defined, a
  namespace is made with the name and with_default_ns is true. If
  a namespace with the name is defined, a namespace is made also
  with the same name but with_default_ns is false. This causes a
  confused behavior.

select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.x', 'http://x.y' 
as pgdefnamespace), '/rows/row' passing t1.doc columns data int PATH 'a') as x;
|  data 
| --
|10
|20
|30
|40
| (4 rows)

 The reason for the design is the fact that xmlXPathRegisterNs
 doesn't accept NULL or empty string as a namespace prefix and it
 only accepts a string consists of valid XML caharacters.

 Even if we are to live with such restriction and such name is
 hardly used, a namespace prefix with the name should be
 rejected.


4. A mistake in the documentaion ?

 The documentaion says about the XMLNAMESPACES clause as the
 folows.

https://www.postgresql.org/docs/10/static/functions-xml.html
> The optional XMLNAMESPACES clause is a comma-separated list of
> namespaces. It specifies the XML namespaces used in the document
> and their aliases.

As far as looking into XmlTableSetNamespace, (and if I read the
documentation correctly) th