Re: [xml] Problem with data in interleave in RELAX NG validation

2018-11-22 Thread Daniel Veillard via xml
On Sun, Oct 14, 2018 at 09:02:29PM +0200, Nikolai Weibull via xml wrote:
> Hi!
> 
> OK, I managed to decode it somewhat.  The issue seems to be that we build
> groups of what can be matched by the interleave, but that these groups don’t
> include data, list, and value elements, only element and text elements.

  this may be an oversight when going from the RNG data model to what
is the libxml2 insternal structures reflecting this. Among the mismatches
are also TEXT vs CDATA which are separated in libxml2 model and unified as
simple text nodes in the RelaxNG one.

> This patch extends xmlRelaxNGGetElements so that it can return these
> elements for us in xmlRelaxNGComputeInterleaves.  Then we make sure to
> updatexmlRelaxNGNodeMatchesList as well so that it accepts the correct
> types.

  That sounds reasonable

> The testsuite passes and my test below does as well.
> 
> I’m a bit surprised that interleaves simply wouldn’t allow for data, list,
> and value elements previously, so I’m wondering if there was a reason for
> the code to be the way it was and that the fix should be placed somewhere
> else or if it was simply an oversight.  Either way, this does seem to be the

  I honnestly can't remember !

> correct solution. If someone could confirm that this solution is what we’re
> looking for, I’ll add some proper test cases and apply another merge request
> on git.gnome.org.

  This indeed seems to be working, would you mind sending patches to add
regression tests for this, that way I can incorporate those into an upcoming
release.

   thanks a lot !

Daniel

> Best regards,
>  Nikolai
> 
> ---
> relaxng.c | 16 
> 1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/relaxng.c b/relaxng.c
> index 8306e546..3ed03ff4 100644
> --- a/relaxng.c
> +++ b/relaxng.c
> @@ -3993,7 +3993,7 @@ xmlRelaxNGGenerateAttributes(xmlRelaxNGParserCtxtPtr
> ctxt,
>  * xmlRelaxNGGetElements:
>  * @ctxt:  a Relax-NG parser context
>  * @def:  the definition definition
> - * @eora:  gather elements (0) or attributes (1)
> + * @eora:  gather elements (0), attributes (1) or elements and text (2)
>  *
>  * Compute the list of top elements a definition can generate
>  *
> @@ -4019,7 +4019,12 @@ xmlRelaxNGGetElements(xmlRelaxNGParserCtxtPtr ctxt,
> while (cur != NULL) {
> if (((eora == 0) && ((cur->type == XML_RELAXNG_ELEMENT) ||
>  (cur->type == XML_RELAXNG_TEXT))) ||
> -((eora == 1) && (cur->type ==
> XML_RELAXNG_ATTRIBUTE))) {
> +((eora == 1) && (cur->type == XML_RELAXNG_ATTRIBUTE)) ||
> +((eora == 2) && ((cur->type == XML_RELAXNG_DATATYPE) ||
> + (cur->type == XML_RELAXNG_ELEMENT) ||
> + (cur->type == XML_RELAXNG_LIST) ||
> + (cur->type == XML_RELAXNG_TEXT) ||
> + (cur->type == XML_RELAXNG_VALUE {
> if (ret == NULL) {
> max = 10;
> ret = (xmlRelaxNGDefinePtr *)
> @@ -4374,7 +4379,7 @@ xmlRelaxNGComputeInterleaves(void *payload, void
> *data,
> if (cur->type == XML_RELAXNG_TEXT)
> is_mixed++;
> groups[nbgroups]->rule = cur;
> -groups[nbgroups]->defs = xmlRelaxNGGetElements(ctxt, cur,
> 0);
> +groups[nbgroups]->defs = xmlRelaxNGGetElements(ctxt, cur, 2);
> groups[nbgroups]->attrs = xmlRelaxNGGetElements(ctxt, cur,
> 1);
> nbgroups++;
> cur = cur->next;
> @@ -9280,7 +9285,10 @@ xmlRelaxNGNodeMatchesList(xmlNodePtr node,
> xmlRelaxNGDefinePtr * list)
> return (1);
> } else if (((node->type == XML_TEXT_NODE) ||
> (node->type == XML_CDATA_SECTION_NODE)) &&
> -   (cur->type == XML_RELAXNG_TEXT)) {
> +   ((cur->type == XML_RELAXNG_DATATYPE) ||
> +(cur->type == XML_RELAXNG_LIST) ||
> +(cur->type == XML_RELAXNG_TEXT) ||
> +(cur->type == XML_RELAXNG_VALUE))) {
> return (1);
> }
> cur = list[i++];
> -- 
> 2.19.1
> 
> 
> Nikolai Weibull, 2018-10-13 00:23:
> 
> > Hi!
> > 
> > This remains unfixed.  I have absolutely no idea what’s going on in
> > the interleave validation code.  Daniel, could you please put together
> > some minor documentation on how the interleave validation code works?
> > It’s very complicated.
> > 
> > Thank you,
> > 
> >  Nikolai
> > 
> > Nikolai Weibull, 2018-09-09 21:26:
> > 
> > > Hi!
> > > 
> > > Given the following input RELAX NG grammar:
> > > 
> > > http://relaxng.org/ns/structure/1.0";
> > > datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes";>
> > >  
> > >
> > >  
> > >
> > >
> > >  
> > >
> > >  
> > > 
> > > 
> > > and the following input document a.xml:
> > > 
> > > c
> > > 
> > > xmllint reports:
> > > 
> > > a.xml:1: element a: Relax-NG validity error : Ele

Re: [xml] Problem with data in interleave in RELAX NG validation

2018-11-22 Thread Daniel Veillard via xml
On Sat, Oct 13, 2018 at 12:23:09AM +0200, Nikolai Weibull via xml wrote:
> Hi!
> 
> This remains unfixed.  I have absolutely no idea what’s going on in the
> interleave validation code.  Daniel, could you please put together some
> minor documentation on how the interleave validation code works?  It’s very
> complicated.
> 
> Thank you,
> 
>  Nikolai

  Hi Nikolai,

I must admit  I don't think I really touched that code for 10 years, it's a bit
hard :-) ... Some of the points behind the RNG implementation of libxml2
are:

  1/ I didn't used the derivation method suggested by James Clark, the problem
 is that if you want to validate an extremely large document, you end up
 building potentially an extremely large in memory derivation data
  2/ where possible, I switch the generated RNG data structure to a normal
 regexp and then use that to validate the subtree, reuses existing
 code, goes way faster due to regexp compilation, but it's certainly
 confusing when trying to understand how it all work !
  3/ based on the above I build streaming RNG validation when people want
 do do RNG validation without full in-memory tree 

so yes deciphering that code both from compilation and runtime is... hard
sorry about that :-)

There is also a known issue with validating interleaves of interleaves, I have
a mess there, it's something on the back of my mind for a decade but
surprizingly few people seems to hit it (or they just used Jing !)

Daniel

> Nikolai Weibull, 2018-09-09 21:26:
> 
> > Hi!
> > 
> > Given the following input RELAX NG grammar:
> > 
> > http://relaxng.org/ns/structure/1.0";
> > datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes";>
> >  
> >
> >  
> >
> >
> >  
> >
> >  
> > 
> > 
> > and the following input document a.xml:
> > 
> > c
> > 
> > xmllint reports:
> > 
> > a.xml:1: element a: Relax-NG validity error : Element a has extra
> > content: text
> > a.xml fails to validate
> > 
> > Changing the interleave to a group solves the issue, so the problem is
> > with how interleaves are validated.
> > 
> > I looked at xmlRelaxNGValidateInterleave() and I sadly have no idea
> > what’s going on.  Please point me in the right direction and I’ll
> > gladly write a patch.
> > 
> >  Nikolai
> 
> ___
> xml mailing list, project page  http://xmlsoft.org/
> xml@gnome.org
> https://mail.gnome.org/mailman/listinfo/xml

-- 
Daniel Veillard  | Red Hat Developers Tools http://developer.redhat.com/
veill...@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Problem with data in interleave in RELAX NG validation

2018-10-15 Thread Ross Reedstrom via xml
Nikolai - 
Glad to see someone attacking these. I've got some RNG schema that we've
had to use jing to validate, since libxml2 was giving similiar issues to
what you're seeing, and I was even more daunted by the code than you
seem to be. If you could get this branch somewhere I can pull it down,
I'd love to see if your fixes help my schema.

Ross


On Sun, Oct 14, 2018 at 09:02:29PM +0200, Nikolai Weibull via xml wrote:
> Hi!
> 
> OK, I managed to decode it somewhat.  The issue seems to be that we build
> groups of what can be matched by the interleave, but that these groups don’t
> include data, list, and value elements, only element and text elements.
> This patch extends xmlRelaxNGGetElements so that it can return these
> elements for us in xmlRelaxNGComputeInterleaves.  Then we make sure to
> updatexmlRelaxNGNodeMatchesList as well so that it accepts the correct
> types.
> 
> The testsuite passes and my test below does as well.
> 
> I’m a bit surprised that interleaves simply wouldn’t allow for data, list,
> and value elements previously, so I’m wondering if there was a reason for
> the code to be the way it was and that the fix should be placed somewhere
> else or if it was simply an oversight.  Either way, this does seem to be the
> correct solution. If someone could confirm that this solution is what we’re
> looking for, I’ll add some proper test cases and apply another merge request
> on git.gnome.org.
> 
> Best regards,
>  Nikolai
> 
> ---
> relaxng.c | 16 
> 1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/relaxng.c b/relaxng.c
> index 8306e546..3ed03ff4 100644
> --- a/relaxng.c
> +++ b/relaxng.c
> @@ -3993,7 +3993,7 @@ xmlRelaxNGGenerateAttributes(xmlRelaxNGParserCtxtPtr
> ctxt,
>  * xmlRelaxNGGetElements:
>  * @ctxt:  a Relax-NG parser context
>  * @def:  the definition definition
> - * @eora:  gather elements (0) or attributes (1)
> + * @eora:  gather elements (0), attributes (1) or elements and text (2)
>  *
>  * Compute the list of top elements a definition can generate
>  *
> @@ -4019,7 +4019,12 @@ xmlRelaxNGGetElements(xmlRelaxNGParserCtxtPtr ctxt,
> while (cur != NULL) {
> if (((eora == 0) && ((cur->type == XML_RELAXNG_ELEMENT) ||
>  (cur->type == XML_RELAXNG_TEXT))) ||
> -((eora == 1) && (cur->type ==
> XML_RELAXNG_ATTRIBUTE))) {
> +((eora == 1) && (cur->type == XML_RELAXNG_ATTRIBUTE)) ||
> +((eora == 2) && ((cur->type == XML_RELAXNG_DATATYPE) ||
> + (cur->type == XML_RELAXNG_ELEMENT) ||
> + (cur->type == XML_RELAXNG_LIST) ||
> + (cur->type == XML_RELAXNG_TEXT) ||
> + (cur->type == XML_RELAXNG_VALUE {
> if (ret == NULL) {
> max = 10;
> ret = (xmlRelaxNGDefinePtr *)
> @@ -4374,7 +4379,7 @@ xmlRelaxNGComputeInterleaves(void *payload, void
> *data,
> if (cur->type == XML_RELAXNG_TEXT)
> is_mixed++;
> groups[nbgroups]->rule = cur;
> -groups[nbgroups]->defs = xmlRelaxNGGetElements(ctxt, cur,
> 0);
> +groups[nbgroups]->defs = xmlRelaxNGGetElements(ctxt, cur, 2);
> groups[nbgroups]->attrs = xmlRelaxNGGetElements(ctxt, cur,
> 1);
> nbgroups++;
> cur = cur->next;
> @@ -9280,7 +9285,10 @@ xmlRelaxNGNodeMatchesList(xmlNodePtr node,
> xmlRelaxNGDefinePtr * list)
> return (1);
> } else if (((node->type == XML_TEXT_NODE) ||
> (node->type == XML_CDATA_SECTION_NODE)) &&
> -   (cur->type == XML_RELAXNG_TEXT)) {
> +   ((cur->type == XML_RELAXNG_DATATYPE) ||
> +(cur->type == XML_RELAXNG_LIST) ||
> +(cur->type == XML_RELAXNG_TEXT) ||
> +(cur->type == XML_RELAXNG_VALUE))) {
> return (1);
> }
> cur = list[i++];
> -- 
> 2.19.1
> 
> 
> Nikolai Weibull, 2018-10-13 00:23:
> 
> >Hi!
> >
> >This remains unfixed.  I have absolutely no idea what’s going on in
> >the interleave validation code.  Daniel, could you please put together
> >some minor documentation on how the interleave validation code works?
> >It’s very complicated.
> >
> >Thank you,
> >
> > Nikolai
> >
> >Nikolai Weibull, 2018-09-09 21:26:
> >
> >>Hi!
> >>
> >>Given the following input RELAX NG grammar:
> >>
> >>http://relaxng.org/ns/structure/1.0";
> >>datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes";>
> >> 
> >>   
> >> 
> >>   
> >>   
> >> 
> >>   
> >> 
> >>
> >>
> >>and the following input document a.xml:
> >>
> >>c
> >>
> >>xmllint reports:
> >>
> >>a.xml:1: element a: Relax-NG validity error : Element a has extra
> >>content: text
> >>a.xml fails to validate
> >>
> >>Changing the interleave to a group solves the issue, so the problem
> >>is
> >>with how interleaves are validated.
> >>
> >>I looked at x

Re: [xml] Problem with data in interleave in RELAX NG validation

2018-10-14 Thread Nikolai Weibull via xml

Hi!

OK, I managed to decode it somewhat.  The issue seems to be that 
we build groups of what can be matched by the interleave, but that 
these groups don’t include data, list, and value elements, only 
element and text elements.  This patch extends 
xmlRelaxNGGetElements so that it can return these elements for us 
in xmlRelaxNGComputeInterleaves.  Then we make sure to 
updatexmlRelaxNGNodeMatchesList as well so that it accepts the 
correct types.


The testsuite passes and my test below does as well.

I’m a bit surprised that interleaves simply wouldn’t allow for 
data, list, and value elements previously, so I’m wondering if 
there was a reason for the code to be the way it was and that the 
fix should be placed somewhere else or if it was simply an 
oversight.  Either way, this does seem to be the correct solution. 
If someone could confirm that this solution is what we’re looking 
for, I’ll add some proper test cases and apply another merge 
request on git.gnome.org.


Best regards,
 Nikolai

---
relaxng.c | 16 
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/relaxng.c b/relaxng.c
index 8306e546..3ed03ff4 100644
--- a/relaxng.c
+++ b/relaxng.c
@@ -3993,7 +3993,7 @@ 
xmlRelaxNGGenerateAttributes(xmlRelaxNGParserCtxtPtr ctxt,

 * xmlRelaxNGGetElements:
 * @ctxt:  a Relax-NG parser context
 * @def:  the definition definition
- * @eora:  gather elements (0) or attributes (1)
+ * @eora:  gather elements (0), attributes (1) or elements and 
text (2)

 *
 * Compute the list of top elements a definition can generate
 *
@@ -4019,7 +4019,12 @@ 
xmlRelaxNGGetElements(xmlRelaxNGParserCtxtPtr ctxt,

while (cur != NULL) {
if (((eora == 0) && ((cur->type == XML_RELAXNG_ELEMENT) 
||

 (cur->type == XML_RELAXNG_TEXT))) ||
-((eora == 1) && (cur->type == 
XML_RELAXNG_ATTRIBUTE))) {
+((eora == 1) && (cur->type == XML_RELAXNG_ATTRIBUTE)) 
||
+((eora == 2) && ((cur->type == XML_RELAXNG_DATATYPE) 
||
+ (cur->type == XML_RELAXNG_ELEMENT) 
||

+ (cur->type == XML_RELAXNG_LIST) ||
+ (cur->type == XML_RELAXNG_TEXT) ||
+ (cur->type == XML_RELAXNG_VALUE 
{

if (ret == NULL) {
max = 10;
ret = (xmlRelaxNGDefinePtr *)
@@ -4374,7 +4379,7 @@ xmlRelaxNGComputeInterleaves(void *payload, 
void *data,

if (cur->type == XML_RELAXNG_TEXT)
is_mixed++;
groups[nbgroups]->rule = cur;
-groups[nbgroups]->defs = xmlRelaxNGGetElements(ctxt, cur, 
0);
+groups[nbgroups]->defs = xmlRelaxNGGetElements(ctxt, cur, 
2);
groups[nbgroups]->attrs = xmlRelaxNGGetElements(ctxt, 
cur, 1);

nbgroups++;
cur = cur->next;
@@ -9280,7 +9285,10 @@ xmlRelaxNGNodeMatchesList(xmlNodePtr node, 
xmlRelaxNGDefinePtr * list)

return (1);
} else if (((node->type == XML_TEXT_NODE) ||
(node->type == XML_CDATA_SECTION_NODE)) &&
-   (cur->type == XML_RELAXNG_TEXT)) {
+   ((cur->type == XML_RELAXNG_DATATYPE) ||
+(cur->type == XML_RELAXNG_LIST) ||
+(cur->type == XML_RELAXNG_TEXT) ||
+(cur->type == XML_RELAXNG_VALUE))) {
return (1);
}
cur = list[i++];
--
2.19.1


Nikolai Weibull, 2018-10-13 00:23:


Hi!

This remains unfixed.  I have absolutely no idea what’s going on 
in
the interleave validation code.  Daniel, could you please put 
together
some minor documentation on how the interleave validation code 
works?

It’s very complicated.

Thank you,

 Nikolai

Nikolai Weibull, 2018-09-09 21:26:


Hi!

Given the following input RELAX NG grammar:

http://relaxng.org/ns/structure/1.0";
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes";>
 
   
 
   
   
 
   
 


and the following input document a.xml:

c

xmllint reports:

a.xml:1: element a: Relax-NG validity error : Element a has 
extra

content: text
a.xml fails to validate

Changing the interleave to a group solves the issue, so the 
problem

is
with how interleaves are validated.

I looked at xmlRelaxNGValidateInterleave() and I sadly have no 
idea
what’s going on.  Please point me in the right direction and 
I’ll

gladly write a patch.

 Nikolai


___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Problem with data in interleave in RELAX NG validation

2018-10-12 Thread Nikolai Weibull via xml

Hi!

This remains unfixed.  I have absolutely no idea what’s going on 
in the interleave validation code.  Daniel, could you please put 
together some minor documentation on how the interleave validation 
code works?  It’s very complicated.


Thank you,

 Nikolai

Nikolai Weibull, 2018-09-09 21:26:


Hi!

Given the following input RELAX NG grammar:

http://relaxng.org/ns/structure/1.0";
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes";>
 
   
 
   
   
 
   
 


and the following input document a.xml:

c

xmllint reports:

a.xml:1: element a: Relax-NG validity error : Element a has 
extra

content: text
a.xml fails to validate

Changing the interleave to a group solves the issue, so the 
problem is

with how interleaves are validated.

I looked at xmlRelaxNGValidateInterleave() and I sadly have no 
idea
what’s going on.  Please point me in the right direction and 
I’ll

gladly write a patch.

 Nikolai


___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Problem with data in interleave in RELAX NG validation

2018-09-09 Thread Nikolai Weibull via xml

Hi!

Given the following input RELAX NG grammar:

http://relaxng.org/ns/structure/1.0";
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes";>
 
   
 
   
   
 
   
 


and the following input document a.xml:

c

xmllint reports:

a.xml:1: element a: Relax-NG validity error : Element a has extra 
content: text

a.xml fails to validate

Changing the interleave to a group solves the issue, so the 
problem is with how interleaves are validated.


I looked at xmlRelaxNGValidateInterleave() and I sadly have no 
idea what’s going on.  Please point me in the right direction and 
I’ll gladly write a patch.


 Nikolai
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml