Re: [xml] Exponential running time in RELAX NG matching of optional attributes

2019-08-23 Thread Daniel Veillard via xml
On Mon, Jul 22, 2019 at 11:59:42AM +0200, Jan Pokorný via xml wrote:
> On 27/06/19 14:15 +0200, Nikolai Weibull via xml wrote:
> > The following RELAX NG schema requires exponential running time when
> > matching attributes (15 attributes is where my computer begins to
> > show the symptoms, this may vary with your set-up):
> > 
> > a.rng:
> > 
> > 
> > http://relaxng.org/ns/structure/1.0;>
> >   
> > 
> >   a
> >   
> >  > ns="">a
> >  > ns="">b
> >  > ns="">c
> >  > ns="">d
> >  > ns="">e
> >  > ns="">f
> >  > ns="">g
> >  > ns="">h
> >  > ns="">i
> >  > ns="">j
> >  > ns="">k
> >  > ns="">l
> >  > ns="">m
> >  > ns="">n
> >  > ns="">o
> >   
> > 
> >   
> > 
> > 
> > a.xml:
> > 
> > 
> >  > l="12" m="13" n="14" o="15"/>
> > 
> > % time xmllint --noout --relaxng a.rng a.xml
> > real: 3.175, user: 3.147, system: 0.014 (99%)
> > 
> > Changing the group to an interleave, that is,
> > 
> > b.rng:
> > 
> > 
> > http://relaxng.org/ns/structure/1.0;>
> >   
> > 
> >   a
> >   
> >  > ns="">a
> >  > ns="">b
> >  > ns="">c
> >  > ns="">d
> >  > ns="">e
> >  > ns="">f
> >  > ns="">g
> >  > ns="">h
> >  > ns="">i
> >  > ns="">j
> >  > ns="">k
> >  > ns="">l
> >  > ns="">m
> >  > ns="">n
> >  > ns="">o
> >   
> > 
> >   
> > 
> > 
> > gives
> > 
> > % time xmllint --noout --relaxng b.rng a.xml
> > real: 0.008, user: 0.003, system: 0.002 (54%)
> > 
> > Making the attributes required instead of optional also fixes the issue:
> > 
> > c.rng:
> > 
> > 
> > http://relaxng.org/ns/structure/1.0;>
> >   
> > 
> >   a
> >   
> > a
> > b
> > c
> > d
> > e
> > f
> > g
> > h
> > i
> > j
> > k
> > l
> > m
> > n
> > o
> >   
> > 
> >   
> > 
> > 
> > % time xmllint --noout --relaxng c.rng a.xml
> > real: 0.008, user: 0.003, system: 0.002 (55%)
> > 
> > The problem seems to be that xmlRelaxNGValidateDefinition keeps
> > adding states that I feel don’t need checking, but I can’t figure
> > out how to avoid this from happening.

  Hum, I'm way behind  sorry

yes that sounds like there is a graph generation issue, best is to compile
the regexp support with things like DEBUG_REGEXP_GRAPH enabled and trace down
with pencil and paper what kind of graph it generates. That's how most of that 
code
ended up being debugged 

> 
> I can only nod to your analysis, ran into similar observations
> in the past, and they were likewise XML attributes (that parameters to
> fence device instances boil down to) related:

> https://bugzilla.redhat.com/show_bug.cgi?id=1224378#c7
> 
> Callgrind proved to be a useful tool to get better quantitative numbers.

  yup but that's way down

> > Daniel, do you have any input on this?  I’d gladly work on fixing
> > this, but I feel that I need some guidance as to what’s going wrong
> > and perhaps also how to fix it.
> 
> Would be nice to get libxml's RelaxNG validation support closer
> to parity with jing (the only other freely available validator I am
> aware of that can be directly used for a peer comparison), for sure.

  well jing use an orthogonal approach, derivating the document graph directly
while libxml2 uses validation structure derivation instead (when it can't 
compile
down to a regexp). In that case it should be regexp based.
  The two will performs massively differently on different problems.

Daniel

> ___
> xml mailing list, project page  http://xmlsoft.org/
> xml@gnome.org
> https://mail.gnome.org/mailman/listinfo/xml


-- 
Daniel Veillard  | Red Hat Developers Tools http://developer.redhat.com/
veill...@redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Exponential running time in RELAX NG matching of optional attributes

2019-07-23 Thread Michael Stahl

On 22.07.19 11:59, Jan Pokorný via xml wrote:

Would be nice to get libxml's RelaxNG validation support closer
to parity with jing (the only other freely available validator I am
aware of that can be directly used for a peer comparison), for sure.


there's also the old Sun Multi-Schema Validator, which we use in the ODF 
Toolkit validator; it's Java based of course but for exponential 
algorithms you can still compare i guess.


https://github.com/kohsuke/msv

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Exponential running time in RELAX NG matching of optional attributes

2019-07-22 Thread Jan Pokorný via xml
On 27/06/19 14:15 +0200, Nikolai Weibull via xml wrote:
> The following RELAX NG schema requires exponential running time when
> matching attributes (15 attributes is where my computer begins to
> show the symptoms, this may vary with your set-up):
> 
> a.rng:
> 
> 
> http://relaxng.org/ns/structure/1.0;>
>   
> 
>   a
>   
>  ns="">a
>  ns="">b
>  ns="">c
>  ns="">d
>  ns="">e
>  ns="">f
>  ns="">g
>  ns="">h
>  ns="">i
>  ns="">j
>  ns="">k
>  ns="">l
>  ns="">m
>  ns="">n
>  ns="">o
>   
> 
>   
> 
> 
> a.xml:
> 
> 
>  m="13" n="14" o="15"/>
> 
> % time xmllint --noout --relaxng a.rng a.xml
> real: 3.175, user: 3.147, system: 0.014 (99%)
> 
> Changing the group to an interleave, that is,
> 
> b.rng:
> 
> 
> http://relaxng.org/ns/structure/1.0;>
>   
> 
>   a
>   
>  ns="">a
>  ns="">b
>  ns="">c
>  ns="">d
>  ns="">e
>  ns="">f
>  ns="">g
>  ns="">h
>  ns="">i
>  ns="">j
>  ns="">k
>  ns="">l
>  ns="">m
>  ns="">n
>  ns="">o
>   
> 
>   
> 
> 
> gives
> 
> % time xmllint --noout --relaxng b.rng a.xml
> real: 0.008, user: 0.003, system: 0.002 (54%)
> 
> Making the attributes required instead of optional also fixes the issue:
> 
> c.rng:
> 
> 
> http://relaxng.org/ns/structure/1.0;>
>   
> 
>   a
>   
> a
> b
> c
> d
> e
> f
> g
> h
> i
> j
> k
> l
> m
> n
> o
>   
> 
>   
> 
> 
> % time xmllint --noout --relaxng c.rng a.xml
> real: 0.008, user: 0.003, system: 0.002 (55%)
> 
> The problem seems to be that xmlRelaxNGValidateDefinition keeps
> adding states that I feel don’t need checking, but I can’t figure
> out how to avoid this from happening.

I can only nod to your analysis, ran into similar observations
in the past, and they were likewise XML attributes (that parameters to
fence device instances boil down to) related:

https://bugzilla.redhat.com/show_bug.cgi?id=1224378#c7

Callgrind proved to be a useful tool to get better quantitative numbers.

> Daniel, do you have any input on this?  I’d gladly work on fixing
> this, but I feel that I need some guidance as to what’s going wrong
> and perhaps also how to fix it.

Would be nice to get libxml's RelaxNG validation support closer
to parity with jing (the only other freely available validator I am
aware of that can be directly used for a peer comparison), for sure.

-- 
Jan (Poki)


pgp5HF3mWGh93.pgp
Description: PGP signature
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml