Re: [xml] xmllint reports non-determinist content model for schema

Johan Corveleyn Thu, 23 Aug 2012 06:48:30 -0700

On Thu, Aug 23, 2012 at 3:03 PM, Daniel Veillard <veill...@redhat.com> wrote:
> On Wed, Aug 22, 2012 at 10:40:20PM +0200, Johan Corveleyn wrote:
>> On Fri, Aug 17, 2012 at 10:32 PM, Johan Corveleyn <jcor...@gmail.com> wrote:
>> > On Fri, Feb 24, 2012 at 2:14 PM, Johan Corveleyn <jcor...@gmail.com> wrote:
>> > ...
>> >> I'm running into a "non-determinist" error with a schema even though I
>> >> don't think it's non-determinist (and both XSV and Xerces agree with
>> >> me; they too have no problem with the schema). I have reduced my
>> >> test-case to a simple example, see below.
>> >
>> > On Tue, Feb 28, 2012 at 6:00 AM, Daniel Veillard <veill...@redhat.com> 
>> > wrote:
>> > ...
>> >>  Well in that case it seems the underlying code is generating a
>> >> wrong automata so it should be a matter of running that minimal
>> >> test case under gdb with the breakpoint appropriately set to find out
>> >> what wrong transition got added ...
>> >
>> > I'd like to take a stab at this issue. I've been able to get the
>> > latest 2.9.0 snapshot (from
>> > ftp://xmlsoft.org/libxml2/libxml2-git-snapshot.tar.gz) to build in
>> > Visual Studio 2010. I can still reproduce the issue.
>> >
>> > But I need some guidance. What part of the code should I be looking
>> > at? Where do I have to set the "breakpoint appropriately to find out
>> > the wrong transition that got added"?
>> >
>> > Alternatively, if you or anyone else on this list can fix the issue
>> > already, that would be even better :-).
>> >
>> > To summarize, the issue is this:
>> >> Given the following schema:
>> >>
>> >>         <?xml version="1.0" encoding="UTF-8"?>
>> >>         <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";>
>> >>                 <xs:element name="rules">
>> >>                         <xs:complexType>
>> >>                                 <xs:sequence>
>> >>                                         <xs:element name="rule" 
>> >> minOccurs="0" maxOccurs="unbounded"/>
>> >>                                         <xs:sequence minOccurs="0" 
>> >> maxOccurs="1">
>> >>                                                 <xs:element 
>> >> name="specialRule" minOccurs="1" maxOccurs="1"/>
>> >>                                                 <xs:element name="rule" 
>> >> minOccurs="0" maxOccurs="unbounded"/>
>> >>                                         </xs:sequence>
>> >>                                 </xs:sequence>
>> >>                         </xs:complexType>
>> >>                 </xs:element>
>> >>         </xs:schema>
>> >>
>> >> xmllint complains with "Schemas parser error : local
>> >> complex type: The content model is not determinist."
>> >> But it isn't.
>> >
>> > See also https://bugzilla.gnome.org/show_bug.cgi?id=670865.
>>
>> Hi,
>>
>> I could really use some guidance here. I've been able to set up
>> everything in Visual Studio 2010, and reproduced the issue from within
>> the debugger. So I can set a breakpoint etc. But I mostly lack some
>> basic knowledge about how this all works.
>>
>> Is there some (preferable concise) reading about basic concepts like
>> how xml schemas automata work, how these are built, etc? What should
>> the automata in this case look like (in terms of the data structures
>> inside the code)? How can I see that the automata is non-determinist,
>> and what would a determinist automata look like, ... ?
>
>     Johan,
>
>   sorry for the slow answer,
>
> I will look at this, promise, but all that code is rooted in automata
> theory, there is quite some litterature about it, and well unless you
> go though the process of learning that code and possibly the theory
> behind it, it's not something that can be hacked trivially. Basically
> I run this in debug mode, look at the automata generated, there is pro
> bably an error there and then this need fixing and a lot of checking.
> To be honnest that one of the hardest part of the library to change
> with the core of the parser.
>   Theory is usually learnt though the 'Dragon Book', by Aho and Al.
> and the code is actually more complex since using extended automata
> operations not found in basic automata ...
>
>     If you have time looking at this may be interesting, but
> it's really not trivial at least more complex than other parts of the
> code :-)


Thanks for your answer, and for pointing out that it's not so simple
:-). It seems that this is a bit over my head then (I have little clue
about automata theory, so it would probably take me weeks or months to
understand what's going on ...). I hope you can find some time to dig
in further in the not-too-distant future :-). I'm always willing to
help out, test, ... if that would be useful (though my usefulness
might be limited because of my non-understanding of things ...).

Thanks for all your efforts,
-- 
Johan
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Re: [xml] xmllint reports non-determinist content model for schema

Reply via email to