On Fri, 3 Apr 2020 16:05:04 +0200
Martin Vidner <mvid...@suse.cz> wrote:

> On Mon, Mar 30, 2020 at 10:07:53PM +0200, josef Reidinger wrote:
> > Hi,
> > I am currently working on research how to improve XML parser in
> > YaST. What we have nowadays is libxml2 based c++ parser ( that
> > almost noone use directly ) and XML module ( module as a code, not
> > YaST module :). I check usage of XML module and main usage is data
> > to XML and back ( with variant xml as string or xml as file ).
> > There is just two additional functionality. One is checking xml
> > error ( almost noone use it ) and setting metadata for generated
> > xml ( bad API as it should be part of that data to XML method ).  
> 
> Most importantly, we have the initial concepts wrong.
> 
> This is not about a "YaST parser" for "XML". What YaST  parses and
> writes is a specific subset of XML, let's call it YaST-XML:
> 
> 1) It has a 1 to 1 correspondence* to YCP/Ruby  data types (maps, lists,
> booleans, symbols, integers, strings)
> 
> 2) It uses a namespace, xmlns="http://www.suse.com/1.0/yast2ns";

Can we set namespace if it is not defined in XML? What puzzle me the most about 
that format (not parser) now is not that xml is badly readable, but that it is 
very hard and error prone to write/modify it. And mandatory namespace is 
unnecessary from my POV.
Can we simple assume this namespace if none is defined?

> 
> 3) It uses config:type attributes for (1)
> where xmlns:config="http://www.suse.com/1.0/configns"; is a different
> namespace (WTF).

Same here. Do we need for our own subset of xml namespace for one attribute?

> 
> 4) Arrays are tagged "listitem" in the generic case but we have a
> long list of specific tags for specific arrays.

Yes, I already face it and it is defined programatically ( so you need modify 
code if you add new list ). Also used only only for writting. Reading does not 
care about name.

> 
> *: there are corner cases, like having trouble distinguishing a
> missing value from an empty value
> 
> One exception to YaST-XML is the one-click installer which uses a
> non-YCP XML schema.

Question is should we use yast xml parser for non yast/ycp xml schema?
I think that e.g. for scc we use generic xml parser as yast one does not bring 
any value and maybe it is also true in this case.

> 
> > So my question is what we would like to have better?
> > One thing for
> > sure that hit us often is optional schema validation ( as some XML
> > is prevalidated like control files for products of roles, but
> > autoyast is user generated/written ).  
> 
> Yes, validation is good.
> 
> > Also some nicer error
> > reporting would be nice because current XMLError method is almost
> > never used (and yes, you should read nicer as using exception that
> > can/have to be catched otherwise it report popup with internal
> > error and not cause some strange error later ).  
> 
> Better error handling is also good.
> 
> >  Do you think that
> > it makes sense at all to have own module as ruby, perl and also
> > python, for whose we currently have bindings, all have own good (
> > good as better then our ) parser. So does it makes sense to have
> > own XML parser beside backward compatibility and for new stuff as
> > already seen on some places just use rexml or nokogiri that e.g.
> > already have support for relax ng validation[1]? Or do we have
> > some functionality that we would like to have on top of standard
> > parsers?  
> 
> As explained at the top, we must have a special library because we
> have a special kind of XML.

But do we need that special kind of XML? Why we cannot use common XML? or 
something that supports types like Yaml or json.

> 
> > 
> > Only thing that current parser have on top of generic xml parsers
> > is understanding of type attribute that do automatic type
> > conversion so `<a type="boolean>true</a>` is returned as `true`
> > and not `"true"`. But this magic  
> 
> It is not magic. Calling things magic will make people avoid
> understanding them which is bad.

It is magic for people that work with common XML. XML is whole just structured 
strings. Structured into elements with names, attributes and values.

> 
> > is also source of some bugs as
> > e.g. hash does not have this type attribute and result is that
> > `<a><key>b</key>c</a>` is returned as `"c"` and not hash, which
> > cause many recent failures we get with typos in autoyast profiles.  
> 
> Let's have test cases for these to ensure that the schemas can
> distinguish them and the error reports are helpful.

I think source of this is that we use typed xml, but omit types for string and 
hash and just guessing it. As usually we stop in middle of road.

> 
> > And as bonus we do not specify this types in schema, so during
> > validation if you omit type it is still valid xml, but it crashes
> > in code as it expect different type.  
> 
> We must use the correct terms:
> 
> WELL-FORMED XML means, roughly, syntactically correct disregarding
> the DTD or schema
> 
> VALID XML means, obeying the DTD or schema (in addition to being
> well formed)
> 
> For example, any XML parser can check for well-formedness otherwise
> it is not worth being called a XML parser. We do not get bugs about
> malformed profiles, people are competent enough not to use them.
> 
> The bug-reported profiles are invalid, either in the sense of not
> obeying the autoyast schema, or even violating some of the common
> properties of YaST-XML.

My problem here is that types are mandatory in Yast-XML, but with exception of 
string and hash, but relax ng schema we are using does not require them.
Also what I do not like is that we specify types in multiple places. It is kind 
of in schema ( where it validate value, but not check that attribute is set ), 
we have it in XML itself and sometimes in code.


So from what you write, I understand that it makes sense to have specific xml 
parser for subset of XML we use. My question is if we should try to improve 
somehow that subset? Like not having types in namespace, not need to define 
global namespace and maybe if we use validation trying to get type from schema 
if it is not defined.

What do you think?


> 
> > I would welcome any suggestions or ideas how your ideal xml parser
> > should look like.  

-- 
To unsubscribe, e-mail: yast-devel+unsubscr...@opensuse.org
To contact the owner, e-mail: yast-devel+ow...@opensuse.org

Reply via email to