Re: [xml] Add new pretty-printing and sorting options for saving XML
On Thu, Oct 07, 2010 at 09:05:46AM -, Adam Spragg wrote: On 05/10/2010, Adam Spragg a...@spra.gg wrote: The idea of these options is to be able to combine them to produce a canonical, nearly line-oriented format for XML files. Are you familiar with the Canonical XML W3C Recommendation and its implementation in libxml2? [snip] The idea seems reasonable, but I don't know if adding code to libxml2 is the right first step. It's a core library people are rightly nervous about updating, and with only an implementation and no spec to go off, Hmmmif I redid the sort part of the patch to stand completely on its own, rename the option to XML_SAVE_CANONICAL, and used it to implement the Canonical XML spec instead, would that likely be more acceptable? I could do a respin of the in-tag pretty-printing patch afterward if anyone thought it was still worth discussing/speccing. Actually I went though your patches now, So I think this new formatting is an interesting addition since it's garanteed to be non-destructive, but reimplementing/reinventing the C14N spec doesn't sound so good (unless it comes as a patch reusing the existing c14n code). So I did apply and commit the first 3 patches nearly as is, adding the new xmllint option. IMHO there isn't really a need at the xmllint level for the following since --c14n just implement the spec. At the API level c14n really comes as a separate module. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Add new pretty-printing and sorting options for saving XML
Daniel, On Tuesday 12 Oct 2010 08:40:52 Daniel Veillard wrote: On Tue, Oct 12, 2010 at 09:34:11AM +0200, Daniel Veillard wrote: On Tue, Oct 05, 2010 at 10:22:22PM +0100, Adam Spragg wrote: libxml developers, Please find for your consideration a series of patches to add 2 new xmlSaveOptions to libxml. XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace *within* tags, where permitted by the XML standard, to re-line and indent XML files, without changing any element content at all. No whitespace is added to, removed from, or altered in any text node of the document, and no text nodes are are added or removed either. Hum, relooking at your patch here, I may have misunderstood how you tried to do this, I will recheck... Maybe this can be isolated from the canonicalization attempt and useful as such... Any news on rechecking this? Adam ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Add new pretty-printing and sorting options for saving XML
Hiya, On Thursday 07 Oct 2010 07:45:43 Martin (gzlist) wrote: On 05/10/2010, Adam Spragg a...@spra.gg wrote: The idea of these options is to be able to combine them to produce a canonical, nearly line-oriented format for XML files. Are you familiar with the Canonical XML W3C Recommendation and its implementation in libxml2? A bit familiar. I wasn't particularly aware of it while I was coding, but looking at it now, it does ring bells. It may well have inspired part of this. It has a similar result, but without the aim to insert breaks to make line-oriented diff and merge tools happier. Well, I split up the re-ordering and the whitespace changes into two separate options, so you could do one without the other. XML_SAVE_WSNONSIG is a new pretty-printing format [snip] I presume this is based on the Henri Sivonen suggestion? http://hsivonen.iki.fi/producing-xml/#prettyprinting Again, I had seen the idea somewhere else before, but couldn't remember where. It may have been his, or it may have been from someone else. Can't say. In the responses I've seen to that, there's been a fair bit of pushback, for instance from Uche Ogbuji here: http://www.ibm.com/developerworks/xml/library/x-think35.html#listing1 I disagree because I think it makes for ugly markup that's not friendly to manipulation by people. Well, I think ugly is a matter of familiarity. I think the GNU coding guidelines recommend what is to me an unworkably ugly and awkwards C brace style, but plenty of other people seem to have got used to it and be productive. I will admit though, I do think it is rather ugly if your document already contains lots of pretty-printing whitespace in the content. But if not, it seems OK to me. As for being manipulable by people, well, I always thought that XML was primarily a machine generated/readable language, which happens to be fairly human-readable in order to make debugging and quick hacks easier. On top of that, if people don't find that representation very workable, there's no reason why they shouldn't be able to use xmllint (or a similar tool) to reformat the document into anther pretty-printed format which they can deal with easily, and then transform it back afterwards. Heh. I'm not suggesting that this format be made the default. I just want to make it available as an option. The other concern is as you're introducing breaks for every element and attribute, lots of lines start looking the same. That tends to make the default, simpler diff algorithms produce suboptimal output. I was going to cross that bridge when I came to it. :-) Please let me know what you think of the idea and patches. Are they suitable for libxml? At all? With work? (If so, what?) The idea seems reasonable, but I don't know if adding code to libxml2 is the right first step. It's a core library people are rightly nervous about updating, and with only an implementation and no spec to go off, it wouldn't be easy for others to interoperate with your new formatting style. OK. Thanks. Adam ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Add new pretty-printing and sorting options for saving XML
On 05/10/2010, Adam Spragg a...@spra.gg wrote: The idea of these options is to be able to combine them to produce a canonical, nearly line-oriented format for XML files. Are you familiar with the Canonical XML W3C Recommendation and its implementation in libxml2? [snip] The idea seems reasonable, but I don't know if adding code to libxml2 is the right first step. It's a core library people are rightly nervous about updating, and with only an implementation and no spec to go off, Hmmmif I redid the sort part of the patch to stand completely on its own, rename the option to XML_SAVE_CANONICAL, and used it to implement the Canonical XML spec instead, would that likely be more acceptable? I could do a respin of the in-tag pretty-printing patch afterward if anyone thought it was still worth discussing/speccing. Adam ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Add new pretty-printing and sorting options for saving XML
Hi. On Tue, Oct 05, 2010 at 10:22:22PM +0100, Adam Spragg wrote: there is already an implementation in libxml of C14N Oh, I missed that. I figured the features I was looking for would be part of the save API, given how it affects what gets saved. Or maybe the tree API for doing things like adding implied attributes and re-ordering parts of the tree. I didn't think of looking in the xmllint help as the best way of figuring out what options would be available in the library. In retrospect, reading the API Menu contents carefully, or googling for libxml c14n should have been my first stop. Doh! :-) The main problem from my POV is you started developping those patches apparently without fully understanding the current state of the art and code, and unfortunately this looks like a lot of wasted efforts :-\ Well, the patches weren't that much work. Finding the odd couple of contiguous hours here and there to sit down and do them was the hard part, and I've had needed that to write up a readable proposal anyway. I figured it would be better to produce a first draft of actual code which could be discussed and batted back and forth, rather than starting with wouldn't it be great if... I don't like to turn down contributions but in this case, I afraid it would add more confusion than really improve the user experience. Seriously, don't worry about it. I was absolutely expecting the first version of the patch set to get rejected for one reason or another, maybe with suggestions for improvement, maybe with not a suitable feature for this library. Obviously I wasn't expecting this is already implemented, but I can't think of a much better reason for rejection! Thanks, Adam ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Add new pretty-printing and sorting options for saving XML
On Tue, Oct 05, 2010 at 10:22:22PM +0100, Adam Spragg wrote: libxml developers, Please find for your consideration a series of patches to add 2 new xmlSaveOptions to libxml. XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace *within* tags, where permitted by the XML standard, to re-line and indent XML files, without changing any element content at all. No whitespace is added to, removed from, or altered in any text node of the document, and no text nodes are are added or removed either. Still *any* text node is significant in XML. Any indenting is by definition destructive. XML_SAVE_SORT is an option which sorts XML nodes whose order is unimportant to XML files. This includes the order of attributes within elements, the order of namespace declarations within elements, and element, attribute entity declarations within doctypes. The idea of these options is to be able to combine them to produce a canonical, nearly line-oriented format for XML files. there is already an implementation in libxml of C14N which is the official W3C standard for canonical XML, it exists and is deployed and used for nearly 10 years, including for digital signatures of XML. Why implement a second implementation which has no standardization at all ? http://www.w3.org/TR/xml-c14n Please let me know what you think of the idea and patches. Are they suitable for libxml? At all? With work? (If so, what?) The main problem from my POV is you started developping those patches apparently without fully understanding the current state of the art and code, and unfortunately this looks like a lot of wasted efforts :-\ I don't like to turn down contributions but in this case, I afraid it would add more confusion than really improve the user experience. See xmllint options: --c14n : save in W3C canonical format v1.0 (with comments) --c14n11 : save in W3C canonical format v1.1 (with comments) --exc-c14n : save in W3C exclusive canonical format (with comments) Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Add new pretty-printing and sorting options for saving XML
On Tue, Oct 12, 2010 at 09:34:11AM +0200, Daniel Veillard wrote: On Tue, Oct 05, 2010 at 10:22:22PM +0100, Adam Spragg wrote: libxml developers, Please find for your consideration a series of patches to add 2 new xmlSaveOptions to libxml. XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace *within* tags, where permitted by the XML standard, to re-line and indent XML files, without changing any element content at all. No whitespace is added to, removed from, or altered in any text node of the document, and no text nodes are are added or removed either. Still *any* text node is significant in XML. Any indenting is by definition destructive. Hum, relooking at your patch here, I may have misunderstood how you tried to do this, I will recheck... Maybe this can be isolated from the canonicalization attempt and useful as such... Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ dan...@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Add new pretty-printing and sorting options for saving XML
On 05/10/2010, Adam Spragg a...@spra.gg wrote: The idea of these options is to be able to combine them to produce a canonical, nearly line-oriented format for XML files. Are you familiar with the Canonical XML W3C Recommendation and its implementation in libxml2? http://www.w3.org/TR/xml-c14n http://xmlsoft.org/html/libxml-c14n.html It has a similar result, but without the aim to insert breaks to make line-oriented diff and merge tools happier. XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace *within* tags, where permitted by the XML standard, to re-line and indent XML files, without changing any element content at all. No whitespace is added to, removed from, or altered in any text node of the document, and no text nodes are are added or removed either. I presume this is based on the Henri Sivonen suggestion? http://hsivonen.iki.fi/producing-xml/#prettyprinting In the responses I've seen to that, there's been a fair bit of pushback, for instance from Uche Ogbuji here: http://www.ibm.com/developerworks/xml/library/x-think35.html#listing1 The other concern is as you're introducing breaks for every element and attribute, lots of lines start looking the same. That tends to make the default, simpler diff algorithms produce suboptimal output. Please let me know what you think of the idea and patches. Are they suitable for libxml? At all? With work? (If so, what?) The idea seems reasonable, but I don't know if adding code to libxml2 is the right first step. It's a core library people are rightly nervous about updating, and with only an implementation and no spec to go off, it wouldn't be easy for others to interoperate with your new formatting style. Martin ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
[xml] Add new pretty-printing and sorting options for saving XML
libxml developers, Please find for your consideration a series of patches to add 2 new xmlSaveOptions to libxml. XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace *within* tags, where permitted by the XML standard, to re-line and indent XML files, without changing any element content at all. No whitespace is added to, removed from, or altered in any text node of the document, and no text nodes are are added or removed either. XML_SAVE_SORT is an option which sorts XML nodes whose order is unimportant to XML files. This includes the order of attributes within elements, the order of namespace declarations within elements, and element, attribute entity declarations within doctypes. The idea of these options is to be able to combine them to produce a canonical, nearly line-oriented format for XML files. The goal is to be able to produce XML files which can be manipulated with standard POSIX-style command-line tools much better than is currently possible, particularly by diff(1) and patch(1). Of course, once diff and patch can work effectively on XML files (something that they currently do very badly at) then revision control systems (e.g. git) will get much better at storing and merging them too - particularly if combined with hooks to enforce the canonical style. Please let me know what you think of the idea and patches. Are they suitable for libxml? At all? With work? (If so, what?) Thanks, Adam Spragg ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml