Re: [xml] Add new pretty-printing and sorting options for saving XML

2010-11-03 Thread Daniel Veillard
On Thu, Oct 07, 2010 at 09:05:46AM -, Adam Spragg wrote:
  On 05/10/2010, Adam Spragg a...@spra.gg wrote:
  The idea of these options is to be able to combine them to produce a
  canonical, nearly line-oriented format for XML files.
 
  Are you familiar with the Canonical XML W3C Recommendation and its
  implementation in libxml2?
 [snip]
  The idea seems reasonable, but I don't know if adding code to libxml2
  is the right first step. It's a core library people are rightly
  nervous about updating, and with only an implementation and no spec to
  go off,
 
 Hmmmif I redid the sort part of the patch to stand completely on its
 own, rename the option to XML_SAVE_CANONICAL, and used it to implement the
 Canonical XML spec instead, would that likely be more acceptable?
 
 I could do a respin of the in-tag pretty-printing patch afterward if
 anyone thought it was still worth discussing/speccing.

  Actually I went though your patches now,
So I think this new formatting is an interesting addition since it's
garanteed to be non-destructive, but reimplementing/reinventing the
C14N spec doesn't sound so good (unless it comes as a patch reusing the
existing c14n code).
  So I did apply and commit the first 3 patches nearly as is, adding the
new xmllint option. IMHO there isn't really a need at the xmllint level
for the following since --c14n just implement the spec. At the API level
c14n really comes as a separate module.

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Add new pretty-printing and sorting options for saving XML

2010-10-26 Thread Adam Spragg
Daniel,

On Tuesday 12 Oct 2010 08:40:52 Daniel Veillard wrote:
 On Tue, Oct 12, 2010 at 09:34:11AM +0200, Daniel Veillard wrote:
  On Tue, Oct 05, 2010 at 10:22:22PM +0100, Adam Spragg wrote:
   libxml developers,
   
   Please find for your consideration a series of patches to add 2 new
   xmlSaveOptions to libxml.
   
   XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace
   *within* tags, where permitted by the XML standard, to re-line and
   indent XML files, without changing any element content at all. No
   whitespace is added to, removed from, or altered in any text node of
   the document, and no text nodes are are added or removed either.
 
   Hum, relooking at your patch here, I may have misunderstood how you
 tried to do this, I will recheck... Maybe this can be isolated from
 the canonicalization attempt and useful as such...

Any news on rechecking this?

Adam
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Add new pretty-printing and sorting options for saving XML

2010-10-23 Thread Adam Spragg
Hiya,

On Thursday 07 Oct 2010 07:45:43 Martin (gzlist) wrote:
 On 05/10/2010, Adam Spragg a...@spra.gg wrote:
  The idea of these options is to be able to combine them to produce a
  canonical, nearly line-oriented format for XML files.
 
 Are you familiar with the Canonical XML W3C Recommendation and its
 implementation in libxml2?

A bit familiar. I wasn't particularly aware of it while I was coding, but 
looking at it now, it does ring bells. It may well have inspired part of this.

 It has a similar result, but without the aim to insert breaks to make
 line-oriented diff and merge tools happier.

Well, I split up the re-ordering and the whitespace changes into two separate 
options, so you could do one without the other.

  XML_SAVE_WSNONSIG is a new pretty-printing format
[snip]
 I presume this is based on the Henri Sivonen suggestion?
 
 http://hsivonen.iki.fi/producing-xml/#prettyprinting

Again, I had seen the idea somewhere else before, but couldn't remember where. 
It may have been his, or it may have been from someone else. Can't say.

 In the responses I've seen to that, there's been a fair bit of
 pushback, for instance from Uche Ogbuji here:
 
 http://www.ibm.com/developerworks/xml/library/x-think35.html#listing1

I disagree because I think it makes for ugly markup that's not friendly to 
manipulation by people.

Well, I think ugly is a matter of familiarity. I think the GNU coding 
guidelines recommend what is to me an unworkably ugly and awkwards C brace 
style, but plenty of other people seem to have got used to it and be 
productive.

I will admit though, I do think it is rather ugly if your document already 
contains lots of pretty-printing whitespace in the content. But if not, it 
seems OK to me.

As for being manipulable by people, well, I always thought that XML was 
primarily a machine generated/readable language, which happens to be fairly 
human-readable in order to make debugging and quick hacks easier.

On top of that, if people don't find that representation very workable, 
there's no reason why they shouldn't be able to use xmllint (or a similar 
tool) to reformat the document into anther pretty-printed format which they 
can deal with easily, and then transform it back afterwards.

Heh. I'm not suggesting that this format be made the default. I just want to 
make it available as an option.

 The other concern is as you're introducing breaks for every element
 and attribute, lots of lines start looking the same. That tends to
 make the default, simpler diff algorithms produce suboptimal output.

I was going to cross that bridge when I came to it. :-)

  Please let me know what you think of the idea and patches. Are they
  suitable for libxml? At all? With work? (If so, what?)
 
 The idea seems reasonable, but I don't know if adding code to libxml2
 is the right first step. It's a core library people are rightly
 nervous about updating, and with only an implementation and no spec to
 go off, it wouldn't be easy for others to interoperate with your new
 formatting style.

OK. Thanks.

Adam
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Add new pretty-printing and sorting options for saving XML

2010-10-23 Thread Adam Spragg
 On 05/10/2010, Adam Spragg a...@spra.gg wrote:
 The idea of these options is to be able to combine them to produce a
 canonical, nearly line-oriented format for XML files.

 Are you familiar with the Canonical XML W3C Recommendation and its
 implementation in libxml2?
[snip]
 The idea seems reasonable, but I don't know if adding code to libxml2
 is the right first step. It's a core library people are rightly
 nervous about updating, and with only an implementation and no spec to
 go off,

Hmmmif I redid the sort part of the patch to stand completely on its
own, rename the option to XML_SAVE_CANONICAL, and used it to implement the
Canonical XML spec instead, would that likely be more acceptable?

I could do a respin of the in-tag pretty-printing patch afterward if
anyone thought it was still worth discussing/speccing.

Adam
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Add new pretty-printing and sorting options for saving XML

2010-10-23 Thread Adam Spragg
Hi.

 On Tue, Oct 05, 2010 at 10:22:22PM +0100, Adam Spragg wrote:
   there is already an implementation in libxml of C14N

Oh, I missed that.

I figured the features I was looking for would be part of the save API,
given how it affects what gets saved. Or maybe the tree API for doing
things like adding implied attributes and re-ordering parts of the tree.

I didn't think of looking in the xmllint help as the best way of figuring
out what options would be available in the library.

In retrospect, reading the API Menu contents carefully, or googling for
libxml c14n should have been my first stop. Doh! :-)

   The main problem from my POV is you started developping those patches
 apparently without fully understanding the current state of the art and
 code, and unfortunately this looks like a lot of wasted efforts :-\

Well, the patches weren't that much work. Finding the odd couple of
contiguous hours here and there to sit down and do them was the hard part,
and I've had needed that to write up a readable proposal anyway. I figured
it would be better to produce a first draft of actual code which could be
discussed and batted back and forth, rather than starting with wouldn't
it be great if...

   I don't like to turn down contributions but in this case, I afraid it
 would add more confusion than really improve the user experience.

Seriously, don't worry about it. I was absolutely expecting the first
version of the patch set to get rejected for one reason or another, maybe
with suggestions for improvement, maybe with not a suitable feature for
this library. Obviously I wasn't expecting this is already implemented,
but I can't think of a much better reason for rejection!

Thanks,

Adam

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Add new pretty-printing and sorting options for saving XML

2010-10-12 Thread Daniel Veillard
On Tue, Oct 05, 2010 at 10:22:22PM +0100, Adam Spragg wrote:
 libxml developers,
 
 Please find for your consideration a series of patches to add 2 new
 xmlSaveOptions to libxml.
 
 XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace
 *within* tags, where permitted by the XML standard, to re-line and
 indent XML files, without changing any element content at all. No
 whitespace is added to, removed from, or altered in any text node of
 the document, and no text nodes are are added or removed either.

  Still *any* text node is significant in XML. Any indenting is by
definition destructive.

 XML_SAVE_SORT is an option which sorts XML nodes whose order is
 unimportant to XML files. This includes the order of attributes within
 elements, the order of namespace declarations within elements, and
 element, attribute  entity declarations within doctypes.
 
 The idea of these options is to be able to combine them to produce a
 canonical, nearly line-oriented format for XML files.

  there is already an implementation in libxml of C14N which is the
official W3C standard for canonical XML, it exists and is deployed and
used for nearly 10 years, including for digital signatures of XML.
  Why implement a second implementation which has no standardization
at all ?

   http://www.w3.org/TR/xml-c14n

 Please let me know what you think of the idea and patches. Are they
 suitable for libxml? At all? With work? (If so, what?)

  The main problem from my POV is you started developping those patches
apparently without fully understanding the current state of the art and
code, and unfortunately this looks like a lot of wasted efforts :-\
  I don't like to turn down contributions but in this case, I afraid it
would add more confusion than really improve the user experience.
See xmllint options:
--c14n : save in W3C canonical format v1.0 (with comments)
--c14n11 : save in W3C canonical format v1.1 (with comments)
--exc-c14n : save in W3C exclusive canonical format (with comments)


Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Add new pretty-printing and sorting options for saving XML

2010-10-12 Thread Daniel Veillard
On Tue, Oct 12, 2010 at 09:34:11AM +0200, Daniel Veillard wrote:
 On Tue, Oct 05, 2010 at 10:22:22PM +0100, Adam Spragg wrote:
  libxml developers,
  
  Please find for your consideration a series of patches to add 2 new
  xmlSaveOptions to libxml.
  
  XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace
  *within* tags, where permitted by the XML standard, to re-line and
  indent XML files, without changing any element content at all. No
  whitespace is added to, removed from, or altered in any text node of
  the document, and no text nodes are are added or removed either.
 
   Still *any* text node is significant in XML. Any indenting is by
 definition destructive.

  Hum, relooking at your patch here, I may have misunderstood how you
tried to do this, I will recheck... Maybe this can be isolated from
the canonicalization attempt and useful as such...

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
dan...@veillard.com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Add new pretty-printing and sorting options for saving XML

2010-10-07 Thread Martin (gzlist)
On 05/10/2010, Adam Spragg a...@spra.gg wrote:

 The idea of these options is to be able to combine them to produce a
 canonical, nearly line-oriented format for XML files.

Are you familiar with the Canonical XML W3C Recommendation and its
implementation in libxml2?

http://www.w3.org/TR/xml-c14n
http://xmlsoft.org/html/libxml-c14n.html

It has a similar result, but without the aim to insert breaks to make
line-oriented diff and merge tools happier.

 XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace
 *within* tags, where permitted by the XML standard, to re-line and
 indent XML files, without changing any element content at all. No
 whitespace is added to, removed from, or altered in any text node of
 the document, and no text nodes are are added or removed either.

I presume this is based on the Henri Sivonen suggestion?

http://hsivonen.iki.fi/producing-xml/#prettyprinting

In the responses I've seen to that, there's been a fair bit of
pushback, for instance from Uche Ogbuji here:

http://www.ibm.com/developerworks/xml/library/x-think35.html#listing1

The other concern is as you're introducing breaks for every element
and attribute, lots of lines start looking the same. That tends to
make the default, simpler diff algorithms produce suboptimal output.

 Please let me know what you think of the idea and patches. Are they
 suitable for libxml? At all? With work? (If so, what?)

The idea seems reasonable, but I don't know if adding code to libxml2
is the right first step. It's a core library people are rightly
nervous about updating, and with only an implementation and no spec to
go off, it wouldn't be easy for others to interoperate with your new
formatting style.

Martin
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


[xml] Add new pretty-printing and sorting options for saving XML

2010-10-06 Thread Adam Spragg
libxml developers,

Please find for your consideration a series of patches to add 2 new
xmlSaveOptions to libxml.

XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace
*within* tags, where permitted by the XML standard, to re-line and
indent XML files, without changing any element content at all. No
whitespace is added to, removed from, or altered in any text node of
the document, and no text nodes are are added or removed either.

XML_SAVE_SORT is an option which sorts XML nodes whose order is
unimportant to XML files. This includes the order of attributes within
elements, the order of namespace declarations within elements, and
element, attribute  entity declarations within doctypes.

The idea of these options is to be able to combine them to produce a
canonical, nearly line-oriented format for XML files.

The goal is to be able to produce XML files which can be manipulated
with standard POSIX-style command-line tools much better than is
currently possible, particularly by diff(1) and patch(1). Of course,
once diff and patch can work effectively on XML files (something that
they currently do very badly at) then revision control systems
(e.g. git) will get much better at storing and merging them too -
particularly if combined with hooks to enforce the canonical style.

Please let me know what you think of the idea and patches. Are they
suitable for libxml? At all? With work? (If so, what?)

Thanks,

Adam Spragg

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml