Re: [xml] Add new pretty-printing and sorting options for saving XML

2010-10-06 Thread Martin (gzlist)
On 05/10/2010, Adam Spragg  wrote:
>
> The idea of these options is to be able to combine them to produce a
> "canonical", nearly line-oriented format for XML files.

Are you familiar with the "Canonical XML" W3C Recommendation and its
implementation in libxml2?




It has a similar result, but without the aim to insert breaks to make
line-oriented diff and merge tools happier.

> XML_SAVE_WSNONSIG is a new pretty-printing format which adds whitespace
> *within* tags, where permitted by the XML standard, to re-line and
> indent XML files, without changing any element content at all. No
> whitespace is added to, removed from, or altered in any text node of
> the document, and no text nodes are are added or removed either.

I presume this is based on the Henri Sivonen suggestion?



In the responses I've seen to that, there's been a fair bit of
pushback, for instance from Uche Ogbuji here:



The other concern is as you're introducing breaks for every element
and attribute, lots of lines start looking the same. That tends to
make the default, simpler diff algorithms produce suboptimal output.

> Please let me know what you think of the idea and patches. Are they
> suitable for libxml? At all? With work? (If so, what?)

The idea seems reasonable, but I don't know if adding code to libxml2
is the right first step. It's a core library people are rightly
nervous about updating, and with only an implementation and no spec to
go off, it wouldn't be easy for others to interoperate with your new
formatting style.

Martin
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] strange end-tag position (parsing html)

2010-10-06 Thread David Gatwood
On Oct 6, 2010, at 10:08 AM, rcs...@gmail.com wrote:

> On Wed, Oct 6, 2010 at 12:18 AM, Steven Falken  wrote:
>> Hi,
>> I'm trying to parse bare.txt (attached, yes it is simply cnn.com). For
>> this purpose I'm using parse.c (also attached).
>> The output is output.txt (Attachment!).
>> If you look at bare.txt, you see a  block from line 826 to
>> line 886. Now if you look at output.txt, you see the
>>