Re: Minimally intrusive XML editing using Python

2009-11-23 Thread Thomas Lotze
Please consider this a reply to any unanswered messages I received in
response to my original post.

Dave Angel wrote:

 What's your real problem, or use case?  Are you just concerned with 
 diffing, or are others likely to read the xml, and want it formatted the 
 way it already is?

I'd like to put the XML under revision control along with other stuff.
Other people should be able to make sense of the diffs and I'd rather not
require them to configure their tools to use some XML differ.

 And how general do you need this tool to be?  For 
 example, if the only thing you're doing is modifying existing attributes 
 or existing tags, the minimal change would be pretty unambiguous.  But 
 if you're adding tags, or adding content on what was an empty element, 
 then the requirement gets fuzzy  And finding an existing library for 
 something fuzzy is unlikely.

Sure. I guess it's something like an 80/20 problem: Changing attributes in
a way that keeps the rest of the XML intact will go a long way and as
we're talking about XML that is supposed to be looked at by humans, I
would base any further requirements on the assumption that it's
pretty-printed in some way so that removing an element, for example, can
be defined by touching as few lines as possible, and adding one can be
restricted to adding a line in the appropriate place. If more complex
stuff isn't as well-defined, that would be entirely OK with me.

 Sample input, change list, and desired output would be very  useful.

I'd like to be able to reliably produce a diff like this using a program
that lets me change the value in some useful way, which might be dragging
a point across a map with the mouse in this example:

--- foo.gpx 2009-05-30 19:45:45.0 +0200
+++ bar.gpx 2009-11-23 17:41:36.0 +0100
@@ -11,7 +11,7 @@
   speed0.792244/speed
   fix2d/fix
 /trkpt
-trkpt lat=50.605995000 lon=10.70968
+trkpt lat=50.605985000 lon=10.70968
   ele508.30/ele
 time2009-05-30T16:37:10Z/time
   course15.15/course


-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Minimally intrusive XML editing using Python

2009-11-23 Thread Nobody
On Mon, 23 Nov 2009 17:45:24 +0100, Thomas Lotze wrote:

 What's your real problem, or use case?  Are you just concerned with 
 diffing, or are others likely to read the xml, and want it formatted the 
 way it already is?
 
 I'd like to put the XML under revision control along with other stuff.
 Other people should be able to make sense of the diffs and I'd rather not
 require them to configure their tools to use some XML differ.

In which case, the data format isn't XML, but a subset of it (and
probably an under-defined subset at that, unless you invest a lot of
effort in defining it).

That defeats one of the advantages of using a standardised container
format such as XML, i.e. being able to take advantage of existing tools
and libraries (which will be written to the offical standard, not to your
private standard).

One option is to require the files to be in a canonical form. Depending
upon your revision control system, you may be able to configure it to
either check that updated files are in this form, or even to convert them
automatically.

If your existing files aren't in such a form, there will be a one-time
penalty (i.e. a huge diff) when converting the files.


-- 
http://mail.python.org/mailman/listinfo/python-list


Minimally intrusive XML editing using Python

2009-11-18 Thread Thomas Lotze
I wonder what Python XML library is best for writing a program that makes
small modifications to an XML file in a minimally intrusive way. By that I
mean that information the program doesn't recognize is kept, as are
comments and whitespace, the order of attributes and even whitespace
around attributes. In short, I want to be able to change an XML file while
producing minimal textual diffs.

Most libraries don't allow controlling the order of and the whitespace
around attributes, so what's generally left to do is store snippets of
original text along with the model objects and re-use that for writing the
edited XML if the model wasn't modified by the program. Does a library
exist that helps with this? Does any XML library at all allow structured
access to the text representation of a tag with its attributes?

Thank you very much.

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Minimally intrusive XML editing using Python

2009-11-18 Thread Stefan Behnel
Thomas Lotze, 18.11.2009 13:55:
 I wonder what Python XML library is best for writing a program that makes
 small modifications to an XML file in a minimally intrusive way. By that I
 mean that information the program doesn't recognize is kept, as are
 comments and whitespace, the order of attributes and even whitespace
 around attributes. In short, I want to be able to change an XML file while
 producing minimal textual diffs.

Take a look at canonical XML (C14N). In short, that's the only way to get a
predictable XML serialisation that can be used for textual diffs. It's
supported by lxml.

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Minimally intrusive XML editing using Python

2009-11-18 Thread Chris Rebert
On Wed, Nov 18, 2009 at 4:55 AM, Thomas Lotze tho...@thomas-lotze.de wrote:
 I wonder what Python XML library is best for writing a program that makes
 small modifications to an XML file in a minimally intrusive way. By that I
 mean that information the program doesn't recognize is kept, as are
 comments and whitespace, the order of attributes and even whitespace
 around attributes. In short, I want to be able to change an XML file while
 producing minimal textual diffs.

 Most libraries don't allow controlling the order of and the whitespace
 around attributes, so what's generally left to do is store snippets of
 original text along with the model objects and re-use that for writing the
 edited XML if the model wasn't modified by the program. Does a library
 exist that helps with this? Does any XML library at all allow structured
 access to the text representation of a tag with its attributes?

Have you considered using an XML-specific diff tool such as:
* One off this list:
http://www.manageability.org/blog/stuff/open-source-xml-diff-in-java
* xmldiff (it's in Python even): http://www.logilab.org/859
* diffxml: http://diffxml.sourceforge.net/

[Note: I haven't actually used any of these.]

Cheers,
Chris
--
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Minimally intrusive XML editing using Python

2009-11-18 Thread Thomas Lotze
Stefan Behnel wrote:

 Take a look at canonical XML (C14N). In short, that's the only way to get a
 predictable XML serialisation that can be used for textual diffs. It's
 supported by lxml.

Thank you for the pointer. IIUC, c14n is about changing an XML document so
that its textual representation is reproducible. While this representation
would certainly solve my problem if I were to deal with input that's
already in c14n form, it doesn't help me handling arbitrarily formatted
XML in a minimally intrusive way.

IOW, I don't want the XML document to obey the rules of a process, but
instead I want a process that respects the textual form my input happens
to have.

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Minimally intrusive XML editing using Python

2009-11-18 Thread Thomas Lotze
Chris Rebert wrote:

 Have you considered using an XML-specific diff tool such as:

I'm afraid I'll have to fall back to using such a thing if I don't find a
solution to what I actually want to do.

I do realize that XML isn't primarily about its textual representation, so
I guess I shouldn't be surprised if what I'm looking for doesn't exist.
Still, it would be nice if it did...

-- 
Thomas


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Minimally intrusive XML editing using Python

2009-11-18 Thread Anthra Norell

Thomas Lotze wrote:

Chris Rebert wrote:

  

Have you considered using an XML-specific diff tool such as:



I'm afraid I'll have to fall back to using such a thing if I don't find a
solution to what I actually want to do.

I do realize that XML isn't primarily about its textual representation, so
I guess I shouldn't be surprised if what I'm looking for doesn't exist.
Still, it would be nice if it did...

  

Thomas,
  I just might have what you are looking for. But I want to be sure I 
understand your problem. So, please show a small sample and explain how 
you want it modified.

Frederic
--
http://mail.python.org/mailman/listinfo/python-list


Re: Minimally intrusive XML editing using Python

2009-11-18 Thread Dave Angel

Thomas Lotze wrote:

Chris Rebert wrote:

  

Have you considered using an XML-specific diff tool such as:



I'm afraid I'll have to fall back to using such a thing if I don't find a
solution to what I actually want to do.

I do realize that XML isn't primarily about its textual representation, so
I guess I shouldn't be surprised if what I'm looking for doesn't exist.
Still, it would be nice if it did...

  
What's your real problem, or use case?  Are you just concerned with 
diffing, or are others likely to read the xml, and want it formatted the 
way it already is?  And how general do you need this tool to be?  For 
example, if the only thing you're doing is modifying existing attributes 
or existing tags, the minimal change would be pretty unambiguous.  But 
if you're adding tags, or adding content on what was an empty element, 
then the requirement gets fuzzy  And finding an existing library for 
something fuzzy is unlikely.


Sample input, change list, and desired output would be very  useful.

DaveA

--
http://mail.python.org/mailman/listinfo/python-list


Re: Minimally intrusive XML editing using Python

2009-11-18 Thread Nobody
On Wed, 18 Nov 2009 13:55:52 +0100, Thomas Lotze wrote:

 I wonder what Python XML library is best for writing a program that makes
 small modifications to an XML file in a minimally intrusive way. By that I
 mean that information the program doesn't recognize is kept, as are
 comments and whitespace, the order of attributes and even whitespace
 around attributes. In short, I want to be able to change an XML file while
 producing minimal textual diffs.
 
 Most libraries don't allow controlling the order of and the whitespace
 around attributes, so what's generally left to do is store snippets of
 original text along with the model objects and re-use that for writing the
 edited XML if the model wasn't modified by the program. Does a library
 exist that helps with this? Does any XML library at all allow structured
 access to the text representation of a tag with its attributes?

Expat parsers have a CurrentByteIndex field, while SAX parsers have
locators. You can use this to identify the portions of the input which
need to be processed, and just copy everything else. One downside is that
these only report either the beginning (Expat) or end (SAX) of the tag;
you'll have to deduce the other side yourself.

OTOH, diff is probably the wrong tool for the job.

-- 
http://mail.python.org/mailman/listinfo/python-list