Re: Minimally intrusive XML editing using Python
Please consider this a reply to any unanswered messages I received in response to my original post. Dave Angel wrote: What's your real problem, or use case? Are you just concerned with diffing, or are others likely to read the xml, and want it formatted the way it already is? I'd like to put the XML under revision control along with other stuff. Other people should be able to make sense of the diffs and I'd rather not require them to configure their tools to use some XML differ. And how general do you need this tool to be? For example, if the only thing you're doing is modifying existing attributes or existing tags, the minimal change would be pretty unambiguous. But if you're adding tags, or adding content on what was an empty element, then the requirement gets fuzzy And finding an existing library for something fuzzy is unlikely. Sure. I guess it's something like an 80/20 problem: Changing attributes in a way that keeps the rest of the XML intact will go a long way and as we're talking about XML that is supposed to be looked at by humans, I would base any further requirements on the assumption that it's pretty-printed in some way so that removing an element, for example, can be defined by touching as few lines as possible, and adding one can be restricted to adding a line in the appropriate place. If more complex stuff isn't as well-defined, that would be entirely OK with me. Sample input, change list, and desired output would be very useful. I'd like to be able to reliably produce a diff like this using a program that lets me change the value in some useful way, which might be dragging a point across a map with the mouse in this example: --- foo.gpx 2009-05-30 19:45:45.0 +0200 +++ bar.gpx 2009-11-23 17:41:36.0 +0100 @@ -11,7 +11,7 @@ speed0.792244/speed fix2d/fix /trkpt -trkpt lat=50.605995000 lon=10.70968 +trkpt lat=50.605985000 lon=10.70968 ele508.30/ele time2009-05-30T16:37:10Z/time course15.15/course -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Minimally intrusive XML editing using Python
On Mon, 23 Nov 2009 17:45:24 +0100, Thomas Lotze wrote: What's your real problem, or use case? Are you just concerned with diffing, or are others likely to read the xml, and want it formatted the way it already is? I'd like to put the XML under revision control along with other stuff. Other people should be able to make sense of the diffs and I'd rather not require them to configure their tools to use some XML differ. In which case, the data format isn't XML, but a subset of it (and probably an under-defined subset at that, unless you invest a lot of effort in defining it). That defeats one of the advantages of using a standardised container format such as XML, i.e. being able to take advantage of existing tools and libraries (which will be written to the offical standard, not to your private standard). One option is to require the files to be in a canonical form. Depending upon your revision control system, you may be able to configure it to either check that updated files are in this form, or even to convert them automatically. If your existing files aren't in such a form, there will be a one-time penalty (i.e. a huge diff) when converting the files. -- http://mail.python.org/mailman/listinfo/python-list
Minimally intrusive XML editing using Python
I wonder what Python XML library is best for writing a program that makes small modifications to an XML file in a minimally intrusive way. By that I mean that information the program doesn't recognize is kept, as are comments and whitespace, the order of attributes and even whitespace around attributes. In short, I want to be able to change an XML file while producing minimal textual diffs. Most libraries don't allow controlling the order of and the whitespace around attributes, so what's generally left to do is store snippets of original text along with the model objects and re-use that for writing the edited XML if the model wasn't modified by the program. Does a library exist that helps with this? Does any XML library at all allow structured access to the text representation of a tag with its attributes? Thank you very much. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Minimally intrusive XML editing using Python
Thomas Lotze, 18.11.2009 13:55: I wonder what Python XML library is best for writing a program that makes small modifications to an XML file in a minimally intrusive way. By that I mean that information the program doesn't recognize is kept, as are comments and whitespace, the order of attributes and even whitespace around attributes. In short, I want to be able to change an XML file while producing minimal textual diffs. Take a look at canonical XML (C14N). In short, that's the only way to get a predictable XML serialisation that can be used for textual diffs. It's supported by lxml. Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: Minimally intrusive XML editing using Python
On Wed, Nov 18, 2009 at 4:55 AM, Thomas Lotze tho...@thomas-lotze.de wrote: I wonder what Python XML library is best for writing a program that makes small modifications to an XML file in a minimally intrusive way. By that I mean that information the program doesn't recognize is kept, as are comments and whitespace, the order of attributes and even whitespace around attributes. In short, I want to be able to change an XML file while producing minimal textual diffs. Most libraries don't allow controlling the order of and the whitespace around attributes, so what's generally left to do is store snippets of original text along with the model objects and re-use that for writing the edited XML if the model wasn't modified by the program. Does a library exist that helps with this? Does any XML library at all allow structured access to the text representation of a tag with its attributes? Have you considered using an XML-specific diff tool such as: * One off this list: http://www.manageability.org/blog/stuff/open-source-xml-diff-in-java * xmldiff (it's in Python even): http://www.logilab.org/859 * diffxml: http://diffxml.sourceforge.net/ [Note: I haven't actually used any of these.] Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Minimally intrusive XML editing using Python
Stefan Behnel wrote: Take a look at canonical XML (C14N). In short, that's the only way to get a predictable XML serialisation that can be used for textual diffs. It's supported by lxml. Thank you for the pointer. IIUC, c14n is about changing an XML document so that its textual representation is reproducible. While this representation would certainly solve my problem if I were to deal with input that's already in c14n form, it doesn't help me handling arbitrarily formatted XML in a minimally intrusive way. IOW, I don't want the XML document to obey the rules of a process, but instead I want a process that respects the textual form my input happens to have. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Minimally intrusive XML editing using Python
Chris Rebert wrote: Have you considered using an XML-specific diff tool such as: I'm afraid I'll have to fall back to using such a thing if I don't find a solution to what I actually want to do. I do realize that XML isn't primarily about its textual representation, so I guess I shouldn't be surprised if what I'm looking for doesn't exist. Still, it would be nice if it did... -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Minimally intrusive XML editing using Python
Thomas Lotze wrote: Chris Rebert wrote: Have you considered using an XML-specific diff tool such as: I'm afraid I'll have to fall back to using such a thing if I don't find a solution to what I actually want to do. I do realize that XML isn't primarily about its textual representation, so I guess I shouldn't be surprised if what I'm looking for doesn't exist. Still, it would be nice if it did... Thomas, I just might have what you are looking for. But I want to be sure I understand your problem. So, please show a small sample and explain how you want it modified. Frederic -- http://mail.python.org/mailman/listinfo/python-list
Re: Minimally intrusive XML editing using Python
Thomas Lotze wrote: Chris Rebert wrote: Have you considered using an XML-specific diff tool such as: I'm afraid I'll have to fall back to using such a thing if I don't find a solution to what I actually want to do. I do realize that XML isn't primarily about its textual representation, so I guess I shouldn't be surprised if what I'm looking for doesn't exist. Still, it would be nice if it did... What's your real problem, or use case? Are you just concerned with diffing, or are others likely to read the xml, and want it formatted the way it already is? And how general do you need this tool to be? For example, if the only thing you're doing is modifying existing attributes or existing tags, the minimal change would be pretty unambiguous. But if you're adding tags, or adding content on what was an empty element, then the requirement gets fuzzy And finding an existing library for something fuzzy is unlikely. Sample input, change list, and desired output would be very useful. DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: Minimally intrusive XML editing using Python
On Wed, 18 Nov 2009 13:55:52 +0100, Thomas Lotze wrote: I wonder what Python XML library is best for writing a program that makes small modifications to an XML file in a minimally intrusive way. By that I mean that information the program doesn't recognize is kept, as are comments and whitespace, the order of attributes and even whitespace around attributes. In short, I want to be able to change an XML file while producing minimal textual diffs. Most libraries don't allow controlling the order of and the whitespace around attributes, so what's generally left to do is store snippets of original text along with the model objects and re-use that for writing the edited XML if the model wasn't modified by the program. Does a library exist that helps with this? Does any XML library at all allow structured access to the text representation of a tag with its attributes? Expat parsers have a CurrentByteIndex field, while SAX parsers have locators. You can use this to identify the portions of the input which need to be processed, and just copy everything else. One downside is that these only report either the beginning (Expat) or end (SAX) of the tag; you'll have to deduce the other side yourself. OTOH, diff is probably the wrong tool for the job. -- http://mail.python.org/mailman/listinfo/python-list