This is an interesting post from John Udell about the two extremes of authoring HTML with pros and cons and bridges he developed.

February 19, 2007
Blogging from Word 2007, crossing the chasm [1]

    To that end, I’m developing some Python code to help me
    wrangle Word’s default .docx format, which is a zip file
    containing the document in WordML and a bunch of other
    stuff. At the end of this entry you can see what I’ve got
    so far. I’m using this code to explore what kind of XML I
    can inject programmatically into a Word 2007 document,
    what kind comes back after a round trip through the
    application, how that XML relates to the HTML that gets
    published to WordPress, and which of these
    representations will be the canonical one that I’ll want
    to store and process.

     So far my conclusion is that none of these
    representations will be the canonical one, and that I’ll
    need to find (or more likely create) a transform to and
    from the canonical representation where I’ll store and
    process all my stuff. We’ll see how it goes.

[1] http://blog.jonudell.net/2007/02/19/blogging-from-word-2007- crossing-the-chasm/


I like the mention of the canonical form.
Not exactly the same canonical form than his, but that would be good to have an html canonical form for editing. It would help building tools like for example htmldiff, tidy serialization, and source code visualizer in editing tools.

It would help authors also to work the way they want with their files and still communicate files between parties. my source code layout <-- T1 --> canonical form <-- T2 --> your source code layout

T1 and T2 being formatting transformation.



--
Karl Dubost - http://www.w3.org/People/karl/
W3C Conformance Manager, QA Activity Lead
  QA Weblog - http://www.w3.org/QA/
     *** Be Strict To Be Cool ***



Reply via email to