BACK STORY On a different list, not just here on ooo-dev, there has been some surprise to see us putting binaries (ODF documents) into some SVN locations used by the PPMC.
My impression is that the experienced hands here in ASF are expecting to see DIFFs in commit messages on SVN, but binaries don't get DIFFed since it is usually unintelligible and almost always uninteresting. For some, it is new news that ODF packages are not XML files. Someone suggested that one could unpack the Zip of these documents and then do diffs of the respective XML parts and that could serve as a DIFF on what the changes are. They also noticed they'd never seen that done. THE INSIGHT On seeing that suggestion (clearly the kinds of things developers think of, it being what we do), it struck me that we have a geeks are from Mars, users are from Venus situation here. I think the clash of expectations has to do with the differences in tools that are applicable at the level we work at, and how we see what it is we are at work on. We need to understand that we really have different experience sets, and they all are important in the context of the OpenOffice.org project. A GEEKY LOOK Here is a geeky explanation of why it does no good to figure out a better way to show DIFFs of the XML inside an ODF package if you want to know what an author contributor/committer changed. (You might want that as a forensics tool, but not for knowing what someone changed in the course of their work on a document.) My (updated) explanation: The problem is that diff-ing the XML is not what's wanted. That's like decompiling two programs and posting a diff of the assembly language. (There are also binary blobs -- I said blogs by mistake in another post -- in the Zipped ODF package.) The level of abstraction that one cares about for accounting for changes in a document in one of these formats is at the presentation or print-preview level. There are document compare utilities that provide such functions. It's like the comparison you get between two wiki pages. It isn't shown as a comparison of the WikiText, but of the resulting presentation anywhere I've looked. (I know that on Apache we have a production process where we use SVN as a publishing location and see diffs of Markdown a kind of plaintext markup. I know that fits beautifully into the source-code revision developer toolcraft model, but you wouldn't want to know about changes in an ODF document that way, BECAUSE IT IS NOT WHAT IS AUTHORED.) There are also change-tracking (historically called red-lining in my experience) provisions in the ODF Format and the software products handle it to varying degrees of reliability. This is like showing a kind of merge with the removed text and the inserted text all shown in the document and distinguished by highlighting and strikethroughs of various forms. A reviewer can agree to accept a change or can reject a change, make more changes, etc. So there are (at least) two different levels of envisioning, of toolcraft and of work practices among us. At one level, there is the world of SVN, compiler and build processes, and source code in simply-formatted text. For ODF (and OOXML and more of these), the XML in the Zip is object code, not the source code. The source code counterpart is at quite another level. Worlds are colliding here on Apache OpenOffice.org. It is going to be very interesting what we learn from each other and how we manage to function in some kind of shared culture within the Apache Way. Some of us navigate both levels with some fluency. That is not the case for most of us and, I am learning, not natural for me either: OpenOffice is not my tool of choice apart from using it as an ODF forensic tool, and my development toolcraft is not SVN, LAMP, etc. It is very important to grasp this, because if we don't recognize it, the authors of documentation and people working at the user-issues level are going to be left with no way to fit in and not much that feels like it is appropriate for their specialized activities. - Dennis
