I think that the important part here is that others can review the work being done. When that work is encapsulated behind binary formats, then it makes it *very* difficult to perform that review.
Sure, some artifacts in the repository *need* to be binary. Nobody will dispute that. But when the primary work of this PMC can be done in a reviewable format, then it helps all of us to make that happen. Cheers, -g On Wed, Jun 22, 2011 at 01:29, Dave Fisher <[email protected]> wrote: > On Jun 21, 2011, at 8:58 PM, Daniel Shahaf wrote: > >> Dennis E. Hamilton wrote on Tue, Jun 21, 2011 at 19:20:13 -0700: >>> BACK STORY >>> >>> On a different list, not just here on ooo-dev, there has been some >>> surprise to see us putting binaries (ODF documents) into some SVN >>> locations used by the PPMC. >>> >>> My impression is that the experienced hands here in ASF are expecting >>> to see DIFFs in commit messages on SVN, but binaries don't get DIFFed >>> since it is usually unintelligible and almost always uninteresting. >>> For some, it is new news that ODF packages are not XML files. >>> >>> Someone suggested that one could unpack the Zip of these documents and >>> then do diffs of the respective XML parts and that could serve as >>> a DIFF on what the changes are. They also noticed they'd never seen >>> that done. >>> >>> THE INSIGHT >>> >>> On seeing that suggestion (clearly the kinds of things developers >>> think of, it being what we do), it struck me that we have a geeks are >>> from Mars, users are from Venus situation here. >>> >>> I think the clash of expectations has to do with the differences in >>> tools that are applicable at the level we work at, and how we see what >>> it is we are at work on. >>> >>> We need to understand that we really have different experience sets, >>> and they all are important in the context of the OpenOffice.org >>> project. >>> >>> A GEEKY LOOK >>> >>> Here is a geeky explanation of why it does no good to figure out >>> a better way to show DIFFs of the XML inside an ODF package if you >>> want to know what an author contributor/committer changed. (You might >>> want that as a forensics tool, but not for knowing what someone >>> changed in the course of their work on a document.) >>> >>> My (updated) explanation: >>> >> >> Long email. In the end, the expectation is for commit mails to contain >> reviewable diffs, I don't think you've addressed how that might be done? > > As far as I know binary files are acceptable elsewhere in SVN. > >> >> (as opposed to how it shouldn't be done) > > Generally ODF files will be documentation and testcases, and generally > consistent., like PNGs, JPEGs, etc. No one complains about PDFs or any of the > MS Office formats in SVN. We haven't seemed to care about that in the Apache > POI project, I can't answer for PDFBox. > > I unzipped an ODF zip then each part is a huge set of verbose xml on two > lines. Header and data. For example, content.xml. > > <?xml version="1.0" encoding="UTF-8"?> > <office:document-content > xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" > xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" > xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" > xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" > xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" > xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" > xmlns:xlink="http://www.w3.org/1999/xlink" > xmlns:dc="http://purl.org/dc/elements/1.1/" > xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" > xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" > xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" > xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" > xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" > xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns: .... > > Diff won't work easily. Maybe SVN needs to provide "zip" storage and then > "xml" diff within. Could the Subversion project whip that out now. We'll wait > until they do before we proceed. I'm being sarcastic here. But if it > available now that would be pretty cool. > > The real issue is that a binary document was used to update a table where > everyone made changes. Changes that were important to those viewing the > commit messages. I know we all love office documents around here, but ... > > Maybe we should be exchanging that particular file as a CSV. > > (BTW - I notice that Calc's save options don't include XLSX, etc.) > > Best Regards, > Dave > >
