It's true that we could bring things around and shoe-horn them into the SVN 
DIFF model, like using a CSV, or not minding that HTML diffs aren't that 
illuminating but we get them for what they are worth, etc.

Of course, using a CSV loses a lot of information and anything that was done in 
the design of the spreadsheet to facilitate its use, in my chosen example.

Now, in this case, the spreadsheet was from a committer. And other committers 
knew how to retrieve it, update it, and resubmit to SVN with an informative 
enough commit message.  This is not a complex case, it was just illustrative of 
the different level.

The question I did not answer, because I do not know the answer:  What is a 
straightforward way for someone who was not raised as a Martian to contribute 
without being compelled to commit unnatural (for a Venusian) acts.  What is a 
way to contribute that does not require an unnatural change in 
already-successful ways of working?  And what is the cutover where the 
contribution is substantial enough that an iCLA is required anyhow?

It seems to me there is an impedance mismatch for non-developer contributions 
of content that becomes part of an Apache deliverable.  I don't question 
policies that are involved.  I am wondering about the logistics and the 
friction of shoe-horning contributors into a practice that is designed around 
submission of patches and requires arcane Martian technology.

Perhaps this is too hypothetical.

I would like to hear from non-developer members of ooo-Dev who want to 
contribute, and what the nature of the envisioned contribution is.  Maybe some 
concrete use cases can clear this up for all of us.

 - Dennis

-----Original Message-----
From: Dave Fisher [mailto:[email protected]] 
Sent: Tuesday, June 21, 2011 22:30
To: [email protected]
Cc: Dennis E. Hamilton
Subject: Re: Consequences of Working in Office Documents Here

On Jun 21, 2011, at 8:58 PM, Daniel Shahaf wrote:

> Dennis E. Hamilton wrote on Tue, Jun 21, 2011 at 19:20:13 -0700:
>> BACK STORY
>> 
>> On a different list, not just here on ooo-dev, there has been some
>> surprise to see us putting binaries (ODF documents) into some SVN
>> locations used by the PPMC. 
>> 
>> My impression is that the experienced hands here in ASF are expecting
>> to see DIFFs in commit messages on SVN, but binaries don't get DIFFed
>> since it is usually unintelligible and almost always uninteresting.
>> For some, it is new news that ODF packages are not XML files.
>> 
>> Someone suggested that one could unpack the Zip of these documents and
>> then do diffs of the respective XML parts and that could serve as
>> a DIFF on what the changes are.  They also noticed they'd never seen
>> that done.
>> 
>> THE INSIGHT
>> 
>> On seeing that suggestion (clearly the kinds of things developers
>> think of, it being what we do), it struck me that we have a geeks are
>> from Mars, users are from Venus situation here.
>> 
>> I think the clash of expectations has to do with the differences in
>> tools that are applicable at the level we work at, and how we see what
>> it is we are at work on.
>> 
>> We need to understand that we really have different experience sets,
>> and they all are important in the context of the OpenOffice.org
>> project.
>> 
>> A GEEKY LOOK
>> 
>> Here is a geeky explanation of why it does no good to figure out
>> a better way to show DIFFs of the XML inside an ODF package if you
>> want to know what an author contributor/committer changed.  (You might
>> want that as a forensics tool, but not for knowing what someone
>> changed in the course of their work on a document.)
>> 
>> My (updated) explanation:
>> 
> 
> Long email.  In the end, the expectation is for commit mails to contain
> reviewable diffs, I don't think you've addressed how that might be done?

As far as I know binary files are acceptable elsewhere in SVN.

> 
> (as opposed to how it shouldn't be done)

Generally ODF files will be documentation and testcases, and generally 
consistent., like PNGs, JPEGs, etc. No one complains about PDFs or any of the 
MS Office formats in SVN. We haven't seemed to care about that in the Apache 
POI project, I can't answer for PDFBox.

I unzipped an ODF zip then each part is a huge set of verbose xml on two lines. 
Header and data. For example, content.xml.

<?xml version="1.0" encoding="UTF-8"?>
<office:document-content 
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" 
xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" 
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" 
xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" 
xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0" 
xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0" 
xmlns:xlink="http://www.w3.org/1999/xlink"; 
xmlns:dc="http://purl.org/dc/elements/1.1/"; 
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" 
xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0" 
xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0" 
xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0" 
xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" 
xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0" xmlns: ....

Diff won't work easily. Maybe SVN needs to provide "zip" storage and then "xml" 
diff within. Could the Subversion project whip that out now. We'll wait until 
they do before we proceed. I'm being sarcastic here. But if it available now 
that would be pretty cool.

The real issue is that a binary document was used to update a table where 
everyone made changes. Changes that were important to those viewing the commit 
messages. I know we all love office documents around here, but ...

Maybe we should be exchanging that particular file as a CSV.

(BTW - I notice that Calc's save options don't include XLSX, etc.)

Best Regards,
Dave

Reply via email to