What's next for ODF Toolkit? Call for participation & personal roadmaps!

Svante Schubert Thu, 13 Apr 2017 05:23:40 -0700

Rob Weir once told me, that participation on OpenSource is like, one is
scratching the part that is itching most. Rob was one of the Jedis of the
ISO Document Clone Wars when Microsoft cloned the concept of ODF for OOXML
making as well an ISO standard on office documents reusing the idea of
zipped XML files and some more [1].

One is for sure, the ODF Toolkit is a wonderful place to play around with
new concepts for file formats, as we already did:

1. We generate from the ODF grammar (RelaxNG)
<http://incubator.apache.org/odftoolkit/0.6.2-incubating/schema2template/> a
typed ODF DOM tree

<http://incubator.apache.org/odftoolkit/0.6.2-incubating/odfdom/overview-summary.html#The_ODFDOM_Layers>,
so developers do not have to know the details of the verbose ODF schema
<http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-schema.rng>
and XSL tooling works out-of-the-box as XSLT directly on ODF document
<http://incubator.apache.org/odftoolkit/xsltrunner/ODFXSLTRunner.html>.
2. We have the most advanced ODF validator
<http://odf-validator.rhcloud.com> doing its job mostly unseen. For
example, for regression testing of LibreOffice and the ODF test server
<https://gitlab.com/odfplugfest/odfserver/>. Dispatching not only XML
invalidness, but constraints of the package/ZIP format via the XML
ErrorHandler interface
<http://www.saxproject.org/apidoc/org/xml/sax/ErrorHandler.html>.
3. Many companies are using the ODF toolkit to have an automated access
to information from the document and adopt the document by any kind of
pattern.

After the release, I am curious how you developers on the ODF dev list
might extend some functionality that is missing most for you?
Remember, when nothing is giving back, the project will never flourish to a
top level project of Apache and might in worst case even vanish.

Therefore, I would like to give you my personal roadmap of improved
functionality for the ODF toolkit and would love to know about yours or ask
for assistance!

>From what I learned, the complexity of file formats can be tamed best when
applying as much automation as possible. Therefore, our focus should be on
the generation part of ODFDOM.
Currently, the sequence and choice of XML elements provided by the Grammar
are not generated into XML. In addition, if there is are multiple children
with an xml:id (or equivalent attribute named in our configuration) a map
should be generated on demand (see JIRA feature
<https://issues.apache.org/jira/browse/ODFTOOLKIT-182>). If the above is
done, one of the oldest written part of ODFDOM the style functionality
could be generated and work could be so much easier.

User components (image, paragraph, table, each character, etc.) have to be
identified directly in the RelaxNG (e.g as a comment above start element),
so they can be generated as well for high level and an explicit
hand-written Simple API would become more or less redundant. Best thing,
the more we generate, the easier we fix on a large scale and even more
might generate not only an ODF model with Java DOM source code but as well
C++ binary source code for some Android/IOS model.

Finally, a document shall be mapped to a sequence of these changes of these
user components mentioned above (similar to an edit sequence of a user
creating the document) and new changes from ODF editors shall be merged
into the document. The upcoming ODF change-tracking will be based on
defined changes oppose to some prior state XML parts saved aside, without
knowing what they are meaning and therefore no chance to have such parts
overlap each other.
Changes are the next big step of document evolution. While documents will
always have their meaning as a snapshot of overall state (like the document
being signed), changes are a mandatory invention to get rid off our
ping-pong document exchange via email/disc and allow async simultaneous
editing of multiple users across multiple applications, which ask: What did
you change?

But foremost, I would love to play around with some analysis of the ODF
grammar in some GraphDB [2]
<https://lists.apache.org/thread.html/810459b3eadbebe81ed2a9720a5467d1a5a3daa0da999baf02ce2305@%3Codf-dev.incubator.apache.org%3E>
[3] <http://markmail.org/message/yjq7arijymspndiu>. To created some
tooling to identify user components from the ODF RelaxNG.

If you have some personal needs/roadmap you are capable of working on, I
would love to hear about it.
Otherwise, if you are without a personal need, but like the basic idea
described, I would love to have some assistance. For instance, on the JIRA
feature <https://issues.apache.org/jira/browse/ODFTOOLKIT-182> mentioned
above, but it is not an easy foe, likely more the BOSS at the end of the
game level.. ;)

Looking forward to receive some feedback,
Svante

[1] Did you ever heard that the first draft version of OpenDocumen Format
was called OpenOffice XML, but was changed as its naming was too close to
one existing application?
Based on this Microsoft was adopting "OpenOffice XML" to "Office Open XML".
Some bright idea! :)
[2] https://lists.apache.org/thread.html/810459b3eadbebe81ed2a9720a5467
d1a5a3daa0da999baf02ce2305@%3Codf-dev.incubator.apache.org%3E
[3] http://markmail.org/message/yjq7arijymspndiu
ᐧ

What's next for ODF Toolkit? Call for participation & personal roadmaps!

Reply via email to