[
https://issues.apache.org/jira/browse/ODFTOOLKIT-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224264#comment-13224264
]
Rob Weir commented on ODFTOOLKIT-308:
-------------------------------------
Good thoughts. The other part is the glue between the command line tools.
That was always the real power of the Unix tools, that they could easily be
combined. For example, I recently did this to search for all openoffice.org
email address on downloaded copy of the openoffice website, deduping and
sorting by how many times each address appeared:
grep -o -r -i --no-filename --include="*.html"
"[[:alnum:]+\.\_\-]*@openoffice.org" . | sort | uniq -c | sort -n -r
So, powerful command line tools that each do one thing well. And then a way to
pipe the outputs of one to become the inputs of another. The trick will be
that an ODF document is a ZIP file containing multiple XML files, and possibly
other resources, like JPG images. If we pipe the binary ZIP, then we're forcing
each tool in the chain to do the uncompress/compress, which is bad for
performance. There is also the issue of repeated parsing/serialization of the
XML. So perhaps we don't use the OS's command line but create our own command
line processor, entirely in a single JVM instance. Or there might be other
clever ways of making this efficient.
> GSoC: ODF Command Line Tools
> -----------------------------
>
> Key: ODFTOOLKIT-308
> URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-308
> Project: ODF Toolkit
> Issue Type: New Feature
> Reporter: Rob Weir
> Assignee: Rob Weir
> Labels: gsoc2012, mentor
>
> GNU/Linux, and UNIX before then has shown the great power of a text
> processing via simple command line tools, combined with operating facilities
> for piping and redirection. This filter-baed text processing is what makes
> shell programming so powerful. But it only works well for text documents.
> But what about more complex, WYSIWYG documents, spreadsheets, word
> processors, with more complex formats, often not text based at all? The tool
> set becomes far weaker.
> The Apache ODF Toolkit is a Java API that gives a high level view of a
> document, and enables programmatic manipulation of a document. We have
> functions for doing things like search & replace. There is a lot you can do
> using the ODF Toolkit. But it still requires Java programming, and that
> limits its reach to professional programmers.
> What if we could write, using the ODF Toolkit, a set of command line
> utilities that made it easy to do both simple and complex text manipulation
> tasks form a command line, things like:
> 1) Concatenate documents
> 2) Replace slide 3 in presentation A with slide 3 from presentation B
> 3) Apply the styles of document A to all documents in the current directory
> 4) Find all occurances of "sausages" in the given document and add a
> hyperlink to sausages.com
> and so on.
> Clearly analogs of cat, grep, diff and sed are obvious ones. Maybe something
> awk-like that works with spreadsheets? No need to be slavish to the original
> tools, but create something of similar power, but which operate on ODF
> documents. For example, an alternative solution might be to write a new
> shell processor that has native commands for ODF document manipulation.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira