[jira] [Commented] (ODFTOOLKIT-308) GSoC: ODF Command Line Tools

Rob Weir (Commented) (JIRA) Wed, 07 Mar 2012 05:03:23 -0800

    [ 
https://issues.apache.org/jira/browse/ODFTOOLKIT-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224264#comment-13224264
 ]


Rob Weir commented on ODFTOOLKIT-308:
-------------------------------------

Good thoughts.   The other part is the glue between the command line tools.  
That was always the real power of the Unix tools, that they could easily be 
combined.  For example, I recently did this to search for all openoffice.org 
email address on downloaded copy of the openoffice website, deduping and 
sorting by how many times each address appeared:


grep -o -r -i --no-filename --include="*.html" 
"[[:alnum:]+\.\_\-]*@openoffice.org" . | sort | uniq -c | sort -n -r

So, powerful command line tools that each do one thing well.  And then a way to 
pipe the outputs of one to become the inputs of another.  The trick will be 
that an ODF document is a ZIP file containing multiple XML files, and possibly 
other resources, like JPG images. If we pipe the binary ZIP, then we're forcing 
each tool in the chain to do the uncompress/compress, which is bad for 
performance.  There is also the issue of repeated parsing/serialization of the 
XML.   So perhaps we don't use the OS's command line but create our own command 
line processor, entirely in a single JVM instance.  Or there might be other 
clever ways of making this efficient.
                
> GSoC:  ODF Command Line Tools
> -----------------------------
>
>                 Key: ODFTOOLKIT-308
>                 URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-308
>             Project: ODF Toolkit
>          Issue Type: New Feature
>            Reporter: Rob Weir
>            Assignee: Rob Weir
>              Labels: gsoc2012, mentor
>
> GNU/Linux, and UNIX before then has shown the great power of a text 
> processing via simple command line tools, combined with operating facilities 
> for piping and redirection. This filter-baed text processing is what makes 
> shell programming so powerful.  But it only works well for text documents.  
> But what about more complex, WYSIWYG documents, spreadsheets, word 
> processors, with more complex formats, often not text based at all?  The tool 
> set becomes far weaker.
> The Apache ODF Toolkit is a Java API that gives a high level view of a 
> document, and enables programmatic manipulation of a document.  We have 
> functions for doing things like search & replace.  There is a lot you can do 
> using the ODF Toolkit.  But it still requires Java programming, and that 
> limits its reach to professional programmers.
> What if we could write, using the ODF Toolkit, a set of command line 
> utilities that made it easy to do both simple and complex text manipulation 
> tasks form a command line, things like:
> 1) Concatenate documents
> 2) Replace slide 3 in presentation A with slide 3 from presentation B
> 3) Apply the styles of document A to all documents in the current directory
> 4) Find all occurances of "sausages" in the given document and add a 
> hyperlink to sausages.com
> and so on.
> Clearly analogs of cat, grep, diff and sed are obvious ones. Maybe something 
> awk-like that works with spreadsheets?  No need to be slavish to the original 
> tools, but create something of similar power, but which operate on ODF 
> documents.  For example, an alternative solution might be to write a new 
> shell processor that has native commands for ODF document manipulation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ODFTOOLKIT-308) GSoC: ODF Command Line Tools

Reply via email to