To me, we really are creating the lowest level required for all of this.
The API for manipulating the file format.

>From there, we need other things to allow a generic way to do it.  To me
that generic way is XML/XSLT.

The Serializer for Excel which I'm moving back over here "very soon" from
Cocoon takes the Gnumeric XML file format and makes Excel.  I always
intended to do a generator but the serializer didn't catch on fire like I
thought (eveyone when  straight to the API for various reasons).

Consider though, that writing a stylesheet to convert Gnumeric format to
say...HTML is a child's task.  There are of course format capability
differences, but that�s going to be there no matter what.

What I'd like to see is POI focus on the M$ file formats at the low level.
A new top-level Apache project focus on *other* types of binary file formats
and then us work on projects which group certain types of like file formats
into XML grammars (serializers/generators).

To me the vocabulary of the XML is practically irrelevant provided the XML
format is closely coupled with the binary format, you can always count on
XSLT to make a transformation.

Our mistake was coopting gnumeric format.  It seemed like a good idea
because it was *kinda* close to Excel.  We knew they'd cooperate (kinda) and
the such.  The trouble was that the format wasn't as similar to Excel format
as we though.  The structure and the capabilities was different enough that
there was mismatch from day one.

The better thing would have been to just represent the binary structures as
XML and then do transforms via XSLT.

Now take HSSF, users don't have to know much about the low level format to
use it.  The structure maybe, but not the format.  That�s why we have a
"High level model" for manipulating cells and rows and the familiar
structures of the spreadsheet.

However, Rome wasn't built in a day.  Once the low level structures in HWPF
exist you can wrap them in to high level models.  From there you can tie an
XML vocabulary to that model.  From there you can transform the XML to other
things.  From there you can digest that XML and apply it to the Java
model....and back to the circle.

Its important that we don't underestimate the work we have before us.  It
took 2 guys basically 6 months (granted that included the underlying OLE 2
CDF) to write out a blank spreadsheet with a few numbers in the cells.  Word
is a MUCH more complicated file format.

I need to take a closer look at the HWPF stuff.  We should be moving to
create a high level model fairly soon.  It should be very close to the
capabillities of word.  Above that we can strive for something more generic
(FOP/etc).

-Andy


On 7/18/03 4:14 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:

>> I like the idea of finding an existing Document Object Model to use. We
>> should all look around and see whats out there. My number one criteria
> for
>> our DOM is intuitiveness. I would like to find one that fits our needs
> and
>> is intuitive.
> 
> Intuitiveness is important.  We want users to be productive without too
> much unnecessary effort.  I think the right level of abstraction gets you
> 90% of it.  Anything so the application developer can remain ignorant of
> all that gobbledygook that you and Praveen exchange:  "In the explanation
> given for 'sprmPlncLvl' it says The sprm is three bytes long and consists
> of the sprm code and a one byte two's complement value."  Ouch!  If users
> can remain as ignorant as I am of what that means, this project will be a
> success!
> 
> Obviously, using standards, official or de facto is also a good thing.
> Existing things out there in this domain (rich text models) include:
> 
> 1) HTML/XHTML DOM
> 2) XSL:FOP
> 3) OpenOffice.org / OASIS Open Office XML
> 4) XML using some other vocabulary
> 
> Any others?
> 
> You take each one of these and weigh it against a few criteria:
> 
> 1) Does it allow a clean separation of content and style?  Presumably Word
> is big on that and we don't want to loose that.
> 2) Is it expressive enough to represent the breadth of Word functionality
> that is important to us?
> 3) Is it easy to work with, lend itself to tooling, etc.?
> 4) Is it popular, widely adopted, etc., such that you might get some
> synergy with other projects?
> 5) Does it lend itself to a high-performance implementation?
> 6) Does it make the simple stuff simple while at the same time allowing
> more ambitious users to do more ambitious things?
> 
> HTML by itself fails 1) and 2).  Adding CSS stylesheets could remedy that,
> but it would still lack page-level features, like headers/footers, page
> numbers, or even hard or soft page breaks.
> 
> FOP gives a lot more support, though it is rather complex.
> 
> OpenOffice.org mixes FOP with several other standard markups like MathML,
> SVG and XLink.  But it gets complex pretty quickly -- A simple "hello
> world" document generates XML with the following namespaces:
> 
> xmlns:office="http://openoffice.org/2000/office";
> xmlns:style="http://openoffice.org/2000/style";
> xmlns:text="http://openoffice.org/2000/text";
> xmlns:table="http://openoffice.org/2000/table";
> xmlns:draw="http://openoffice.org/2000/drawing";
> xmlns:fo="http://www.w3.org/1999/XSL/Format";
> xmlns:xlink="http://www.w3.org/1999/xlink";
> xmlns:number="http://openoffice.org/2000/datastyle";
> xmlns:svg="http://www.w3.org/2000/svg";
> xmlns:chart="http://openoffice.org/2000/chart";
> xmlns:dr3d="http://openoffice.org/2000/dr3d";
> xmlns:math="http://www.w3.org/1998/Math/MathML";
> xmlns:form="http://openoffice.org/2000/form";
> xmlns:script="http://openoffice.org/2000/script";
> 
> But it is something of a moving target now that OASIS is drafting a format
> standard based on it.  But I think it will be an attractive and widely
> used format once it re-emerges as a standard.
> 
> I'm afraid I've raised more questions than I've answered ;-)  But in the
> end I really don't see anything out there that jumps out and says "I'm the
> API you want".
> 
> -Rob

-- 
Andrew C. Oliver
http://www.superlinksoftware.com/poi.jsp
Custom enhancements and Commercial Implementation for Jakarta POI

http://jakarta.apache.org/poi
For Java and Excel, Got POI?


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to