To me, we really are creating the lowest level required for all of this. The API for manipulating the file format.
>From there, we need other things to allow a generic way to do it. To me that generic way is XML/XSLT. The Serializer for Excel which I'm moving back over here "very soon" from Cocoon takes the Gnumeric XML file format and makes Excel. I always intended to do a generator but the serializer didn't catch on fire like I thought (eveyone when straight to the API for various reasons). Consider though, that writing a stylesheet to convert Gnumeric format to say...HTML is a child's task. There are of course format capability differences, but that�s going to be there no matter what. What I'd like to see is POI focus on the M$ file formats at the low level. A new top-level Apache project focus on *other* types of binary file formats and then us work on projects which group certain types of like file formats into XML grammars (serializers/generators). To me the vocabulary of the XML is practically irrelevant provided the XML format is closely coupled with the binary format, you can always count on XSLT to make a transformation. Our mistake was coopting gnumeric format. It seemed like a good idea because it was *kinda* close to Excel. We knew they'd cooperate (kinda) and the such. The trouble was that the format wasn't as similar to Excel format as we though. The structure and the capabilities was different enough that there was mismatch from day one. The better thing would have been to just represent the binary structures as XML and then do transforms via XSLT. Now take HSSF, users don't have to know much about the low level format to use it. The structure maybe, but not the format. That�s why we have a "High level model" for manipulating cells and rows and the familiar structures of the spreadsheet. However, Rome wasn't built in a day. Once the low level structures in HWPF exist you can wrap them in to high level models. From there you can tie an XML vocabulary to that model. From there you can transform the XML to other things. From there you can digest that XML and apply it to the Java model....and back to the circle. Its important that we don't underestimate the work we have before us. It took 2 guys basically 6 months (granted that included the underlying OLE 2 CDF) to write out a blank spreadsheet with a few numbers in the cells. Word is a MUCH more complicated file format. I need to take a closer look at the HWPF stuff. We should be moving to create a high level model fairly soon. It should be very close to the capabillities of word. Above that we can strive for something more generic (FOP/etc). -Andy On 7/18/03 4:14 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: >> I like the idea of finding an existing Document Object Model to use. We >> should all look around and see whats out there. My number one criteria > for >> our DOM is intuitiveness. I would like to find one that fits our needs > and >> is intuitive. > > Intuitiveness is important. We want users to be productive without too > much unnecessary effort. I think the right level of abstraction gets you > 90% of it. Anything so the application developer can remain ignorant of > all that gobbledygook that you and Praveen exchange: "In the explanation > given for 'sprmPlncLvl' it says The sprm is three bytes long and consists > of the sprm code and a one byte two's complement value." Ouch! If users > can remain as ignorant as I am of what that means, this project will be a > success! > > Obviously, using standards, official or de facto is also a good thing. > Existing things out there in this domain (rich text models) include: > > 1) HTML/XHTML DOM > 2) XSL:FOP > 3) OpenOffice.org / OASIS Open Office XML > 4) XML using some other vocabulary > > Any others? > > You take each one of these and weigh it against a few criteria: > > 1) Does it allow a clean separation of content and style? Presumably Word > is big on that and we don't want to loose that. > 2) Is it expressive enough to represent the breadth of Word functionality > that is important to us? > 3) Is it easy to work with, lend itself to tooling, etc.? > 4) Is it popular, widely adopted, etc., such that you might get some > synergy with other projects? > 5) Does it lend itself to a high-performance implementation? > 6) Does it make the simple stuff simple while at the same time allowing > more ambitious users to do more ambitious things? > > HTML by itself fails 1) and 2). Adding CSS stylesheets could remedy that, > but it would still lack page-level features, like headers/footers, page > numbers, or even hard or soft page breaks. > > FOP gives a lot more support, though it is rather complex. > > OpenOffice.org mixes FOP with several other standard markups like MathML, > SVG and XLink. But it gets complex pretty quickly -- A simple "hello > world" document generates XML with the following namespaces: > > xmlns:office="http://openoffice.org/2000/office" > xmlns:style="http://openoffice.org/2000/style" > xmlns:text="http://openoffice.org/2000/text" > xmlns:table="http://openoffice.org/2000/table" > xmlns:draw="http://openoffice.org/2000/drawing" > xmlns:fo="http://www.w3.org/1999/XSL/Format" > xmlns:xlink="http://www.w3.org/1999/xlink" > xmlns:number="http://openoffice.org/2000/datastyle" > xmlns:svg="http://www.w3.org/2000/svg" > xmlns:chart="http://openoffice.org/2000/chart" > xmlns:dr3d="http://openoffice.org/2000/dr3d" > xmlns:math="http://www.w3.org/1998/Math/MathML" > xmlns:form="http://openoffice.org/2000/form" > xmlns:script="http://openoffice.org/2000/script" > > But it is something of a moving target now that OASIS is drafting a format > standard based on it. But I think it will be an attractive and widely > used format once it re-emerges as a standard. > > I'm afraid I've raised more questions than I've answered ;-) But in the > end I really don't see anything out there that jumps out and says "I'm the > API you want". > > -Rob -- Andrew C. Oliver http://www.superlinksoftware.com/poi.jsp Custom enhancements and Commercial Implementation for Jakarta POI http://jakarta.apache.org/poi For Java and Excel, Got POI? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
