On Wed, 09 Oct 2002 07:23:53 -0400, Addy, Jonathan wrote:

> I'd like to extract the text out of powerpoint documents from Java but
> having read the overview of POIFS I still cannot gauge whether it is
> worthwhile trying to use this technology or not. Before I investigate
> much further can anybody tell me if i'm barking up the right or wrong
> tree?
> 
> Thanks,
> Jon.
 
Okay... So lets start with a basic understanding of what POIFS is and
what it is not.  POIFS lets you read and write OLE 2 Compound Document
format.  All Office documents are written in OLE 2 Compound Document
format.  However OLE 2 Compound Document format is in essence a "zip"
file or archive of sorts that somewhat loosely resembles the old DOS FAT
filesystem.  So a PPT is in OLE 2 Compound document format for sure, but
the relationship is much like a bunch of "zipped" html files.  So yes you
can unzip them with winzip (POIFS in this analogy) but that still doesn't
let you parse the HTML files.

So eventually I plan to create an API for manipulating PPT (much like
HSSF manipulates Excel files) however it will be after we've basically
finished HSSF and have taken a big hunk out of HDF.  This is not to say
it can't start before then, it just will require contributers with the
drive and skill necessary to do it.

The good thing about PPT format is that it is newer and more modern than
Excel and Word format, so it makes use of property sets as opposed to
being written as one huge blob in an OLE 2 CDF file which one has to
write tools to parse the big fat blob.  With PPT persumably you'd just
have to interperate the propertyset entries.  The bad news is information
on the file format has been heavily guarded from day 1 (so far as I can
tell), so we'll have to be pretty dern methodical about it.

So if you need to read/write PPT in Java, you'd definitely need POIFS,
but you'd also have some additional work to do!  And we'd be happy to
have your contributions as part of POI!

-Andy



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to