On Wed, 09 Oct 2002 07:23:53 -0400, Addy, Jonathan wrote: > I'd like to extract the text out of powerpoint documents from Java but > having read the overview of POIFS I still cannot gauge whether it is > worthwhile trying to use this technology or not. Before I investigate > much further can anybody tell me if i'm barking up the right or wrong > tree? > > Thanks, > Jon. Okay... So lets start with a basic understanding of what POIFS is and what it is not. POIFS lets you read and write OLE 2 Compound Document format. All Office documents are written in OLE 2 Compound Document format. However OLE 2 Compound Document format is in essence a "zip" file or archive of sorts that somewhat loosely resembles the old DOS FAT filesystem. So a PPT is in OLE 2 Compound document format for sure, but the relationship is much like a bunch of "zipped" html files. So yes you can unzip them with winzip (POIFS in this analogy) but that still doesn't let you parse the HTML files.
So eventually I plan to create an API for manipulating PPT (much like HSSF manipulates Excel files) however it will be after we've basically finished HSSF and have taken a big hunk out of HDF. This is not to say it can't start before then, it just will require contributers with the drive and skill necessary to do it. The good thing about PPT format is that it is newer and more modern than Excel and Word format, so it makes use of property sets as opposed to being written as one huge blob in an OLE 2 CDF file which one has to write tools to parse the big fat blob. With PPT persumably you'd just have to interperate the propertyset entries. The bad news is information on the file format has been heavily guarded from day 1 (so far as I can tell), so we'll have to be pretty dern methodical about it. So if you need to read/write PPT in Java, you'd definitely need POIFS, but you'd also have some additional work to do! And we'd be happy to have your contributions as part of POI! -Andy -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
