Nick Burch schrieb:
> On Tue, 9 Jan 2007, Joerg Hohwiller wrote:
>> Besides I used the official POI release which is very old. I did NOT
>> try the
>> HEAD from svn.
> 
> You should probably try with the svn head, you will generally have more
> luck with HWPF and HSLF from there.
Okay, thanks for the tip.
> 
>> I did NOT even open most of the documents. The constructor caused an
>> exception. Something like illegal fileformat or magic-number or
>> something.
> 
> I use hslf for a web spider that tries lots of random documents, and
> it's ok on almost all of them, so it's odd that you're having such problems
> 
>>> (Normally you want to catch CorruptPowerPointFileException and
>>> EncryptedPowerPointFileException, and skip over them, and catch
>>> ArrayIndexOutOfBoundsException, and report bugs for those)
>>
>> If an ArrayIndexOutOfBoundException is thrown by a method where the
>> user did not supply an index as parameter the implementation looks
>> like a hack to me. Same applies to NullPointerExceptions.
> 
> These two are caused by powerpoint files containing things that we
> didn't know they might, and which our test documents don't. If you
> report bugs for them, and include the problem document, we can try and
> figure out which of our assumptions on the file format are wrong, and
> work to fix them.
I already debugged into it. It occured when an UnknownRecord was created.
Generally not a good idea to assume anything about you dont even know.
I such situations you should always check indices and length before
accessing or copying arrays.
Besides i have seen printStackTrace() calls which is genrally sick for a
library. Please use nested exceptions for situations like this.
I hope this is already fixed in the last 2,5 years since the relase...
> 
>> My problem is that I extract many parts of text twice from the file.
>> It seems to me that they are really in there twice even though not
>> visible to the powerpoint application user.
> 
> Yup, that's to be expected on quicksaved files.
> QuickButCruddyTextExtractor will do something similar.
okay.
> 
> Your only option if you want to avoid that is to implement all the
> PersistPtr stuff, then parse SlideListWithTexts, and DoTheRightThing(tm)
> with it all. At which point, you've re-implemented most of hslf....
Sounds like some hints on that. I will have a look at it and also compare this
option with using the latest trunk. Thanks!
> 
> Nick
Regards
  Jörg

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to