Hi Nick,

I tested if "Save As" fixes the issue, and found out very interesting results.
For example, there is an Excel file A, which POI can't read at a string ABC.
Open the file A in Excel and save as a file B. Then, POI error occurs
at a different string DEF when reading the file B.
Open the file B in Excel and save as a file C. Then, POI error occurs
at a different string GHI when reading the file C.
Open the file C in Excel and save as a file D. Then, POI error occurs
at a different string JKL when reading the file D.
Open the file A again in Excel and save as a file B2. Then, POI error
occurs at the string DEF when reading the file B2.
Open the file B again in Excel and save as a file C2. Then, POI error
occurs at the string GHI when reading the file C2.
Open the file C again in Excel and save as a file D2. Then, POI error
occurs at the string JKL when reading the file D2.

So, "Save As" creates a file which POI can't read at a different
string. Its pattern is reproducible, but I have not been able to
create new such file from scratch. My problematic Excel file contains
about 50 sheets, and each sheet has 300-1000 non-empty cells. Maybe
this issue occurs only for a large file.

Only workaround is to use debugger to find strings that causes the
error, and re-input those strings in Excel and save. There are many
strings that cause errors, so I have to repeat debug and edit until
POI can successfully read the file. Another very interesting fact is
that, after I "clean" the problematic file until POI can read it, open
the cleaned file in Excel and do "Save As," then POI can't read the
newly saved file!

Of course all the files I explained above can be opened in Excel, and
text and phonetic is not corrupted at all. (I checked phonetic with
=PHONETIC(cell) function.)

apptaro


On Mon, Feb 14, 2011 at 9:04 PM, Nick Burch <[email protected]> wrote:
> On Thu, 10 Feb 2011, Taro App wrote:
>>
>> Oops, there's a comment that isContinueNext "Should never be called
>> before end of current record" so the code must be correct.
>
> Or at least correct to the best of our knowledge...
>
>> I'm not sure how it happens, but the exception is raised when POI tries to
>> read phonetic text in ExtRst of UnicodeString. POI tries to read 18
>> double-byte characters, but RecordInputStream has only 29 bytes (14
>> characters + 1 byte.) If I check the Excel file with binary editor,
>> everything up to here seems correct.
>
> Hmm, the phonetic text stuff went in much later to the file format. It's not
> impossible that the team working on it had different ideas about continue
> records to the team who did the original work
>
>> When I open the excel file and deleted sheets which do not contain the
>> problematic string, then the problem disappeared. I tried deleting different
>> sheets, then the problem sometimes disappeared and sometimes not. I also
>> tried re-inputting the problematic string by cut & pasting in Excel, then
>> the problem always disappeared. My best guess is that Excel sometimes saves
>> corrupted data. Corruption occurs in phonetic texts which is not usually
>> visible to users, it is not very obvious.
>
> If you open the file in Excel and do "Save As", does it fix the issue, or is
> it only changing text / removing sheets that fixes it?
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to