Recent issue with a customer's Excel workbook we received and attempted to read 
with Apache POI 3.7 beta 1.  POI is unable to open the workbook, throwing an 
error upon initial workbook open each time.  I have experimented with numerous 
different methods of opening the workbook, none of which have worked.

Unfortunately I am unable to provide the workbook as it contains sensitive 
data.  Obfuscating the data is also not an option as saving the workbook fixes 
the issue.  I hope this post can at least add to the knowledge base.

Facts:

1.       File is an Excel 2003 (.xls and detected as 2003 version) workbook 
with one sheet.

2.       Excel 2007 opens it without an issue.

3.       Metadata in the spreadsheet indicates it was created by Crystal 
Decisions.  It contains an Author: tag "Crystal Decisions" and a Comments: tag 
"Powered by Crystal".  Also contains 7 custom properties all named like 
"Business Objects Context Information1", 2, 3 etc...  Each of these custom 
attributes has a string value consisting of what looks like random characters 
and are all 256 characters long except for the last one, which is 8 characters 
long.

4.       File came from Western Europe region and does appear to contain double 
byte or extended ASCII.

Observations:

1.       Opening the file and resaving it immediately fixes the issue.  This 
also results in the file reducing in size from 26,595KB to 20,208KB.  Roughly a 
6MB drop in size by simply resaving it.

2.       Looking at the file in a text editor or hex viewer after it has been 
resaved shows heavy modification.  The end of the unmodified file was a line 
containing the Crystal metadata.  The resaved/fixed file contains several lines 
indicating normal Excel 2003 metadata after the Crystal metadata.

3.       HSSF in user mode, either directly creating an HSSFWorkbook or using 
the generic workbook factory, results in the following error:

Initialisation of record 0xFF left 6 bytes remaining still to be read.
null
org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:156)
org.apache.poi.hssf.record.RecordFactoryInputStream.nextRecord(RecordFactoryInputStream.java:216)
org.apache.poi.hssf.record.RecordFactory.createRecords(RecordFactory.java:439)
org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:278)
org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:203)
org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:319)
org.apache.poi.hssf.usermodel.HSSFWorkbook.<init>(HSSFWorkbook.java:300)
org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:60)


4.       HSSF in event mode using it's default constructor results in the 
following error (and no this is not a path error):

Exception in thread "main" java.io.FileNotFoundException: no such entry: 
"Workbook"
      at 
org.apache.poi.poifs.filesystem.DirectoryNode.getEntry(DirectoryNode.java:278)
      at 
org.apache.poi.poifs.filesystem.DirectoryNode.createDocumentInputStream(DirectoryNode.java:128)
      at 
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processWorkbookEvents(HSSFEventFactory.java:63)
      at 
org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processWorkbookEvents(HSSFEventFactory.java:53)
      at XLS2CSVmra.process(XLS2CSVmra.java:126)
      at XLS2CSVmra.main(XLS2CSVmra.java:323)


Possible this is the older version of the BIFF8 format?  I'm unsure how to 
identify the file as such.  The workbook factory and it's used methods 
(POIFSFileSystem.hasPOIFSHeader()) seem to indicate it has a valid Excel 
header, but nothing can then read the file completely.

Thanks again for any help,
Matt

Reply via email to