Liem Do <liem <at> waterware.com> writes:

> 
> Is there any way of reading MS Outlook files (.msg) I basically just want to 
be able to extract the To, From,
> and other header fields. Can this be done with POI? Does any body have any 
samples of this or specific
> details on which classes, methods are used to do this?

The good news is: .MSG files are OLE Doc files (also called OLE Structured 
Storage). So you actually can read them with the POI (POIFS -- not HSSF, which 
parses Excel data structures).

The bad news is: POI does not contain any support for the particular structure 
of a .MSG file.

The further bad news is: .MSG files are a different flavor of OLE Structured 
File than Excel. If anyone knows about OLE Structured files, they will 
understand this: The Mini-Stream in Excel has 64 byte blocks. The Mini-Stream 
in a .MSG file has 32 byte blocks. Unfortunately, POI has the mini-stream 
hardcoded at 64 byte blocks, and ignores the OLE Doc header where it states 
that the mini-stream is 32 byte blocks.

You can prove the above by dumping an Excel file and looking at the first few 
bytes. At offset 20x you will find 06x, meaning that small data streams will 
use 64 byte blocks in the mini-stream. (64 = 2^6). If you dump an Outlook MSG 
file and look at the same byte, you will find 05x, meaning that small data 
streams will use 32 byte blocks in the mini-stream. (32 = 2^5).

See org.apache.poi.poifs.storage.SmallDocumentBlock, which has a field

private static final int _block_size = 64;

This is wrong. The block size is defined in the header at byte offset 0x20 
(33rd byte of the header).

The further further bad news is: You want to read properties like To and From, 
which are stored in a truly bizarre way. Every Recipient is listed in its own 
storage stream. So a MSG file contains many little storage streams, each one 
holding information about a single property such as a recipient or sender. 
Since these streams are tiny, they go into the mini-stream, which POI does not 
parse properly.




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to