It looks like I was looking at the wrong HWPFDocument byte [] after all.

I have a demo HWPFDocument file,
which is read from a Word file that has two gif
images inserted and embedded.

I have been told that the bytes for the images
inside the doc file, the HWPFDocument file
I am programming with, starts at 
Character point 0x01
byte 01.

I have found 48 such bytes in the file, 72 such characters This holds correct 
even if I cast bytes as type (char).


What is the start byte for one image's data, and where does that image's data 
end?

FileInputStream input = new FileInputStream(new File("demo.doc"));
POIFSFileSystem fileSystem = HWPFDocument.verifyAndBuildPOIFS(input); 
HWPFDocument document = new HWPFDocument(fileSystem); 
input.close(); 
Field dataStream = document.getClass().getDeclaredField("_mainStream");
dataStream.setAccessible(true);




byte [] dataArray = (byte [])dataStream.get(document); 
System.out.println(Arrays.toString(dataArray));


// 0x01 in text content.
byte marker = 0x01; //(char 01)

int found = 0;

for (int i=0;i<dataArray.length;i++)
{
if(dataArray[i]==marker)
{
System.out.println("Found!");
System.out.println(marker++);
}
} 

Reply via email to