It looks like I was looking at the wrong HWPFDocument byte [] after all.
I have a demo HWPFDocument file,
which is read from a Word file that has two gif
images inserted and embedded.
I have been told that the bytes for the images
inside the doc file, the HWPFDocument file
I am programming with, starts at
Character point 0x01
byte 01.
I have found 48 such bytes in the file, 72 such characters This holds correct
even if I cast bytes as type (char).
What is the start byte for one image's data, and where does that image's data
end?
FileInputStream input = new FileInputStream(new File("demo.doc"));
POIFSFileSystem fileSystem = HWPFDocument.verifyAndBuildPOIFS(input);
HWPFDocument document = new HWPFDocument(fileSystem);
input.close();
Field dataStream = document.getClass().getDeclaredField("_mainStream");
dataStream.setAccessible(true);
byte [] dataArray = (byte [])dataStream.get(document);
System.out.println(Arrays.toString(dataArray));
// 0x01 in text content.
byte marker = 0x01; //(char 01)
int found = 0;
for (int i=0;i<dataArray.length;i++)
{
if(dataArray[i]==marker)
{
System.out.println("Found!");
System.out.println(marker++);
}
}