> Anybody want to continue this discussion?
Yes of course!

The best thing would be to throw some code and have people play around a bit with it. 

In terms of how to fit it into poi ... 

Well, in general, to read, the first place would be to hook it into RecordFactory, and 
branch off from there. 

In terms of high level access, we try to keep it mapped to excel... is an image linked 
to a cell, or to a sheet?? we have similar links from the high level objects.....

Writing may be more difficult,  but in general, every record serializes itself, so i 
dont think it will be impossible to have a custom serialisation process..... 

just some thought, probably needs more playing around... 

Regards
-
Avik


  




   -------Original Message-------
   > From: Michael Zalewski <[EMAIL PROTECTED]>
   > Subject: RE: reading images out of xls files (and transforming them?)
   > Sent: 25 Oct 2003 09:43:32
   >
   >  I have worked out the details of how this works. But I am not sure how to
   >  fit it into HSSF.
   >  
   >  Images in Excel spreadsheets can be of two types. The kind with 'Insert
   >  Picture', and the kind you get with 'Insert Object'. The 'Insert Picture'
   >  ones are the easiest to handle. They work like this:
   >  
   >  There is a record type called MSODRAWINGGROUP (type 0xebh, it's not
   >  implemented in HSSF). It contains a hierarchy of drawing records, used by
   >  the drawing layer. Inside this hierarchy, you will find something called a
   >  BLIP Entry record (it's not a BIFF record - it's a piece of the
   >  MSODRAWINGGROUP structure).
   >  
   >  IMDATA type 07fh is an obsolete record type (I think). You might be able to
   >  get it when you read an older version of Excel into Excel 97. When you use
   >  Excel 97 and above, Insert | Picture gives you the MSODRAWINGGROUP record I
   >  describe below.
   >  
   >  Drawing records come in two types - containers and atoms. Containers are
   >  like file directories - they contain one or more child drawing records, some
   >  of which can also be containers. Atoms are like files, or leafs of a tree.
   >  
   >  The layout of a drawing record is like this: There is an eight byte fixed
   >  length header, and a variable length detail part. The length of the variable
   >  length part is described in the header. The header has 4 fields
   >  
   >  The first two bytes contain 2 fields - the instance variable and the record
   >  version. But the instance variable is 12 bits (1 1/2 bytes), and the record
   >  version is the remaining 4 bytes. The interpretation of instance data
   >  depends on the record type. Often it is zero. The version is always the same
   >  for the same record type in Excel. I suspect other Office products may have
   >  different versions, or perhaps later versions of Excel might have higher
   >  versions of some record types. The version is always 0xfh for a container.
   >  Anything else tells you the record is an atom.
   >  
   >  Bytes 3 - 4 are the record ID, which tells you the type of record. Most of
   >  the records have a record ID like F003h. The record you are interested in
   >  will have a record ID of F007h (BLIP Entry - I will explain later) In the
   >  case of a BLIP container, the instance data is the number of drawings in the
   >  worksheet.
   >  
   >  Bytes 5 - 8 are the length of the variable part. Container records can have
   >  nested child records in the variable length part. Atoms might have a
   >  zero-length variable part.
   >  
   >  So if a container had two atoms, and each atom had no variable length data,
   >  there would be 3 drawing group records. The length field in the container
   >  would be 16 (because the two atoms are both 8 bytes long). And the length
   >  field in each atom would be 0. The total length of a drawing record would be
   >  the length field + 8.
   >  
   >  BLIP Entry Atoms are the parts that contain the binary image. The version in
   >  Excel is 6, and the instance data is a sequentially assigned number called
   >  the BLIP ID. (I didn't make these terms up). Other drawing records will
   >  refer to the image by the value of the BLIP ID.
   >  
   >  The image is usually in PNG format. You can tell the format by the first
   >  byte of the atom payload (the variable length part), which is 0 for a
   >  deleted entry, 5 for JPEG, 6 for PNG, and 7 for bitmaps. (But most images
   >  get converted to PNG, even if you insert them as JPG or BMP). If the first
   >  byte is 0x6h, you should see a marker byte with a value of 0xff at offset
   >  60, and you should find the PNG image beginning at offset 61. (I'm skipping
   >  a lot of detail here. I'm not even sure of the correct interpretation of all
   >  the bytes in between.) I think you will always find something at offset 36
   >  that looks like an atom, with a record type of f01e, and instance data 'n'
   >  (06eh). I think that means that the PNG is not compressed.
   >  
   >  Now for the fun part. The MSODRAWINGGROUP record I described above can be
   >  huge. In theory, over a gigbabyte. But Biff records can only be about 8K. So
   >  what if your image is too large to fit?
   >  
   >  You would expect that Excel would store the overflow in CONTINUE records,
   >  like it does for other records which get to be longer than the 8K limit. But
   >  apparently, that is not true for drawing records, which get stored as
   >  contiguous MSODRAWINGGROUP records.
   >  
   >  So HSSF wants to split this nested structure that I described above into
   >  discrete sequential units, and it usually splits right in the middle of an
   >  atom. That makes the implementation hard in HSSF (otherwise, I have most of
   >  a patch completed for inserting pictures. But the patch is *huge* -- 6
   >  modified HSSF classes so far, 6 new HSSF classes,  45 new classes to
   >  represent various atoms, containers, and other structures, and some 30 odd
   >  test cases.
   >  
   >  I am stuck on how to interface this patch with HSSF. If anyone has any
   >  thoughts, I would love to hear. Frankly, I thought someone else would have
   >  provided a solution by now - I thought at least two people were working on
   >  it. Anybody want to continue this discussion?
   >  
   >  Michael Zalewski
   >  
   >  -----Original Message-----
   >  From: Steve Widmar [mailto:[EMAIL PROTECTED]
   >  Sent: Friday, October 24, 2003 2:09 AM
   >  To: [EMAIL PROTECTED]
   >  Subject: reading images out of xls files (and transforming them?)
   >  
   >  Hi -
   >  
   >  I have been asked to support a requirement to read images included inside an
   >  .xls binary and publish them online (converted to gif|png)
   >  (plus a whole bunch of other stuff I *can* figger out how to do myself with
   >  Poi, thanks to the Poi community...).
   >  
   >  It doesn't look like this functionality is supported in  2.0 pre3.
   >  
   >  I searched the javadocs (esp. HSSFCell and ..record.* - conspicuous gap
   >  betw. HideObjectRecord and IndexRecord where IMDATA support s/b?!), the
   >  other online dox and archives ...
   >  
   >  I found a lot of threads on writing images, with the general recommendation
   >  being using a template etc.etc. but nothing about reading and nothing
   >  indicating it is supported.
   >  
   >  I found this bit of inspiration from Andy
   >  http://www.mail-archive.com/[EMAIL PROTECTED]/msg02255.html, but
   >  nothing else and I suspect his long-term estimates are like mine (aggressive
   >  and assuming nothing else comes up).
   >  
   >  I haven't yet tucked into the code analysis - but I did see where the patch
   >  to read embedded images would be a beefy effort
   >  (http://www.mail-archive.com/[EMAIL PROTECTED]/msg02035.html).
   >  
   >  If anyone out there wants to give me a nudge in the right direction
   >  (like elaboratiing a bit on "at least 10 classes, includes 3 new 
substantial
   >  ones and possibly an entire new top level port"), I wouldn't mind at all 
to
   >  scratch this itch myself... (and I even gots a copy of MS Excel 97
   >  Developer's Kit).
   >  
   >  Thanks -
   >  Steve
   >  
   >  
   >  ---------------------------------------------------------------------
   >  To unsubscribe, e-mail: [EMAIL PROTECTED]
   >  For additional commands, e-mail: [EMAIL PROTECTED]
   -------Original Message-------


Reply via email to