Alrighty! Thanks! -Matt
On Wed Feb 04 2015 at 6:23:58 PM Nick Burch <[email protected]> wrote: > On Wed, 4 Feb 2015, Matt Bachmann wrote: > > When I play with the TIKA jar file with a simple excel file I get > > something like what I have below. Code I write to do the parsing pulls > > out something similar. The data is generally correct. But, in the > > parsing the position of cells is completely lost. > > That's to be expected - Tika will only return you text for cells which are > really defined in the file. It won't generate dummy entries for "missing" > cells which Excel optimised out of the file for being blank. This avoids > bloating the Tika output, and keeps the Tika code much simpler > > > Is this possible with TIKA? I have google around and have not found > > much. Do I have to drop down to POI to do this? > > You'll need to use POI if you want full control over missing rows or > missing cells. > > For working with .xls files, you'd probably want something like the > example "missing records aware" streaming xls to csv converter: > https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/ > org/apache/poi/hssf/eventusermodel/examples/XLS2CSVmra.java > For .xlsx you'll need some similar logic in a sax-based parser > > Or, if you have the memory, it's all very easy, as detailed on the site: > http://poi.apache.org/spreadsheet/quick-guide.html#Iterator > > Nick >
