Thanks Nick We want to use Tika as it supports different doc formats and not just xls or doc like POI I think Streamed parsing also makes Tika a lot faster and efficient than POI to parse even large docs of 15 MB or greater.
I understand that Tika uses POI under the cover to parse excel. So , is there some way, to tell Tika (and in turn POI) to follow some Missing Cell Policy. This will help to produce Text document in a very readable format in case of missing cells Any direct is really appreciated -Adish -----Original Message----- From: Nick Burch [mailto:[email protected]] Sent: Friday, January 27, 2012 7:04 AM To: '[email protected]' Subject: RE: Excel Parser - Blank Cell On Thu, 26 Jan 2012, Gangwal, Adish (IS Consultant) wrote: > When I parse the excel which has an empty cell, it doesn't create a > extra tab character. > > If there are three cells of which middle one is empty, it skips the > middle cell and only outputs 1st and 3rd cell with a tab Tika itself doesn't generate tab characters, it generates xhtml table elements. It's the text content handler that does tabs In general though, Tika will generate the text that is present. If you're trying to generate a CSV or similar, and want full control over what shows up, missing cells etc, then I'd suggest you look at using Apache POI directly. Nick
