Thanks Nick

We want to use Tika as it supports different doc formats and not just xls or 
doc like POI
I think Streamed parsing also makes Tika a lot faster and efficient than POI to 
parse even large docs of 15 MB or greater.

I understand that Tika uses POI under the cover to parse excel. So , is there 
some way, to tell Tika (and in turn POI) to follow some Missing Cell Policy.

This will help to produce Text document in a very readable format in case of 
missing cells

Any direct is really appreciated

-Adish



-----Original Message-----
From: Nick Burch [mailto:[email protected]] 
Sent: Friday, January 27, 2012 7:04 AM
To: '[email protected]'
Subject: RE: Excel Parser - Blank Cell

On Thu, 26 Jan 2012, Gangwal, Adish (IS Consultant) wrote:
> When I parse the excel which has an empty cell, it doesn't create a 
> extra tab character.
>
> If there are three cells of which middle one is empty, it skips the 
> middle cell and only outputs 1st and 3rd cell with a tab

Tika itself doesn't generate tab characters, it generates xhtml table elements. 
It's the text content handler that does tabs

In general though, Tika will generate the text that is present.

If you're trying to generate a CSV or similar, and want full control over what 
shows up, missing cells etc, then I'd suggest you look at using Apache POI 
directly.

Nick

Reply via email to