Re: Excel Parsing Issues With Tika 0.3

Jukka Zitting Mon, 30 Mar 2009 10:25:23 -0700

Hi,

On Sat, Mar 28, 2009 at 6:18 AM, David Weekly <da...@pbwiki.com> wrote:
> So this is part "bug report" (the columns of the first sheet should
> definitely be included!)


Agreed. Can you please file a Jira bug report for this? It looks
similar to some of the zero- vs. one-based index issues we faced when
upgrading to POI 3.5.

> and part query as to whether or not there is a plan
> w/Tika to extract more than sheet & cell data from documents.

Doing so would be very nice. You may want to file a Jira improvement
request for that.

And if you're familiar with Apache POI (or willing to learn it),
patches would of course also be welcome. :-) Otherwise I don't know
when one of us will encounter a similar need.

You may also want to contact the POI project to see if they've already
implemented text extraction improvements that would cover these
features. Last week at the ApacheCon I noticed that they've recently
been improving the out-of-the-box text extraction features in POI.

BR,

Jukka Zitting

Re: Excel Parsing Issues With Tika 0.3

Reply via email to