Re: Excel Parsing Issues With Tika 0.3

David Weekly Mon, 06 Apr 2009 17:52:09 -0700

TIKA-214 has now been filed, along with the sample XLS file.

https://issues.apache.org/jira/browse/TIKA-214


Should I separately bother the POI folks about this issue?

Incidentally, although sad and hacky it may be worth noting that
catting the output of strings and "strings -el" does a decent job of
pulling unique strings out. (Although does include font names, etc.)

-David


2009/3/30 Jukka Zitting <[email protected]>:
> Hi,
>
> On Sat, Mar 28, 2009 at 6:18 AM, David Weekly <[email protected]> wrote:
>> So this is part "bug report" (the columns of the first sheet should
>> definitely be included!)
>
> Agreed. Can you please file a Jira bug report for this? It looks
> similar to some of the zero- vs. one-based index issues we faced when
> upgrading to POI 3.5.
>
>> and part query as to whether or not there is a plan
>> w/Tika to extract more than sheet & cell data from documents.
>
> Doing so would be very nice. You may want to file a Jira improvement
> request for that.
>
> And if you're familiar with Apache POI (or willing to learn it),
> patches would of course also be welcome. :-) Otherwise I don't know
> when one of us will encounter a similar need.
>
> You may also want to contact the POI project to see if they've already
> implemented text extraction improvements that would cover these
> features. Last week at the ApacheCon I noticed that they've recently
> been improving the out-of-the-box text extraction features in POI.
>
> BR,
>
> Jukka Zitting
>



-- 
Follow me on Twitter! http://twitter.com/dweekly

Re: Excel Parsing Issues With Tika 0.3

Reply via email to