Re: Problem with WordParser

Mats Norén Fri, 21 Dec 2007 05:11:08 -0800

Done.
It seems that the error occurs on the last TextPiece if it occurs at all..


On Dec 20, 2007 8:11 PM, Jukka Zitting <[EMAIL PROTECTED]> wrote:
> Hi,
>
> On Dec 20, 2007 1:54 PM, Mats Norén <[EMAIL PROTECTED]> wrote:
> > On Dec 20, 2007 11:03 AM, Jukka Zitting <[EMAIL PROTECTED]> wrote:
> > > You may want to contact the POI mailing lists, as I don't think many
> > > of us have too much experience with POI internals.
> >
> > The thing is that it's the WordParser.java that calls
> > TextPiece.substring with a negative value, so my guess is that it's
> > the algorithm itself that for some corner case does the wrong thing.
>
> I'm sorry, you're of course right. I somehow misunderstood you
> referring to a class within POI. Too much on my mind lately...
>
> As Niall already pointed out, this seems like a bug in our parser code.
>
> > It's my understaning that the text extraction in Jackrabbit is based
> > on textmining.org which is the basis for the WordParser in Tika, is
> > that correct or have got it wrong?
>
> Yes. The code actually ended in Tika through Nutch and Lius, but
> originates from the same textmining.org codebase that also Jackrabbit
> is currently using. Unfortunately Ryan Ackley is no longer maintaining
> the code, which leaves us with few options other than embedding the
> code in Tika. It would IMHO be best if we could push all the complex
> file format logic out to separate parser libraries (preferably POI in
> this case), where there would likely be people with much better
> understanding of that specific format and the related parsing code.
> Anyway, until we get rid of the code we should try to maintain it the
> best we can, so please file a bug report for this issue. :-)
>
> BR,
>
> Jukka Zitting
>

Re: Problem with WordParser

Reply via email to