Hi Jukka,
Thanks for the quick reply.
I see 50 copies of the content in the extracted text output. I have
attached a sample Outlook (msg) file to this mail (which happens to be a
mail from you to the dev group). Hope it helps.
Thanks again,
Kumar
-----Original Message-----
From: Jukka Zitting [mailto:jukka.zitt...@gmail.com]
Sent: Thursday, February 05, 2009 5:22 AM
To: tika-dev@lucene.apache.org
Subject: Re: Microsoft Outlook (msg) files get parsed 50 times in
TikaGUI
Hi,
On Wed, Feb 4, 2009 at 12:00 PM, Jana, Kumar Raja <kj...@ptc.com> wrote:
> I was feeding various document formats to the TikaGUI tool and found
> that Microsoft Outlook (msg) files get parsed around 50 times!!!
Hmm, that's quite a lot... How does this "50 times" appear, do you get
50 copies of the message content in the extracted text output? Do you
have an example file that you could share with us?
BR,
Jukka Zitting
--- Begin Message ---
Hi,
As discussed, I've been working on making the type detection code more
modular and extensible. This work has progressed pretty well, and now
I'd like to start integrating the results with the rest of Tika. To do
this, I'll need to modify the current MIME type classes. However, I
don't know all the ways in which this code is being used out there.
If you're directly using the classes in org.apache.tika.mime, please
let me know about your use cases and the classes/methods you're
accessing. Otherwise I might end up breaking your application.
BR,
Jukka Zitting
--- End Message ---