Also, anyone have any recommendations for parsing various mail
repositories like Outlook, Mac Mail (which I think is mbox), etc.?
"mstor" is a JavaMail implementation which should do a good job of handling
mbox parsing for you. I've used it but looks like the license isn't Apache
:( http://mstor.sourceforge.net/
I'm not up to speed with latest Tika developments for which I must
apologise - I've been buried in other work since it's inception.
Cheers,
Mark.
----- Original Message -----
From: "Grant Ingersoll" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Friday, June 29, 2007 9:57 PM
Subject: Questions
Hey Gang,
I was wondering if you had a todo list or something somewhere? I have
been loosely following the discussions here and see the general outline
of what the goals are here: http://www.mail-archive.com/tika-
[EMAIL PROTECTED]/msg00024.html (Tika discussions in Amsterdam)
Here's where I am at: I am considering extracting the Nutch parsing
plugins for a project I am undertaking and wrapping them for my own
purposes, but knowing Tika is around, I would just as soon do this in the
context of Tika, or at least try to help out that way and have it become
a part of Tika. I have not looked at Lius yet. I guess I am wondering
if you have some interfaces in mind that you want to hook into, or is the
Nutch model (or Lius model) already going to serve as the main model? I
pretty much think the Nutch model has everything I need at the moment,
but I don't want to carry around the whole set of Nutch dependencies. I
am not worried about content detection at this point so much as
extraction.
Is the plan to adopt a similar plugin approach as Nutch?
So, I guess the question is what can I do at this point to help? Should
I just go ahead with my needs and then give it back as a patch and you
can decide what to do with it from there? I am in somewhat of a hurry
to get the basics working in the next week or so.
Also, anyone have any recommendations for parsing various mail
repositories like Outlook, Mac Mail (which I think is mbox), etc.?
Cheers,
Grant