Also, anyone have any recommendations for parsing various mail repositories like Outlook, Mac Mail (which I think is mbox), etc.?

"mstor" is a JavaMail implementation which should do a good job of handling mbox parsing for you. I've used it but looks like the license isn't Apache :( http://mstor.sourceforge.net/

I'm not up to speed with latest Tika developments for which I must apologise - I've been buried in other work since it's inception.

Cheers,
Mark.

----- Original Message ----- From: "Grant Ingersoll" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Friday, June 29, 2007 9:57 PM
Subject: Questions


Hey Gang,

I was wondering if you had a todo list or something somewhere? I have been loosely following the discussions here and see the general outline of what the goals are here: http://www.mail-archive.com/tika- [EMAIL PROTECTED]/msg00024.html (Tika discussions in Amsterdam)

Here's where I am at: I am considering extracting the Nutch parsing plugins for a project I am undertaking and wrapping them for my own purposes, but knowing Tika is around, I would just as soon do this in the context of Tika, or at least try to help out that way and have it become a part of Tika. I have not looked at Lius yet. I guess I am wondering if you have some interfaces in mind that you want to hook into, or is the Nutch model (or Lius model) already going to serve as the main model? I pretty much think the Nutch model has everything I need at the moment, but I don't want to carry around the whole set of Nutch dependencies. I am not worried about content detection at this point so much as extraction.

Is the plan to adopt a similar plugin approach as Nutch?

So, I guess the question is what can I do at this point to help? Should I just go ahead with my needs and then give it back as a patch and you can decide what to do with it from there? I am in somewhat of a hurry to get the basics working in the next week or so.

Also, anyone have any recommendations for parsing various mail repositories like Outlook, Mac Mail (which I think is mbox), etc.?

Cheers,
Grant






Reply via email to