Also, please feel free to tell me I am getting to far ahead of
things... :-)
On Jun 29, 2007, at 4:57 PM, Grant Ingersoll wrote:
Hey Gang,
I was wondering if you had a todo list or something somewhere? I
have been loosely following the discussions here and see the
general outline of what the goals are here: http://www.mail-
archive.com/[email protected]/msg00024.html (Tika
discussions in Amsterdam)
Here's where I am at: I am considering extracting the Nutch
parsing plugins for a project I am undertaking and wrapping them
for my own purposes, but knowing Tika is around, I would just as
soon do this in the context of Tika, or at least try to help out
that way and have it become a part of Tika. I have not looked at
Lius yet. I guess I am wondering if you have some interfaces in
mind that you want to hook into, or is the Nutch model (or Lius
model) already going to serve as the main model? I pretty much
think the Nutch model has everything I need at the moment, but I
don't want to carry around the whole set of Nutch dependencies. I
am not worried about content detection at this point so much as
extraction.
Is the plan to adopt a similar plugin approach as Nutch?
So, I guess the question is what can I do at this point to help?
Should I just go ahead with my needs and then give it back as a
patch and you can decide what to do with it from there? I am in
somewhat of a hurry to get the basics working in the next week or so.
Also, anyone have any recommendations for parsing various mail
repositories like Outlook, Mac Mail (which I think is mbox), etc.?
Cheers,
Grant