Hey Gang,
I was wondering if you had a todo list or something somewhere? I
have been loosely following the discussions here and see the general
outline of what the goals are here: http://www.mail-archive.com/tika-
[EMAIL PROTECTED]/msg00024.html (Tika discussions in Amsterdam)
Here's where I am at: I am considering extracting the Nutch parsing
plugins for a project I am undertaking and wrapping them for my own
purposes, but knowing Tika is around, I would just as soon do this in
the context of Tika, or at least try to help out that way and have it
become a part of Tika. I have not looked at Lius yet. I guess I am
wondering if you have some interfaces in mind that you want to hook
into, or is the Nutch model (or Lius model) already going to serve as
the main model? I pretty much think the Nutch model has everything I
need at the moment, but I don't want to carry around the whole set of
Nutch dependencies. I am not worried about content detection at this
point so much as extraction.
Is the plan to adopt a similar plugin approach as Nutch?
So, I guess the question is what can I do at this point to help?
Should I just go ahead with my needs and then give it back as a patch
and you can decide what to do with it from there? I am in somewhat
of a hurry to get the basics working in the next week or so.
Also, anyone have any recommendations for parsing various mail
repositories like Outlook, Mac Mail (which I think is mbox), etc.?
Cheers,
Grant