On Jun 29, 2007, at 6:36 PM, Jukka Zitting wrote:
I would recommend that you just go forward with your plan and don't
wait for us. :-) One thing you may want to take a look at is "Lius
Lite" in the Tika issue tracker, that contains a trimmed version of
the Lius framework, but if you already are familiar with Nutch then it
probably makes more sense to stick with that. I believe the eventual
Tika framework will end up incorporating concepts from both Nutch and
Lius (among others).
It would be certainly interesting to see what you end up with and
perhaps hear a brief summary of the main issues and concerns you
encountered. This is exactly the sort of stuff that Tika should
support, so your contributions would be very much welcome!
Well, you will definitely get that chance at some point time.
My main concern w/ extracting Nutch is all the dependencies on
Hadoop, etc. But it does seem like the shortest path for me.
-Grant