For now I only need to crawl hundreds of pages, previously I wrote stuff
from scratch in perl.   I want something that allows me to get started
quickly and allows for scale in the future.  I like that Droids is a
framework and I only have to do minimal work to get started.  Apache-Tika is
the framework for parsing and it looks right for the job.  It's the part
that I have a hard time evaluating with Nutch.   Some of what I have read
from the mailing list suggests it's still not all that easy to do extraction
with Nutch, am I wrong?

Mark

Reply via email to