Hi all,

I need your advice!

I need to harvest blog posts and news articles and extract their date, the
author, the text, the title and the comments if possible. The way I see it
I have two choices, deploy a Nutch crawler or as a friend suggested, use
Webhose.io.

The Webhose.io site has it's own Build or Buy
<https://webhose.io/white-papers/build-or-buy> comparison, but I wanted to
hear a Nutch user take on it.

Why did you go with Nutch and not with a service like Webhose.io? Where is
the catch?

Thank you,

Jon

Reply via email to