Re: Nutch 2 with Cassandra as a storage is not crawling data properly

Lewis John Mcgibbney Tue, 24 Feb 2015 09:06:38 -0800

Hi Sumant,

Please see my replies below


On Mon, Feb 23, 2015 at 10:11 PM, <[email protected]> wrote:

>
> I am using Nutch 2.x using Cassandra as storage. Currently I am just
> crawling only one website, and data is getting loaded to Cassandra in byte
> code format. When I use readdb command in Nutch, I did get any useful
> crawling data.
>
> Below are the details of different files and output I am getting:
>
> *========== command to run crawler =====================*
>
> bin/crawl urls/ crawlDir/ solr_link 3
>
>
1) Which version of Nutch are you using? Are you using the 2.X source code
from here [0] e.g. 2.4-SNAPSHOT?
2) Which version of Cassandra are you using? The recommended version of
this Nutch codebase is currently 2.0.2 and Gora 0.5 dependencies.
3) The way you are invoking the crawl script is pretty strange. Please read
the input parameters
https://github.com/apache/nutch/blob/2.x/src/bin/crawl#L33
4) the solr_url parameters is optional. Meaning that if you enter it, and
it is incorrect, then it will undoubtedly throw an error/exception.
5) Please provide a paste of your logs for the crawl task somewhere once
you've addressed the above.
Thank you
Lewis


[0] http://svn.apache.org/repos/asf/nutch/branches/2.x/

Re: Nutch 2 with Cassandra as a storage is not crawling data properly

Reply via email to