Difference between nutch fetch list and number of indexed documents

2015-09-27 Thread Daniel Holmes
Hi, I am using apache Nutch 1.7 to crawl and apache Solr 4.7.2 for indexing. In my tests there is a gap between number of fetched results of Nutch and number of indexed documents in Solr. For example one of the crawls is fetched 23343 pages and 1146 images successfully while in the Solr 19250 docs

Re: Unable to use notch 2.3 crawl script for MySQL, Mongo, or Cassandra

2015-09-27 Thread Lewis John Mcgibbney
Hi Drulea, On Sun, Sep 27, 2015 at 7:36 AM, wrote: > > I’m using nutch 2.3 on OS X 10.9.5 with homebrew. > >From the start I would like to point you at the current release candidate for Nutch 2.3.1. The VOTE is currently open and the release candidate is

Re: [VOTE] Release Apache Nutch 2.3.1

2015-09-27 Thread Sebastian Nagel
+1 - tests pass - verified signatures - run a test crawl using HBase 0.98.14 The documentation [1] needs to be updated for Gora 0.6.1, right? I also had to copy hbase-common to $NUTCH_HOME/runtime/local/lib/ but that's probably it's not exactly the same HBase version used by Gora. Sebastian

Re: Configuring rotating agent in Nutch

2015-09-27 Thread Karanjeet Singh
I am facing the same problem here. Tried rebuilding it but in logs I can only see the agent name mentioned in http.agent.name property. By $NUTCH_HOME/conf do you mean runtime/local/conf directory ? Also can you please brief me on how the rotation works ? Does the agent rotates after crawling