Hi,
I am using apache Nutch 1.7 to crawl and apache Solr 4.7.2 for indexing. In
my tests there is a gap between number of fetched results of Nutch and
number of indexed documents in Solr. For example one of the crawls is
fetched 23343 pages and 1146 images successfully while in the Solr 19250
docs
Hi Drulea,
On Sun, Sep 27, 2015 at 7:36 AM, wrote:
>
> I’m using nutch 2.3 on OS X 10.9.5 with homebrew.
>
>From the start I would like to point you at the current release candidate
for Nutch 2.3.1. The VOTE is currently open and the release candidate is
+1
- tests pass
- verified signatures
- run a test crawl using HBase 0.98.14
The documentation [1] needs to be updated for Gora 0.6.1, right?
I also had to copy hbase-common to $NUTCH_HOME/runtime/local/lib/
but that's probably it's not exactly the same HBase version used by Gora.
Sebastian
I am facing the same problem here. Tried rebuilding it but in logs I can only
see the agent name mentioned in http.agent.name property.
By $NUTCH_HOME/conf do you mean runtime/local/conf directory ?
Also can you please brief me on how the rotation works ? Does the agent
rotates after crawling
4 matches
Mail list logo