Hi,
> Map input records=79
> Map output records=0
... and no IndexerJob:DocumentCount counter
The map function got 79 records as input,
but did not write anything to the indexer.
There are a couple of reasons why a document is skipped,
e.g., nothing parsed, missing markers, errors in indexing
I want to crawl ajax populated content using nutch.
I tried this with selenium-grid-plugin on nutch 1.14.
After following all the steps from github page nutch-selenium-grid-plugin I
am not able to fetch the ajax loaded content.
I have docker-selnium hub and node running on my mac.
But I am still
I got this after setting log4j.logger.org.apache.hadoop to info
2018-03-02 17:29:40,157 INFO indexer.IndexingJob - IndexingJob: starting
2018-03-02 17:29:40,775 WARN util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
Hi,
looks more like that there is nothing to index.
Unfortunately, in 2.x there are no log messages
on by default which indicate how many documents
are sent to the index back-ends.
The easiest way is to enable Job counters in
conf/log4j.properties by adding the line:
4 matches
Mail list logo