Re: Regarding Indexing to elasticsearch

2018-03-02 Thread Sebastian Nagel
Hi, > Map input records=79 > Map output records=0 ... and no IndexerJob:DocumentCount counter The map function got 79 records as input, but did not write anything to the indexer. There are a couple of reasons why a document is skipped, e.g., nothing parsed, missing markers, errors in indexing

Crawling of AJAX populated content.

2018-03-02 Thread narendra singh arya
I want to crawl ajax populated content using nutch. I tried this with selenium-grid-plugin on nutch 1.14. After following all the steps from github page nutch-selenium-grid-plugin I am not able to fetch the ajax loaded content. I have docker-selnium hub and node running on my mac. But I am still

Re: Regarding Indexing to elasticsearch

2018-03-02 Thread Yash Thenuan Thenuan
I got this after setting log4j.logger.org.apache.hadoop to info 2018-03-02 17:29:40,157 INFO indexer.IndexingJob - IndexingJob: starting 2018-03-02 17:29:40,775 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Re: Regarding Indexing to elasticsearch

2018-03-02 Thread Sebastian Nagel
Hi, looks more like that there is nothing to index. Unfortunately, in 2.x there are no log messages on by default which indicate how many documents are sent to the index back-ends. The easiest way is to enable Job counters in conf/log4j.properties by adding the line: