Need Nutch to Index to Different Folder

2019-07-22 Thread Rushi
Hi All, I need some help on this ,I have two different servers Say *S1* AND *S2 *Which has Nutch and SOLR running and Nutch indexing the data to *DATA* folder in *SOLR* . Now my requirement is NUTCH should index the data to a new FOLDER in server *S3.*This folder will contain only the index files

Bayan Group Extractor plugin for Nutch-Spanish Accent Character Issue

2018-01-25 Thread Rushi
Hello Everyone, I am having an issue while crawling the spanish website,some the accent characters are not converting properly. Here is an example Infección (wrong one)should be Infección (correct ). Note:This is with *Bayan Group Extractor plugin.* Is there any change that i need to make to co

Search with Accent and without accent Character

2018-02-13 Thread Rushi
Hello All, I integrated Nutch with solr ,everything seems to be fine till now, i am having a issue while searching some spanish accent characters,the search results are not same,with accent (Example :investigación) gives correct result but without accent(example :investigacion) gives zero results.

Crawling/Indexing Issue on Dev and staging Sever Urls

2018-07-19 Thread Rushi
Hi all, I was using nutch from last 6 months and it works with Production urls with out any issue and for testing purpose i want make this work on Dev/staging.I followed these steps - Changed my regex-filter to use development domain address. And ran this command ./bin/crawl -i -D solr.serve

Re: Crawling/Indexing Issue on Dev and staging Sever Urls

2018-07-20 Thread Rushi
.delete(CleaningJob.java:174) at org.apache.nutch.indexer.CleaningJob.run(CleaningJob.java:197) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.indexer.CleaningJob.main(CleaningJob.java:208) On Fri, Jul 20, 2018 at 2:06 AM Sebastian Nagel wrote: > Hi, > > &g

Re: Crawling/Indexing Issue on Dev and staging Sever Urls

2018-07-23 Thread Rushi
tion pool shut down". > Which version of Solr are you running? It should be > Solr 5.5.0 for Nutch 1.13. > > Sebastian > > On 07/20/2018 03:58 PM, Rushi wrote: > > Thanks for the response Sebastian, > > Yeah i changed my seeds and i am using Nutch 1.13 > &g

Re: Crawling/Indexing Issue on Dev and staging Sever Urls

2018-07-26 Thread Rushi
I don't see a way to find out why the > page failed to fetch. The CrawlDb contains the fetch status and > usually also a status message which explains the failure. > > Best, > Sebastian > > On 07/23/2018 04:15 PM, Rushi wrote: > > Hi Sebastian, > > I am usi