Re: Connect Solr and Nutch in Ubuntu 18

2018-10-05 Thread Timeka Cobb
No problem and sorry about that! On Fri, Oct 5, 2018 at 11:50 AM Sebastian Nagel wrote: > Hi Timeka, > > > because Solr is missing the > > files from its packet for it to work. > > There are many Solr versions available and it easily may happen that the > description in the Wiki is outdated or

Re: Connect Solr and Nutch in Ubuntu 18

2018-10-05 Thread Sebastian Nagel
Hi Timeka, > because Solr is missing the > files from its packet for it to work. There are many Solr versions available and it easily may happen that the description in the Wiki is outdated or not applicable for your combination of Nutch and Solr. Please try to give as much information as

Re: Alternatives to Solr

2018-10-05 Thread Timeka Cobb
Thank you so very much On Fri, Oct 5, 2018, 3:41 PM Yash Thenuan Thenuan wrote: > You can use elasticsearch. > > On Sat, 6 Oct 2018, 00:58 Timeka Cobb, wrote: > > > Hello folks! Does anyone know of a good alternative to Solr? Im asking > this > > becasue Ive been trying to connect the 2 and

Re: Alternatives to Solr

2018-10-05 Thread Yash Thenuan Thenuan
You can use elasticsearch. On Sat, 6 Oct 2018, 00:58 Timeka Cobb, wrote: > Hello folks! Does anyone know of a good alternative to Solr? Im asking this > becasue Ive been trying to connect the 2 and its been so frustrating. > The Nutch Wiki is extremely unreliable when it comes to Solr and

Alternatives to Solr

2018-10-05 Thread Timeka Cobb
Hello folks! Does anyone know of a good alternative to Solr? Im asking this becasue Ive been trying to connect the 2 and its been so frustrating. The Nutch Wiki is extremely unreliable when it comes to Solr and every site I go to for info leads me nowhere. Does anyone know of something else I

Re: Regex to block some patterns

2018-10-05 Thread Sebastian Nagel
Hi Amarnath, the only possibility is that https://www.abc.com/ is skipped - by another rule in regex-urlfilter.txt - or another URL filter plugin Please check your configuration carefully. You may also use the tool bin/nutch filterchecker to test the filters beforehand: every active filter

Re: Connect Solr and Nutch in Ubuntu 18

2018-10-05 Thread govind nitk
Info given is not sufficient to figure out the problem. 1. You need to add indexer-solr to the plugins list. 2. Check "solr index properties" in nutch-default.xml ( It has lot of properties) check out - https://wiki.apache.org/nutch/NutchTutorial for detailed explanation. On Fri, Oct 5, 2018

Re: Regex to block some patterns

2018-10-05 Thread govind nitk
Also, check last regex line. *# accept anything else* *+.* By mistake if you have made it negative( -.), everything will be discarded. Best, Govind On Fri, Oct 5, 2018 at 1:02 PM Sebastian Nagel wrote: > Hi Amarnath, > > the only possibility is that https://www.abc.com/ is skipped > - by

Encoding issue in solr

2018-10-05 Thread UMA MAHESWAR
HI ALL, while i am using nutch for crawling and indexing in to solr,while storing data in to solr encoding issue facing in site having the title title : ebm-papst Motoren & Ventilatoren GmbH - Axialventilatoren und Radialventilatoren aus Linz, Österreich but in solr storing in the below

Re: Connect Solr and Nutch in Ubuntu 18

2018-10-05 Thread Timeka Cobb
I see that but I'm the instructions they say to create resources and the command line Nutch Wiki offers doesn't work because Solr is missing the files from its packet for it to work..I will try again. Thank ya so much for ya help  On Fri, Oct 5, 2018, 4:48 AM govind nitk wrote: > Info given is

Re: Regex to block some patterns

2018-10-05 Thread Amarnatha Reddy
Hi Sebastian, Thanks for the update, here is my regex pattern to block my use case after long spent time. *-.*(modal[-_a-zA-Z0-9]*[\.]html|exit.html[\/]?\??.*|model[-_a-zA-Z0-9]*[\.]html|exitpage.*|exitPage.*)* There was some other pattern which caused whole block, I rectified it. Thanks,