Re: Parsed segment has outlinks filtered

2019-10-19 Thread Sachin Mittal
:36 > To: user@nutch.apache.org > Subject: Re: Parsed segment has outlinks filtered > > Hi, > Setting the prop parse.filter.urls= false does not filter out the outlinks. > I get all the outlinks for my parsed url. So this is working as expected. > However it has caused something

RE: Parsed segment has outlinks filtered

2019-10-18 Thread yossi.tamari
Subject: Re: Parsed segment has outlinks filtered Hi, Setting the prop parse.filter.urls= false does not filter out the outlinks. I get all the outlinks for my parsed url. So this is working as expected. However it has caused something unwarranted on the FetcherThread as now it seems to be fetching

Re: Parsed segment has outlinks filtered

2019-10-18 Thread Sachin Mittal
l not filter outlinks at all. >> >> Yossi. >> >> -Original Message- >> From: Sachin Mittal >> Sent: Thursday, 17 October 2019 19:15 >> To: user@nutch.apache.org >> Subject: Parsed segment has outlinks filtered >> >> Hi, >&g

Re: Parsed segment has outlinks filtered

2019-10-18 Thread Sebastian Nagel
alse, the Parser >> will not filter outlinks at all. >> >> Yossi. >> >> -Original Message- >> From: Sachin Mittal >> Sent: Thursday, 17 October 2019 19:15 >> To: user@nutch.apache.org >> Subject: Parsed segment has outlinks

RE: Parsed segment has outlinks filtered

2019-10-17 Thread yossi.tamari
Mittal Sent: Thursday, 17 October 2019 19:15 To: user@nutch.apache.org Subject: Parsed segment has outlinks filtered Hi, I was bit confused on the outlinks generated from a parsed url. If I use the utility: bin/nutch parsechecker url The generated outlinks has all the outlinks. However if I

Parsed segment has outlinks filtered

2019-10-17 Thread Sachin Mittal
Hi, I was bit confused on the outlinks generated from a parsed url. If I use the utility: bin/nutch parsechecker url The generated outlinks has all the outlinks. However if I check the dump of parsed segment generated using nutch crawl script using command: bin/nutch readseg -dump