Re: Regex to block some patterns

2018-10-05 Thread Amarnatha Reddy
modal|exit).*\.html > >> > >> Will work for all exampes given. > >> > >> You can test regexes really well online [1]. If each input has true for > >> lookingAt, Nutch' regexfilter will filter the URL's. > >> > >> Regards, > &g

Re: Regex to block some patterns

2018-10-05 Thread govind nitk
AM Markus Jelsma < > markus.jel...@openindex.io> > > wrote: > > > >> Hi Amarnatha, > >> > >> -^.+(?:modal|exit).*\.html > >> > >> Will work for all exampes given. > >> > >> You can test regexes really well online [1]. If

Re: Regex to block some patterns

2018-10-05 Thread Sebastian Nagel
ach input has true for >> lookingAt, Nutch' regexfilter will filter the URL's. >> >> Regards, >> Markus >> >> [1] https://www.regexplanet.com/advanced/java/index.html >> >> >> -Original message- >>> From:Amarnatha Reddy &g

Re: Regex to block some patterns

2018-10-03 Thread Amarnatha Reddy
r the URL's. > > Regards, > Markus > > [1] https://www.regexplanet.com/advanced/java/index.html > > > -Original message- > > From:Amarnatha Reddy > > Sent: Wednesday 3rd October 2018 15:23 > > To: user@nutch.apache.org > > Subject: Regex to blo

RE: Regex to block some patterns

2018-10-03 Thread Markus Jelsma
message- > From:Amarnatha Reddy > Sent: Wednesday 3rd October 2018 15:23 > To: user@nutch.apache.org > Subject: Regex to block some patterns > > Hi Team, > > > > I need some assistance to block patterns in my current setup. > > > > Always m

Regex to block some patterns

2018-10-03 Thread Amarnatha Reddy
Hi Team, I need some assistance to block patterns in my current setup. Always my seed url is *https://www.abc.com/ * and need to crawl all pages except below patterns in Nutch1.15 Blocking pattern *modal(.*).html *and *exit.html? *and *exit.html/?* Sample pages