Re: Regex to block some patterns

2018-10-05 Thread Amarnatha Reddy
Hi Sebastian, Thanks for the update, here is my regex pattern to block my use case after long spent time. *-.*(modal[-_a-zA-Z0-9]*[\.]html|exit.html[\/]?\??.*|model[-_a-zA-Z0-9]*[\.]html|exitpage.*|exitPage.*)* There was some other pattern which caused whole block, I rectified it. Thanks,

Re: Regex to block some patterns

2018-10-05 Thread govind nitk
Also, check last regex line. *# accept anything else* *+.* By mistake if you have made it negative( -.), everything will be discarded. Best, Govind On Fri, Oct 5, 2018 at 1:02 PM Sebastian Nagel wrote: > Hi Amarnath, > > the only possibility is that https://www.abc.com/ is skipped > - by

Re: Regex to block some patterns

2018-10-05 Thread Sebastian Nagel
Hi Amarnath, the only possibility is that https://www.abc.com/ is skipped - by another rule in regex-urlfilter.txt - or another URL filter plugin Please check your configuration carefully. You may also use the tool bin/nutch filterchecker to test the filters beforehand: every active filter

Re: Regex to block some patterns

2018-10-03 Thread Amarnatha Reddy
Hi Markus, Thanks a lot for the quick update, but i applied the same rule and it's completely rejected and no more urls to inject. I have applied the same regex: -^.+(?:modal|exit).*\.html seed.txt: https://www.abc.com/ Seems regex is fine, but it's not working with Nutch1.15 regex block...any

RE: Regex to block some patterns

2018-10-03 Thread Markus Jelsma
Hi Amarnatha, -^.+(?:modal|exit).*\.html Will work for all exampes given. You can test regexes really well online [1]. If each input has true for lookingAt, Nutch' regexfilter will filter the URL's. Regards, Markus [1] https://www.regexplanet.com/advanced/java/index.html -Original