Hi Sebastian,
Thanks for the update, here is my regex pattern to block my use case after
long spent time.
*-.*(modal[-_a-zA-Z0-9]*[\.]html|exit.html[\/]?\??.*|model[-_a-zA-Z0-9]*[\.]html|exitpage.*|exitPage.*)*
There was some other pattern which caused whole block, I rectified it.
Thanks,
Also, check last regex line.
*# accept anything else*
*+.*
By mistake if you have made it negative( -.), everything will be discarded.
Best,
Govind
On Fri, Oct 5, 2018 at 1:02 PM Sebastian Nagel
wrote:
> Hi Amarnath,
>
> the only possibility is that https://www.abc.com/ is skipped
> - by
Hi Amarnath,
the only possibility is that https://www.abc.com/ is skipped
- by another rule in regex-urlfilter.txt
- or another URL filter plugin
Please check your configuration carefully. You may also use the tool
bin/nutch filterchecker
to test the filters beforehand: every active filter
Hi Markus,
Thanks a lot for the quick update, but i applied the same rule and it's
completely rejected and no more urls to inject.
I have applied the same regex: -^.+(?:modal|exit).*\.html
seed.txt: https://www.abc.com/
Seems regex is fine, but it's not working with Nutch1.15 regex block...any
Hi Amarnatha,
-^.+(?:modal|exit).*\.html
Will work for all exampes given.
You can test regexes really well online [1]. If each input has true for
lookingAt, Nutch' regexfilter will filter the URL's.
Regards,
Markus
[1] https://www.regexplanet.com/advanced/java/index.html
-Original
5 matches
Mail list logo