is there any way to find out the url filtered?

Also the line -^.{513,}$ was inserted as the update job was failing consistently due to MongoDb exception : key too large to index.

Thanks and Regards,
Shubham Gupta

On Wednesday 05 October 2016 01:50 PM, Sachin Shaju wrote:
For the time being you can comment out this line -^.{513,}$ and check.

Regards,
Sachin Shaju

[email protected]
+919539887554

On Wed, Oct 5, 2016 at 11:41 AM, shubham.gupta <[email protected]>
wrote:

my current regex-urlfilter properties are as follows:

# skip file: ftp: and mailto: urls
#-^(file|ftp|mailto):

# skip image and other suffixes we can't yet parse
# for a more extensive coverage use the urlfilter-suffix plugin
#-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|
wmf|WMF|zip|ZIP|ppt|pdf|PPT|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|
tgz|TGZ|mov|MOV|exe|EXE|jpeg|JPEG|bmp|BMP|js|JS)$

# skip URLs containing certain characters as probable queries, etc.
#-[?*!@=]

# skip URLs with slash-delimited segment that repeats 3+ times, to break
loops
#-.*(/[^/]+)/[^/]+\1/[^/]+\1/

# accept anything else
-^(http://up.anv.bz)
+.

# skip URLs longer than 512 characters
-^.{513,}$

Thanks and Regards,
Shubham Gupta

On Wednesday 05 October 2016 11:29 AM, Sachin Shaju wrote:

my regex-urlfilter properties are as follows:
# skip file: ftp: and mailto: urls
-^(file|ftp|mailto):

# skip image and other suffixes we can't yet parse
# for a more extensive coverage use the urlfilter-suffix plugin
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|
wmf|WMF|zip|ZIP|ppt|pdf|PPT|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|
tgz|TGZ|mov|MOV|exe|EXE|jpeg|JPEG|bmp|BMP|js|JS)$

# skip URLs containing certain characters as probable queries, etc.
#-[?*!@=]

# skip URLs with slash-delimited segment that repeats 3+ times, to
break
loops
-.*(/[^/]+)/[^/]+\1/[^/]+\1/

# accept anything else
-^(http://up.anv.bz)
+.

# skip URLs longer than 512 characters
-^.{513,}$


Reply via email to