Hello Sachin,
Once a URL gets filtered, by any plugin, it is rejected entirely.
If you want specific queries to pass the regex-urlfilter, you must let is pass
explicitly above this -[?*!@=] line, e.g. +passThisQuery=
Use bin/nutch filterchecker -stdIn for quick testing.
Regards,
Markus
-Original message-
> From:Sachin Mittal
> Sent: Monday 21st October 2019 14:22
> To: user@nutch.apache.org
> Subject: Adding specfic query parameters to nutch url filters
>
> Hi,
> I have checked the regex-urlfilter and by default I see this line:
>
> # skip URLs containing certain characters as probable queries, etc.
> -[?*!@=]
>
> In my case for a particular url I want to crawl a specific query, so wanted
> to know what file would be the best to make changes to enable this.
>
> Would it be regex-urlfilter or I also see a filters file suffix-urlfilter
> and fast-urlfilter.
>
> Would adding filters in any of the later two files would help.
> Any idea why these filters are added, like what would be the potential
> usecase.
>
> Also say if I add multiple filter plugins backed by these files, then how
> url filtering works? Only those urls which pass all the plugins are
> selected to be fetched or any of the plugin?
>
> Thanks
> Sachin
>