Hello Sachin, Once a URL gets filtered, by any plugin, it is rejected entirely.
If you want specific queries to pass the regex-urlfilter, you must let is pass explicitly above this -[?*!@=] line, e.g. +passThisQuery= Use bin/nutch filterchecker -stdIn for quick testing. Regards, Markus -----Original message----- > From:Sachin Mittal <sjmit...@gmail.com> > Sent: Monday 21st October 2019 14:22 > To: user@nutch.apache.org > Subject: Adding specfic query parameters to nutch url filters > > Hi, > I have checked the regex-urlfilter and by default I see this line: > > # skip URLs containing certain characters as probable queries, etc. > -[?*!@=] > > In my case for a particular url I want to crawl a specific query, so wanted > to know what file would be the best to make changes to enable this. > > Would it be regex-urlfilter or I also see a filters file suffix-urlfilter > and fast-urlfilter. > > Would adding filters in any of the later two files would help. > Any idea why these filters are added, like what would be the potential > usecase. > > Also say if I add multiple filter plugins backed by these files, then how > url filtering works? Only those urls which pass all the plugins are > selected to be fetched or any of the plugin? > > Thanks > Sachin >