Re: Help: Extracted Links with characters like ?,= are getting filtered out.

Volli Tue, 31 Aug 2010 06:47:53 -0700

Did you try already to switch off the regexp incrawl-urlfilter.txt?


if you use
bin/nutch crawl...
for crawling crawl-urlfilter.txt must be changed.

compare other lines, too. see "# skip everything else" and"# accept anything else"


Am 31.08.2010 10:32, schrieb jitendra rajput:

Hi,

I am trying to write XpathBasedLinkExtractor which extracts links out of
html page using xpaths.
But all the extracted links which contains characters like [? , = ] are
being filtered out. I am not able to nail it down where it is happening.
They are not going into segments.
I have also commented out regular expression -[...@=] in
regex-urlfilter.txt. Still It is showing same behaviour.

Can any one give me idea about this. Where am I going wrong. I am stuck at
this for last day.

Any help would be highly appreciated.

Thanks
Jitendra

Re: Help: Extracted Links with characters like ?,= are getting filtered out.

Reply via email to