Re: Nutch excludeNodes Patch

2019-10-10 Thread Dave Beckstrom
and xhtml, Nutch should use HtmlParser instead. > > Regards, > Markus > > -Original message- > > From:Dave Beckstrom > > Sent: Wednesday 9th October 2019 22:10 > > To: user@nutch.apache.org > > Subject: Nutch excludeNodes Patch > > > &g

RE: Nutch excludeNodes Patch

2019-10-09 Thread Markus Jelsma
HtmlParser instead. Regards, Markus -Original message- > From:Dave Beckstrom > Sent: Wednesday 9th October 2019 22:10 > To: user@nutch.apache.org > Subject: Nutch excludeNodes Patch > > Hi Everyone! > > > We are running Nutch 1.15. > > We are

Nutch excludeNodes Patch

2019-10-09 Thread Dave Beckstrom
Hi Everyone! We are running Nutch 1.15. We are trying to implement the nutch-585-excludeNodes.patch described on: https://issues.apache.org/jira/browse/NUTCH-585 It's acting like it's not running. We don't get an error when the crawl runs, no errors in the hadoop logs, it just doesn't exclude