[SECURITY] Nutch 2.3.1 affected by downstream dependency CVE-2016-6809

2019-10-14 Thread lewis john mcgibbney
Title: Nutch 2.3.1 affected by downstream dependency CVE-2016-6809 Vulnerable Versions: 2.3.1 (1.16 is not vulnerable) Disclosure date: 2018-10-22 Credit: Pierre Ernst, Salesforce Summary: Remote Code Execution in Apache Nutch 2.3.1 when crawling web site containing malicious content

RE: [ANNOUNCE] Apache Nutch 1.16 Release

2019-10-14 Thread Markus Jelsma
Thanks Sebastian! -Original message- > From:Sebastian Nagel > Sent: Friday 11th October 2019 17:03 > To: user@nutch.apache.org > Cc: d...@nutch.apache.org; annou...@apache.org > Subject: [ANNOUNCE] Apache Nutch 1.16 Release > > Hi folks! > > The Apache Nutch [0] Project Management

Re: metatags missing with parse-html

2019-10-14 Thread Sebastian Nagel
Hi Dave, could you share an example document? Which Nutch version is used? I tried to reproduce the problem without success using Nutch v1.16: - example document: Test metatags test for metatag extraction - using parse-html (works) > bin/nutch indexchecker -Dmetatags.names='*' \

Unable to index on Hadoop 3.2.0 with 1.16

2019-10-14 Thread Markus Jelsma
Hello, We're upgrading our stuff to 1.16 and got a peculiar problem when we started indexing: 2019-10-14 13:50:30,586 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalStateException: text width is less than 1, was <-41> at

Re: Unable to index on Hadoop 3.2.0 with 1.16

2019-10-14 Thread Sebastian Nagel
Hi Markus, I've tested in pseudo-distributed mode with Hadoop 3.2.1, including indexing into Solr. It worked. Could be a dependency version issue similar to that causing NUTCH-2706. But that's only an assumption. Since the IndexWriters.describe() is for help only, I would just deactivate this