[nutch] branch branch-1.16 updated: Nutch 1.16 release - update year for API docs - add link to release notes

2019-10-02 Thread snagel
This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch branch-1.16
in repository https://gitbox.apache.org/repos/asf/nutch.git


The following commit(s) were added to refs/heads/branch-1.16 by this push:
 new c9278f6  Nutch 1.16 release - update year for API docs - add link to 
release notes
c9278f6 is described below

commit c9278f651d90ad04e280581141813b36d6a0740b
Author: Sebastian Nagel 
AuthorDate: Wed Oct 2 12:41:01 2019 +0200

Nutch 1.16 release
- update year for API docs
- add link to release notes
---
 CHANGES.txt| 3 ++-
 default.properties | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 2c18e38..82e71f8 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,6 +1,7 @@
 # Nutch Change Log
 
-Nutch 1.16 Release (01/10/2019)
+Nutch 1.16 Release 02/10/2019 (dd/mm/)
+Release Report: https://s.apache.org/l2j94
 
 Comments
 
diff --git a/default.properties b/default.properties
index 298c6fd..a4f8209 100644
--- a/default.properties
+++ b/default.properties
@@ -16,7 +16,7 @@
 name=apache-nutch
 version=1.16
 final.name=${name}-${version}
-year=2018
+year=2019
 
 basedir = ./
 src.dir = ./src/java



[nutch] annotated tag release-1.16 updated (c9278f6 -> 6f15fba)

2019-10-02 Thread snagel
This is an automated email from the ASF dual-hosted git repository.

snagel pushed a change to annotated tag release-1.16
in repository https://gitbox.apache.org/repos/asf/nutch.git.


*** WARNING: tag release-1.16 was modified! ***

from c9278f6  (commit)
  to 6f15fba  (tag)
 tagging c9278f651d90ad04e280581141813b36d6a0740b (commit)
 replaces release-1.13
  by Sebastian Nagel
  on Wed Oct 2 12:43:21 2019 +0200

- Log -
Apache Nutch 1.16 RC#1
---


No new revisions were added by this update.

Summary of changes:



svn commit: r36162 [3/3] - /dev/nutch/1.16/

2019-10-02 Thread snagel
Propchange: dev/nutch/1.16/CHANGES.txt
--
svn:eol-style = native

Added: dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz
==
Binary file - no diff available.

Propchange: dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz
--
svn:mime-type = application/x-gzip

Added: dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.asc
==
--- dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.asc (added)
+++ dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.asc Wed Oct  2 15:17:14 2019
@@ -0,0 +1,11 @@
+-BEGIN PGP SIGNATURE-
+
+iQEzBAABCgAdFiEE/4Kkh/ktcOUv934Kxm6nt9sKnG0FAl2UvTMACgkQxm6nt9sK
+nG1TiAgAtz2BdIb00tCcn11TdHlu9cs31gjxOIK3OShVePMadlby9lSXNuLPUJFU
+rQPU9ZQkFlmPVcyB6HCuoj2xZ/THDWiYtjqPqzCrlzw0TQ6R4ZOGxlK1OpuMEeir
+mSTaphZq4reYZn4gIiKuetaf9x89a5EgbdEhFkP+K2+hIafjIqoUnKvmdD43VGrz
+j+CkEVFYBKuDXJSUUmMj2UTSG7arPpRbDhJPi28vkD3vmCuOXpWDUK9W1rjpDjkv
+w3EdbEqEqsIU1qtdIO0uL80/IvxBnJgu6r8HkAxcm8JO/ERXOxWi7uegL3PTSblD
+KgeqrMr/qUo19yQFcPUOybTYvxlDtw==
+=BBSh
+-END PGP SIGNATURE-

Added: dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.sha512
==
--- dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.sha512 (added)
+++ dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.sha512 Wed Oct  2 15:17:14 2019
@@ -0,0 +1 @@
+SHA512(apache-nutch-1.16-bin.tar.gz)= 
487d73f03fea161d823fa5f425102a2e11f0fdb53b6d76c3787dbc42b4a1b0e51673ab4810aafad1e1321364ec3a1f83742bd904d1b220c6919297a3a0b0b053

Added: dev/nutch/1.16/apache-nutch-1.16-bin.zip
==
Binary file - no diff available.

Propchange: dev/nutch/1.16/apache-nutch-1.16-bin.zip
--
svn:mime-type = application/octet-stream

Added: dev/nutch/1.16/apache-nutch-1.16-bin.zip.asc
==
--- dev/nutch/1.16/apache-nutch-1.16-bin.zip.asc (added)
+++ dev/nutch/1.16/apache-nutch-1.16-bin.zip.asc Wed Oct  2 15:17:14 2019
@@ -0,0 +1,11 @@
+-BEGIN PGP SIGNATURE-
+
+iQEzBAABCgAdFiEE/4Kkh/ktcOUv934Kxm6nt9sKnG0FAl2UvUIACgkQxm6nt9sK
+nG3zoQgAjXAUuwBFjoWYSE+7uzxvYw9BkSueixxmHt6MiYLkN1Je3slvXWfv8oZe
+fAr9OMGsRsy8DQrIxJ3y0SV4+rBHoRoa6A4gOSBfCL2QuWtb180ilDNIGHHWR0GY
+qXLO32WrKn8J8cCxWTiBfyhLj6syo/tYrolg+QLdr7XnNGbzdqRhGPI8KBzzDRV4
+VZkxwnphLc760+BakrwD+SiGPWZeXbACH6tAbkiUWANDMSCtvISXFXSo6jN3aHCM
+T3dJR8ZgnileUF5+VhhfovWzFiN1NgyzMAKvI9eKnAYKw7Wb+re9zI4SLtveWAwD
+WchN/76h0EF3/fwY2i96hxvoEeCAgQ==
+=HAvA
+-END PGP SIGNATURE-

Added: dev/nutch/1.16/apache-nutch-1.16-bin.zip.sha512
==
--- dev/nutch/1.16/apache-nutch-1.16-bin.zip.sha512 (added)
+++ dev/nutch/1.16/apache-nutch-1.16-bin.zip.sha512 Wed Oct  2 15:17:14 2019
@@ -0,0 +1 @@
+SHA512(apache-nutch-1.16-bin.zip)= 
8836d465b537d538acbce73ae34a848c75d366e4b5574bce2ed6a080b358436e67bd8b92978f8c3ea4ea922aef88301d71d417b0c4ac8f6ade5373c1966bfc86

Added: dev/nutch/1.16/apache-nutch-1.16-src.tar.gz
==
Binary file - no diff available.

Propchange: dev/nutch/1.16/apache-nutch-1.16-src.tar.gz
--
svn:mime-type = application/x-gzip

Added: dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.asc
==
--- dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.asc (added)
+++ dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.asc Wed Oct  2 15:17:14 2019
@@ -0,0 +1,11 @@
+-BEGIN PGP SIGNATURE-
+
+iQEzBAABCgAdFiEE/4Kkh/ktcOUv934Kxm6nt9sKnG0FAl2UvUMACgkQxm6nt9sK
+nG169QgArTzLvR/x0UhnLGqP6Bmx2Cm+sTn/9ZNLTfw7GRT4nb4/ZulHuFT5oifu
+Dj+pygQ13N/XCOUYdZzV7EtmC4gkB+ngP2wPM+RsCQYM3NnnrqlbE8cAMlxlMJmc
+ejKRGNg5kuw7/jhUQVh/Is6qCib5m7jtoG7hwL5UJ6bMg1+Yd2ObB3QwPGugXfej
+x/PriaFkvpRjpCjLUwZ1/WcnRqWvRTyHPSTfaO/CHYfWhl8F2SJy+0OfwEcjPJi/
+jxlZwNq81D9/O6WYIfSIDvVKoHKvfH4kh2is+yTOvq7Npz0ua0PMfXFMZYz7d/1l
+8vA/3pBehS4iWopqVSw8vzytMTXI6g==
+=/CT2
+-END PGP SIGNATURE-

Added: dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.sha512
==
--- dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.sha512 (added)
+++ dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.sha512 Wed Oct  2 15:17:14 2019
@@ -0,0 +1 @@
+SHA512(apache-nutch-1.16-src.tar.gz)= 
dc33eedd7b00bd8dcebff60bd97178cada8b76fa435044e405462b3887b9c9c7d9ea550df4f97bd18d29372a79abb98f7e9e3882be95e98c8d9591d21583fe8e

Added: dev/nutch/1.16/apache-nutch-1.16-src.zip

svn commit: r36162 [2/3] - /dev/nutch/1.16/

2019-10-02 Thread snagel


Added: dev/nutch/1.16/CHANGES.txt
==
--- dev/nutch/1.16/CHANGES.txt (added)
+++ dev/nutch/1.16/CHANGES.txt Wed Oct  2 15:17:14 2019
@@ -0,0 +1,3032 @@
+# Nutch Change Log
+
+Nutch 1.16 Release 02/10/2019 (dd/mm/)
+Release Report: https://s.apache.org/l2j94
+
+Comments
+
+-  schema.xml has been moved to indexer-solr plugin directory. This file 
is provided as a
+   reference/guide for Solr users (NUTCH-2654)
+
+Breaking Changes
+
+-  The value of crawl.gen.delay is now read in milliseconds as stated in 
the description
+   in nutch-default.xml. Previously, the value has been read in days, see 
NUTCH-1842 for
+   further information.
+
+-  HostDB entries have been moved from Integer to Long in order to 
accomodate very large
+   hosts. Remove your existing HostDB and recreate it with bin/nutch 
updatehostdb, see
+   NUTCH-2694 for additional information.
+
+-  The signature class TextProfileSignature has been improved to be stable 
over
+   consecutive runs by sorting tokens by frequency first and secondarily 
in lexicographic
+   order.  If an existing CrawlDb contains signatures generated by 
TextProfileSignature
+   these are likely to change when upgrading to Nutch 1.16.  The previous 
behavior relying
+   on a semi-stable pseudo-random hash sorting could be restored setting 
the property
+   `db.signature.text_profile.sec_sort_lex` to `false`. See also 
NUTCH-2381.
+
+Bug
+
+[NUTCH-1063] - OutlinkExtractor test generates an exception but does not 
fail
+[NUTCH-1842] - crawl.gen.delay has a wrong default value in 
nutch-default.xml or is being parsed incorrectly
+[NUTCH-2279] - LinkRank fails when using Hadoop MR output compression
+[NUTCH-2381] - In some situations the class TextProfileSignature gives 
different signatures for the same text "profile" page.
+[NUTCH-2387] - Nutch should not index document with "noindex" meta
+[NUTCH-2457] - Embedded documents likely not correctly parsed by Tika
+[NUTCH-2475] - If and else-if branches has the same condition
+[NUTCH-2482] - index-geoip not to add null values to document fields
+[NUTCH-2585] - NPE in TrieStringMatcher
+[NUTCH-2598] - URLNormalizerChecker fails on invalid URLs in input
+[NUTCH-2606] - MIME detection is wrong for plain-text documents send as 
Content-Type "application/msword"
+[NUTCH-2635] - Generator writes unneeded temporary output
+[NUTCH-2639] - bin/nutch fails to set native library path on Cygwin 
causing jobs to fail with UnsatisfiedLinkError
+[NUTCH-2641] - ClassCastException in webui
+[NUTCH-2642] - MoreIndexingFilter parses ISO 8601 UTC dates in local time 
zone
+[NUTCH-2643] - ant target "resolve-default" to depend on "init"
+[NUTCH-2644] - CrawlDbReader -dump ignores filter options
+[NUTCH-2645] - Webgraph tools ignore command-line options
+[NUTCH-2650] - -addBinaryContent -base64 flags are causing "String length 
must be a multiple of four" error in IndexingJob
+[NUTCH-2652] - Fetcher launches more fetch tasks than fetch lists
+[NUTCH-2655] - Update Solr schema.xml for Solr 7.x
+[NUTCH-2656] - Update description to configure Solr 7.x in tutorial
+[NUTCH-2673] - EOFException protocol-http
+[NUTCH-2674] - HostDb: dump shows wrong column headers
+[NUTCH-2680] - Documentation: https supported by multiple protocol plugins 
not only httpclient
+[NUTCH-2687] - Regex for reading title from Content-Disposition is wrong
+[NUTCH-2694] - HostDB to aggregate by long instead of integer
+[NUTCH-2696] - Nutch SegmentReader does not dump non-ASCII characters with 
Hadoop 3.x
+[NUTCH-2699] - Protocol-okhttp: needless loops to increment requested 
bytes counter when more content is already buffered
+[NUTCH-2703] - parse-tika: Boilerpipe should not run for non-(X)HTML pages
+[NUTCH-2706] - -addBinaryContent flag can cause "String length must be a 
multiple of four" error in IndexingJob
+[NUTCH-2715] - WARCExporter fails on large records
+[NUTCH-2716] - protocol-http: Response headers are not stored for a 
compressed response
+[NUTCH-2717] - Generator cannot open hostDB
+[NUTCH-2722] - Fetch dependencies via https
+[NUTCH-2723] - Indexer Solr not to decode URLs before deletion
+[NUTCH-2724] - Metadata indexer not to emit empty values
+[NUTCH-2729] - protocol-okhttp: fix marking of truncated content
+[NUTCH-2731] - Solr Cleanup Step Fails when Authentication is Required
+[NUTCH-2738] - Generator: document property generate.restrict.status
+[NUTCH-2740] - Generator: generate.max.count overflow not logged
+
+New Feature
+
+[NUTCH-2676] - Update to the latest selenium and add code to use chrome 
and firefox headless mode with the remote web driver
+
+Improvement
+
+[NUTCH-1014] - Migrate from Apache ORO to java.util.regex
+[NUTCH-1021] - Migrate 

svn commit: r36162 [1/3] - /dev/nutch/1.16/

2019-10-02 Thread snagel
Author: snagel
Date: Wed Oct  2 15:17:14 2019
New Revision: 36162

Log:
Apache Nutch 1.16 RC#1

Added:
dev/nutch/1.16/
dev/nutch/1.16/CHANGES.txt   (with props)
dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz   (with props)
dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.asc
dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.sha512
dev/nutch/1.16/apache-nutch-1.16-bin.zip   (with props)
dev/nutch/1.16/apache-nutch-1.16-bin.zip.asc
dev/nutch/1.16/apache-nutch-1.16-bin.zip.sha512
dev/nutch/1.16/apache-nutch-1.16-src.tar.gz   (with props)
dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.asc
dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.sha512
dev/nutch/1.16/apache-nutch-1.16-src.zip   (with props)
dev/nutch/1.16/apache-nutch-1.16-src.zip.asc
dev/nutch/1.16/apache-nutch-1.16-src.zip.sha512