[nutch] branch branch-1.16 updated: Nutch 1.16 release - update year for API docs - add link to release notes
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch branch-1.16 in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/branch-1.16 by this push: new c9278f6 Nutch 1.16 release - update year for API docs - add link to release notes c9278f6 is described below commit c9278f651d90ad04e280581141813b36d6a0740b Author: Sebastian Nagel AuthorDate: Wed Oct 2 12:41:01 2019 +0200 Nutch 1.16 release - update year for API docs - add link to release notes --- CHANGES.txt| 3 ++- default.properties | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index 2c18e38..82e71f8 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,6 +1,7 @@ # Nutch Change Log -Nutch 1.16 Release (01/10/2019) +Nutch 1.16 Release 02/10/2019 (dd/mm/) +Release Report: https://s.apache.org/l2j94 Comments diff --git a/default.properties b/default.properties index 298c6fd..a4f8209 100644 --- a/default.properties +++ b/default.properties @@ -16,7 +16,7 @@ name=apache-nutch version=1.16 final.name=${name}-${version} -year=2018 +year=2019 basedir = ./ src.dir = ./src/java
[nutch] annotated tag release-1.16 updated (c9278f6 -> 6f15fba)
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to annotated tag release-1.16 in repository https://gitbox.apache.org/repos/asf/nutch.git. *** WARNING: tag release-1.16 was modified! *** from c9278f6 (commit) to 6f15fba (tag) tagging c9278f651d90ad04e280581141813b36d6a0740b (commit) replaces release-1.13 by Sebastian Nagel on Wed Oct 2 12:43:21 2019 +0200 - Log - Apache Nutch 1.16 RC#1 --- No new revisions were added by this update. Summary of changes:
svn commit: r36162 [3/3] - /dev/nutch/1.16/
Propchange: dev/nutch/1.16/CHANGES.txt -- svn:eol-style = native Added: dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz == Binary file - no diff available. Propchange: dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz -- svn:mime-type = application/x-gzip Added: dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.asc == --- dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.asc (added) +++ dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.asc Wed Oct 2 15:17:14 2019 @@ -0,0 +1,11 @@ +-BEGIN PGP SIGNATURE- + +iQEzBAABCgAdFiEE/4Kkh/ktcOUv934Kxm6nt9sKnG0FAl2UvTMACgkQxm6nt9sK +nG1TiAgAtz2BdIb00tCcn11TdHlu9cs31gjxOIK3OShVePMadlby9lSXNuLPUJFU +rQPU9ZQkFlmPVcyB6HCuoj2xZ/THDWiYtjqPqzCrlzw0TQ6R4ZOGxlK1OpuMEeir +mSTaphZq4reYZn4gIiKuetaf9x89a5EgbdEhFkP+K2+hIafjIqoUnKvmdD43VGrz +j+CkEVFYBKuDXJSUUmMj2UTSG7arPpRbDhJPi28vkD3vmCuOXpWDUK9W1rjpDjkv +w3EdbEqEqsIU1qtdIO0uL80/IvxBnJgu6r8HkAxcm8JO/ERXOxWi7uegL3PTSblD +KgeqrMr/qUo19yQFcPUOybTYvxlDtw== +=BBSh +-END PGP SIGNATURE- Added: dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.sha512 == --- dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.sha512 (added) +++ dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.sha512 Wed Oct 2 15:17:14 2019 @@ -0,0 +1 @@ +SHA512(apache-nutch-1.16-bin.tar.gz)= 487d73f03fea161d823fa5f425102a2e11f0fdb53b6d76c3787dbc42b4a1b0e51673ab4810aafad1e1321364ec3a1f83742bd904d1b220c6919297a3a0b0b053 Added: dev/nutch/1.16/apache-nutch-1.16-bin.zip == Binary file - no diff available. Propchange: dev/nutch/1.16/apache-nutch-1.16-bin.zip -- svn:mime-type = application/octet-stream Added: dev/nutch/1.16/apache-nutch-1.16-bin.zip.asc == --- dev/nutch/1.16/apache-nutch-1.16-bin.zip.asc (added) +++ dev/nutch/1.16/apache-nutch-1.16-bin.zip.asc Wed Oct 2 15:17:14 2019 @@ -0,0 +1,11 @@ +-BEGIN PGP SIGNATURE- + +iQEzBAABCgAdFiEE/4Kkh/ktcOUv934Kxm6nt9sKnG0FAl2UvUIACgkQxm6nt9sK +nG3zoQgAjXAUuwBFjoWYSE+7uzxvYw9BkSueixxmHt6MiYLkN1Je3slvXWfv8oZe +fAr9OMGsRsy8DQrIxJ3y0SV4+rBHoRoa6A4gOSBfCL2QuWtb180ilDNIGHHWR0GY +qXLO32WrKn8J8cCxWTiBfyhLj6syo/tYrolg+QLdr7XnNGbzdqRhGPI8KBzzDRV4 +VZkxwnphLc760+BakrwD+SiGPWZeXbACH6tAbkiUWANDMSCtvISXFXSo6jN3aHCM +T3dJR8ZgnileUF5+VhhfovWzFiN1NgyzMAKvI9eKnAYKw7Wb+re9zI4SLtveWAwD +WchN/76h0EF3/fwY2i96hxvoEeCAgQ== +=HAvA +-END PGP SIGNATURE- Added: dev/nutch/1.16/apache-nutch-1.16-bin.zip.sha512 == --- dev/nutch/1.16/apache-nutch-1.16-bin.zip.sha512 (added) +++ dev/nutch/1.16/apache-nutch-1.16-bin.zip.sha512 Wed Oct 2 15:17:14 2019 @@ -0,0 +1 @@ +SHA512(apache-nutch-1.16-bin.zip)= 8836d465b537d538acbce73ae34a848c75d366e4b5574bce2ed6a080b358436e67bd8b92978f8c3ea4ea922aef88301d71d417b0c4ac8f6ade5373c1966bfc86 Added: dev/nutch/1.16/apache-nutch-1.16-src.tar.gz == Binary file - no diff available. Propchange: dev/nutch/1.16/apache-nutch-1.16-src.tar.gz -- svn:mime-type = application/x-gzip Added: dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.asc == --- dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.asc (added) +++ dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.asc Wed Oct 2 15:17:14 2019 @@ -0,0 +1,11 @@ +-BEGIN PGP SIGNATURE- + +iQEzBAABCgAdFiEE/4Kkh/ktcOUv934Kxm6nt9sKnG0FAl2UvUMACgkQxm6nt9sK +nG169QgArTzLvR/x0UhnLGqP6Bmx2Cm+sTn/9ZNLTfw7GRT4nb4/ZulHuFT5oifu +Dj+pygQ13N/XCOUYdZzV7EtmC4gkB+ngP2wPM+RsCQYM3NnnrqlbE8cAMlxlMJmc +ejKRGNg5kuw7/jhUQVh/Is6qCib5m7jtoG7hwL5UJ6bMg1+Yd2ObB3QwPGugXfej +x/PriaFkvpRjpCjLUwZ1/WcnRqWvRTyHPSTfaO/CHYfWhl8F2SJy+0OfwEcjPJi/ +jxlZwNq81D9/O6WYIfSIDvVKoHKvfH4kh2is+yTOvq7Npz0ua0PMfXFMZYz7d/1l +8vA/3pBehS4iWopqVSw8vzytMTXI6g== +=/CT2 +-END PGP SIGNATURE- Added: dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.sha512 == --- dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.sha512 (added) +++ dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.sha512 Wed Oct 2 15:17:14 2019 @@ -0,0 +1 @@ +SHA512(apache-nutch-1.16-src.tar.gz)= dc33eedd7b00bd8dcebff60bd97178cada8b76fa435044e405462b3887b9c9c7d9ea550df4f97bd18d29372a79abb98f7e9e3882be95e98c8d9591d21583fe8e Added: dev/nutch/1.16/apache-nutch-1.16-src.zip
svn commit: r36162 [2/3] - /dev/nutch/1.16/
Added: dev/nutch/1.16/CHANGES.txt == --- dev/nutch/1.16/CHANGES.txt (added) +++ dev/nutch/1.16/CHANGES.txt Wed Oct 2 15:17:14 2019 @@ -0,0 +1,3032 @@ +# Nutch Change Log + +Nutch 1.16 Release 02/10/2019 (dd/mm/) +Release Report: https://s.apache.org/l2j94 + +Comments + +- schema.xml has been moved to indexer-solr plugin directory. This file is provided as a + reference/guide for Solr users (NUTCH-2654) + +Breaking Changes + +- The value of crawl.gen.delay is now read in milliseconds as stated in the description + in nutch-default.xml. Previously, the value has been read in days, see NUTCH-1842 for + further information. + +- HostDB entries have been moved from Integer to Long in order to accomodate very large + hosts. Remove your existing HostDB and recreate it with bin/nutch updatehostdb, see + NUTCH-2694 for additional information. + +- The signature class TextProfileSignature has been improved to be stable over + consecutive runs by sorting tokens by frequency first and secondarily in lexicographic + order. If an existing CrawlDb contains signatures generated by TextProfileSignature + these are likely to change when upgrading to Nutch 1.16. The previous behavior relying + on a semi-stable pseudo-random hash sorting could be restored setting the property + `db.signature.text_profile.sec_sort_lex` to `false`. See also NUTCH-2381. + +Bug + +[NUTCH-1063] - OutlinkExtractor test generates an exception but does not fail +[NUTCH-1842] - crawl.gen.delay has a wrong default value in nutch-default.xml or is being parsed incorrectly +[NUTCH-2279] - LinkRank fails when using Hadoop MR output compression +[NUTCH-2381] - In some situations the class TextProfileSignature gives different signatures for the same text "profile" page. +[NUTCH-2387] - Nutch should not index document with "noindex" meta +[NUTCH-2457] - Embedded documents likely not correctly parsed by Tika +[NUTCH-2475] - If and else-if branches has the same condition +[NUTCH-2482] - index-geoip not to add null values to document fields +[NUTCH-2585] - NPE in TrieStringMatcher +[NUTCH-2598] - URLNormalizerChecker fails on invalid URLs in input +[NUTCH-2606] - MIME detection is wrong for plain-text documents send as Content-Type "application/msword" +[NUTCH-2635] - Generator writes unneeded temporary output +[NUTCH-2639] - bin/nutch fails to set native library path on Cygwin causing jobs to fail with UnsatisfiedLinkError +[NUTCH-2641] - ClassCastException in webui +[NUTCH-2642] - MoreIndexingFilter parses ISO 8601 UTC dates in local time zone +[NUTCH-2643] - ant target "resolve-default" to depend on "init" +[NUTCH-2644] - CrawlDbReader -dump ignores filter options +[NUTCH-2645] - Webgraph tools ignore command-line options +[NUTCH-2650] - -addBinaryContent -base64 flags are causing "String length must be a multiple of four" error in IndexingJob +[NUTCH-2652] - Fetcher launches more fetch tasks than fetch lists +[NUTCH-2655] - Update Solr schema.xml for Solr 7.x +[NUTCH-2656] - Update description to configure Solr 7.x in tutorial +[NUTCH-2673] - EOFException protocol-http +[NUTCH-2674] - HostDb: dump shows wrong column headers +[NUTCH-2680] - Documentation: https supported by multiple protocol plugins not only httpclient +[NUTCH-2687] - Regex for reading title from Content-Disposition is wrong +[NUTCH-2694] - HostDB to aggregate by long instead of integer +[NUTCH-2696] - Nutch SegmentReader does not dump non-ASCII characters with Hadoop 3.x +[NUTCH-2699] - Protocol-okhttp: needless loops to increment requested bytes counter when more content is already buffered +[NUTCH-2703] - parse-tika: Boilerpipe should not run for non-(X)HTML pages +[NUTCH-2706] - -addBinaryContent flag can cause "String length must be a multiple of four" error in IndexingJob +[NUTCH-2715] - WARCExporter fails on large records +[NUTCH-2716] - protocol-http: Response headers are not stored for a compressed response +[NUTCH-2717] - Generator cannot open hostDB +[NUTCH-2722] - Fetch dependencies via https +[NUTCH-2723] - Indexer Solr not to decode URLs before deletion +[NUTCH-2724] - Metadata indexer not to emit empty values +[NUTCH-2729] - protocol-okhttp: fix marking of truncated content +[NUTCH-2731] - Solr Cleanup Step Fails when Authentication is Required +[NUTCH-2738] - Generator: document property generate.restrict.status +[NUTCH-2740] - Generator: generate.max.count overflow not logged + +New Feature + +[NUTCH-2676] - Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver + +Improvement + +[NUTCH-1014] - Migrate from Apache ORO to java.util.regex +[NUTCH-1021] - Migrate
svn commit: r36162 [1/3] - /dev/nutch/1.16/
Author: snagel Date: Wed Oct 2 15:17:14 2019 New Revision: 36162 Log: Apache Nutch 1.16 RC#1 Added: dev/nutch/1.16/ dev/nutch/1.16/CHANGES.txt (with props) dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz (with props) dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.asc dev/nutch/1.16/apache-nutch-1.16-bin.tar.gz.sha512 dev/nutch/1.16/apache-nutch-1.16-bin.zip (with props) dev/nutch/1.16/apache-nutch-1.16-bin.zip.asc dev/nutch/1.16/apache-nutch-1.16-bin.zip.sha512 dev/nutch/1.16/apache-nutch-1.16-src.tar.gz (with props) dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.asc dev/nutch/1.16/apache-nutch-1.16-src.tar.gz.sha512 dev/nutch/1.16/apache-nutch-1.16-src.zip (with props) dev/nutch/1.16/apache-nutch-1.16-src.zip.asc dev/nutch/1.16/apache-nutch-1.16-src.zip.sha512