[nutch] branch master updated: NUTCH-2763 protocol-okhttp (store.http.headers): add whitespace in status line after status code also when message is empty

2020-02-27 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 9449417 NUTCH-2763 protocol-okhttp

[nutch] branch master updated (142a026 -> ac4f2f4)

2020-02-27 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git. from 142a026 Merge pull request #495 from sebastian-nagel/NUTCH-2672-build-docs-use-https new 8e5837f NUTCH-2767

[nutch] branch master updated: NUTCH-2768 FetcherThread: unnecessary usage of class casts

2020-02-27 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 77ec28f NUTCH-2768 FetcherThread: unnecessary

[nutch] branch master updated: NUTCH-2760 protocol-okhttp: properly record HTTP version in request message header - use HTTP protocol from connection (instead of response) for request message stored i

2020-01-09 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 21018be NUTCH-2760 protocol-okhttp: properly

[nutch] branch master updated (3bbc6dd -> c4dd7c1)

2020-01-09 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git. from 3bbc6dd Merge pull request #489 from sebastian-nagel/NUTCH-2760-protocol-okhttp-request-message-http-version

[nutch] branch master updated: NUTCH-2759 bin/crawl: Rename option --num-slaves - renamed to --num-fetchers

2020-01-19 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 040d71d NUTCH-2759 bin/crawl: Rename option

[nutch] branch master updated (a118c85 -> 0a2ffa7)

2020-01-19 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git. from a118c85 Merge pull request #491 from sebastian-nagel/NUTCH-2759-bin-crawl-rename-num-slaves new a209946 NUTCH

[nutch] branch master updated: Fix for NUTCH-1863: Add JSON format dump output to readdb command (#490)

2019-12-27 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 8a663f9 Fix for NUTCH-1863: Add JSON format

[nutch] branch master updated: NUTCH-2777 - Upgrade to Hadoop 3.1

2020-04-10 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new d7b6ccf NUTCH-2777 - Upgrade to Hadoop 3.1

[nutch] branch master updated: NUTCH-2775 Fetcher to guarantee minimum delay even if robots.txt defines shorter Crawl-delay - guaranteed minimum delay is configured by `fetcher.min.crawl.delay` (defau

2020-04-10 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new e6bc451 NUTCH-2775 Fetcher to guarantee minimum

[nutch] branch master updated (0cd0022 -> 6f51618)

2020-04-19 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git. from 0cd0022 Merge pull request #507 from balashashanka/NUTCH-2777 new f999ca5 NUTCH-2757 : Indexer-elastic: add

[nutch] branch master updated (6f51618 -> dcbb0f2)

2020-04-21 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git. from 6f51618 Merge pull request #508 from balashashanka/NUTCH-2757 new 6741574 NUTCH-2755: Remove obsolete plugin

[nutch] branch master updated: NUTCH-2773 SegmentReader (-dump or -get): show HTML content as UTF-8 - if called with command-line flag `-recode` (or if property `segment.reader.content.recode` is true

2020-03-13 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 5076430 NUTCH-2773 SegmentReader (-dump or -get

[nutch] branch master updated: NUTCH-2774 Annotate methods implementing the Hadoop API by @Override - annotate classes implementing Hadoop interfaces - annotate few classes implementing Nutch interfac

2020-03-13 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 22e668d NUTCH-2774 Annotate methods

[nutch] branch master updated (ebc2152 -> 4443cc1)

2020-03-13 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git. from ebc2152 Merge pull request #498 from sebastian-nagel/NUTCH-2763-protocol-okhttp-store-headers-status-line add

[nutch] branch master updated: NUTCH-2778 indexer-elastic to properly log errors - add log output in BulkProcessor.Listener - do not throw an exception in BulkProcessor.Listener (ignored anyway)

2020-04-28 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 81a4b92 NUTCH-2778 indexer-elastic to properly

[nutch] branch master updated: NUTCH-2779 Upgrade to Tika 1.24.1

2020-04-24 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new e1ba9f1 NUTCH-2779 Upgrade to Tika 1.24.1

[nutch] branch master updated: NUTCH-2434 Add methods to reset parameters HTMLMetaTags (apply patch contributed by Markus)

2020-05-05 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new a0ed0b4 NUTCH-2434 Add methods to reset

[nutch] branch master updated: NUTCH-2781 Increase default Java heap size - increase default value for NUTCH_HEAPSIZE to 4096 MB (from 1000 MB) - remove -Dmapred.child.java.opts=-Xmx1000m from default

2020-04-28 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 3214840 NUTCH-2781 Increase default Java heap

[nutch] branch master updated (49eb1bd -> 52eec66)

2020-04-28 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git. from 49eb1bd Merge pull request #511 from sebastian-nagel/NUTCH-2779-tika-1.24.1 new 5b4f595 NUTCH-2780 : Upgrade

[nutch] branch master updated: NUTCH-2783 Use (more) parametrized logging - replace logging messages with string concatenations by parametrized calls - remove LOG.isInfoEnabled() where parametrized lo

2020-04-28 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 3d3018b NUTCH-2783 Use (more) parametrized

[nutch] branch master updated: NUTCH-2501 allow to set Java heap size when using crawl script in distributed mode - bin/crawl - add hint how to set map and reduce task memory via -D ... options - use

2020-04-28 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new b5e794e NUTCH-2501 allow to set Java heap size

[nutch] branch master updated: NUTCH-2501 allow to set Java heap size when using crawl script in distributed mode - fix examples of `-D property=value` in bin/crawl : there must be a blank after `-D`

2020-04-28 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new a455eb5 NUTCH-2501 allow to set Java heap size

[nutch] branch master updated: NUTCH-2495: Use -deleteGone instead of clean job in crawl script while indexing

2020-04-30 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 7ebd35d NUTCH-2495: Use -deleteGone instead

[nutch] branch master updated: NUTCH-2784 Tool to list Nutch properties and configured values

2020-04-30 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new a20c261 NUTCH-2784 Tool to list Nutch

[nutch] branch master updated: NUTCH-2743 Add list of Nutch properties (nutch-default.xml) to documentation - modify ant build.xml to copy nutch-default.xml into docs/api/resources/ - adapt XSLT table

2020-04-30 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 462ca6e NUTCH-2743 Add list of Nutch properties

[nutch] branch master updated: NUTCH-2772 Debugging parse filter to show serialized DOM tree

2020-04-30 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new caea3a0 NUTCH-2772 Debugging parse filter

[nutch] branch master updated: NUTCH-2776 Fetcher to temporarily deduplicate followed redirects - cache followed redirect targets for a configurable time (`fetcher.redirect.dedupcache.seconds`) - if a

2020-04-30 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 0f33d18 NUTCH-2776 Fetcher to temporarily

[nutch] branch master updated: NUTCH-1945 Test for XLSX parser - add Tika unit test for XLSX files - bundle instance variables and utility methods in class TikaParserTest - clean up javadoc comments

2020-05-12 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 0341f0d NUTCH-1945 Test for XLSX parser - add

[nutch] branch master updated: NUTCH-2758 Add plugin READMEs to binary release packages

2020-05-05 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 90502bd NUTCH-2758 Add plugin READMEs to binary

[nutch] branch master updated: NUTCH-2753 Add -listen option to command-line help of CrawlDbReader and LinkDbReader

2020-05-05 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new c573c70 NUTCH-2753 Add -listen option

[nutch] branch master updated: NUTCH-2785 FreeGenerator: command-line option to define number of generated fetch lists - add command-line option `-numFetchers` to FreeGenerator - in local mode: genera

2020-05-05 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 72f3ff2 NUTCH-2785 FreeGenerator: command-line

[nutch] branch master updated: NUTCH-2002 parse and index checkers to check robots.txt - applied Julien's patch to recent code base - also check redirects whether they are allowed - add command-line p

2020-05-05 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 46db3ed NUTCH-2002 parse and index checkers

[nutch] branch master updated (e61a8a3 -> 9139d6e)

2020-05-14 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git. from e61a8a3 Merge pull request #525 from sebastian-nagel/NUTCH-1945 new b543b8b NUTCH-2419 Some URL filters

[nutch] branch master updated: NUTCH-1194 Generator: CrawlDB lock should be released earlier - release CrawlDb lock after select step, in case, generated items are not marked in CrawlDb (generate.upda

2020-05-05 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 11eea5a NUTCH-1194 Generator: CrawlDB lock

[nutch] branch master updated (0b46ac2 -> 680df6b)

2020-09-14 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git. from 0b46ac2 Merge pull request #551 from sebastian-nagel/NUTCH-2823 new 66f50be NUTCH-2824 urlnormalizer-basic

[nutch] branch master updated: NUTCH-2818 Fix Apache Rat task to check sources for license headers - automatize download of Apache Rat jar file - write report to build/apache-rat-report.txt

2020-08-18 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 4c8dd07 NUTCH-2818 Fix Apache Rat task to check

[nutch] branch master updated: NUTCH-2814 HttpDateFormat's internal time zone may change after parsing a date - reset time zone to GMT after parsing a date

2020-08-18 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 88cd369 NUTCH-2814 HttpDateFormat's internal

[nutch] branch master updated: NUTCH-2697 Upgrade Ivy to fix the issue of an unset packaging.type property NUTCH-2671 Upgrade ant ivy library - upgrade Ivy (2.4.0 -> 2.5.0) - upgrade all plugins build

2020-08-17 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new d07c075 NUTCH-2697 Upgrade Ivy to fix the issue

svn commit: r1880550 - in /nutch/cms_site/trunk: content/credits.md templates/std.html

2020-08-03 Thread snagel
Author: snagel Date: Mon Aug 3 15:27:59 2020 New Revision: 1880550 URL: http://svn.apache.org/viewvc?rev=1880550=rev Log: Add Shashanka Balakuntala as committer and PMC Modified: nutch/cms_site/trunk/content/credits.md nutch/cms_site/trunk/templates/std.html Modified: nutch/cms_site

svn commit: r1880551 - /nutch/cms_site/trunk/content/credits.md

2020-08-03 Thread snagel
Author: snagel Date: Mon Aug 3 15:30:57 2020 New Revision: 1880551 URL: http://svn.apache.org/viewvc?rev=1880551=rev Log: Add Shashanka Balakuntala as committer and PMC Modified: nutch/cms_site/trunk/content/credits.md Modified: nutch/cms_site/trunk/content/credits.md URL: http

svn commit: r1063816 - /websites/production/nutch/content/

2020-08-03 Thread snagel
Author: snagel Date: Mon Aug 3 15:55:59 2020 New Revision: 1063816 Log: Add Shashanka Balakuntala as committer and PMC Added: websites/production/nutch/content/ - copied from r1063815, websites/staging/nutch/trunk/content/

svn commit: r1880552 - /nutch/cms_site/trunk/content/credits.md

2020-08-03 Thread snagel
Author: snagel Date: Mon Aug 3 15:34:04 2020 New Revision: 1880552 URL: http://svn.apache.org/viewvc?rev=1880552=rev Log: Add Shashanka Balakuntala as committer and PMC Modified: nutch/cms_site/trunk/content/credits.md Modified: nutch/cms_site/trunk/content/credits.md URL: http

[nutch] branch master updated (e33aaa1 -> 2f5a8ad)

2020-08-03 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git. from e33aaa1 NUTCH-2811 : Setup Github workflows for prs (#543) new f24ccab [NUTCH-2801] RobotsRulesParser command

[nutch] branch master updated: NUTCH-2810 FreeGenerator to actually apply configured number of fetch lists

2020-08-03 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 46f7dc2 NUTCH-2810 FreeGenerator to actually

[nutch] branch master updated: NUTCH-2817 Avoid check for equality of URL path and file part using ==/!= - replace check whether URL path and file are identical by check whether URL has a query - clea

2020-08-11 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new c68780d NUTCH-2817 Avoid check for equality

[nutch] branch master updated: NUTCH-2816 Add Spotbugs target to ant build - called on-demand as ant target "spotbugs" - creates spotbugs report ("build/nutch-spotbugs.html") covering Nutch core and p

2020-08-11 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 8b85324 NUTCH-2816 Add Spotbugs target to ant

[nutch] branch master updated: Prepare for new development after release of 1.17 - bump version number (1.17-SNAPSHOT -> 1.18-SNAPSHOT) - add 1.17 changes / release notes - update links to Hadoop and

2020-07-01 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new a1adce7 Prepare for new development after

svn commit: r40263 - /dev/nutch/1.17/ /release/nutch/1.17/

2020-07-01 Thread snagel
Author: snagel Date: Wed Jul 1 09:42:51 2020 New Revision: 40263 Log: Release Apache Nutch 1.17 Added: release/nutch/1.17/ - copied from r40262, dev/nutch/1.17/ Removed: dev/nutch/1.17/

svn commit: r1879444 - /nutch/cms_site/trunk/content/version_control.md

2020-07-02 Thread snagel
Author: snagel Date: Thu Jul 2 14:06:44 2020 New Revision: 1879444 URL: http://svn.apache.org/viewvc?rev=1879444=rev Log: - improve header version control Modified: nutch/cms_site/trunk/content/version_control.md Modified: nutch/cms_site/trunk/content/version_control.md URL: http

svn commit: r1879439 - in /nutch/cms_site/trunk: content/assets/css/bootstrap.css content/assets/img/nutch_logo_tm.gif content/assets/img/nutch_logo_tm.png content/index.md content/javadoc.md template

2020-07-02 Thread snagel
Author: snagel Date: Thu Jul 2 12:28:38 2020 New Revision: 1879439 URL: http://svn.apache.org/viewvc?rev=1879439=rev Log: Release of Nutch 1.17 - fix release headline (1.17 not 1.16) - improve layout: * rerender log from plain vector graphics (https://upload.wikimedia.org/wikipedia/commons

svn commit: r1062489 - /websites/production/nutch/content/

2020-07-02 Thread snagel
Author: snagel Date: Thu Jul 2 14:27:21 2020 New Revision: 1062489 Log: - release Nutch 1.17 - improvements/fixes navigation bar Added: websites/production/nutch/content/ - copied from r1062488, websites/staging/nutch/trunk/content/

svn commit: r40271 - /release/nutch/1.16/

2020-07-02 Thread snagel
Author: snagel Date: Thu Jul 2 14:52:30 2020 New Revision: 40271 Log: Remove 1.16 after release of 1.17 Removed: release/nutch/1.16/

svn commit: r1879446 - /nutch/cms_site/trunk/content/version_control.md

2020-07-02 Thread snagel
Author: snagel Date: Thu Jul 2 14:22:28 2020 New Revision: 1879446 URL: http://svn.apache.org/viewvc?rev=1879446=rev Log: - improve header version control Modified: nutch/cms_site/trunk/content/version_control.md Modified: nutch/cms_site/trunk/content/version_control.md URL: http

svn commit: r1879442 - in /nutch/cms_site/trunk/content: bot.md credits.md downloads.md index.md javadoc.md mailing_lists.md

2020-07-02 Thread snagel
Author: snagel Date: Thu Jul 2 13:55:29 2020 New Revision: 1879442 URL: http://svn.apache.org/viewvc?rev=1879442=rev Log: Unify license headers Modified: nutch/cms_site/trunk/content/bot.md nutch/cms_site/trunk/content/credits.md nutch/cms_site/trunk/content/downloads.md nutch

svn commit: r1879443 - in /nutch/cms_site/trunk: content/version_control.md templates/std.html

2020-07-02 Thread snagel
Author: snagel Date: Thu Jul 2 14:04:14 2020 New Revision: 1879443 URL: http://svn.apache.org/viewvc?rev=1879443=rev Log: - improve header version control - fix link "contribute" (Nutch wiki) Modified: nutch/cms_site/trunk/content/version_control.md nutch/cms_site/trunk

svn commit: r1879440 - in /nutch/cms_site/trunk: content/assets/css/bootstrap.css templates/std.html

2020-07-02 Thread snagel
Author: snagel Date: Thu Jul 2 13:02:13 2020 New Revision: 1879440 URL: http://svn.apache.org/viewvc?rev=1879440=rev Log: Fix Google search box Modified: nutch/cms_site/trunk/content/assets/css/bootstrap.css nutch/cms_site/trunk/templates/std.html Modified: nutch/cms_site/trunk/content

svn commit: r1062491 - /websites/production/nutch/content/

2020-07-02 Thread snagel
Author: snagel Date: Thu Jul 2 15:49:52 2020 New Revision: 1062491 Log: Improve placement of Google search box on Chromium Added: websites/production/nutch/content/ - copied from r1062490, websites/staging/nutch/trunk/content/

svn commit: r1879450 - in /nutch/cms_site/trunk: content/assets/css/bootstrap.css templates/std.html

2020-07-02 Thread snagel
Author: snagel Date: Thu Jul 2 15:47:16 2020 New Revision: 1879450 URL: http://svn.apache.org/viewvc?rev=1879450=rev Log: Improve placement of Google search box on Chromium Modified: nutch/cms_site/trunk/content/assets/css/bootstrap.css nutch/cms_site/trunk/templates/std.html Modified

[nutch] branch master updated (a1adce7 -> ff15671)

2020-07-14 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git. from a1adce7 Prepare for new development after release of 1.17 - bump version number (1.17-SNAPSHOT -> 1.18-SNAPS

[nutch] branch master updated: NUTCH-2782: protocol-http / lib-http: support TLSv1.3

2020-07-14 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 996ff8b NUTCH-2782: protocol-http / lib-http

svn commit: r1879431 - in /nutch/cms_site/trunk/content: ./ apidocs/apidocs-1.17/ apidocs/apidocs-1.17/org/ apidocs/apidocs-1.17/org/apache/ apidocs/apidocs-1.17/org/apache/nutch/ apidocs/apidocs-1.17

2020-07-02 Thread snagel
Author: snagel Date: Thu Jul 2 08:23:10 2020 New Revision: 1879431 URL: http://svn.apache.org/viewvc?rev=1879431=rev Log: Release of Nutch 1.17 - add link to Nutch properties (nutch-default.xml) to javadoc page [This commit notification would consist of 267 parts, which exceeds the limit

[nutch] 01/02: Nutch 1.16 release - update links to Hadoop and Solr API docs

2020-06-18 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch branch-1.17 in repository https://gitbox.apache.org/repos/asf/nutch.git commit 46f7bf0f72869aae908700b1ffa3e0277fcbef13 Author: Sebastian Nagel AuthorDate: Wed Jun 17 23:00:09 2020 +0200

[nutch] branch branch-1.17 updated (77fa56e -> 1386c5a)

2020-06-18 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch branch-1.17 in repository https://gitbox.apache.org/repos/asf/nutch.git. discard 77fa56e Nutch 1.16 release - update current year in API docs etc. - update version number - add changes

[nutch] 02/02: Nutch 1.17 release - update current year in API docs etc. - update version number - add changes / release notes

2020-06-18 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to annotated tag release-1.17 in repository https://gitbox.apache.org/repos/asf/nutch.git commit 1386c5a815f706a3fe6d9e1960924502285c3282 Author: Sebastian Nagel AuthorDate: Wed Jun 17 23:10:35 2020 +0200

[nutch] 01/02: Nutch 1.17 release - update links to Hadoop and Solr API docs

2020-06-18 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to annotated tag release-1.17 in repository https://gitbox.apache.org/repos/asf/nutch.git commit dd93c81d4b8f8039a3c87e4ca1a50bcbe0c45cb4 Author: Sebastian Nagel AuthorDate: Wed Jun 17 23:00:09 2020 +0200

[nutch] annotated tag release-1.17 updated (e68bd87 -> eff98db)

2020-06-18 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to annotated tag release-1.17 in repository https://gitbox.apache.org/repos/asf/nutch.git. *** WARNING: tag release-1.17 was modified! *** from e68bd87 (tag) to eff98db (tag) tagging

[nutch] 02/02: Nutch 1.16 release - update current year in API docs etc. - update version number - add changes / release notes

2020-06-18 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch branch-1.17 in repository https://gitbox.apache.org/repos/asf/nutch.git commit 77fa56e34ccd4ecf35f14111a4a3a0e2912e7f29 Author: Sebastian Nagel AuthorDate: Wed Jun 17 23:10:35 2020 +0200

[nutch] branch branch-1.17 created (now 77fa56e)

2020-06-18 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch branch-1.17 in repository https://gitbox.apache.org/repos/asf/nutch.git. at 77fa56e Nutch 1.16 release - update current year in API docs etc. - update version number - add changes

[nutch] annotated tag release-1.17 updated (77fa56e -> e68bd87)

2020-06-18 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to annotated tag release-1.17 in repository https://gitbox.apache.org/repos/asf/nutch.git. *** WARNING: tag release-1.17 was modified! *** from 77fa56e (commit) to e68bd87 (tag) tagging

svn commit: r40079 [2/3] - /dev/nutch/1.17/

2020-06-18 Thread snagel
rs missing in bin/nutch +[NUTCH-2220] - Rename db.* options used only by the linkdb to linkdb.* + +Nutch 1.11 Release 03/12/2015 (dd/mm/) +Release Report: http://s.apache.org/nutch11 + +* NUTCH-2176 Clean up of log4j.properties (markus) + +* NUTCH-2107 plugin.xml to validate against plugin

svn commit: r40079 [3/3] - /dev/nutch/1.17/

2020-06-18 Thread snagel
Propchange: dev/nutch/1.17/CHANGES.txt -- svn:eol-style = native Added: dev/nutch/1.17/apache-nutch-1.17-bin.tar.gz == Binary file - no diff

svn commit: r40079 [1/3] - /dev/nutch/1.17/

2020-06-18 Thread snagel
Author: snagel Date: Thu Jun 18 10:16:13 2020 New Revision: 40079 Log: Apache Nutch 1.17 RC#1 Added: dev/nutch/1.17/ dev/nutch/1.17/CHANGES.txt (with props) dev/nutch/1.17/apache-nutch-1.17-bin.tar.gz (with props) dev/nutch/1.17/apache-nutch-1.17-bin.tar.gz.asc dev/nutch

[nutch] branch master updated: NUTCH-2790 indexer-csv: escape field leading quote character

2020-06-10 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 6fa02ef NUTCH-2790 indexer-csv: escape field

[nutch] branch master updated: NUTCH-2496 Speed up link inversion step in crawling script

2020-06-09 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 7fba6df NUTCH-2496 Speed up link inversion step

[nutch] branch master updated (9139d6e -> 1cb64df)

2020-06-09 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git. from 9139d6e Merge pull request #526 from sebastian-nagel/NUTCH-2419-urlfilter-rule-file-precedence new f0e1e3d

[nutch] branch master updated: NUTCH-2791 Handle GCS URLs in stats commands

2020-06-11 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git The following commit(s) were added to refs/heads/master by this push: new 6b6e74c NUTCH-2791 Handle GCS URLs in stats

[nutch] 34/35: NUTCH-2817 Avoid check for equality of URL path and file part using ==/!= - replace check whether URL path and file are identical by check whether URL has a query - clean up code and im

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 69deffa67d76eb61ddabe29d54575c5b6635a4e2 Author: Sebastian Nagel AuthorDate: Sat Aug 8 10:54:42 2020 +0200 NUTCH

[nutch] 12/35: NUTCH-2720 ROBOTS metatag ignored when capitalized - move string "robots" to constant in metadata.Nutch - make string lowercase not depend on system locale

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit fa319a60f30dbb0efcd67e306c611d66b7b379f1 Author: Sebastian Nagel AuthorDate: Sun May 17 14:37:47 2020 +0200 NUTCH

[nutch] 33/35: NUTCH-2816 Add Spotbugs target to ant build - called on-demand as ant target "spotbugs" - creates spotbugs report ("build/nutch-spotbugs.html") covering Nutch core and plugins

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit e7a3da373ffb534d943e3911ffee52e8fdcb5691 Author: Sebastian Nagel AuthorDate: Thu Aug 6 19:24:35 2020 +0200 NUTCH

[nutch] 09/35: NUTCH-2419 Some URL filters and normalizers do not respect command-line override for rule file

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 79f3c0ad54025c4d3f87c625faecc807be2a04b9 Author: Sebastian Nagel AuthorDate: Fri Sep 27 22:51:29 2019 +0200 NUTCH

[nutch] 08/35: NUTCH-1945 Test for XLSX parser - add Tika unit test for XLSX files - bundle instance variables and utility methods in class TikaParserTest - clean up javadoc comments

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 37590194f0f604d8a39a2ae814a1874079715822 Author: Sebastian Nagel AuthorDate: Tue May 5 13:25:15 2020 +0200 NUTCH

[nutch] 22/35: [NUTCH-2796] Upgrade to crawler-commons 1.1

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 6fb5ebb572c1d4b65861416c508cdf9275518553 Author: Sebastian Nagel AuthorDate: Mon Jul 6 14:02:39 2020 +0200 [NUTCH

[nutch] 21/35: Prepare for new development after release of 1.17 - bump version number (1.17-SNAPSHOT -> 1.18-SNAPSHOT) - add 1.17 changes / release notes - update links to Hadoop and Solr API docs -

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 4b505f2dc54b29f3d6477014b5195b93d66970e5 Author: Sebastian Nagel AuthorDate: Wed Jun 17 23:00:09 2020 +0200 Prepare

[nutch] 24/35: NUTCH-2782: protocol-http / lib-http: support TLSv1.3

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 50eba77091d03b72d42e16e9daf1c2868a3165af Author: shbalaku AuthorDate: Fri Jul 10 23:07:36 2020 +0530 NUTCH-2782

[nutch] 14/35: NUTCH-2790 indexer-csv: escape field leading quote character

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 6c654988fe4576783c00ab3c6329e175231eda88 Author: Patrick Mezard AuthorDate: Tue Jun 9 17:00:16 2020 +0200 NUTCH

[nutch] 01/35: NUTCH-2743 Add list of Nutch properties (nutch-default.xml) to documentation - modify ant build.xml to copy nutch-default.xml into docs/api/resources/ - adapt XSLT table layout - remove

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 7f51c2530795d7203aa1d0834be8e1c2c1373531 Author: Sebastian Nagel AuthorDate: Wed Apr 29 13:03:01 2020 +0200 NUTCH

[nutch] 32/35: NUTCH-2811 : Setup Github workflows for prs (#543)

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit b4b81f7f3bc28df502cf3acef638dcd9c132f3da Author: Madhawa Gunasekara AuthorDate: Mon Aug 3 17:10:45 2020 +0200 NUTCH

[nutch] 25/35: NUTCH-2805: Rename plugin urlfilter-domainblacklist (#540)

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 4cc60483266c3d74f648e083ec731349b22bcc8d Author: Shashanka Balakuntala Srinivasa AuthorDate: Wed Jul 29 20:05:04 2020

[nutch] 16/35: NUTCH-2788 ParseData: improve presentation of Metadata in method toString() - switch to multi-line presentation of Metadata in ParseData::toString - default implementation of Metadata::

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit e8673d143446840e434506566f5362023ffaeca3 Author: Sebastian Nagel AuthorDate: Tue Jun 9 11:41:37 2020 +0200 NUTCH

[nutch] 10/35: NUTCH-2419 Some URL filters and normalizers do not respect command-line override for rule file

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 83011a08b98c55406583eb068d516ccb9f137266 Author: Sebastian Nagel AuthorDate: Wed May 13 14:39:15 2020 +0200 NUTCH

[nutch] 17/35: NUTCH-2789 Docker README: update links to point to cwiki

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit f08c9db74de45374c15acba6162576bb80437817 Author: Sebastian Nagel AuthorDate: Tue Jun 9 12:06:17 2020 +0200 NUTCH

[nutch] 28/35: NUTCH-1190 MoreIndexingFilter: move data formats used to parse "lastModified" to a config file

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 2c3d864222ef79ed19f33399b5abcd392f27c82a Author: Jakob Berlin AuthorDate: Mon Aug 3 15:56:44 2020 +0200 NUTCH-1190

[nutch] 29/35: [NUTCH-2801] RobotsRulesParser command-line checker to use http.robots.agents as fall-back - if no agent names are given as command-line arguments use values of http.agent.name and http

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit d3d3b31da07f9755c663c20056d57b9a0b172171 Author: Sebastian Nagel AuthorDate: Fri Jul 10 15:13:49 2020 +0200 [NUTCH

[nutch] 05/35: NUTCH-2002 parse and index checkers to check robots.txt - applied Julien's patch to recent code base - also check redirects whether they are allowed - add command-line parameter `-check

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit aed6fa71fa7cd07740235e4c4aeca8380ddb9b48 Author: Sebastian Nagel AuthorDate: Thu Apr 30 12:58:05 2020 +0200 NUTCH

[nutch] 06/35: NUTCH-2753 Add -listen option to command-line help of CrawlDbReader and LinkDbReader

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 72b941fa93fe082f095235612b02ec8be2af4f18 Author: Sebastian Nagel AuthorDate: Thu Apr 30 17:07:30 2020 +0200 NUTCH

[nutch] 15/35: NUTCH-2787 CrawlDb JSON dump does not export metadata primitive data types correctly - add JsonSerializer to write common Writable types (null, boolean, numbers) - remaining "unknown" W

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 41d3eb13d3e5f608bc9a21f1e1b946bf1c7bf46d Author: Sebastian Nagel AuthorDate: Tue Jun 9 14:17:40 2020 +0200 NUTCH

[nutch] 31/35: NUTCH-2810 FreeGenerator to actually apply configured number of fetch lists

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit a51b0f52c43ff057288d019ec1a2dee1f09675c4 Author: Sebastian Nagel AuthorDate: Mon Jul 27 12:05:23 2020 +0200 NUTCH

[nutch] 23/35: [NUTCH-2730] SitemapProcessor to treat sitemap URLs as Set instead of List - sitemap links from robots.txt are treated as set by crawler-commons (since crawler-commons 1.1) - sitemaps r

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 7b163542e4d319f95a7d4d06db77d910250bceb0 Author: Sebastian Nagel AuthorDate: Mon Jul 6 14:03:33 2020 +0200 [NUTCH

[nutch] 02/35: NUTCH-2434 Add methods to reset parameters HTMLMetaTags (apply patch contributed by Markus)

2020-08-16 Thread snagel
This is an automated email from the ASF dual-hosted git repository. snagel pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nutch.git commit 1b27ab8264b91a7116931908f3074022c68c826b Author: Sebastian Nagel AuthorDate: Tue May 5 11:27:35 2020 +0200 NUTCH

<    2   3   4   5   6   7   8   9   >