[jira] [Updated] (NUTCH-2667) Update Tika and Commons Collections 4

2018-10-23 Thread Lewis John McGibbney (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-2667:

Description: Tika and Commons Collections 4 need to be updated. This issue 
needs to address them.  (was: Tika and Commons Collections 4 need to be updated 
due to known CVE's.
This issue needs to address them.)

> Update Tika and Commons Collections 4
> -
>
> Key: NUTCH-2667
> URL: https://issues.apache.org/jira/browse/NUTCH-2667
> Project: Nutch
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 2.4
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 2.4
>
>
> Tika and Commons Collections 4 need to be updated. This issue needs to 
> address them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2667) Update Tika and Commons Collections 4

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661740#comment-16661740
 ] 

ASF GitHub Bot commented on NUTCH-2667:
---

lewismc opened a new pull request #403: NUTCH-2667 Update Tika and Commons 
Collections 4
URL: https://github.com/apache/nutch/pull/403
 
 
   This issue addresses https://issues.apache.org/jira/browse/NUTCH-2667


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Tika and Commons Collections 4
> -
>
> Key: NUTCH-2667
> URL: https://issues.apache.org/jira/browse/NUTCH-2667
> Project: Nutch
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 2.4
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 2.4
>
>
> Tika and Commons Collections 4 need to be updated due to known CVE's.
> This issue needs to address them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (NUTCH-2667) Update Tika and Commons Collections 4

2018-10-23 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2667:
---

 Summary: Update Tika and Commons Collections 4
 Key: NUTCH-2667
 URL: https://issues.apache.org/jira/browse/NUTCH-2667
 Project: Nutch
  Issue Type: Improvement
  Components: build
Affects Versions: 2.4
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 2.4


Tika and Commons Collections 4 need to be updated due to known CVE's.
This issue needs to address them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2630) Fetcher to log skipped records by robots.txt

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661726#comment-16661726
 ] 

ASF GitHub Bot commented on NUTCH-2630:
---

lewismc commented on issue #387: NUTCH-2630 Fetcher to log skipped records by 
robots.txt
URL: https://github.com/apache/nutch/pull/387#issuecomment-432511339
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Fetcher to log skipped records by robots.txt
> 
>
> Key: NUTCH-2630
> URL: https://issues.apache.org/jira/browse/NUTCH-2630
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.15
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: 1.16
>
>
> To analyze problems it would be helpful if fetcher logs URLs which are 
> disallowed in the robots.txt - see [discussion on user mailing 
> list|https://lists.apache.org/thread.html/7fe5b02104ea866aba183d009a5fad59ad4e4daf8954593ef0123dd6@%3Cuser.nutch.apache.org%3E].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2655) Update Solr schema.xml for Solr 7.x

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661725#comment-16661725
 ] 

ASF GitHub Bot commented on NUTCH-2655:
---

lewismc commented on issue #395: NUTCH-2655 Update Solr schema.xml for Solr 7.x
URL: https://github.com/apache/nutch/pull/395#issuecomment-432511163
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Solr schema.xml for Solr 7.x
> ---
>
> Key: NUTCH-2655
> URL: https://issues.apache.org/jira/browse/NUTCH-2655
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer, plugin
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Priority: Major
> Fix For: 1.16
>
>
> The Solr schema.xml is not compatible with Solr 7.x which is used by Nutch 
> 1.15. I've tested Solr 7.3.1 and 7.5.0: when using the current schema.xml, 
> Solr fails and complains about unknown field types:
> {noformat}
> 2018-10-15 12:55:24.484 ERROR (qtp102617125-17) [ x:nutch] 
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error 
> CREATEing SolrCore 'nutch': Unable to create core [nutch] Caused by: 
> fieldType 'pdates' not found in the schema
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2658) Add README file to all plugins in src/plugin

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661724#comment-16661724
 ] 

ASF GitHub Bot commented on NUTCH-2658:
---

lewismc commented on issue #398: NUTCH-2658 Add README for the index-links 
plugin
URL: https://github.com/apache/nutch/pull/398#issuecomment-43257
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add README file to all plugins in src/plugin
> 
>
> Key: NUTCH-2658
> URL: https://issues.apache.org/jira/browse/NUTCH-2658
> Project: Nutch
>  Issue Type: Improvement
>  Components: documentation, plugin
>Reporter: Jorge Luis Betancourt Gonzalez
>Priority: Trivial
>
> Since we've migrated a good portion of our workflow to Github we could 
> consider adding a {{README.md}} file to the root of each plugin in 
> {{src/plugins}}. 
> This is a good place to have plugin-specific documentation. Wich fields the 
> plugin adds to the indexer, which configuration options, etc. Also, since the 
> README.md is rendered by Github automatically is a good link to point users.
> I think that a good example is the {{indexer-cloudsearch}} plugin, on top of 
> that it's a good source of information to point users when asking questions 
> regarding a specific plugin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Akshar Dave (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661680#comment-16661680
 ] 

Akshar Dave edited comment on NUTCH-2665 at 10/24/18 4:08 AM:
--

were you able to commit this change and successfully build? Defined the 
property in the ivysettings.xml , as shown here 
[65c4fed|https://gitbox.apache.org/repos/asf?p=nutch.git;a=commitdiff;h=65c4fedfacdb873a050e97a50602ed366c7b5a98].
  I am trying to build locally after merging all the changes and getting 
dependency related error:

[ivy:retrieve]  public: tried
[ivy:retrieve] 
http://repo1.maven.org/maven2/javax/ws/rs/javax.ws.rs-api/2.1/javax.ws.rs-api-2.1.${packaging.type}


was (Author: axr):
were you able to commit this change and successfully build? I am trying to 
build locally after merging all the changes and getting dependency related 
error:

[ivy:resolve] ::
[ivy:resolve] :: UNRESOLVED DEPENDENCIES ::
[ivy:resolve] ::
[ivy:resolve] :: javax.measure#unit-api;working@axr.local: not found
[ivy:resolve] ::

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch, NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Akshar Dave (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661680#comment-16661680
 ] 

Akshar Dave commented on NUTCH-2665:


were you able to commit this change and successfully build? I am trying to 
build locally after merging all the changes and getting dependency related 
error:

[ivy:resolve] ::
[ivy:resolve] :: UNRESOLVED DEPENDENCIES ::
[ivy:resolve] ::
[ivy:resolve] :: javax.measure#unit-api;working@axr.local: not found
[ivy:resolve] ::

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch, NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : Nutch-trunk #3578

2018-10-23 Thread Apache Jenkins Server
See 



[jira] [Commented] (NUTCH-2659) Add missing Apache license headers

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661343#comment-16661343
 ] 

Hudson commented on NUTCH-2659:
---

FAILURE: Integrated in Jenkins build Nutch-trunk #3577 (See 
[https://builds.apache.org/job/Nutch-trunk/3577/])
NUTCH-2659 Add missing Apache license headers (snagel: 
[https://github.com/apache/nutch/commit/8bf04cd49c7930ef62e3bc41aa3ee55917a4018f])
* (edit) src/java/org/apache/nutch/webui/pages/instances/InstancePanel.java
* (edit) 
src/plugin/indexer-dummy/src/java/org/apache/nutch/indexwriter/dummy/DummyConstants.java
* (edit) src/plugin/scoring-depth/build.xml
* (edit) src/plugin/parse-metatags/plugin.xml
* (edit) src/plugin/scoring-depth/plugin.xml
* (edit) 
src/plugin/scoring-depth/src/java/org/apache/nutch/scoring/depth/DepthScoringFilter.java
* (edit) 
src/plugin/headings/src/test/org/apache/nutch/parse/headings/TestHeadingsParseFilter.java
* (edit) src/java/org/apache/nutch/webui/pages/settings/SettingsPage.java
* (edit) src/plugin/index-replace/plugin.xml
* (edit) src/java/org/apache/nutch/indexer/IndexWriterParams.java
* (edit) 
src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/cosine/package-info.java
* (edit) src/java/org/apache/nutch/tools/WARCUtils.java
* (edit) src/test/org/apache/nutch/crawl/TODOTestCrawlDbStates.java
* (edit) src/java/org/apache/nutch/scoring/AbstractScoringFilter.java
* (edit) src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java


> Add missing Apache license headers
> --
>
> Key: NUTCH-2659
> URL: https://issues.apache.org/jira/browse/NUTCH-2659
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.16
>
>
> Should add Apache license headers to source files (at least, *.java) - some 
> files lack the license header.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: Nutch-trunk #3577

2018-10-23 Thread Apache Jenkins Server
See 


Changes:

[snagel] NUTCH-2659 Add missing Apache license headers

--
[...truncated 349.50 KB...]
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.opennlp#opennlp-tools;1.9.0!opennlp-tools.jar(bundle) (43ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/commons-io/commons-io/2.6/commons-io-2.6.jar ...
[ivy:resolve] ... (209kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] commons-io#commons-io;2.6!commons-io.jar (24ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/com/github/openjson/openjson/1.0.10/openjson-1.0.10.jar
 ...
[ivy:resolve] .. (26kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] com.github.openjson#openjson;1.0.10!openjson.jar 
(21ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/com/google/code/gson/gson/2.8.5/gson-2.8.5.jar ...
[ivy:resolve]  (235kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] com.google.code.gson#gson;2.8.5!gson.jar (25ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/slf4j/jul-to-slf4j/1.7.25/jul-to-slf4j-1.7.25.jar
 ...
[ivy:resolve] .. (4kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] org.slf4j#jul-to-slf4j;1.7.25!jul-to-slf4j.jar 
(21ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/net/java/dev/jna/jna/4.3.0/jna-4.3.0.jar ...
[ivy:resolve]  (923kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] net.java.dev.jna#jna;4.3.0!jna.jar (39ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/jsoup/jsoup/1.11.3/jsoup-1.11.3.jar ...
[ivy:resolve] .. (386kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] org.jsoup#jsoup;1.11.3!jsoup.jar (27ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/httpcomponents/httpmime/4.5.6/httpmime-4.5.6.jar
 ...
[ivy:resolve] ... (40kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.httpcomponents#httpmime;4.5.6!httpmime.jar (22ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/sis/core/sis-utility/0.8/sis-utility-0.8.jar
 ...
[ivy:resolve] .. (812kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.sis.core#sis-utility;0.8!sis-utility.jar(bundle) (33ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/sis/storage/sis-netcdf/0.8/sis-netcdf-0.8.jar
 ...
[ivy:resolve] . (71kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.sis.storage#sis-netcdf;0.8!sis-netcdf.jar(bundle) (22ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/sis/core/sis-metadata/0.8/sis-metadata-0.8.jar
 ...
[ivy:resolve] .. (627kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.sis.core#sis-metadata;0.8!sis-metadata.jar(bundle) (31ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/opengis/geoapi/3.0.1/geoapi-3.0.1.jar ...
[ivy:resolve] .. (211kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] org.opengis#geoapi;3.0.1!geoapi.jar(bundle) (24ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/uima/uimafit-core/2.2.0/uimafit-core-2.2.0.jar
 ...
[ivy:resolve]  (167kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.uima#uimafit-core;2.2.0!uimafit-core.jar (23ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/uima/uimaj-core/2.9.0/uimaj-core-2.9.0.jar
 ...
[ivy:resolve] 
... (1553kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] org.apache.uima#uimaj-core;2.9.0!uimaj-core.jar 
(52ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/jdom/jdom2/2.0.6/jdom2-2.0.6.jar ...
[ivy:resolve] .. (297kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] org.jdom#jdom2;2.0.6!jdom2.jar (26ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.9.6/jackson-core-2.9.6.jar
 ...
[ivy:resolve] ... (316kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
com.fasterxml.jackson.core#jackson-core;2.9.6!jackson-core.jar(bundle) (27ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.9.6/jackson-databind-2.9.6.jar
 ...
[ivy:resolve]  
(1317kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
com.fasterxml.jackson.core#jackson-databind;2.9.6!jackson-databind.jar(bundle) 
(45ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.9.6/jackson-annotations-2.9.6.jar
 ...
[ivy:resolve]  (65kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
com.fasterxml.jackson.core#jackson-annotations;2.9.6!jackson-annotations.jar(bundle)
 (26ms)

[jira] [Commented] (NUTCH-2655) Update Solr schema.xml for Solr 7.x

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661285#comment-16661285
 ] 

ASF GitHub Bot commented on NUTCH-2655:
---

jorgelbg commented on issue #395: NUTCH-2655 Update Solr schema.xml for Solr 7.x
URL: https://github.com/apache/nutch/pull/395#issuecomment-432417798
 
 
   +1 LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Solr schema.xml for Solr 7.x
> ---
>
> Key: NUTCH-2655
> URL: https://issues.apache.org/jira/browse/NUTCH-2655
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer, plugin
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Priority: Major
> Fix For: 1.16
>
>
> The Solr schema.xml is not compatible with Solr 7.x which is used by Nutch 
> 1.15. I've tested Solr 7.3.1 and 7.5.0: when using the current schema.xml, 
> Solr fails and complains about unknown field types:
> {noformat}
> 2018-10-15 12:55:24.484 ERROR (qtp102617125-17) [ x:nutch] 
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error 
> CREATEing SolrCore 'nutch': Unable to create core [nutch] Caused by: 
> fieldType 'pdates' not found in the schema
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2661) Move TestOutlinks to the proper path

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661261#comment-16661261
 ] 

Hudson commented on NUTCH-2661:
---

FAILURE: Integrated in Jenkins build Nutch-trunk #3576 (See 
[https://builds.apache.org/job/Nutch-trunk/3576/])
NUTCH-2661 Move the TestOutlinks class into the o.a.n.parse path 
(jorge-luis.betancourt: 
[https://github.com/apache/nutch/commit/c8fcb78890df4e917a1d36ae945a1e01e02c0c78])
* (delete) 
src/plugin/index-links/src/test/org/apache/nutch/parse/TestOutlinks.java
* (add) src/test/org/apache/nutch/parse/TestOutlinks.java


> Move TestOutlinks to the proper path
> 
>
> Key: NUTCH-2661
> URL: https://issues.apache.org/jira/browse/NUTCH-2661
> Project: Nutch
>  Issue Type: Improvement
>Reporter: Jorge Luis Betancourt Gonzalez
>Assignee: Jorge Luis Betancourt Gonzalez
>Priority: Trivial
> Fix For: 1.16
>
>
> Initially, I placed the {{TestOutlinks}} class in the index-links plugin, 
> although this was when I found the bug with the {{hashCode}}. Now I realised 
> that this test is best to have in the {{test/org/apache/nutch/nutch/parse}} 
> directory. 
> Even more because since this test is not covering any plugin-specific code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: Nutch-trunk #3576

2018-10-23 Thread Apache Jenkins Server
See 


Changes:

[jorge-luis.betancourt] NUTCH-2661 Move the TestOutlinks class into the 
o.a.n.parse path

--
[...truncated 349.12 KB...]
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.opennlp#opennlp-tools;1.9.0!opennlp-tools.jar(bundle) (185ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/commons-io/commons-io/2.6/commons-io-2.6.jar ...
[ivy:resolve] . (209kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] commons-io#commons-io;2.6!commons-io.jar (146ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/com/github/openjson/openjson/1.0.10/openjson-1.0.10.jar
 ...
[ivy:resolve] . (26kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] com.github.openjson#openjson;1.0.10!openjson.jar 
(131ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/com/google/code/gson/gson/2.8.5/gson-2.8.5.jar ...
[ivy:resolve] .. (235kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] com.google.code.gson#gson;2.8.5!gson.jar (26ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/slf4j/jul-to-slf4j/1.7.25/jul-to-slf4j-1.7.25.jar
 ...
[ivy:resolve] .. (4kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] org.slf4j#jul-to-slf4j;1.7.25!jul-to-slf4j.jar 
(21ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/net/java/dev/jna/jna/4.3.0/jna-4.3.0.jar ...
[ivy:resolve] ... (923kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] net.java.dev.jna#jna;4.3.0!jna.jar (38ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/jsoup/jsoup/1.11.3/jsoup-1.11.3.jar ...
[ivy:resolve] .. (386kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] org.jsoup#jsoup;1.11.3!jsoup.jar (28ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/httpcomponents/httpmime/4.5.6/httpmime-4.5.6.jar
 ...
[ivy:resolve] .. (40kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.httpcomponents#httpmime;4.5.6!httpmime.jar (21ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/sis/core/sis-utility/0.8/sis-utility-0.8.jar
 ...
[ivy:resolve] ... (812kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.sis.core#sis-utility;0.8!sis-utility.jar(bundle) (37ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/sis/storage/sis-netcdf/0.8/sis-netcdf-0.8.jar
 ...
[ivy:resolve] ... (71kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.sis.storage#sis-netcdf;0.8!sis-netcdf.jar(bundle) (29ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/sis/core/sis-metadata/0.8/sis-metadata-0.8.jar
 ...
[ivy:resolve] .. (627kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.sis.core#sis-metadata;0.8!sis-metadata.jar(bundle) (31ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/opengis/geoapi/3.0.1/geoapi-3.0.1.jar ...
[ivy:resolve] ... (211kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] org.opengis#geoapi;3.0.1!geoapi.jar(bundle) (24ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/uima/uimafit-core/2.2.0/uimafit-core-2.2.0.jar
 ...
[ivy:resolve]  (167kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.uima#uimafit-core;2.2.0!uimafit-core.jar (123ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/uima/uimaj-core/2.9.0/uimaj-core-2.9.0.jar
 ...
[ivy:resolve] 
... (1553kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] org.apache.uima#uimaj-core;2.9.0!uimaj-core.jar 
(55ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/jdom/jdom2/2.0.6/jdom2-2.0.6.jar ...
[ivy:resolve] .. (297kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] org.jdom#jdom2;2.0.6!jdom2.jar (26ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.9.6/jackson-core-2.9.6.jar
 ...
[ivy:resolve] ... (316kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
com.fasterxml.jackson.core#jackson-core;2.9.6!jackson-core.jar(bundle) (37ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.9.6/jackson-databind-2.9.6.jar
 ...
[ivy:resolve] . 
(1317kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
com.fasterxml.jackson.core#jackson-databind;2.9.6!jackson-databind.jar(bundle) 
(44ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.9.6/jackson-annotations-2.9.6.jar
 ...
[ivy:resolve]  (65kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 

[jira] [Commented] (NUTCH-2659) Add missing Apache license headers

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661253#comment-16661253
 ] 

ASF GitHub Bot commented on NUTCH-2659:
---

jorgelbg commented on issue #396: NUTCH-2659 Add missing Apache license headers
URL: https://github.com/apache/nutch/pull/396#issuecomment-432412749
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add missing Apache license headers
> --
>
> Key: NUTCH-2659
> URL: https://issues.apache.org/jira/browse/NUTCH-2659
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.16
>
>
> Should add Apache license headers to source files (at least, *.java) - some 
> files lack the license header.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2659) Add missing Apache license headers

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661254#comment-16661254
 ] 

ASF GitHub Bot commented on NUTCH-2659:
---

jorgelbg closed pull request #396: NUTCH-2659 Add missing Apache license headers
URL: https://github.com/apache/nutch/pull/396
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/src/java/org/apache/nutch/indexer/IndexWriterParams.java 
b/src/java/org/apache/nutch/indexer/IndexWriterParams.java
index cc91ec02a..952dc9ee1 100644
--- a/src/java/org/apache/nutch/indexer/IndexWriterParams.java
+++ b/src/java/org/apache/nutch/indexer/IndexWriterParams.java
@@ -1,3 +1,20 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
 package org.apache.nutch.indexer;
 
 import org.apache.hadoop.util.StringUtils;
diff --git a/src/java/org/apache/nutch/scoring/AbstractScoringFilter.java 
b/src/java/org/apache/nutch/scoring/AbstractScoringFilter.java
index d74c7fbe1..cd592740d 100644
--- a/src/java/org/apache/nutch/scoring/AbstractScoringFilter.java
+++ b/src/java/org/apache/nutch/scoring/AbstractScoringFilter.java
@@ -1,3 +1,20 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
 package org.apache.nutch.scoring;
 
 import java.util.Collection;
diff --git a/src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java 
b/src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java
index 6f89b16f2..27f119806 100644
--- a/src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java
+++ b/src/java/org/apache/nutch/tools/CommonCrawlFormatWARC.java
@@ -1,3 +1,20 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
 package org.apache.nutch.tools;
 
 import java.io.ByteArrayInputStream;
diff --git a/src/java/org/apache/nutch/tools/WARCUtils.java 
b/src/java/org/apache/nutch/tools/WARCUtils.java
index a705ae7c1..dab3ba764 100644
--- a/src/java/org/apache/nutch/tools/WARCUtils.java
+++ b/src/java/org/apache/nutch/tools/WARCUtils.java
@@ -1,3 +1,20 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * 

[jira] [Commented] (NUTCH-2661) Move TestOutlinks to the proper path

2018-10-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661199#comment-16661199
 ] 

ASF GitHub Bot commented on NUTCH-2661:
---

jorgelbg closed pull request #399: NUTCH-2661 Move the TestOutlinks class into 
the o.a.n.parse path
URL: https://github.com/apache/nutch/pull/399
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/src/plugin/index-links/src/test/org/apache/nutch/parse/TestOutlinks.java 
b/src/test/org/apache/nutch/parse/TestOutlinks.java
similarity index 100%
rename from 
src/plugin/index-links/src/test/org/apache/nutch/parse/TestOutlinks.java
rename to src/test/org/apache/nutch/parse/TestOutlinks.java


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Move TestOutlinks to the proper path
> 
>
> Key: NUTCH-2661
> URL: https://issues.apache.org/jira/browse/NUTCH-2661
> Project: Nutch
>  Issue Type: Improvement
>Reporter: Jorge Luis Betancourt Gonzalez
>Assignee: Jorge Luis Betancourt Gonzalez
>Priority: Trivial
> Fix For: 1.16
>
>
> Initially, I placed the {{TestOutlinks}} class in the index-links plugin, 
> although this was when I found the bug with the {{hashCode}}. Now I realised 
> that this test is best to have in the {{test/org/apache/nutch/nutch/parse}} 
> directory. 
> Even more because since this test is not covering any plugin-specific code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NUTCH-2666) increase default value for http.content.limit

2018-10-23 Thread Marco Ebbinghaus (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Ebbinghaus updated NUTCH-2666:

Description: 
The default value for http.content.limit in nutch-default.xml (The length limit 
for downloaded content using the http://
 protocol, in bytes. If this value is nonnegative (>=0), content longer
 than it will be truncated; otherwise, no truncation at all. Do not
 confuse this setting with the file.content.limit setting.) is set to 64kb. 
Maybe this default value should be increased as many pages today are greater 
than 64kb.

This fact hit me when trying to crawl a single website whose pages are much 
greater than 64kb and because of that with every crawl cycle the count of 
db_unfetched urls decreased until it hit zero and the crawler became inactive 
(because the first 64 kB contained always the same set of navigation links)

The description might also be updated as this is not only the case for the http 
protocol, but also for https.

  was:
The default value for http.content.limit in nutch-default.xml (The length limit 
for downloaded content using the http://
 protocol, in bytes. If this value is nonnegative (>=0), content longer
 than it will be truncated; otherwise, no truncation at all. Do not
 confuse this setting with the file.content.limit setting.) is set to 64kb. 
Maybe this default value should be increased as many pages today are greater 
than 64kb.

The description might also be updated as this is not only the case for the http 
protocol, but also for https.


> increase default value for http.content.limit
> -
>
> Key: NUTCH-2666
> URL: https://issues.apache.org/jira/browse/NUTCH-2666
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.15
>Reporter: Marco Ebbinghaus
>Priority: Minor
>
> The default value for http.content.limit in nutch-default.xml (The length 
> limit for downloaded content using the http://
>  protocol, in bytes. If this value is nonnegative (>=0), content longer
>  than it will be truncated; otherwise, no truncation at all. Do not
>  confuse this setting with the file.content.limit setting.) is set to 64kb. 
> Maybe this default value should be increased as many pages today are greater 
> than 64kb.
> This fact hit me when trying to crawl a single website whose pages are much 
> greater than 64kb and because of that with every crawl cycle the count of 
> db_unfetched urls decreased until it hit zero and the crawler became inactive 
> (because the first 64 kB contained always the same set of navigation links)
> The description might also be updated as this is not only the case for the 
> http protocol, but also for https.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NUTCH-2666) increase default value for http.content.limit

2018-10-23 Thread Marco Ebbinghaus (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Ebbinghaus updated NUTCH-2666:

Description: 
The default value for http.content.limit in nutch-default.xml (The length limit 
for downloaded content using the http://
 protocol, in bytes. If this value is nonnegative (>=0), content longer
 than it will be truncated; otherwise, no truncation at all. Do not
 confuse this setting with the file.content.limit setting.) is set to 64kb. 
Maybe this default value should be increased as many pages today are greater 
than 64kb.

The description might also be updated as this is not only the case for the http 
protocol, but also for https.

  was:
The default value for http.content.limit (The length limit for downloaded 
content using the http://
 protocol, in bytes. If this value is nonnegative (>=0), content longer
 than it will be truncated; otherwise, no truncation at all. Do not
 confuse this setting with the file.content.limit setting.) is set to 64kb. 
Maybe this default value should be increased as many pages today are greater 
than 64kb.

The description might also be updated as this is not only the case for the http 
protocol, but also for https.


> increase default value for http.content.limit
> -
>
> Key: NUTCH-2666
> URL: https://issues.apache.org/jira/browse/NUTCH-2666
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 1.15
>Reporter: Marco Ebbinghaus
>Priority: Minor
>
> The default value for http.content.limit in nutch-default.xml (The length 
> limit for downloaded content using the http://
>  protocol, in bytes. If this value is nonnegative (>=0), content longer
>  than it will be truncated; otherwise, no truncation at all. Do not
>  confuse this setting with the file.content.limit setting.) is set to 64kb. 
> Maybe this default value should be increased as many pages today are greater 
> than 64kb.
> The description might also be updated as this is not only the case for the 
> http protocol, but also for https.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (NUTCH-2666) increase default value for http.content.limit

2018-10-23 Thread Marco Ebbinghaus (JIRA)
Marco Ebbinghaus created NUTCH-2666:
---

 Summary: increase default value for http.content.limit
 Key: NUTCH-2666
 URL: https://issues.apache.org/jira/browse/NUTCH-2666
 Project: Nutch
  Issue Type: Improvement
  Components: fetcher
Affects Versions: 1.15
Reporter: Marco Ebbinghaus


The default value for http.content.limit (The length limit for downloaded 
content using the http://
 protocol, in bytes. If this value is nonnegative (>=0), content longer
 than it will be truncated; otherwise, no truncation at all. Do not
 confuse this setting with the file.content.limit setting.) is set to 64kb. 
Maybe this default value should be increased as many pages today are greater 
than 64kb.

The description might also be updated as this is not only the case for the http 
protocol, but also for https.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660625#comment-16660625
 ] 

Markus Jelsma commented on NUTCH-2665:
--

I'll commit this one later today, if i don't forget, unless further objections.


> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch, NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-2665:
-
Attachment: NUTCH-2665.patch

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch, NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660525#comment-16660525
 ] 

Markus Jelsma commented on NUTCH-2665:
--

Updated patch defining the property in ivysettings.xml.

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch, NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Sebastian Nagel (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660518#comment-16660518
 ] 

Sebastian Nagel commented on NUTCH-2665:


+1 Thanks, [~markus17]!
For 1.x I needed several trials to get the fix for the javax.ws dependency 
working on the [Jenkins builds|https://builds.apache.org/job/Nutch-trunk/]. 
Defining packaging.type=jar in the default.properties didn't work, also adding 
it as an ant param did not (equiv. to {{ant -Dpackaging.type=jar ...}}). 
Defining the property in the ivysettings.xml finally solved it, see 
[65c4fed|https://gitbox.apache.org/repos/asf?p=nutch.git;a=commitdiff;h=65c4fedfacdb873a050e97a50602ed366c7b5a98].
 Can you integrate this change into your patch?

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16660455#comment-16660455
 ] 

Markus Jelsma commented on NUTCH-2665:
--

Patch for 2.x!

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-2665:
-
Attachment: NUTCH-2665.patch

> Upgrade to Apache Tika 1.19.1
> -
>
> Key: NUTCH-2665
> URL: https://issues.apache.org/jira/browse/NUTCH-2665
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Affects Versions: 2.3.1
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 2.4
>
> Attachments: NUTCH-2665.patch
>
>
> Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
> Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (NUTCH-2665) Upgrade to Apache Tika 1.19.1

2018-10-23 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-2665:


 Summary: Upgrade to Apache Tika 1.19.1
 Key: NUTCH-2665
 URL: https://issues.apache.org/jira/browse/NUTCH-2665
 Project: Nutch
  Issue Type: Task
  Components: parser
Affects Versions: 2.3.1
Reporter: Markus Jelsma
Assignee: Markus Jelsma
 Fix For: 2.4


Borrowing from [~wastl-nagel]'s efforts on NUTCH-2651, 2.x can be upgraded to 
Apache Tika 1.19.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)