[jira] [Commented] (NUTCH-2824) urlnormalizer-basic to unescape percent-encoded host names

2020-10-25 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220422#comment-17220422
 ] 

Hudson commented on NUTCH-2824:
---

FAILURE: Integrated in Jenkins build Nutch » Nutch-trunk #10 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/10/])
NUTCH-2824 urlnormalizer-basic to unescape percent-encoded host names (snagel: 
[https://github.com/apache/nutch/commit/66f50be86d870ddd9b420334bfa8f6c0f9a79ce6])
* (edit) 
src/plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java
* (edit) 
src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
NUTCH-2824 urlnormalizer-basic to unescape percent-encoded host names (snagel: 
[https://github.com/apache/nutch/commit/44cdb20469989fc865ec062daabdf606974c48fc])
* (edit) 
src/plugin/urlnormalizer-basic/src/test/org/apache/nutch/net/urlnormalizer/basic/TestBasicURLNormalizer.java


> urlnormalizer-basic to unescape percent-encoded host names
> --
>
> Key: NUTCH-2824
> URL: https://issues.apache.org/jira/browse/NUTCH-2824
> Project: Nutch
>  Issue Type: Bug
>  Components: plugin, urlnormalizer
>Affects Versions: 1.17
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.18
>
>
> BasicURLNormalizer should unescape percent-encoded characters in host names, 
> similar as done in web browsers. Examples: 
> [https://example%2Ecom/|https://example.com/] or 
> [https://www.0251-sachverst%c3%a4ndiger.de/|https://www.xn--0251-sachverstndiger-ozb.de/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NUTCH-2823) IllegalStateException in IndexWriters.describe() when validating url param for SolrIndexer

2020-10-25 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17220421#comment-17220421
 ] 

Hudson commented on NUTCH-2823:
---

FAILURE: Integrated in Jenkins build Nutch » Nutch-trunk #10 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/10/])
NUTCH-2823 IllegalStateException in IndexWriters.describe() when validating url 
param for SolrIndexer (snagel: 
[https://github.com/apache/nutch/commit/96bd7577b7276c91f01e6b226742805b481151b4])
* (edit) src/java/org/apache/nutch/indexer/IndexWriters.java


> IllegalStateException in IndexWriters.describe() when validating url param 
> for SolrIndexer
> --
>
> Key: NUTCH-2823
> URL: https://issues.apache.org/jira/browse/NUTCH-2823
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer, plugin
>Affects Versions: 1.16, 1.17
>Reporter: Joe Gilvary
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.18
>
>
> The string validation for the IndexWriters.describe() fails when the value in 
> index-writers.xml is too long.
> I encountered the exception when using three comma-separated URL values in a 
> config that worked for Nutch 1.15.The schema doesn't allow multiple values, 
> but the documentation says a comma-separated list works.
> Indexing ran without the exception when I changed to use only one host's URL 
> (Solr Cloud). Sebastian duplicated the error with a long string value for the 
> param, so it's not directly due to the comma separated values.
> While googling I found this thread in the archives where Markus encountered 
> it going from 1.15 to 1.16:
> mail-archives.apache.org/mod_mbox/nutch-user/201910.mbox/<05eda22b-14b2-309f-3bc7-d6d85c218...@googlemail.com>
> I also found a change in 1.16 that might be relevant: NUTCH-2602
>  https://issues.apache.org/jira/browse/NUTCH-2602
> My stack trace:
> {{java.lang.Exception: java.lang.IllegalStateException: text width is less 
> than 1, was <-26>}}
>  \{{ at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)}}
>  \{{ at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)}}
>  {{Caused by: java.lang.IllegalStateException: text width is less than 1, was 
> <-26>}}
>  \{{ at org.apache.commons.lang3.Validate.validState(Validate.java:829)}}
>  \{{ at 
> de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215)}}
>  \{{ at 
> de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250)}}
>  \{{ at de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128)}}
>  \{{ at de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191)}}
>  \{{ at 
> org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326)}}
>  \{{ at 
> org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45)}}
>  \{{ at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:542)}}
>  \{{ at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)}}
>  \{{ at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)}}
>  \{{ at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)}}
>  \{{ at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}
>  \{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
>  \{{ at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}
>  \{{ at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}
>  \{{ at java.lang.Thread.run(Thread.java:748)}}
>  
>  Thanks,
>  Joe



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: Nutch » Nutch-trunk #10

2020-10-25 Thread Apache Jenkins Server
See 


Changes:

[Sebastian Nagel] NUTCH-2823 IllegalStateException in IndexWriters.describe() 
when validating url param for SolrIndexer

[Sebastian Nagel] NUTCH-2824 urlnormalizer-basic to unescape percent-encoded 
host names

[Sebastian Nagel] NUTCH-2824 urlnormalizer-basic to unescape percent-encoded 
host names


--
Started by an SCM change
Running as SYSTEM
[EnvInject] - Loading node environment variables.
Building remotely on H33 (ubuntu) in workspace 

No credentials specified
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/nutch.git # timeout=10
Fetching upstream changes from https://github.com/apache/nutch.git
 > git --version # timeout=10
 > git fetch --tags --progress -- https://github.com/apache/nutch.git 
 > +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git rev-parse refs/remotes/origin/master^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/master^{commit} # timeout=10
Checking out Revision 680df6ba1dc68ad5ede5fca743304593d4d5b0a3 
(refs/remotes/origin/master)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 680df6ba1dc68ad5ede5fca743304593d4d5b0a3 # timeout=10
Commit message: "Merge pull request #552 from sebastian-nagel/NUTCH-2824"
 > git rev-list --no-walk f3afee07af94beb43ac3c4ba784af69a7d4323dc # timeout=10
[Nutch-trunk] $ /home/jenkins/tools/ant/latest/bin/ant -file build.xml 
-Dtest.junit.output.format=xml clean nightly javadoc
Buildfile: 
Trying to override old definition of task javac
  [taskdef] Could not load definitions from resource 
org/apache/rat/anttasks/antlib.xml. It could not be found.
  [taskdef] Could not load definitions from resource 
edu/umd/cs/findbugs/anttask/tasks.properties. It could not be found.

clean-build:
   [delete] Deleting directory 


clean-default-lib:

clean-test-lib:

clean-lib:

clean-dist:

clean-runtime:

clean:

ivy-probe-antlib:

ivy-download:
  [taskdef] Could not load definitions from resource 
org/apache/rat/anttasks/antlib.xml. It could not be found.
  [taskdef] Could not load definitions from resource 
edu/umd/cs/findbugs/anttask/tasks.properties. It could not be found.

ivy-download-unchecked:

ivy-init-antlib:

ivy-init:

init:
[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 

[mkdir] Created dir: 


clean-default-lib:

resolve-default:
[ivy:resolve] :: Apache Ivy 2.5.0 - 20191020104435 :: 
https://ant.apache.org/ivy/ ::
[ivy:resolve] :: loading settings :: file = 

  [taskdef] Could not load definitions from resource 
org/apache/rat/anttasks/antlib.xml. It could not be found.
  [taskdef] Could not load definitions from resource 
edu/umd/cs/findbugs/anttask/tasks.properties. It could not be found.

copy-libs:

compile-core:
[javac] Compiling 300 source files to 

[javac] 
:377:
 warning: [deprecation] WOULDBLOCK in ProtocolStatus has been deprecated
[javac] case ProtocolStatus.WOULDBLOCK:
[javac]^
[javac] 
:431:
 warning: [deprecation] BLOCKED in ProtocolStatus has been deprecated
[javac] case ProtocolStatus.BLOCKED:
[javac]^
[javac] 
:214:
 warning: [deprecation] open(Configuration,String) in IndexWriter has been 
deprecated
[javac]   entry.getValue().getIndexWriter().open(conf, name);
[javac]^
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 3 warnings
[javac] Creating empty