Re: Unable to index on Hadoop 3.2.0 with 1.16

2020-08-12 Thread Sebastian Nagel
Hi Joe,

> I eliminated it when I updated the index-writers.xml for the solr_indexer_1
> to use only a single URL.

Thanks for the hint. I'm able to reproduce the error by adding an overlong URL 
to
  


Could you open an issue to fix this on
https://issues.apache.org/jira/projects/NUTCH ?

Thanks!

Best,
Sebastian


On 8/12/20 5:35 PM, Gilvary, Joseph wrote:
> Hi,
> 
> I wasn't on the list when this discussion happened, so I hope this will 
> thread correctly in archives. I linked to the archive below and tried to 
> include enough here to ensure searchers can find it if this won't thread.
> 
> I was getting an error with Nutch 1.17.  I never used 1.16, but upgraded from 
> 1.15 recently.
> 
> java.lang.Exception: java.lang.IllegalStateException: text width is less than 
> 1, was <-26>
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)
> Caused by: java.lang.IllegalStateException: text width is less than 1, was 
> <-26>
> at org.apache.commons.lang3.Validate.validState(Validate.java:829)
> at 
> de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215)
> at 
> de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250)
> at de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128)
> at de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191)
> at 
> org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326)
> at 
> org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45)
> at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:542)
> at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
> at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 
> This looks like the error that Markus Jelsma described in the earlier 
> discussion, though the invalid test width in my case was -26. I eliminated it 
> when I updated the index-writers.xml for the solr_indexer_1 to use only a 
> single URL. I don't know where the -26 comes from or the -41 Marcus was 
> getting, but the fact that they were different values told me that the issue 
> would be in the site-specific difference in our configs.
> 
> Adding the link in the archive were I found the earlier discussion:
> http://mail-archives.apache.org/mod_mbox/nutch-user/201910.mbox/%3c05eda22b-14b2-309f-3bc7-d6d85c218...@googlemail.com%3E
> 
> Adding the only potentially relevant Jira link I found while searching:
> https://issues.apache.org/jira/browse/NUTCH-2602
> 
> It seems potentially relevant because Marcus started getting the error after 
> migrating to 1.16 & I started getting it when I went from 1.15 to 1.17.
> 
> Thanks. Stay safe, stay healthy,
> 
> Joe
> 



Re: Unable to index on Hadoop 3.2.0 with 1.16

2020-08-12 Thread Gilvary, Joseph
Hi,

I wasn't on the list when this discussion happened, so I hope this will thread 
correctly in archives. I linked to the archive below and tried to include 
enough here to ensure searchers can find it if this won't thread.

I was getting an error with Nutch 1.17.  I never used 1.16, but upgraded from 
1.15 recently.

java.lang.Exception: java.lang.IllegalStateException: text width is less than 
1, was <-26>
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559)
Caused by: java.lang.IllegalStateException: text width is less than 1, was <-26>
at org.apache.commons.lang3.Validate.validState(Validate.java:829)
at 
de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215)
at 
de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250)
at de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128)
at de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191)
at org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326)
at 
org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45)
at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:542)
at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

This looks like the error that Markus Jelsma described in the earlier 
discussion, though the invalid test width in my case was -26. I eliminated it 
when I updated the index-writers.xml for the solr_indexer_1 to use only a 
single URL. I don't know where the -26 comes from or the -41 Marcus was 
getting, but the fact that they were different values told me that the issue 
would be in the site-specific difference in our configs.

Adding the link in the archive were I found the earlier discussion:
http://mail-archives.apache.org/mod_mbox/nutch-user/201910.mbox/%3c05eda22b-14b2-309f-3bc7-d6d85c218...@googlemail.com%3E

Adding the only potentially relevant Jira link I found while searching:
https://issues.apache.org/jira/browse/NUTCH-2602

It seems potentially relevant because Marcus started getting the error after 
migrating to 1.16 & I started getting it when I went from 1.15 to 1.17.

Thanks. Stay safe, stay healthy,

Joe


Re: Unable to index on Hadoop 3.2.0 with 1.16

2019-10-22 Thread Sebastian Nagel

Hi Markus,

any updates on this? Just to make sure the issue gets resolved.

Thanks,
Sebastian

On 14.10.19 17:08, Markus Jelsma wrote:

Hello,

We're upgrading our stuff to 1.16 and got a peculiar problem when we started 
indexing:

2019-10-14 13:50:30,586 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception 
running child : java.lang.IllegalStateException: text width is less than 1, was 
<-41>
at org.apache.commons.lang3.Validate.validState(Validate.java:829)
at 
de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215)
at 
de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250)
at de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128)
at de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191)
at org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326)
at 
org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45)
at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:542)
at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

The only IndexWriter we use is SolrIndexer, and locally everything is just fine. 


Any thoughts?

Thanks,
Markus





Re: Unable to index on Hadoop 3.2.0 with 1.16

2019-10-14 Thread Sebastian Nagel
Hi Markus,

I've tested in pseudo-distributed mode with Hadoop 3.2.1,
including indexing into Solr. It worked.

Could be a dependency version issue similar to that
causing NUTCH-2706. But that's only an assumption.

Since the IndexWriters.describe() is for help only,
I would just deactivate this method and open an issue to
investigate the reason. Need also to think when and where
to output the index writer options. Maybe better call
the describe() methods of the indexer plugins explicitly
via IndexingJob --help or similar.

Best,
Sebastian

On 14.10.19 17:08, Markus Jelsma wrote:
> Hello,
> 
> We're upgrading our stuff to 1.16 and got a peculiar problem when we started 
> indexing:
> 
> 2019-10-14 13:50:30,586 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.IllegalStateException: text width is less 
> than 1, was <-41>
>   at org.apache.commons.lang3.Validate.validState(Validate.java:829)
>   at 
> de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215)
>   at 
> de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250)
>   at de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128)
>   at de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191)
>   at org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326)
>   at 
> org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45)
>   at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:542)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> 
> The only IndexWriter we use is SolrIndexer, and locally everything is just 
> fine. 
> 
> Any thoughts?
> 
> Thanks,
> Markus
> 



Unable to index on Hadoop 3.2.0 with 1.16

2019-10-14 Thread Markus Jelsma
Hello,

We're upgrading our stuff to 1.16 and got a peculiar problem when we started 
indexing:

2019-10-14 13:50:30,586 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.IllegalStateException: text width is less 
than 1, was <-41>
at org.apache.commons.lang3.Validate.validState(Validate.java:829)
at 
de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215)
at 
de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250)
at de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128)
at de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191)
at org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326)
at 
org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45)
at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:542)
at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

The only IndexWriter we use is SolrIndexer, and locally everything is just 
fine. 

Any thoughts?

Thanks,
Markus