Re: Unable to index on Hadoop 3.2.0 with 1.16
Hi Joe, > I eliminated it when I updated the index-writers.xml for the solr_indexer_1 > to use only a single URL. Thanks for the hint. I'm able to reproduce the error by adding an overlong URL to Could you open an issue to fix this on https://issues.apache.org/jira/projects/NUTCH ? Thanks! Best, Sebastian On 8/12/20 5:35 PM, Gilvary, Joseph wrote: > Hi, > > I wasn't on the list when this discussion happened, so I hope this will > thread correctly in archives. I linked to the archive below and tried to > include enough here to ensure searchers can find it if this won't thread. > > I was getting an error with Nutch 1.17. I never used 1.16, but upgraded from > 1.15 recently. > > java.lang.Exception: java.lang.IllegalStateException: text width is less than > 1, was <-26> > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559) > Caused by: java.lang.IllegalStateException: text width is less than 1, was > <-26> > at org.apache.commons.lang3.Validate.validState(Validate.java:829) > at > de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215) > at > de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250) > at de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128) > at de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191) > at > org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326) > at > org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:542) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > This looks like the error that Markus Jelsma described in the earlier > discussion, though the invalid test width in my case was -26. I eliminated it > when I updated the index-writers.xml for the solr_indexer_1 to use only a > single URL. I don't know where the -26 comes from or the -41 Marcus was > getting, but the fact that they were different values told me that the issue > would be in the site-specific difference in our configs. > > Adding the link in the archive were I found the earlier discussion: > http://mail-archives.apache.org/mod_mbox/nutch-user/201910.mbox/%3c05eda22b-14b2-309f-3bc7-d6d85c218...@googlemail.com%3E > > Adding the only potentially relevant Jira link I found while searching: > https://issues.apache.org/jira/browse/NUTCH-2602 > > It seems potentially relevant because Marcus started getting the error after > migrating to 1.16 & I started getting it when I went from 1.15 to 1.17. > > Thanks. Stay safe, stay healthy, > > Joe >
Re: Unable to index on Hadoop 3.2.0 with 1.16
Hi, I wasn't on the list when this discussion happened, so I hope this will thread correctly in archives. I linked to the archive below and tried to include enough here to ensure searchers can find it if this won't thread. I was getting an error with Nutch 1.17. I never used 1.16, but upgraded from 1.15 recently. java.lang.Exception: java.lang.IllegalStateException: text width is less than 1, was <-26> at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:559) Caused by: java.lang.IllegalStateException: text width is less than 1, was <-26> at org.apache.commons.lang3.Validate.validState(Validate.java:829) at de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215) at de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250) at de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128) at de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191) at org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326) at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:542) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:347) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) This looks like the error that Markus Jelsma described in the earlier discussion, though the invalid test width in my case was -26. I eliminated it when I updated the index-writers.xml for the solr_indexer_1 to use only a single URL. I don't know where the -26 comes from or the -41 Marcus was getting, but the fact that they were different values told me that the issue would be in the site-specific difference in our configs. Adding the link in the archive were I found the earlier discussion: http://mail-archives.apache.org/mod_mbox/nutch-user/201910.mbox/%3c05eda22b-14b2-309f-3bc7-d6d85c218...@googlemail.com%3E Adding the only potentially relevant Jira link I found while searching: https://issues.apache.org/jira/browse/NUTCH-2602 It seems potentially relevant because Marcus started getting the error after migrating to 1.16 & I started getting it when I went from 1.15 to 1.17. Thanks. Stay safe, stay healthy, Joe
Re: Unable to index on Hadoop 3.2.0 with 1.16
Hi Markus, any updates on this? Just to make sure the issue gets resolved. Thanks, Sebastian On 14.10.19 17:08, Markus Jelsma wrote: Hello, We're upgrading our stuff to 1.16 and got a peculiar problem when we started indexing: 2019-10-14 13:50:30,586 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalStateException: text width is less than 1, was <-41> at org.apache.commons.lang3.Validate.validState(Validate.java:829) at de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215) at de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250) at de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128) at de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191) at org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326) at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:542) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) The only IndexWriter we use is SolrIndexer, and locally everything is just fine. Any thoughts? Thanks, Markus
Re: Unable to index on Hadoop 3.2.0 with 1.16
Hi Markus, I've tested in pseudo-distributed mode with Hadoop 3.2.1, including indexing into Solr. It worked. Could be a dependency version issue similar to that causing NUTCH-2706. But that's only an assumption. Since the IndexWriters.describe() is for help only, I would just deactivate this method and open an issue to investigate the reason. Need also to think when and where to output the index writer options. Maybe better call the describe() methods of the indexer plugins explicitly via IndexingJob --help or similar. Best, Sebastian On 14.10.19 17:08, Markus Jelsma wrote: > Hello, > > We're upgrading our stuff to 1.16 and got a peculiar problem when we started > indexing: > > 2019-10-14 13:50:30,586 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.lang.IllegalStateException: text width is less > than 1, was <-41> > at org.apache.commons.lang3.Validate.validState(Validate.java:829) > at > de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215) > at > de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250) > at de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128) > at de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191) > at org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326) > at > org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:542) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > > The only IndexWriter we use is SolrIndexer, and locally everything is just > fine. > > Any thoughts? > > Thanks, > Markus >
Unable to index on Hadoop 3.2.0 with 1.16
Hello, We're upgrading our stuff to 1.16 and got a peculiar problem when we started indexing: 2019-10-14 13:50:30,586 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalStateException: text width is less than 1, was <-41> at org.apache.commons.lang3.Validate.validState(Validate.java:829) at de.vandermeer.skb.interfaces.transformers.textformat.Text_To_FormattedText.transform(Text_To_FormattedText.java:215) at de.vandermeer.asciitable.AT_Renderer.renderAsCollection(AT_Renderer.java:250) at de.vandermeer.asciitable.AT_Renderer.render(AT_Renderer.java:128) at de.vandermeer.asciitable.AsciiTable.render(AsciiTable.java:191) at org.apache.nutch.indexer.IndexWriters.describe(IndexWriters.java:326) at org.apache.nutch.indexer.IndexerOutputFormat.getRecordWriter(IndexerOutputFormat.java:45) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.(ReduceTask.java:542) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:615) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.base/java.security.AccessController.doPrivileged(Native Method) at java.base/javax.security.auth.Subject.doAs(Subject.java:423) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) The only IndexWriter we use is SolrIndexer, and locally everything is just fine. Any thoughts? Thanks, Markus