[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers
[ https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630967#comment-16630967 ] Hudson commented on NUTCH-2602: --- FAILURE: Integrated in Jenkins build Nutch-trunk #3553 (See [https://builds.apache.org/job/Nutch-trunk/3553/]) Fixes for NUTCH-2602: Index writers description in the form: (r0ann3l: [https://github.com/apache/nutch/commit/29280bf8a6da3d037f5c7daf506028194b327d62]) * (edit) src/plugin/indexer-cloudsearch/src/java/org/apache/nutch/indexwriter/cloudsearch/CloudSearchIndexWriter.java * (edit) src/plugin/indexer-dummy/src/java/org/apache/nutch/indexwriter/dummy/DummyIndexWriter.java * (edit) src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrIndexWriter.java * (edit) src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java * (edit) src/java/org/apache/nutch/indexer/IndexerOutputFormat.java * (edit) src/plugin/indexer-rabbit/src/java/org/apache/nutch/indexwriter/rabbit/RabbitIndexWriter.java * (edit) src/plugin/indexer-cloudsearch/src/java/org/apache/nutch/indexwriter/cloudsearch/CloudSearchConstants.java * (edit) src/plugin/indexer-elastic-rest/src/java/org/apache/nutch/indexwriter/elasticrest/ElasticRestIndexWriter.java * (edit) src/java/org/apache/nutch/indexer/IndexingJob.java Fixes for NUTCH-2602: No index writer description when commands have no (r0ann3l: [https://github.com/apache/nutch/commit/57d82ffba3c15adb924b1db705b9d89c026ac58f]) * (edit) src/java/org/apache/nutch/indexer/CleaningJob.java * (edit) src/java/org/apache/nutch/indexer/IndexingJob.java Fixes for NUTCH-2602: Description as a table with columns: KEY, (r0ann3l: [https://github.com/apache/nutch/commit/4e70af25e9181bf2234ed131c22e9aacffe1524c]) * (edit) src/plugin/indexer-dummy/src/java/org/apache/nutch/indexwriter/dummy/DummyIndexWriter.java * (edit) src/plugin/indexer-rabbit/src/java/org/apache/nutch/indexwriter/rabbit/RabbitIndexWriter.java * (edit) src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrIndexWriter.java * (edit) src/plugin/indexer-csv/src/java/org/apache/nutch/indexwriter/csv/CSVIndexWriter.java * (edit) src/java/org/apache/nutch/indexer/IndexWriter.java * (edit) src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java * (edit) ivy/ivy.xml * (edit) src/plugin/indexer-elastic-rest/src/java/org/apache/nutch/indexwriter/elasticrest/ElasticRestIndexWriter.java * (edit) src/plugin/indexer-cloudsearch/src/java/org/apache/nutch/indexwriter/cloudsearch/CloudSearchIndexWriter.java * (edit) src/java/org/apache/nutch/indexer/IndexWriters.java > Configuration values in the description of index writers > > > Key: NUTCH-2602 > URL: https://issues.apache.org/jira/browse/NUTCH-2602 > Project: Nutch > Issue Type: Improvement > Components: indexer, plugin >Affects Versions: 1.15 >Reporter: Roannel Fernández Hernández >Assignee: Roannel Fernández Hernández >Priority: Minor > Fix For: 1.16 > > Attachments: Nutch output.png > > > Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] > when you have 2+ different configuration of the same index writers (the same > implementation class), the index command print the same description several > times. I propose the {{describe()}} method show the values of its own > configuration and not a generic one. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers
[ https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630951#comment-16630951 ] ASF GitHub Bot commented on NUTCH-2602: --- r0ann3l closed pull request #356: fix for NUTCH-2602: Index writers description URL: https://github.com/apache/nutch/pull/356 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/ivy/ivy.xml b/ivy/ivy.xml index 06bb9197a..112975ab9 100644 --- a/ivy/ivy.xml +++ b/ivy/ivy.xml @@ -151,6 +151,8 @@ + + diff --git a/src/java/org/apache/nutch/indexer/CleaningJob.java b/src/java/org/apache/nutch/indexer/CleaningJob.java index 8a77a9d82..9b496535b 100644 --- a/src/java/org/apache/nutch/indexer/CleaningJob.java +++ b/src/java/org/apache/nutch/indexer/CleaningJob.java @@ -185,8 +185,6 @@ public int run(String[] args) throws IOException { String usage = "Usage: CleaningJob [-noCommit]"; LOG.error("Missing crawldb. " + usage); System.err.println(usage); - IndexWriters writers = IndexWriters.get(getConf()); - System.err.println(writers.describe()); return 1; } diff --git a/src/java/org/apache/nutch/indexer/IndexWriter.java b/src/java/org/apache/nutch/indexer/IndexWriter.java index b33c5070d..78661599e 100644 --- a/src/java/org/apache/nutch/indexer/IndexWriter.java +++ b/src/java/org/apache/nutch/indexer/IndexWriter.java @@ -21,8 +21,10 @@ import org.apache.nutch.plugin.Pluggable; import java.io.IOException; +import java.util.Map; public interface IndexWriter extends Pluggable, Configurable { + /** * The name of the extension point. */ @@ -53,9 +55,9 @@ public void close() throws IOException; /** - * Returns a String describing the IndexWriter instance and the specific parameters it can take. + * Returns {@link Map} with the specific parameters the IndexWriter instance can take. * - * @return The full description. + * @return The values of each row. It must have the form >. */ - public String describe(); + Map> describe(); } diff --git a/src/java/org/apache/nutch/indexer/IndexWriters.java b/src/java/org/apache/nutch/indexer/IndexWriters.java index 3ac20bfea..9fac2e2fe 100644 --- a/src/java/org/apache/nutch/indexer/IndexWriters.java +++ b/src/java/org/apache/nutch/indexer/IndexWriters.java @@ -16,6 +16,10 @@ */ package org.apache.nutch.indexer; +import de.vandermeer.asciitable.AT_ColumnWidthCalculator; +import de.vandermeer.asciitable.AT_Row; +import de.vandermeer.asciitable.AsciiTable; +import de.vandermeer.skb.interfaces.document.TableRowType; import org.apache.hadoop.conf.Configuration; import org.apache.nutch.exchange.Exchanges; import org.apache.nutch.plugin.Extension; @@ -265,8 +269,52 @@ public String describe() { builder.append("Active IndexWriters :\n"); for (IndexWriterWrapper indexWriterWrapper : this.indexWriters.values()) { - builder.append(indexWriterWrapper.getIndexWriter().describe()) - .append("\n"); + // Getting the class name + builder.append( + indexWriterWrapper.getIndexWriter().getClass().getSimpleName()) + .append(":\n"); + + // Building the table + AsciiTable at = new AsciiTable(); + at.getRenderer().setCWC((rows, colNumbers, tableWidth) -> { +int maxLengthFirstColumn = 0; +int maxLengthLastColumn = 0; +for (AT_Row row : rows) { + if (row.getType() == TableRowType.CONTENT) { +// First column +int lengthFirstColumn = row.getCells().get(0).toString().length(); +if (lengthFirstColumn > maxLengthFirstColumn) { + maxLengthFirstColumn = lengthFirstColumn; +} + +// Last column +int lengthLastColumn = row.getCells().get(2).toString().length(); +if (lengthLastColumn > maxLengthLastColumn) { + maxLengthLastColumn = lengthLastColumn; +} + } +} +return new int[] { maxLengthFirstColumn, +tableWidth - maxLengthFirstColumn - maxLengthLastColumn, +maxLengthLastColumn }; + }); + + // Getting the properties + Map> properties = indexWriterWrapper + .getIndexWriter().describe(); + + // Adding the rows + properties.forEach((key, value) -> { +at.addRule(); +at.addRow(key, value.getKey(), +value.getValue() != null ? value.getValue() : ""); + }); + + // Last rule + at.addRule(); + + // Rendering the table + builder.append(at.render(150)).append("\n\n"); } return builder.toString();
[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers
[ https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562443#comment-16562443 ] ASF GitHub Bot commented on NUTCH-2602: --- r0ann3l commented on issue #356: fix for NUTCH-2602: Index writers description URL: https://github.com/apache/nutch/pull/356#issuecomment-408996075 The output of the method `describe()` as a table with 3 columns (property, description, value) is ready. Some feedback guys? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Configuration values in the description of index writers > > > Key: NUTCH-2602 > URL: https://issues.apache.org/jira/browse/NUTCH-2602 > Project: Nutch > Issue Type: Improvement > Components: indexer, plugin >Affects Versions: 1.15 >Reporter: Roannel Fernández Hernández >Assignee: Roannel Fernández Hernández >Priority: Minor > Fix For: 1.16 > > Attachments: Nutch output.png > > > Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] > when you have 2+ different configuration of the same index writers (the same > implementation class), the index command print the same description several > times. I propose the {{describe()}} method show the values of its own > configuration and not a generic one. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers
[ https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555795#comment-16555795 ] Sebastian Nagel commented on NUTCH-2602: Hi [~roannel], looks very nice. As the describe() method was used before it's definitely ok to continue using it. Would be great if this comprehensive view is also available for all other index writers! > Configuration values in the description of index writers > > > Key: NUTCH-2602 > URL: https://issues.apache.org/jira/browse/NUTCH-2602 > Project: Nutch > Issue Type: Improvement > Components: indexer, plugin >Affects Versions: 1.15 >Reporter: Roannel Fernández Hernández >Assignee: Roannel Fernández Hernández >Priority: Minor > Fix For: 1.16 > > Attachments: Nutch output.png > > > Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] > when you have 2+ different configuration of the same index writers (the same > implementation class), the index command print the same description several > times. I propose the {{describe()}} method show the values of its own > configuration and not a generic one. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers
[ https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554797#comment-16554797 ] Roannel Fernández Hernández commented on NUTCH-2602: I prepared an example using [asciitable|https://github.com/vdmeer/asciitable] where the output of {{describe()}} methods is shown. This is only for SolrIndexWriter with the default configuration. Some feedback to continue with the other writers? !Nutch output.png! > Configuration values in the description of index writers > > > Key: NUTCH-2602 > URL: https://issues.apache.org/jira/browse/NUTCH-2602 > Project: Nutch > Issue Type: Improvement > Components: indexer, plugin >Affects Versions: 1.15 >Reporter: Roannel Fernández Hernández >Assignee: Roannel Fernández Hernández >Priority: Minor > Fix For: 1.16 > > Attachments: Nutch output.png > > > Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] > when you have 2+ different configuration of the same index writers (the same > implementation class), the index command print the same description several > times. I propose the {{describe()}} method show the values of its own > configuration and not a generic one. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers
[ https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540107#comment-16540107 ] Roannel Fernández Hernández commented on NUTCH-2602: Hi [~wastl-nagel], I agree with you about not remove the description totally from the code. So, you proposed one of this: * adding Javadoc comments to the constants * comments in indexwriters.xml.template I include two more options: * description attribute for elements. e.g. {{}} * print a table as output of {{describe()}} methods with three columns: Property, Description, Value. We can use one of this: [asciitable|https://github.com/vdmeer/asciitable], [j-text-utils|https://code.google.com/archive/p/j-text-utils/] or create an {{Util}} class for generate an standard and common output. What do you think is the best option? > Configuration values in the description of index writers > > > Key: NUTCH-2602 > URL: https://issues.apache.org/jira/browse/NUTCH-2602 > Project: Nutch > Issue Type: Improvement > Components: indexer, plugin >Affects Versions: 1.15 >Reporter: Roannel Fernández Hernández >Assignee: Roannel Fernández Hernández >Priority: Minor > Fix For: 1.16 > > > Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] > when you have 2+ different configuration of the same index writers (the same > implementation class), the index command print the same description several > times. I propose the {{describe()}} method show the values of its own > configuration and not a generic one. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers
[ https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536894#comment-16536894 ] Sebastian Nagel commented on NUTCH-2602: +1 to show the concrete configuration for index writer plugins instead of listing available properties. I'm a little bit concerned about removing the descriptions from the code at all (they're now only in the [wiki|https://wiki.apache.org/nutch/IndexWriters]). That's good for now, but as the wiki tends to get outdated over time and, in general, is not bound to a specific Nutch version, it would be good to document at least those options which are not self-explanatory, either by adding Javadoc comments to the constants or comments in indexwriters.xml.template. > Configuration values in the description of index writers > > > Key: NUTCH-2602 > URL: https://issues.apache.org/jira/browse/NUTCH-2602 > Project: Nutch > Issue Type: Improvement > Components: indexer, plugin >Affects Versions: 1.15 >Reporter: Roannel Fernández Hernández >Assignee: Roannel Fernández Hernández >Priority: Minor > Fix For: 1.16 > > > Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] > when you have 2+ different configuration of the same index writers (the same > implementation class), the index command print the same description several > times. I propose the {{describe()}} method show the values of its own > configuration and not a generic one. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers
[ https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527977#comment-16527977 ] ASF GitHub Bot commented on NUTCH-2602: --- r0ann3l commented on issue #356: fix for NUTCH-2602: Index writers description URL: https://github.com/apache/nutch/pull/356#issuecomment-401418480 Reviewing it again I see when the arguments of index or clean commands are not enough the `describe()` method is called before `open()` method. I think the use of method `describe()` here is not useful because: 1. Shows the default values of the attributes. 2. The error is due to there are not enough arguments, it is not a misconfigured index writer. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Configuration values in the description of index writers > > > Key: NUTCH-2602 > URL: https://issues.apache.org/jira/browse/NUTCH-2602 > Project: Nutch > Issue Type: Improvement > Components: indexer, plugin >Affects Versions: 1.15 >Reporter: Roannel Fernández Hernández >Assignee: Roannel Fernández Hernández >Priority: Minor > Fix For: 1.16 > > > Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] > when you have 2+ different configuration of the same index writers (the same > implementation class), the index command print the same description several > times. I propose the {{describe()}} method show the values of its own > configuration and not a generic one. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers
[ https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527924#comment-16527924 ] ASF GitHub Bot commented on NUTCH-2602: --- r0ann3l opened a new pull request #356: fix for NUTCH-2602: Index writers description URL: https://github.com/apache/nutch/pull/356 When the index writers description is shown, it appears in the form **\:\** where **\** is the parameter's name used on index-writers.xml file and **\** is the value that the parameter has. To acomplish this, the `describe()` method should be called after the `open()` method. For this reason I moved the line `LOG.info(writers.describe());` on `IndexingJob.java:124` to `IndexerOutputFormat.java:45`. This change is suitable because now the `IndexWriters.get(conf)` method isn't invoked twice. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Configuration values in the description of index writers > > > Key: NUTCH-2602 > URL: https://issues.apache.org/jira/browse/NUTCH-2602 > Project: Nutch > Issue Type: Improvement > Components: indexer, plugin >Affects Versions: 1.15 >Reporter: Roannel Fernández Hernández >Priority: Minor > Fix For: 1.16 > > > Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] > when you have 2+ different configuration of the same index writers (the same > implementation class), the index command print the same description several > times. I propose the {{describe()}} method show the values of its own > configuration and not a generic one. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)