[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers

2018-09-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630967#comment-16630967
 ] 

Hudson commented on NUTCH-2602:
---

FAILURE: Integrated in Jenkins build Nutch-trunk #3553 (See 
[https://builds.apache.org/job/Nutch-trunk/3553/])
Fixes for NUTCH-2602: Index writers description in the form: (r0ann3l: 
[https://github.com/apache/nutch/commit/29280bf8a6da3d037f5c7daf506028194b327d62])
* (edit) 
src/plugin/indexer-cloudsearch/src/java/org/apache/nutch/indexwriter/cloudsearch/CloudSearchIndexWriter.java
* (edit) 
src/plugin/indexer-dummy/src/java/org/apache/nutch/indexwriter/dummy/DummyIndexWriter.java
* (edit) 
src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrIndexWriter.java
* (edit) 
src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java
* (edit) src/java/org/apache/nutch/indexer/IndexerOutputFormat.java
* (edit) 
src/plugin/indexer-rabbit/src/java/org/apache/nutch/indexwriter/rabbit/RabbitIndexWriter.java
* (edit) 
src/plugin/indexer-cloudsearch/src/java/org/apache/nutch/indexwriter/cloudsearch/CloudSearchConstants.java
* (edit) 
src/plugin/indexer-elastic-rest/src/java/org/apache/nutch/indexwriter/elasticrest/ElasticRestIndexWriter.java
* (edit) src/java/org/apache/nutch/indexer/IndexingJob.java
Fixes for NUTCH-2602: No index writer description when commands have no 
(r0ann3l: 
[https://github.com/apache/nutch/commit/57d82ffba3c15adb924b1db705b9d89c026ac58f])
* (edit) src/java/org/apache/nutch/indexer/CleaningJob.java
* (edit) src/java/org/apache/nutch/indexer/IndexingJob.java
Fixes for NUTCH-2602: Description as a table with columns: KEY, (r0ann3l: 
[https://github.com/apache/nutch/commit/4e70af25e9181bf2234ed131c22e9aacffe1524c])
* (edit) 
src/plugin/indexer-dummy/src/java/org/apache/nutch/indexwriter/dummy/DummyIndexWriter.java
* (edit) 
src/plugin/indexer-rabbit/src/java/org/apache/nutch/indexwriter/rabbit/RabbitIndexWriter.java
* (edit) 
src/plugin/indexer-solr/src/java/org/apache/nutch/indexwriter/solr/SolrIndexWriter.java
* (edit) 
src/plugin/indexer-csv/src/java/org/apache/nutch/indexwriter/csv/CSVIndexWriter.java
* (edit) src/java/org/apache/nutch/indexer/IndexWriter.java
* (edit) 
src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java
* (edit) ivy/ivy.xml
* (edit) 
src/plugin/indexer-elastic-rest/src/java/org/apache/nutch/indexwriter/elasticrest/ElasticRestIndexWriter.java
* (edit) 
src/plugin/indexer-cloudsearch/src/java/org/apache/nutch/indexwriter/cloudsearch/CloudSearchIndexWriter.java
* (edit) src/java/org/apache/nutch/indexer/IndexWriters.java


> Configuration values in the description of index writers
> 
>
> Key: NUTCH-2602
> URL: https://issues.apache.org/jira/browse/NUTCH-2602
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer, plugin
>Affects Versions: 1.15
>Reporter: Roannel Fernández Hernández
>Assignee: Roannel Fernández Hernández
>Priority: Minor
> Fix For: 1.16
>
> Attachments: Nutch output.png
>
>
> Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] 
> when you have 2+ different configuration of the same index writers (the same 
> implementation class), the index command print the same description several 
> times. I propose the {{describe()}} method show the values of its own 
> configuration and not a generic one.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers

2018-09-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630951#comment-16630951
 ] 

ASF GitHub Bot commented on NUTCH-2602:
---

r0ann3l closed pull request #356: fix for NUTCH-2602: Index writers description
URL: https://github.com/apache/nutch/pull/356
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/ivy/ivy.xml b/ivy/ivy.xml
index 06bb9197a..112975ab9 100644
--- a/ivy/ivy.xml
+++ b/ivy/ivy.xml
@@ -151,6 +151,8 @@


 
+   
+



diff --git a/src/java/org/apache/nutch/indexer/CleaningJob.java 
b/src/java/org/apache/nutch/indexer/CleaningJob.java
index 8a77a9d82..9b496535b 100644
--- a/src/java/org/apache/nutch/indexer/CleaningJob.java
+++ b/src/java/org/apache/nutch/indexer/CleaningJob.java
@@ -185,8 +185,6 @@ public int run(String[] args) throws IOException {
   String usage = "Usage: CleaningJob  [-noCommit]";
   LOG.error("Missing crawldb. " + usage);
   System.err.println(usage);
-  IndexWriters writers = IndexWriters.get(getConf());
-  System.err.println(writers.describe());
   return 1;
 }
 
diff --git a/src/java/org/apache/nutch/indexer/IndexWriter.java 
b/src/java/org/apache/nutch/indexer/IndexWriter.java
index b33c5070d..78661599e 100644
--- a/src/java/org/apache/nutch/indexer/IndexWriter.java
+++ b/src/java/org/apache/nutch/indexer/IndexWriter.java
@@ -21,8 +21,10 @@
 import org.apache.nutch.plugin.Pluggable;
 
 import java.io.IOException;
+import java.util.Map;
 
 public interface IndexWriter extends Pluggable, Configurable {
+
   /**
* The name of the extension point.
*/
@@ -53,9 +55,9 @@
   public void close() throws IOException;
 
   /**
-   * Returns a String describing the IndexWriter instance and the specific 
parameters it can take.
+   * Returns {@link Map} with the specific parameters the IndexWriter instance 
can take.
*
-   * @return The full description.
+   * @return The values of each row. It must have the form 
>.
*/
-  public String describe();
+  Map> describe();
 }
diff --git a/src/java/org/apache/nutch/indexer/IndexWriters.java 
b/src/java/org/apache/nutch/indexer/IndexWriters.java
index 3ac20bfea..9fac2e2fe 100644
--- a/src/java/org/apache/nutch/indexer/IndexWriters.java
+++ b/src/java/org/apache/nutch/indexer/IndexWriters.java
@@ -16,6 +16,10 @@
  */
 package org.apache.nutch.indexer;
 
+import de.vandermeer.asciitable.AT_ColumnWidthCalculator;
+import de.vandermeer.asciitable.AT_Row;
+import de.vandermeer.asciitable.AsciiTable;
+import de.vandermeer.skb.interfaces.document.TableRowType;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.nutch.exchange.Exchanges;
 import org.apache.nutch.plugin.Extension;
@@ -265,8 +269,52 @@ public String describe() {
   builder.append("Active IndexWriters :\n");
 
 for (IndexWriterWrapper indexWriterWrapper : this.indexWriters.values()) {
-  builder.append(indexWriterWrapper.getIndexWriter().describe())
-  .append("\n");
+  // Getting the class name
+  builder.append(
+  indexWriterWrapper.getIndexWriter().getClass().getSimpleName())
+  .append(":\n");
+
+  // Building the table
+  AsciiTable at = new AsciiTable();
+  at.getRenderer().setCWC((rows, colNumbers, tableWidth) -> {
+int maxLengthFirstColumn = 0;
+int maxLengthLastColumn = 0;
+for (AT_Row row : rows) {
+  if (row.getType() == TableRowType.CONTENT) {
+// First column
+int lengthFirstColumn = row.getCells().get(0).toString().length();
+if (lengthFirstColumn > maxLengthFirstColumn) {
+  maxLengthFirstColumn = lengthFirstColumn;
+}
+
+// Last column
+int lengthLastColumn = row.getCells().get(2).toString().length();
+if (lengthLastColumn > maxLengthLastColumn) {
+  maxLengthLastColumn = lengthLastColumn;
+}
+  }
+}
+return new int[] { maxLengthFirstColumn,
+tableWidth - maxLengthFirstColumn - maxLengthLastColumn,
+maxLengthLastColumn };
+  });
+
+  // Getting the properties
+  Map> properties = indexWriterWrapper
+  .getIndexWriter().describe();
+
+  // Adding the rows
+  properties.forEach((key, value) -> {
+at.addRule();
+at.addRow(key, value.getKey(),
+value.getValue() != null ? value.getValue() : "");
+  });
+
+  // Last rule
+  at.addRule();
+
+  // Rendering the table
+  builder.append(at.render(150)).append("\n\n");
 }
 
 return builder.toString();

[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers

2018-07-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562443#comment-16562443
 ] 

ASF GitHub Bot commented on NUTCH-2602:
---

r0ann3l commented on issue #356: fix for NUTCH-2602: Index writers description
URL: https://github.com/apache/nutch/pull/356#issuecomment-408996075
 
 
   The output of the method `describe()` as a table with 3 columns (property, 
description, value) is ready. Some feedback guys?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Configuration values in the description of index writers
> 
>
> Key: NUTCH-2602
> URL: https://issues.apache.org/jira/browse/NUTCH-2602
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer, plugin
>Affects Versions: 1.15
>Reporter: Roannel Fernández Hernández
>Assignee: Roannel Fernández Hernández
>Priority: Minor
> Fix For: 1.16
>
> Attachments: Nutch output.png
>
>
> Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] 
> when you have 2+ different configuration of the same index writers (the same 
> implementation class), the index command print the same description several 
> times. I propose the {{describe()}} method show the values of its own 
> configuration and not a generic one.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers

2018-07-25 Thread Sebastian Nagel (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555795#comment-16555795
 ] 

Sebastian Nagel commented on NUTCH-2602:


Hi [~roannel], looks very nice. As the describe() method was used before it's 
definitely ok to continue using it. Would be great if this comprehensive view 
is also available for all other index writers!

> Configuration values in the description of index writers
> 
>
> Key: NUTCH-2602
> URL: https://issues.apache.org/jira/browse/NUTCH-2602
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer, plugin
>Affects Versions: 1.15
>Reporter: Roannel Fernández Hernández
>Assignee: Roannel Fernández Hernández
>Priority: Minor
> Fix For: 1.16
>
> Attachments: Nutch output.png
>
>
> Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] 
> when you have 2+ different configuration of the same index writers (the same 
> implementation class), the index command print the same description several 
> times. I propose the {{describe()}} method show the values of its own 
> configuration and not a generic one.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers

2018-07-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554797#comment-16554797
 ] 

Roannel Fernández Hernández commented on NUTCH-2602:


I prepared an example using [asciitable|https://github.com/vdmeer/asciitable] 
where the output of {{describe()}} methods is shown. This is only for 
SolrIndexWriter with the default configuration. 

Some feedback to continue with the other writers?

!Nutch output.png!

> Configuration values in the description of index writers
> 
>
> Key: NUTCH-2602
> URL: https://issues.apache.org/jira/browse/NUTCH-2602
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer, plugin
>Affects Versions: 1.15
>Reporter: Roannel Fernández Hernández
>Assignee: Roannel Fernández Hernández
>Priority: Minor
> Fix For: 1.16
>
> Attachments: Nutch output.png
>
>
> Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] 
> when you have 2+ different configuration of the same index writers (the same 
> implementation class), the index command print the same description several 
> times. I propose the {{describe()}} method show the values of its own 
> configuration and not a generic one.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers

2018-07-11 Thread JIRA


[ 
https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540107#comment-16540107
 ] 

Roannel Fernández Hernández commented on NUTCH-2602:


Hi [~wastl-nagel], I agree with you about not remove the description totally 
from the code. So, you proposed one of this:
 * adding Javadoc comments to the constants
 * comments in indexwriters.xml.template

I include two more options:
 * description attribute for  elements. e.g. {{}}
 * print a table as output of {{describe()}} methods with three columns: 
Property, Description, Value. We can use one of this: 
[asciitable|https://github.com/vdmeer/asciitable], 
[j-text-utils|https://code.google.com/archive/p/j-text-utils/] or create an 
{{Util}} class for generate an standard and common output.

What do you think is the best option?

> Configuration values in the description of index writers
> 
>
> Key: NUTCH-2602
> URL: https://issues.apache.org/jira/browse/NUTCH-2602
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer, plugin
>Affects Versions: 1.15
>Reporter: Roannel Fernández Hernández
>Assignee: Roannel Fernández Hernández
>Priority: Minor
> Fix For: 1.16
>
>
> Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] 
> when you have 2+ different configuration of the same index writers (the same 
> implementation class), the index command print the same description several 
> times. I propose the {{describe()}} method show the values of its own 
> configuration and not a generic one.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers

2018-07-09 Thread Sebastian Nagel (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536894#comment-16536894
 ] 

Sebastian Nagel commented on NUTCH-2602:


+1 to show the concrete configuration for index writer plugins instead of 
listing available properties. I'm a little bit concerned about removing the 
descriptions from the code at all (they're now only in the 
[wiki|https://wiki.apache.org/nutch/IndexWriters]). That's good for now, but as 
the wiki tends to get outdated over time and, in general, is not bound to a 
specific Nutch version, it would be good to document at least those options 
which are not self-explanatory, either by adding Javadoc comments to the 
constants or comments in indexwriters.xml.template.

> Configuration values in the description of index writers
> 
>
> Key: NUTCH-2602
> URL: https://issues.apache.org/jira/browse/NUTCH-2602
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer, plugin
>Affects Versions: 1.15
>Reporter: Roannel Fernández Hernández
>Assignee: Roannel Fernández Hernández
>Priority: Minor
> Fix For: 1.16
>
>
> Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] 
> when you have 2+ different configuration of the same index writers (the same 
> implementation class), the index command print the same description several 
> times. I propose the {{describe()}} method show the values of its own 
> configuration and not a generic one.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers

2018-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527977#comment-16527977
 ] 

ASF GitHub Bot commented on NUTCH-2602:
---

r0ann3l commented on issue #356: fix for NUTCH-2602: Index writers description
URL: https://github.com/apache/nutch/pull/356#issuecomment-401418480
 
 
   Reviewing it again I see when the arguments of index or clean commands are 
not enough the `describe()` method is called before `open()` method. I think 
the use of method `describe()` here is not useful because:
   
   1. Shows the default values of the attributes.
   2. The error is due to there are not enough arguments, it is not a 
misconfigured index writer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Configuration values in the description of index writers
> 
>
> Key: NUTCH-2602
> URL: https://issues.apache.org/jira/browse/NUTCH-2602
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer, plugin
>Affects Versions: 1.15
>Reporter: Roannel Fernández Hernández
>Assignee: Roannel Fernández Hernández
>Priority: Minor
> Fix For: 1.16
>
>
> Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] 
> when you have 2+ different configuration of the same index writers (the same 
> implementation class), the index command print the same description several 
> times. I propose the {{describe()}} method show the values of its own 
> configuration and not a generic one.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2602) Configuration values in the description of index writers

2018-06-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527924#comment-16527924
 ] 

ASF GitHub Bot commented on NUTCH-2602:
---

r0ann3l opened a new pull request #356: fix for NUTCH-2602: Index writers 
description
URL: https://github.com/apache/nutch/pull/356
 
 
   When the index writers description is shown, it appears in the form 
**\:\** where **\** is the parameter's name used on 
index-writers.xml file and **\** is the value that the parameter has. To 
acomplish this, the `describe()` method should be called after the `open()` 
method. For this reason I moved the line `LOG.info(writers.describe());` on 
`IndexingJob.java:124` to `IndexerOutputFormat.java:45`. This change is 
suitable because now the `IndexWriters.get(conf)` method isn't invoked twice.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Configuration values in the description of index writers
> 
>
> Key: NUTCH-2602
> URL: https://issues.apache.org/jira/browse/NUTCH-2602
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer, plugin
>Affects Versions: 1.15
>Reporter: Roannel Fernández Hernández
>Priority: Minor
> Fix For: 1.16
>
>
> Since [GitHub Pull Request #218|https://github.com/apache/nutch/pull/218] 
> when you have 2+ different configuration of the same index writers (the same 
> implementation class), the index command print the same description several 
> times. I propose the {{describe()}} method show the values of its own 
> configuration and not a generic one.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)