[jira] [Commented] (NUTCH-2700) Indexchecker: improve command-line help

2019-04-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815350#comment-16815350
 ] 

Hudson commented on NUTCH-2700:
---

FAILURE: Integrated in Jenkins build Nutch-trunk #3621 (See 
[https://builds.apache.org/job/Nutch-trunk/3621/])
NUTCH-2700 Indexchecker: improve command-line help - add options (snagel: 
[https://github.com/apache/nutch/commit/76c8cff1402e217049942bac88a8a005d45abf43])
* (edit) src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
* (edit) src/java/org/apache/nutch/parse/ParserChecker.java


> Indexchecker: improve command-line help
> ---
>
> Key: NUTCH-2700
> URL: https://issues.apache.org/jira/browse/NUTCH-2700
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.16
>
>
> The command-line help of the indexchecker tool is incomplete:
> {noformat}
> Usage: IndexingFiltersChecker [-normalize] [-followRedirects] [-dumpText] 
> [-md key=value] (-stdin | -listen  [-keepClientCnxOpen])
> {noformat}
> It does not
> - show the possibility to pass the URL as argument
> - mention the property {{-DdoIndex=true}} which makes it send the document to 
> the indexes
> It should follow the help shown by parsechecker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2700) Indexchecker: improve command-line help

2019-04-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815310#comment-16815310
 ] 

ASF GitHub Bot commented on NUTCH-2700:
---

sebastian-nagel commented on pull request #446: NUTCH-2700 Indexchecker: 
improve command-line help
URL: https://github.com/apache/nutch/pull/446
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Indexchecker: improve command-line help
> ---
>
> Key: NUTCH-2700
> URL: https://issues.apache.org/jira/browse/NUTCH-2700
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Priority: Minor
> Fix For: 1.16
>
>
> The command-line help of the indexchecker tool is incomplete:
> {noformat}
> Usage: IndexingFiltersChecker [-normalize] [-followRedirects] [-dumpText] 
> [-md key=value] (-stdin | -listen  [-keepClientCnxOpen])
> {noformat}
> It does not
> - show the possibility to pass the URL as argument
> - mention the property {{-DdoIndex=true}} which makes it send the document to 
> the indexes
> It should follow the help shown by parsechecker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (NUTCH-2700) Indexchecker: improve command-line help

2019-03-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-2700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792805#comment-16792805
 ] 

ASF GitHub Bot commented on NUTCH-2700:
---

sebastian-nagel commented on pull request #446: NUTCH-2700 Indexchecker: 
improve command-line help
URL: https://github.com/apache/nutch/pull/446
 
 
   ... and add options `-doIndex` to pass "checked" document to index writers 
(the property `doIndex` is kept to ensure back-ward compatibility):
   
   ```
   % bin/nutch indexchecker
   Usage:
 IndexingFiltersChecker [OPTIONS] 
   Fetch single URL and index it
 IndexingFiltersChecker [OPTIONS] -stdin
   Read URLs to be indexed from stdin
 IndexingFiltersChecker [OPTIONS] -listen  [-keepClientCnxOpen]
   Listen on  for URLs to be indexed
   Options:
 -D=  set/overwrite Nutch/Hadoop properties
   (a generic Hadoop option to be passed
before other command-specific options)
 -normalizenormalize URLs
 -followRedirects  follow redirects when fetching URL
 -dumpText show the entire plain-text content,
   not only the first 100 characters
 -doIndex  pass document to configured index writers
   and let them index it
 -md = metadata added to CrawlDatum before parsing
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Indexchecker: improve command-line help
> ---
>
> Key: NUTCH-2700
> URL: https://issues.apache.org/jira/browse/NUTCH-2700
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Priority: Minor
> Fix For: 1.16
>
>
> The command-line help of the indexchecker tool is incomplete:
> {noformat}
> Usage: IndexingFiltersChecker [-normalize] [-followRedirects] [-dumpText] 
> [-md key=value] (-stdin | -listen  [-keepClientCnxOpen])
> {noformat}
> It does not
> - show the possibility to pass the URL as argument
> - mention the property {{-DdoIndex=true}} which makes it send the document to 
> the indexes
> It should follow the help shown by parsechecker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)