[jira] [Commented] (NUTCH-2753) Add -listen option to command-line help of CrawlDbReader and LinkDbReader

2020-05-05 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099979#comment-17099979
 ] 

Hudson commented on NUTCH-2753:
---

SUCCESS: Integrated in Jenkins build Nutch-trunk #3681 (See 
[https://builds.apache.org/job/Nutch-trunk/3681/])
NUTCH-2753 Add -listen option to command-line help of CrawlDbReader and 
(snagel: 
[https://github.com/apache/nutch/commit/c573c70d05331dcd572ddcd23831337f8208fff7])
* (edit) src/java/org/apache/nutch/crawl/CrawlDbReader.java
* (edit) src/java/org/apache/nutch/crawl/LinkDbReader.java


> Add -listen option to command-line help of CrawlDbReader and LinkDbReader
> -
>
> Key: NUTCH-2753
> URL: https://issues.apache.org/jira/browse/NUTCH-2753
> Project: Nutch
>  Issue Type: Bug
>  Components: crawldb, linkdb
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
>  Labels: easytask, help-wanted
> Fix For: 1.17
>
>
> The tools CrawlDbReader and LinkDbReader extend AbstractChecker but do not 
> show `-listen  [-keepClientCnxOpen]` as available option(s).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NUTCH-2002) ParserChecker and IndexingFiltersChecker to check robots.txt

2020-05-05 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099977#comment-17099977
 ] 

Hudson commented on NUTCH-2002:
---

SUCCESS: Integrated in Jenkins build Nutch-trunk #3681 (See 
[https://builds.apache.org/job/Nutch-trunk/3681/])
NUTCH-2002 parse and index checkers to check robots.txt - applied (snagel: 
[https://github.com/apache/nutch/commit/46db3ed71355fefda42a008ece75094f51859ab2])
* (edit) src/java/org/apache/nutch/util/AbstractChecker.java
* (edit) src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
* (edit) src/java/org/apache/nutch/parse/ParserChecker.java


> ParserChecker and IndexingFiltersChecker to check robots.txt
> 
>
> Key: NUTCH-2002
> URL: https://issues.apache.org/jira/browse/NUTCH-2002
> Project: Nutch
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.9
>Reporter: Julien Nioche
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.17
>
> Attachments: NUTCH-2002.patch
>
>
> ParserChecker could check whether a given URL is allowed by the robots.txt 
> directives.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NUTCH-2785) FreeGenerator: command-line option to define number of generated fetch lists

2020-05-05 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099976#comment-17099976
 ] 

Hudson commented on NUTCH-2785:
---

SUCCESS: Integrated in Jenkins build Nutch-trunk #3681 (See 
[https://builds.apache.org/job/Nutch-trunk/3681/])
NUTCH-2785 FreeGenerator: command-line option to define number of (snagel: 
[https://github.com/apache/nutch/commit/72f3ff20d28f2e19281a5d1c83139b152acac1e1])
* (edit) src/java/org/apache/nutch/tools/FreeGenerator.java


> FreeGenerator: command-line option to define number of generated fetch lists
> 
>
> Key: NUTCH-2785
> URL: https://issues.apache.org/jira/browse/NUTCH-2785
> Project: Nutch
>  Issue Type: Improvement
>  Components: generator
>Affects Versions: 1.16
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Major
> Fix For: 1.17
>
>
> While Generator allows to specify the number of generated fetch lists using 
> the "-numFetchers" command-line option, FreeGenerator does not provide such 
> an option. It uses the value of "mapreduce.job.maps" (2 by default), also in 
> local mode where Generator always creates only one single fetch list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NUTCH-2758) Add plugin READMEs to binary release packages

2020-05-05 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099978#comment-17099978
 ] 

Hudson commented on NUTCH-2758:
---

SUCCESS: Integrated in Jenkins build Nutch-trunk #3681 (See 
[https://builds.apache.org/job/Nutch-trunk/3681/])
NUTCH-2758 Add plugin READMEs to binary release packages (snagel: 
[https://github.com/apache/nutch/commit/90502bdae07e9e2e4d42b970e709a72ce333e440])
* (edit) build.xml


> Add plugin READMEs to binary release packages
> -
>
> Key: NUTCH-2758
> URL: https://issues.apache.org/jira/browse/NUTCH-2758
> Project: Nutch
>  Issue Type: Improvement
>  Components: build, plugin
>Affects Versions: 1.16
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Major
> Fix For: 1.17
>
>
> Almost 20 plugins have a README (.md or .txt) which explains how to use and 
> configure the plugin. The READMEs should be included in the binary release 
> packages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (NUTCH-2758) Add plugin READMEs to binary release packages

2020-05-05 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2758.

Resolution: Fixed

> Add plugin READMEs to binary release packages
> -
>
> Key: NUTCH-2758
> URL: https://issues.apache.org/jira/browse/NUTCH-2758
> Project: Nutch
>  Issue Type: Improvement
>  Components: build, plugin
>Affects Versions: 1.16
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Major
> Fix For: 1.17
>
>
> Almost 20 plugins have a README (.md or .txt) which explains how to use and 
> configure the plugin. The READMEs should be included in the binary release 
> packages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (NUTCH-2753) Add -listen option to command-line help of CrawlDbReader and LinkDbReader

2020-05-05 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel reassigned NUTCH-2753:
--

Assignee: Sebastian Nagel

> Add -listen option to command-line help of CrawlDbReader and LinkDbReader
> -
>
> Key: NUTCH-2753
> URL: https://issues.apache.org/jira/browse/NUTCH-2753
> Project: Nutch
>  Issue Type: Bug
>  Components: crawldb, linkdb
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
>  Labels: easytask, help-wanted
> Fix For: 1.17
>
>
> The tools CrawlDbReader and LinkDbReader extend AbstractChecker but do not 
> show `-listen  [-keepClientCnxOpen]` as available option(s).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (NUTCH-2753) Add -listen option to command-line help of CrawlDbReader and LinkDbReader

2020-05-05 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2753.

Resolution: Fixed

> Add -listen option to command-line help of CrawlDbReader and LinkDbReader
> -
>
> Key: NUTCH-2753
> URL: https://issues.apache.org/jira/browse/NUTCH-2753
> Project: Nutch
>  Issue Type: Bug
>  Components: crawldb, linkdb
>Affects Versions: 1.15
>Reporter: Sebastian Nagel
>Priority: Minor
>  Labels: easytask, help-wanted
> Fix For: 1.17
>
>
> The tools CrawlDbReader and LinkDbReader extend AbstractChecker but do not 
> show `-listen  [-keepClientCnxOpen]` as available option(s).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (NUTCH-2002) ParserChecker and IndexingFiltersChecker to check robots.txt

2020-05-05 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2002.

Resolution: Implemented

> ParserChecker and IndexingFiltersChecker to check robots.txt
> 
>
> Key: NUTCH-2002
> URL: https://issues.apache.org/jira/browse/NUTCH-2002
> Project: Nutch
>  Issue Type: Improvement
>  Components: parser
>Affects Versions: 1.9
>Reporter: Julien Nioche
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.17
>
> Attachments: NUTCH-2002.patch
>
>
> ParserChecker could check whether a given URL is allowed by the robots.txt 
> directives.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (NUTCH-2785) FreeGenerator: command-line option to define number of generated fetch lists

2020-05-05 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2785.

Resolution: Fixed

> FreeGenerator: command-line option to define number of generated fetch lists
> 
>
> Key: NUTCH-2785
> URL: https://issues.apache.org/jira/browse/NUTCH-2785
> Project: Nutch
>  Issue Type: Improvement
>  Components: generator
>Affects Versions: 1.16
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Major
> Fix For: 1.17
>
>
> While Generator allows to specify the number of generated fetch lists using 
> the "-numFetchers" command-line option, FreeGenerator does not provide such 
> an option. It uses the value of "mapreduce.job.maps" (2 by default), also in 
> local mode where Generator always creates only one single fetch list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NUTCH-1194) Generator: CrawlDB lock should be released earlier

2020-05-05 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099852#comment-17099852
 ] 

Hudson commented on NUTCH-1194:
---

SUCCESS: Integrated in Jenkins build Nutch-trunk #3680 (See 
[https://builds.apache.org/job/Nutch-trunk/3680/])
NUTCH-1194 Generator: CrawlDB lock should be released earlier - release 
(snagel: 
[https://github.com/apache/nutch/commit/11eea5aea89599e2c35e577d15623f1278ded8e4])
* (edit) src/java/org/apache/nutch/util/NutchJob.java
* (edit) src/java/org/apache/nutch/crawl/Generator.java


> Generator: CrawlDB lock should be released earlier
> --
>
> Key: NUTCH-1194
> URL: https://issues.apache.org/jira/browse/NUTCH-1194
> Project: Nutch
>  Issue Type: Improvement
>  Components: generator
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Fix For: 1.17
>
>
> Lock on the CrawlDB is released when everything is finished. But when 
> generating many segments, the lock remains in place while it's not neccessary 
> anymore. If GENERATE_UPDATE_DB is false we can release the lock immediately 
> after the selector has finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (NUTCH-1194) Generator: CrawlDB lock should be released earlier

2020-05-05 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-1194.

Resolution: Fixed

> Generator: CrawlDB lock should be released earlier
> --
>
> Key: NUTCH-1194
> URL: https://issues.apache.org/jira/browse/NUTCH-1194
> Project: Nutch
>  Issue Type: Improvement
>  Components: generator
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Fix For: 1.17
>
>
> Lock on the CrawlDB is released when everything is finished. But when 
> generating many segments, the lock remains in place while it's not neccessary 
> anymore. If GENERATE_UPDATE_DB is false we can release the lock immediately 
> after the selector has finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NUTCH-1194) Generator: CrawlDB lock should be released earlier

2020-05-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099805#comment-17099805
 ] 

ASF GitHub Bot commented on NUTCH-1194:
---

sebastian-nagel commented on pull request #514:
URL: https://github.com/apache/nutch/pull/514#issuecomment-624003247


   Rebased to master and squashed commits.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Generator: CrawlDB lock should be released earlier
> --
>
> Key: NUTCH-1194
> URL: https://issues.apache.org/jira/browse/NUTCH-1194
> Project: Nutch
>  Issue Type: Improvement
>  Components: generator
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Minor
> Fix For: 1.17
>
>
> Lock on the CrawlDB is released when everything is finished. But when 
> generating many segments, the lock remains in place while it's not neccessary 
> anymore. If GENERATE_UPDATE_DB is false we can release the lock immediately 
> after the selector has finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [nutch] sebastian-nagel commented on pull request #514: NUTCH-1194 Generator: CrawlDB lock should be released earlier

2020-05-05 Thread GitBox


sebastian-nagel commented on pull request #514:
URL: https://github.com/apache/nutch/pull/514#issuecomment-624003247


   Rebased to master and squashed commits.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (NUTCH-1806) Delegate processing of URL domains to crawler commons

2020-05-05 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-1806:
---
Fix Version/s: 1.18

> Delegate processing of URL domains to crawler commons
> -
>
> Key: NUTCH-1806
> URL: https://issues.apache.org/jira/browse/NUTCH-1806
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.8
>Reporter: Julien Nioche
>Priority: Major
>  Labels: crawler-commons
> Fix For: 1.18
>
>
> We have code in src/java/org/apache/nutch/util/domain and a resource file 
> conf/domain-suffixes.xml to handle URL domains. This is used mostly from 
> URLUtil.getDomainName.
> The resource file is not necessarily up to date and since crawler commons has 
> a similar functionality we should use it instead of having to maintain our 
> own resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NUTCH-1945) Test for XLSX parser

2020-05-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099801#comment-17099801
 ] 

ASF GitHub Bot commented on NUTCH-1945:
---

sebastian-nagel opened a new pull request #525:
URL: https://github.com/apache/nutch/pull/525


   - add Tika unit test for XLSX files
   - bundle instance variables and utility methods in class TikaParserTest
   - clean up javadoc comments
   
   See patch attached to 
[NUTCH-1945](https://issues.apache.org/jira/browse/NUTCH-1945) which has been 
ported to apply to the current Nutch master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Test for XLSX parser
> 
>
> Key: NUTCH-1945
> URL: https://issues.apache.org/jira/browse/NUTCH-1945
> Project: Nutch
>  Issue Type: Test
>  Components: parser
>Affects Versions: 1.10, 2.3.1
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.17
>
> Attachments: NUTCH-1945-2x.patch
>
>
> Add a test for Excel spreadsheets (xlsx) files: because the are formally also 
> zip files (as well as other composite files) the MIME type detection is 
> crucial also for parsing, cf. NUTCH-1605 and NUTCH-1925.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [nutch] sebastian-nagel opened a new pull request #525: NUTCH-1945 Test for XLSX parser

2020-05-05 Thread GitBox


sebastian-nagel opened a new pull request #525:
URL: https://github.com/apache/nutch/pull/525


   - add Tika unit test for XLSX files
   - bundle instance variables and utility methods in class TikaParserTest
   - clean up javadoc comments
   
   See patch attached to 
[NUTCH-1945](https://issues.apache.org/jira/browse/NUTCH-1945) which has been 
ported to apply to the current Nutch master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (NUTCH-2434) Add methods to reset parameters HTMLMetaTags

2020-05-05 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099740#comment-17099740
 ] 

Hudson commented on NUTCH-2434:
---

SUCCESS: Integrated in Jenkins build Nutch-trunk #3679 (See 
[https://builds.apache.org/job/Nutch-trunk/3679/])
NUTCH-2434 Add methods to reset parameters HTMLMetaTags (apply patch (snagel: 
[https://github.com/apache/nutch/commit/a0ed0b42c2a42be2963a43d99ebc849b71d95fa8])
* (edit) src/java/org/apache/nutch/parse/HTMLMetaTags.java


> Add methods to reset parameters HTMLMetaTags
> 
>
> Key: NUTCH-2434
> URL: https://issues.apache.org/jira/browse/NUTCH-2434
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 1.17
>
> Attachments: NUTCH-2434.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (NUTCH-1945) Test for XLSX parser

2020-05-05 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-1945:
---
Fix Version/s: 1.17

> Test for XLSX parser
> 
>
> Key: NUTCH-1945
> URL: https://issues.apache.org/jira/browse/NUTCH-1945
> Project: Nutch
>  Issue Type: Test
>  Components: parser
>Affects Versions: 1.10, 2.3.1
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Fix For: 1.17
>
> Attachments: NUTCH-1945-2x.patch
>
>
> Add a test for Excel spreadsheets (xlsx) files: because the are formally also 
> zip files (as well as other composite files) the MIME type detection is 
> crucial also for parsing, cf. NUTCH-1605 and NUTCH-1925.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (NUTCH-1945) Test for XLSX parser

2020-05-05 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel reassigned NUTCH-1945:
--

Assignee: Sebastian Nagel

> Test for XLSX parser
> 
>
> Key: NUTCH-1945
> URL: https://issues.apache.org/jira/browse/NUTCH-1945
> Project: Nutch
>  Issue Type: Test
>  Components: parser
>Affects Versions: 1.10, 2.3.1
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Minor
> Attachments: NUTCH-1945-2x.patch
>
>
> Add a test for Excel spreadsheets (xlsx) files: because the are formally also 
> zip files (as well as other composite files) the MIME type detection is 
> crucial also for parsing, cf. NUTCH-1605 and NUTCH-1925.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (NUTCH-1652) Avoid instanciation of MimeUtil for each Content object created

2020-05-05 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-1652.

Resolution: Done

This has been fixed as part of NUTCH-2578 for Nutch 1.15. Thanks, [~jnioche] !

> Avoid instanciation of MimeUtil for each Content object created
> ---
>
> Key: NUTCH-1652
> URL: https://issues.apache.org/jira/browse/NUTCH-1652
> Project: Nutch
>  Issue Type: Improvement
>Affects Versions: 1.7
>Reporter: Julien Nioche
>Priority: Major
>
> Content objects instantiate and hold a MimeUtil in the constructor used by 
> the HttpBase class. This is wasteful and unnecessarily slows down the 
> creation of Content object as the MimeUtil creates a new Tika instance, reads 
> from the configuration etc...
> Instead we could create a single instance of the MimeUtil class and pass it 
> to the a new Content constructor   
> {code}
> public Content(String url, String base, byte[] content, String contentType,
>   Metadata metadata, MimeUtil mime)
> {code}
> and create a single instance of MimeUtil in HttpBase. We would also need to 
> make sure that the synchronisation is handled properly in MimeUtil 
> (especially for the calls to Tika) as the creation of the Content is done in 
> a multithreaded environment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (NUTCH-2434) Add methods to reset parameters HTMLMetaTags

2020-05-05 Thread Sebastian Nagel (Jira)


 [ 
https://issues.apache.org/jira/browse/NUTCH-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2434.

Resolution: Implemented

Applied [~markus17]'s patch to master in 
[a0ed0b4|https://github.com/apache/nutch/commit/a0ed0b42c2a42be2963a43d99ebc849b71d95fa8].

> Add methods to reset parameters HTMLMetaTags
> 
>
> Key: NUTCH-2434
> URL: https://issues.apache.org/jira/browse/NUTCH-2434
> Project: Nutch
>  Issue Type: Task
>  Components: parser
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
>Priority: Major
> Fix For: 1.17
>
> Attachments: NUTCH-2434.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)