[jira] [Commented] (NUTCH-2503) Add option to run tests for a single plugin
[ https://issues.apache.org/jira/browse/NUTCH-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336441#comment-16336441 ] ASF GitHub Bot commented on NUTCH-2503: --- lewismc closed pull request #281: NUTCH-2503: Add option to run tests for a single plugin URL: https://github.com/apache/nutch/pull/281 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/build.xml b/build.xml index 85bb923de..db163c620 100644 --- a/build.xml +++ b/build.xml @@ -411,7 +411,7 @@ - + + + + diff --git a/src/plugin/build.xml b/src/plugin/build.xml index d035d54b9..3f579e841 100755 --- a/src/plugin/build.xml +++ b/src/plugin/build.xml @@ -152,6 +152,13 @@ + + + + + + + This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add option to run tests for a single plugin > --- > > Key: NUTCH-2503 > URL: https://issues.apache.org/jira/browse/NUTCH-2503 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Moreno Feltscher >Priority: Major > Fix For: 1.15 > > > Sometimes it makes sense to just run tests for a single plugin instead of > building all plugins and running all tests at once. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2503) Add option to run tests for a single plugin
[ https://issues.apache.org/jira/browse/NUTCH-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336242#comment-16336242 ] Hudson commented on NUTCH-2503: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3498 (See [https://builds.apache.org/job/Nutch-trunk/3498/]) NUTCH-2503: Add option to run tests for a single plugin (moreno: [https://github.com/apache/nutch/commit/ea6a5f071baae3c55be22858822b251e4c781241]) * (edit) src/plugin/build.xml * (edit) build.xml > Add option to run tests for a single plugin > --- > > Key: NUTCH-2503 > URL: https://issues.apache.org/jira/browse/NUTCH-2503 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Moreno Feltscher >Priority: Major > Fix For: 1.15 > > > Sometimes it makes sense to just run tests for a single plugin instead of > building all plugins and running all tests at once. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2499) Elastic REST Indexer: Duplicate values
[ https://issues.apache.org/jira/browse/NUTCH-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336240#comment-16336240 ] Hudson commented on NUTCH-2499: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3498 (See [https://builds.apache.org/job/Nutch-trunk/3498/]) fix for NUTCH-2499: Filter duplicated field values when indexing using (moreno: [https://github.com/apache/nutch/commit/a51686446d03dd27e04c4cb77f8bf0a60895954c]) * (edit) src/plugin/indexer-elastic-rest/src/java/org/apache/nutch/indexwriter/elasticrest/ElasticRestIndexWriter.java > Elastic REST Indexer: Duplicate values > -- > > Key: NUTCH-2499 > URL: https://issues.apache.org/jira/browse/NUTCH-2499 > Project: Nutch > Issue Type: Bug >Reporter: Moreno Feltscher >Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.15 > > > Due to a change in > https://github.com/apache/nutch/commit/160758023e3de83894ae4fe654c17fde62aba50e#diff-408fd2f17bc9791dcbf531ffe6574a6a > the Elastic REST indexer does not work with HashSets for values anymore but > instead saves duplicated values as arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering
[ https://issues.apache.org/jira/browse/NUTCH-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336241#comment-16336241 ] Hudson commented on NUTCH-2502: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3498 (See [https://builds.apache.org/job/Nutch-trunk/3498/]) NUTCH-2502: Add Content-Type filter option to Any23 plugin (moreno: [https://github.com/apache/nutch/commit/856a8abd31ac9a4d9944c1f9b494b8f94ded209f]) * (edit) src/plugin/any23/src/java/org/apache/nutch/any23/Any23ParseFilter.java * (edit) conf/nutch-default.xml * (edit) src/plugin/any23/src/test/org/apache/nutch/any23/TestAny23ParseFilter.java > Any23 Plugin: Add Content-Type filtering > > > Key: NUTCH-2502 > URL: https://issues.apache.org/jira/browse/NUTCH-2502 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.15 > > > It should be possible to filter based on a document's Content-Type when using > Any23 extractors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering
[ https://issues.apache.org/jira/browse/NUTCH-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2502. - Resolution: Fixed Thank you [~mfeltscher] > Any23 Plugin: Add Content-Type filtering > > > Key: NUTCH-2502 > URL: https://issues.apache.org/jira/browse/NUTCH-2502 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.15 > > > It should be possible to filter based on a document's Content-Type when using > Any23 extractors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering
[ https://issues.apache.org/jira/browse/NUTCH-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2502: Fix Version/s: 1.15 > Any23 Plugin: Add Content-Type filtering > > > Key: NUTCH-2502 > URL: https://issues.apache.org/jira/browse/NUTCH-2502 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.15 > > > It should be possible to filter based on a document's Content-Type when using > Any23 extractors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (NUTCH-2499) Elastic REST Indexer: Duplicate values
[ https://issues.apache.org/jira/browse/NUTCH-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2499: Fix Version/s: 1.15 > Elastic REST Indexer: Duplicate values > -- > > Key: NUTCH-2499 > URL: https://issues.apache.org/jira/browse/NUTCH-2499 > Project: Nutch > Issue Type: Bug >Reporter: Moreno Feltscher >Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.15 > > > Due to a change in > https://github.com/apache/nutch/commit/160758023e3de83894ae4fe654c17fde62aba50e#diff-408fd2f17bc9791dcbf531ffe6574a6a > the Elastic REST indexer does not work with HashSets for values anymore but > instead saves duplicated values as arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (NUTCH-2499) Elastic REST Indexer: Duplicate values
[ https://issues.apache.org/jira/browse/NUTCH-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2499. - Resolution: Fixed Thank you [~mfeltscher] > Elastic REST Indexer: Duplicate values > -- > > Key: NUTCH-2499 > URL: https://issues.apache.org/jira/browse/NUTCH-2499 > Project: Nutch > Issue Type: Bug >Reporter: Moreno Feltscher >Assignee: Lewis John McGibbney >Priority: Major > Fix For: 1.15 > > > Due to a change in > https://github.com/apache/nutch/commit/160758023e3de83894ae4fe654c17fde62aba50e#diff-408fd2f17bc9791dcbf531ffe6574a6a > the Elastic REST indexer does not work with HashSets for values anymore but > instead saves duplicated values as arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (NUTCH-2495) Use -deleteGone instead of clean job in crawler script while indexing
[ https://issues.apache.org/jira/browse/NUTCH-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher reassigned NUTCH-2495: --- Assignee: Lewis John McGibbney (was: Moreno Feltscher) > Use -deleteGone instead of clean job in crawler script while indexing > - > > Key: NUTCH-2495 > URL: https://issues.apache.org/jira/browse/NUTCH-2495 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Lewis John McGibbney >Priority: Major > > Instead of running {{bin/nutch clean}} after indexing the documents run > {{bin/nutch index}} with the {{-deleteGone}} flag which instead of just > deleting gone and duplicated documents also deletes redirects from the index. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering
[ https://issues.apache.org/jira/browse/NUTCH-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher reassigned NUTCH-2502: --- Assignee: Lewis John McGibbney (was: Moreno Feltscher) > Any23 Plugin: Add Content-Type filtering > > > Key: NUTCH-2502 > URL: https://issues.apache.org/jira/browse/NUTCH-2502 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Lewis John McGibbney >Priority: Major > > It should be possible to filter based on a document's Content-Type when using > Any23 extractors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (NUTCH-2501) Take into account $NUTCH_HEAPSIZE when crawling using crawl script
[ https://issues.apache.org/jira/browse/NUTCH-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher reassigned NUTCH-2501: --- Assignee: Lewis John McGibbney (was: Moreno Feltscher) > Take into account $NUTCH_HEAPSIZE when crawling using crawl script > -- > > Key: NUTCH-2501 > URL: https://issues.apache.org/jira/browse/NUTCH-2501 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Lewis John McGibbney >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (NUTCH-2503) Add option to run tests for a single plugin
[ https://issues.apache.org/jira/browse/NUTCH-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2503. - Resolution: Fixed Thank you [~mfeltscher] > Add option to run tests for a single plugin > --- > > Key: NUTCH-2503 > URL: https://issues.apache.org/jira/browse/NUTCH-2503 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Moreno Feltscher >Priority: Major > Fix For: 1.15 > > > Sometimes it makes sense to just run tests for a single plugin instead of > building all plugins and running all tests at once. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (NUTCH-2503) Add option to run tests for a single plugin
[ https://issues.apache.org/jira/browse/NUTCH-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2503: Fix Version/s: 1.15 > Add option to run tests for a single plugin > --- > > Key: NUTCH-2503 > URL: https://issues.apache.org/jira/browse/NUTCH-2503 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Moreno Feltscher >Priority: Major > Fix For: 1.15 > > > Sometimes it makes sense to just run tests for a single plugin instead of > building all plugins and running all tests at once. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (NUTCH-2499) Elastic REST Indexer: Duplicate values
[ https://issues.apache.org/jira/browse/NUTCH-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moreno Feltscher reassigned NUTCH-2499: --- Assignee: Lewis John McGibbney (was: Moreno Feltscher) > Elastic REST Indexer: Duplicate values > -- > > Key: NUTCH-2499 > URL: https://issues.apache.org/jira/browse/NUTCH-2499 > Project: Nutch > Issue Type: Bug >Reporter: Moreno Feltscher >Assignee: Lewis John McGibbney >Priority: Major > > Due to a change in > https://github.com/apache/nutch/commit/160758023e3de83894ae4fe654c17fde62aba50e#diff-408fd2f17bc9791dcbf531ffe6574a6a > the Elastic REST indexer does not work with HashSets for values anymore but > instead saves duplicated values as arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2501) Take into account $NUTCH_HEAPSIZE when crawling using crawl script
[ https://issues.apache.org/jira/browse/NUTCH-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335999#comment-16335999 ] Moreno Feltscher commented on NUTCH-2501: - Pull request: https://github.com/apache/nutch/pull/279 > Take into account $NUTCH_HEAPSIZE when crawling using crawl script > -- > > Key: NUTCH-2501 > URL: https://issues.apache.org/jira/browse/NUTCH-2501 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Moreno Feltscher >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2503) Add option to run tests for a single plugin
[ https://issues.apache.org/jira/browse/NUTCH-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335991#comment-16335991 ] Moreno Feltscher commented on NUTCH-2503: - Pull request: https://github.com/apache/nutch/pull/281 > Add option to run tests for a single plugin > --- > > Key: NUTCH-2503 > URL: https://issues.apache.org/jira/browse/NUTCH-2503 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Moreno Feltscher >Priority: Major > > Sometimes it makes sense to just run tests for a single plugin instead of > building all plugins and running all tests at once. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering
[ https://issues.apache.org/jira/browse/NUTCH-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335994#comment-16335994 ] Moreno Feltscher commented on NUTCH-2502: - Pull request: https://github.com/apache/nutch/pull/280 > Any23 Plugin: Add Content-Type filtering > > > Key: NUTCH-2502 > URL: https://issues.apache.org/jira/browse/NUTCH-2502 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Moreno Feltscher >Priority: Major > > It should be possible to filter based on a document's Content-Type when using > Any23 extractors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2503) Add option to run tests for a single plugin
[ https://issues.apache.org/jira/browse/NUTCH-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335984#comment-16335984 ] Markus Jelsma commented on NUTCH-2503: -- Hmm, in the past you could run ant -f src/plugin/urlfilter-suffix/build.xml test and it ran that specific test. Nowadays i get a errors: {code} [javac] /home/markus/projects/apache/nutch/svn/trunk/src/plugin/urlfilter-suffix/src/test/org/apache/nutch/urlfilter/suffix/TestSuffixURLFilter.java:118: error: cannot find symbol [javac] Assert.assertTrue(urlsModeAcceptAndPathFilter[i] == filter [javac] ^ [javac] symbol: variable Assert [javac] location: class TestSuffixURLFilter {code} > Add option to run tests for a single plugin > --- > > Key: NUTCH-2503 > URL: https://issues.apache.org/jira/browse/NUTCH-2503 > Project: Nutch > Issue Type: Improvement >Reporter: Moreno Feltscher >Assignee: Moreno Feltscher >Priority: Major > > Sometimes it makes sense to just run tests for a single plugin instead of > building all plugins and running all tests at once. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NUTCH-2503) Add option to run tests for a single plugin
Moreno Feltscher created NUTCH-2503: --- Summary: Add option to run tests for a single plugin Key: NUTCH-2503 URL: https://issues.apache.org/jira/browse/NUTCH-2503 Project: Nutch Issue Type: Improvement Reporter: Moreno Feltscher Assignee: Moreno Feltscher Sometimes it makes sense to just run tests for a single plugin instead of building all plugins and running all tests at once. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NUTCH-2466) Sitemap processor to follow redirects
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335949#comment-16335949 ] Markus Jelsma commented on NUTCH-2466: -- First patch adding maxRedir configurable and filterNormalize instead just normalize. > Sitemap processor to follow redirects > - > > Key: NUTCH-2466 > URL: https://issues.apache.org/jira/browse/NUTCH-2466 > Project: Nutch > Issue Type: Bug >Affects Versions: 1.13 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Fix For: 1.15 > > Attachments: NUTCH-2466.patch, NUTCH-2466.patch > > > It does follow http > https, but not the following redirect, e.g. > sitemap_index.xml that some websites have. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (NUTCH-2466) Sitemap processor to follow redirects
[ https://issues.apache.org/jira/browse/NUTCH-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-2466: - Attachment: NUTCH-2466.patch > Sitemap processor to follow redirects > - > > Key: NUTCH-2466 > URL: https://issues.apache.org/jira/browse/NUTCH-2466 > Project: Nutch > Issue Type: Bug >Affects Versions: 1.13 >Reporter: Markus Jelsma >Assignee: Markus Jelsma >Priority: Minor > Fix For: 1.15 > > Attachments: NUTCH-2466.patch, NUTCH-2466.patch > > > It does follow http > https, but not the following redirect, e.g. > sitemap_index.xml that some websites have. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering
Moreno Feltscher created NUTCH-2502: --- Summary: Any23 Plugin: Add Content-Type filtering Key: NUTCH-2502 URL: https://issues.apache.org/jira/browse/NUTCH-2502 Project: Nutch Issue Type: Improvement Reporter: Moreno Feltscher Assignee: Moreno Feltscher It should be possible to filter based on a document's Content-Type when using Any23 extractors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)