[jira] [Commented] (NUTCH-2525) Metadata indexer cannot handle uppercase parse metadata

2020-01-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015812#comment-17015812 ] Sebastian Nagel commented on NUTCH-2525: Thanks, [~jurian]! I've updated the patch again so that

[jira] [Assigned] (NUTCH-2759) bin/crawl: Rename option --num-slaves

2020-01-09 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2759: -- Assignee: Sebastian Nagel > bin/crawl: Rename option --num-slaves >

[jira] [Resolved] (NUTCH-2184) Enable IndexingJob to function with no crawldb

2020-01-09 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2184. Resolution: Fixed Merged PR #486 into master. Thanks, [~lewismc] for the initial work!

[jira] [Resolved] (NUTCH-2760) protocol-okhttp: properly record HTTP version in request message header

2020-01-09 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2760. Assignee: Sebastian Nagel Resolution: Fixed Merged and verified that protocol

[jira] [Updated] (NUTCH-2525) Metadata indexer cannot handle uppercase parse metadata

2020-01-02 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2525: --- Component/s: plugin indexer > Metadata indexer cannot handle uppercase

[jira] [Resolved] (NUTCH-1863) Add JSON format dump output to readdb command

2019-12-27 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1863. Resolution: Fixed Thanks, [~balaShashanka]! > Add JSON format dump output to readdb

[jira] [Resolved] (NUTCH-2754) fetcher.max.crawl.delay ignored if exceeding 5 min. / 300 sec.

2019-12-23 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2754. Resolution: Fixed > fetcher.max.crawl.delay ignored if exceeding 5 min. / 300 sec. >

[jira] [Commented] (NUTCH-2756) Segment Part problem with HDFS on distibuted mode

2019-12-20 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000995#comment-17000995 ] Sebastian Nagel commented on NUTCH-2756: Hi [~lucasp], thanks for the notice! Ugly error and

[jira] [Resolved] (NUTCH-2745) Solr schema.xml not shipped in binary release

2019-12-20 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2745. Resolution: Fixed Merged. Thanks everybody! > Solr schema.xml not shipped in binary

[jira] [Updated] (NUTCH-2760) protocol-okhttp: properly record HTTP version in request message header

2019-12-13 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2760: --- Labels: patch-available (was: ) > protocol-okhttp: properly record HTTP version in request

[jira] [Created] (NUTCH-2760) protocol-okhttp: properly record HTTP version in request message header

2019-12-13 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2760: -- Summary: protocol-okhttp: properly record HTTP version in request message header Key: NUTCH-2760 URL: https://issues.apache.org/jira/browse/NUTCH-2760 Project:

[jira] [Commented] (NUTCH-2756) Segment Part problem with HDFS on distibuted mode

2019-12-10 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992437#comment-16992437 ] Sebastian Nagel commented on NUTCH-2756: The killed container was one launched speculatively:

[jira] [Created] (NUTCH-2759) bin/crawl: Rename option --num-slaves

2019-12-09 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2759: -- Summary: bin/crawl: Rename option --num-slaves Key: NUTCH-2759 URL: https://issues.apache.org/jira/browse/NUTCH-2759 Project: Nutch Issue Type:

[jira] [Created] (NUTCH-2758) Add plugin READMEs to binary release packages

2019-12-09 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2758: -- Summary: Add plugin READMEs to binary release packages Key: NUTCH-2758 URL: https://issues.apache.org/jira/browse/NUTCH-2758 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-2756) Segment Part problem with HDFS on distibuted mode

2019-12-09 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991467#comment-16991467 ] Sebastian Nagel commented on NUTCH-2756: Hi [~lucasp], if always the same partition is affected,

[jira] [Comment Edited] (NUTCH-2756) Segment Part problem with HDFS on distibuted mode

2019-12-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989947#comment-16989947 ] Sebastian Nagel edited comment on NUTCH-2756 at 12/6/19 4:34 PM: - Hi

[jira] [Commented] (NUTCH-2756) Segment Part problem with HDFS on distibuted mode

2019-12-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989947#comment-16989947 ] Sebastian Nagel commented on NUTCH-2756: Hi [~lucasp], I've also had a look into the config files

[jira] [Commented] (NUTCH-2755) Remove obsolete plugin indexer-elastic-rest

2019-12-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989919#comment-16989919 ] Sebastian Nagel commented on NUTCH-2755: After a closer look: the indexer-elastic-rest plugin

[jira] [Updated] (NUTCH-2757) indexer-elastic: add authentication options

2019-12-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2757: --- Fix Version/s: 1.17 > indexer-elastic: add authentication options >

[jira] [Updated] (NUTCH-2757) indexer-elastic: add authentication options

2019-12-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2757: --- Affects Version/s: 1.16 > indexer-elastic: add authentication options >

[jira] [Created] (NUTCH-2757) indexer-elastic: add authentication options

2019-12-06 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2757: -- Summary: indexer-elastic: add authentication options Key: NUTCH-2757 URL: https://issues.apache.org/jira/browse/NUTCH-2757 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-2756) Segment Part problem with HDFS on distibuted mode

2019-12-06 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989868#comment-16989868 ] Sebastian Nagel commented on NUTCH-2756: Hi [~lucasp], > But today we had again the same problem

[jira] [Commented] (NUTCH-2756) Segment Part problem with HDFS on distibuted mode

2019-12-05 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988866#comment-16988866 ] Sebastian Nagel commented on NUTCH-2756: Hi [~lucasp], could you share more details? - the Hadoop

[jira] [Updated] (NUTCH-1863) Add JSON format dump output to readdb command

2019-12-04 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1863: --- Fix Version/s: 1.17 > Add JSON format dump output to readdb command >

[jira] [Resolved] (NUTCH-2748) Fetch status gone (redirect exceeded) not to overwrite existing items in CrawlDb

2019-12-02 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2748. Resolution: Fixed Merged into master. The new configuration property

[jira] [Resolved] (NUTCH-2746) Basic URL normalizer to normalize Unicode domain names

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2746. Resolution: Fixed Merged/committed. Note: by default the behavior is still the old and

[jira] [Updated] (NUTCH-1971) The crawldb.url.filters property is not present in any configuration file

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1971: --- Fix Version/s: 1.17 > The crawldb.url.filters property is not present in any configuration

[jira] [Commented] (NUTCH-1971) The crawldb.url.filters property is not present in any configuration file

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980264#comment-16980264 ] Sebastian Nagel commented on NUTCH-1971: Needs better documentation. > The crawldb.url.filters

[jira] [Resolved] (NUTCH-1984) Eliminate unnecessary dependencies

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1984. Fix Version/s: 2.5 Resolution: Auto Closed Closing 2.5 issues as branch is no

[jira] [Closed] (NUTCH-1984) Eliminate unnecessary dependencies

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-1984. -- > Eliminate unnecessary dependencies > -- > > Key:

[jira] [Updated] (NUTCH-1999) Add http://nutch.apache.org/robots.txt

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1999: --- Fix Version/s: 1.17 > Add http://nutch.apache.org/robots.txt >

[jira] [Closed] (NUTCH-2003) topN is not work correctly

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-2003. -- > topN is not work correctly > -- > > Key: NUTCH-2003 >

[jira] [Updated] (NUTCH-2002) ParserChecker to check robots.txt

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2002: --- Fix Version/s: 1.17 > ParserChecker to check robots.txt > -

[jira] [Updated] (NUTCH-2002) ParserChecker to check robots.txt

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2002: --- Component/s: parser > ParserChecker to check robots.txt > -

[jira] [Resolved] (NUTCH-2003) topN is not work correctly

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2003. Fix Version/s: 2.5 Resolution: Auto Closed Closing 2.5 issues as branch is no

[jira] [Resolved] (NUTCH-2024) httpcore classpath jar conflict when invoking protocol-selenium

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2024. Resolution: Cannot Reproduce Hi [~lewismc], closing this old issue for now. The selenium

[jira] [Resolved] (NUTCH-2032) Plugin to index the raw content of a readable document.

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2032. Resolution: Duplicate Thanks, [~betolink]! Closing this old issue. The functionality to

[jira] [Closed] (NUTCH-2532) Throw error if HBase is not available while running nutch commands.

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-2532. -- > Throw error if HBase is not available while running nutch commands. >

[jira] [Closed] (NUTCH-2075) Generate will not choose URL without distance marker

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-2075. -- > Generate will not choose URL without distance marker >

[jira] [Closed] (NUTCH-2076) exceptions are not handled when using method waitForCompletion in a try block

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-2076. -- > exceptions are not handled when using method waitForCompletion in a try block >

[jira] [Resolved] (NUTCH-2103) Nutch 2.3 has an old version of hbase jar in runtime/lib folder

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2103. Resolution: Auto Closed Closing 2.5 issues as branch is no longer maintained. > Nutch 2.3

[jira] [Updated] (NUTCH-2113) Need documentation for using various Gora backends

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2113: --- Fix Version/s: 2.5 > Need documentation for using various Gora backends >

[jira] [Resolved] (NUTCH-2113) Need documentation for using various Gora backends

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2113. Resolution: Auto Closed Closing 2.5 issues as branch is no longer maintained. > Need

[jira] [Updated] (NUTCH-2118) browser requests sometimes timeout when using the selenium grid because of port access issues

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2118: --- Affects Version/s: 1.15 > browser requests sometimes timeout when using the selenium grid

[jira] [Updated] (NUTCH-2103) Nutch 2.3 has an old version of hbase jar in runtime/lib folder

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2103: --- Fix Version/s: 2.5 > Nutch 2.3 has an old version of hbase jar in runtime/lib folder >

[jira] [Resolved] (NUTCH-2126) Use selenium protocol for specific sites

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2126. Fix Version/s: 1.16 Resolution: Duplicate This has been implemented in 1.16, see

[jira] [Resolved] (NUTCH-2131) Problem running nutch(crawl) with selenium

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2131. Resolution: Won't Do The selenium plugins have been upgraded in NUTCH-2676. Please test

[jira] [Updated] (NUTCH-2134) Redirection and cookie handling using protocol plugins

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2134: --- Fix Version/s: 1.17 > Redirection and cookie handling using protocol plugins >

[jira] [Resolved] (NUTCH-2240) ava.lang.NoSuchFieldError: INSTANCE selenium nutch

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2240. Resolution: Cannot Reproduce Without further information about the Nutch version used, it

[jira] [Updated] (NUTCH-2249) WordNet Integration for Cosine Similarity

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2249: --- Affects Version/s: 1.15 > WordNet Integration for Cosine Similarity >

[jira] [Resolved] (NUTCH-2253) ProtocolFactory still not thread-safe

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2253. Fix Version/s: 1.16 Resolution: Duplicate Thanks, [~l.misak...@gmail.com] and sorry

[jira] [Updated] (NUTCH-2265) Write A Test Package for Scoring Similarity

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2265: --- Affects Version/s: 1.15 > Write A Test Package for Scoring Similarity >

[jira] [Resolved] (NUTCH-2268) SolrIndexerJob: java.lang.RuntimeException

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2268. Resolution: Auto Closed Closing 2.5 issues as branch is no longer maintained. >

[jira] [Updated] (NUTCH-2268) SolrIndexerJob: java.lang.RuntimeException

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2268: --- Fix Version/s: 2.5 > SolrIndexerJob: java.lang.RuntimeException >

[jira] [Updated] (NUTCH-2274) InteractiveSelenium Plugin's DefaultHandler Returns Null

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2274: --- Fix Version/s: 1.17 > InteractiveSelenium Plugin's DefaultHandler Returns Null >

[jira] [Commented] (NUTCH-2275) MD5Signature by default doesn't take in account parse

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980217#comment-16980217 ] Sebastian Nagel commented on NUTCH-2275: The problem is that the "feed" plugin emits one document

[jira] [Updated] (NUTCH-2275) MD5Signature by default doesn't take in account parse

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2275: --- Fix Version/s: 1.17 > MD5Signature by default doesn't take in account parse >

[jira] [Updated] (NUTCH-2277) Adding goldstandard.txt default file in conf

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2277: --- Fix Version/s: 1.17 > Adding goldstandard.txt default file in conf >

[jira] [Updated] (NUTCH-2277) Adding goldstandard.txt default file in conf

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2277: --- Affects Version/s: 1.15 > Adding goldstandard.txt default file in conf >

[jira] [Updated] (NUTCH-2293) Make the unit tests which requires "plugin.folders" as integration tests

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2293: --- Affects Version/s: 1.15 > Make the unit tests which requires "plugin.folders" as integration

[jira] [Commented] (NUTCH-2318) Text extraction in HtmlParser adds too much whitespace.

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980167#comment-16980167 ] Sebastian Nagel commented on NUTCH-2318: Still a problem, also in 1.x. [~markus17] - you're

[jira] [Updated] (NUTCH-2318) Text extraction in HtmlParser adds too much whitespace.

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2318: --- Fix Version/s: 1.17 > Text extraction in HtmlParser adds too much whitespace. >

[jira] [Updated] (NUTCH-2318) Text extraction in HtmlParser adds too much whitespace.

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2318: --- Affects Version/s: 1.15 > Text extraction in HtmlParser adds too much whitespace. >

[jira] [Resolved] (NUTCH-2739) indexer-elastic: Upgrade ES and migrate to REST client

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2739. Resolution: Implemented Merged PR #484 into master. Opened NUTCH-2755 to track the removal

[jira] [Updated] (NUTCH-2755) Remove obsolete plugin indexer-elastic-rest

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2755: --- Affects Version/s: 1.17 > Remove obsolete plugin indexer-elastic-rest >

[jira] [Created] (NUTCH-2755) Remove obsolete plugin indexer-elastic-rest

2019-11-22 Thread Sebastian Nagel (Jira)
Sebastian Nagel created NUTCH-2755: -- Summary: Remove obsolete plugin indexer-elastic-rest Key: NUTCH-2755 URL: https://issues.apache.org/jira/browse/NUTCH-2755 Project: Nutch Issue Type:

[jira] [Resolved] (NUTCH-2323) ElasticSearch Indexer does not work on Nutch 2.3.1

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2323. Resolution: Auto Closed Closing 2.5 issues as branch is no longer maintained. >

[jira] [Updated] (NUTCH-2323) ElasticSearch Indexer does not work on Nutch 2.3.1

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2323: --- Fix Version/s: 2.5 > ElasticSearch Indexer does not work on Nutch 2.3.1 >

[jira] [Updated] (NUTCH-2331) REST API Fetch fails to retrieve HDFS path on distributed mode

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2331: --- Affects Version/s: 1.15 > REST API Fetch fails to retrieve HDFS path on distributed mode >

[jira] [Updated] (NUTCH-2075) Generate will not choose URL without distance marker

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2075: --- Fix Version/s: 2.5 > Generate will not choose URL without distance marker >

[jira] [Resolved] (NUTCH-2075) Generate will not choose URL without distance marker

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2075. Resolution: Auto Closed Closing 2.5 issues as branch is no longer maintained. > Generate

[jira] [Resolved] (NUTCH-2076) exceptions are not handled when using method waitForCompletion in a try block

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2076. Resolution: Auto Closed Closing 2.5 issues as branch is no longer maintained. >

[jira] [Updated] (NUTCH-2076) exceptions are not handled when using method waitForCompletion in a try block

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2076: --- Fix Version/s: 2.5 > exceptions are not handled when using method waitForCompletion in a try

[jira] [Updated] (NUTCH-2586) Add a fallback mechanism for missing meta tags

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2586: --- Component/s: plugin metadata > Add a fallback mechanism for missing meta

[jira] [Updated] (NUTCH-2586) Add a fallback mechanism for missing meta tags

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2586: --- Affects Version/s: 1.15 > Add a fallback mechanism for missing meta tags >

[jira] [Commented] (NUTCH-2681) ClassCastException - Apache Nutch 1.x, Selenium v2.48.2, firefox 31.4.0

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980134#comment-16980134 ] Sebastian Nagel commented on NUTCH-2681: Eventually resolved "en passant" by NUTCH-2676. Needs

[jira] [Updated] (NUTCH-2586) Add a fallback mechanism for missing meta tags

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2586: --- Fix Version/s: 1.17 > Add a fallback mechanism for missing meta tags >

[jira] [Updated] (NUTCH-2681) ClassCastException - Apache Nutch 1.x, Selenium v2.48.2, firefox 31.4.0

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2681: --- Fix Version/s: 1.17 > ClassCastException - Apache Nutch 1.x, Selenium v2.48.2, firefox

[jira] [Closed] (NUTCH-2270) Solr indexer Failed i

2019-11-22 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-2270. -- > Solr indexer Failed i > - > > Key: NUTCH-2270 >

[jira] [Resolved] (NUTCH-2230) Nutch doesn't index all URLs found

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2230. Resolution: Auto Closed Closing 2.5 issues as branch is no longer maintained. > Nutch

[jira] [Resolved] (NUTCH-2270) Solr indexer Failed i

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2270. Resolution: Duplicate > Solr indexer Failed i > - > >

[jira] [Updated] (NUTCH-2230) Nutch doesn't index all URLs found

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2230: --- Fix Version/s: 2.5 > Nutch doesn't index all URLs found > --

[jira] [Resolved] (NUTCH-2332) Indexer-elastic2 plugin availability timeline

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2332. Resolution: Auto Closed Closing 2.5 issues as branch is no longer maintained. >

[jira] [Resolved] (NUTCH-2341) bin/crawl do not fetch batchId generated by bash script

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2341. Resolution: Auto Closed Closing 2.5 issues as branch is no longer maintained. > bin/crawl

[jira] [Updated] (NUTCH-2332) Indexer-elastic2 plugin availability timeline

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2332: --- Fix Version/s: 2.5 > Indexer-elastic2 plugin availability timeline >

[jira] [Resolved] (NUTCH-2343) Calling nutch extension points before custom plugin

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2343. Resolution: Auto Closed Closing 2.5 issues as branch is no longer maintained. > Calling

[jira] [Updated] (NUTCH-2343) Calling nutch extension points before custom plugin

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2343: --- Fix Version/s: 2.5 > Calling nutch extension points before custom plugin >

[jira] [Resolved] (NUTCH-2361) Deprecated nutch and solr integration documentation.

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2361. Resolution: Fixed The information about the managed schema is now available in the

[jira] [Updated] (NUTCH-2361) Deprecated nutch and solr integration documentation.

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2361: --- Component/s: wiki > Deprecated nutch and solr integration documentation. >

[jira] [Updated] (NUTCH-2379) crawl script dedup's crawldb update is slow

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2379: --- Fix Version/s: 1.17 > crawl script dedup's crawldb update is slow >

[jira] [Updated] (NUTCH-2385) 1.x Elasticsearch Indexer - path.home is not configured

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2385: --- Fix Version/s: 1.17 > 1.x Elasticsearch Indexer - path.home is not configured >

[jira] [Updated] (NUTCH-2396) Cannot stop or abort fetch job via REST API

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2396: --- Fix Version/s: 1.17 > Cannot stop or abort fetch job via REST API >

[jira] [Updated] (NUTCH-2407) Memory leak causing Nutch Server to run out of memory

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2407: --- Fix Version/s: 1.17 > Memory leak causing Nutch Server to run out of memory >

[jira] [Updated] (NUTCH-2407) Memory leak causing Nutch Server to run out of memory

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2407: --- Affects Version/s: 1.16 > Memory leak causing Nutch Server to run out of memory >

[jira] [Commented] (NUTCH-2425) Update GettingNutchRunningWithUbuntu wiki article

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975008#comment-16975008 ] Sebastian Nagel commented on NUTCH-2425: Maybe archive this article in favor of

[jira] [Updated] (NUTCH-2421) parse-html to prioritize HTML5 charset definitions

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2421: --- Fix Version/s: 1.17 > parse-html to prioritize HTML5 charset definitions >

[jira] [Updated] (NUTCH-2421) parse-html to prioritize HTML5 charset definitions

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2421: --- Affects Version/s: 1.15 > parse-html to prioritize HTML5 charset definitions >

[jira] [Updated] (NUTCH-2425) Update GettingNutchRunningWithUbuntu wiki article

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2425: --- Component/s: wiki > Update GettingNutchRunningWithUbuntu wiki article >

[jira] [Updated] (NUTCH-2423) Update contributor info page

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2423: --- Component/s: wiki > Update contributor info page > > >

[jira] [Updated] (NUTCH-2423) Update contributor info page

2019-11-15 Thread Sebastian Nagel (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2423: --- Labels: easytask help-wanted (was: ) > Update contributor info page >

<    4   5   6   7   8   9   10   11   12   13   >