[jira] [Resolved] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1553. Resolution: Fixed Thanks [~alfonso.presa]! Verified the solution and committed. > Property

[jira] [Updated] (NUTCH-2291) Fix mrunit dependencies

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2291: --- Attachment: NUTCH-2291-2.patch Solution 2: explicitly add mockito dependencies. (Solution 1

[jira] [Comment Edited] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356625#comment-15356625 ] Sebastian Nagel edited comment on NUTCH-1553 at 6/30/16 6:48 AM: - Thanks

[jira] [Assigned] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-1553: -- Assignee: Sebastian Nagel > Property 'indexer.delete.robots.noindex' not working when

[jira] [Updated] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1553: --- Fix Version/s: 1.13 > Property 'indexer.delete.robots.noindex' not working when using

[jira] [Updated] (NUTCH-2291) Fix mrunit dependencies

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2291: --- Attachment: mrunit-deps-cached.png mrunit-deps-new.png Screenshots from Ant

[jira] [Updated] (NUTCH-2291) Fix mrunit dependencies

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2291: --- Attachment: NUTCH-2291-1.patch Solution 1: remove {{maven:classifier="hadoop2"}} from the

[jira] [Reopened] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-07-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reopened NUTCH-1553: Tests fail, see

[jira] [Commented] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-07-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358841#comment-15358841 ] Sebastian Nagel commented on NUTCH-1553: Ok, the test did not test anything before because no

[jira] [Commented] (NUTCH-2291) Fix mrunit dependencies

2016-07-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358917#comment-15358917 ] Sebastian Nagel commented on NUTCH-2291: Solution 1 is correct because the classifier is only a

[jira] [Resolved] (NUTCH-2291) Fix mrunit dependencies

2016-07-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2291. Resolution: Fixed > Fix mrunit dependencies > --- > >

[jira] [Assigned] (NUTCH-2291) Fix mrunit dependencies

2016-07-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2291: -- Assignee: Sebastian Nagel > Fix mrunit dependencies > --- > >

[jira] [Resolved] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-07-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1553. Resolution: Fixed Fixed the unit test. Also made sure that all values are added, in case

[jira] [Updated] (NUTCH-1308) Add main() to ZipParser

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1308: --- Summary: Add main() to ZipParser (was: Unnecessary truncate content configuration, and

[jira] [Work started] (NUTCH-1308) Unnecessary truncate content configuration, and logging in parse-zip/ZipParser

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1308 started by Sebastian Nagel. -- > Unnecessary truncate content configuration, and logging in >

[jira] [Updated] (NUTCH-1308) Add main() to ZipParser

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1308: --- Fix Version/s: (was: 2.5) 1.13 > Add main() to ZipParser >

[jira] [Updated] (NUTCH-1308) Add main() to ZipParser

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1308: --- Issue Type: Improvement (was: Bug) > Add main() to ZipParser > --- > >

[jira] [Updated] (NUTCH-1308) Add main() to ZipParser

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1308: --- Priority: Minor (was: Major) > Add main() to ZipParser > --- > >

[jira] [Resolved] (NUTCH-1308) Add main() to ZipParser

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1308. Resolution: Fixed Added main() to 1.x ZipParser. For 2.x parse-zip has been disabled. >

[jira] [Assigned] (NUTCH-2286) CrawlDbReader -stats to show fetch time and interval

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2286: -- Assignee: Sebastian Nagel > CrawlDbReader -stats to show fetch time and interval >

[jira] [Resolved] (NUTCH-2286) CrawlDbReader -stats to show fetch time and interval

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2286. Resolution: Fixed Committed. Thanks, [~markus17]! > CrawlDbReader -stats to show fetch

[jira] [Created] (NUTCH-2290) Update licenses of bundled libraries

2016-06-29 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2290: -- Summary: Update licenses of bundled libraries Key: NUTCH-2290 URL: https://issues.apache.org/jira/browse/NUTCH-2290 Project: Nutch Issue Type: Bug

[jira] [Resolved] (NUTCH-2299) Remove obsolete properties protocol.plugin.check.*

2016-08-16 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2299. Resolution: Fixed > Remove obsolete properties protocol.plugin.check.* >

[jira] [Assigned] (NUTCH-2299) Remove obsolete properties protocol.plugin.check.*

2016-08-16 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2299: -- Assignee: Sebastian Nagel > Remove obsolete properties protocol.plugin.check.* >

[jira] [Work started] (NUTCH-2299) Remove obsolete properties protocol.plugin.check.*

2016-08-16 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2299 started by Sebastian Nagel. -- > Remove obsolete properties protocol.plugin.check.* >

[jira] [Commented] (NUTCH-2355) Protocol plugins to set cookie if Cookie metadata field is present

2017-02-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848277#comment-15848277 ] Sebastian Nagel commented on NUTCH-2355: Hi Markus, useful for sure, e.g., if a server uses

[jira] [Resolved] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name

2017-02-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2345. Resolution: Duplicate Thanks [~Mgupta]! The fix is included in NUTCH-2352. >

[jira] [Resolved] (NUTCH-2347) Use Logger Instead of Printing Throwable

2017-02-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2347. Resolution: Fixed Merged into 2.x, thanks [~kamaci]! > Use Logger Instead of Printing

[jira] [Resolved] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-02-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2349. Resolution: Fixed Assignee: Sebastian Nagel Committed to 1.x and 2.x. >

[jira] [Commented] (NUTCH-2346) Check Types at Object Equality

2017-01-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840129#comment-15840129 ] Sebastian Nagel commented on NUTCH-2346: Hi Lewis, no problem. That's a subtle issue, looks

[jira] [Comment Edited] (NUTCH-2346) Check Types at Object Equality

2017-01-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837625#comment-15837625 ] Sebastian Nagel edited comment on NUTCH-2346 at 1/25/17 12:12 PM: -- Hi,

[jira] [Reopened] (NUTCH-2346) Check Types at Object Equality

2017-01-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reopened NUTCH-2346: Hi, o.a.n.protocol.TestContent now fails in line 50 {code}

[jira] [Updated] (NUTCH-2357) Index metadata throw Exception because writable object cannot be cast to Text

2017-02-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2357: --- Flags: Patch Patch Info: Patch Available > Index metadata throw Exception because

[jira] [Commented] (NUTCH-2357) Index metadata throw Exception because writable object cannot be cast to Text

2017-02-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859272#comment-15859272 ] Sebastian Nagel commented on NUTCH-2357: Thanks! See also [this discussion on the user mailing

[jira] [Commented] (NUTCH-2350) Add Missing activeConfId Field to NutchStatus Object

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826198#comment-15826198 ] Sebastian Nagel commented on NUTCH-2350: +1 but isn't this just part of NUTCH-2344 (or a subtask),

[jira] [Updated] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2349: --- Affects Version/s: 2.4 > urlnormalizer-basic NPE for ill-formed URL "http:/" >

[jira] [Updated] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2349: --- Fix Version/s: 2.4 > urlnormalizer-basic NPE for ill-formed URL "http:/" >

[jira] [Commented] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826958#comment-15826958 ] Sebastian Nagel commented on NUTCH-2349: See also

[jira] [Commented] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826084#comment-15826084 ] Sebastian Nagel commented on NUTCH-2345: Hi Lewis, yes, that looks easier to maintain, but better

[jira] [Commented] (NUTCH-2333) Indexer for RabbitMQ

2017-01-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828234#comment-15828234 ] Sebastian Nagel commented on NUTCH-2333: +1 looks good, although I haven't tested it. Yes, there

[jira] [Resolved] (NUTCH-2352) Log with Generic Class Name at Nutch 1.x

2017-01-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2352. Resolution: Fixed Committed to 1.x. Thanks, [~kamaci]! > Log with Generic Class Name at

[jira] [Commented] (NUTCH-2352) Log with Generic Class Name at Nutch 1.x

2017-01-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830548#comment-15830548 ] Sebastian Nagel commented on NUTCH-2352: +1 lgtm, going to commit... > Log with Generic Class

[jira] [Created] (NUTCH-2300) Fetcher to optionally save robots.txt

2016-08-19 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2300: -- Summary: Fetcher to optionally save robots.txt Key: NUTCH-2300 URL: https://issues.apache.org/jira/browse/NUTCH-2300 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-2246) Refactor /seed endpoint for backward compatibility

2016-08-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431497#comment-15431497 ] Sebastian Nagel commented on NUTCH-2246: Hi [~sujenshah], it's committed, couldn't it be resolved

[jira] [Work started] (NUTCH-2305) generate.min.score doesn't work in 2.x

2016-08-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2305 started by Sebastian Nagel. -- > generate.min.score doesn't work in 2.x >

[jira] [Assigned] (NUTCH-2305) generate.min.score doesn't work in 2.x

2016-08-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2305: -- Assignee: Sebastian Nagel > generate.min.score doesn't work in 2.x >

[jira] [Commented] (NUTCH-2305) generate.min.score doesn't work in 2.x

2016-08-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431556#comment-15431556 ] Sebastian Nagel commented on NUTCH-2305: Thanks, [~cloudysunny14]. Good catch! >

[jira] [Updated] (NUTCH-2139) Basic plugin to index inlinks and outlinks

2016-08-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2139: --- Fix Version/s: (was: 1.13) 1.11 > Basic plugin to index inlinks and

[jira] [Commented] (NUTCH-2139) Basic plugin to index inlinks and outlinks

2016-08-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431575#comment-15431575 ] Sebastian Nagel commented on NUTCH-2139: Hi [~19manish90], thanks for reporting. Please, open a

[jira] [Updated] (NUTCH-2139) Basic plugin to index inlinks and outlinks

2016-08-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2139: --- Assignee: Jorge Luis Betancourt Gonzalez > Basic plugin to index inlinks and outlinks >

[jira] [Resolved] (NUTCH-2139) Basic plugin to index inlinks and outlinks

2016-08-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2139. Resolution: Fixed Already fixed for 1.11, resolving. > Basic plugin to index inlinks and

[jira] [Resolved] (NUTCH-2300) Fetcher to optionally save robots.txt

2016-08-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2300. Resolution: Fixed Committed to 1.x (3fca1a5). Thanks everyone! > Fetcher to optionally

[jira] [Assigned] (NUTCH-2300) Fetcher to optionally save robots.txt

2016-08-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2300: -- Assignee: Sebastian Nagel > Fetcher to optionally save robots.txt >

[jira] [Updated] (NUTCH-1749) Title duplicated in document body

2016-09-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1749: --- Fix Version/s: 1.13 > Title duplicated in document body > - >

[jira] [Updated] (NUTCH-1749) Optionally exclude title from content field

2016-09-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1749: --- Summary: Optionally exclude title from content field (was: Title duplicated in document

[jira] [Updated] (NUTCH-2315) UpdateDb jobs fails everytime (Nutch 2.3.1)

2016-09-15 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2315: --- Fix Version/s: (was: 2.3.1) 2.4 > UpdateDb jobs fails everytime (Nutch

[jira] [Updated] (NUTCH-2315) UpdateDb jobs fails everytime (Nutch 2.3.1)

2016-09-15 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2315: --- Attachment: NUTCH-2315-2.3.1-1.patch Normally, invalid URLs should be filtered aways or fixed

[jira] [Commented] (NUTCH-2315) UpdateDb jobs fails everytime (Nutch 2.3.1)

2016-09-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522311#comment-15522311 ] Sebastian Nagel commented on NUTCH-2315: Looks like some longer text "but when someone..." is used

[jira] [Commented] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-10-05 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549245#comment-15549245 ] Sebastian Nagel commented on NUTCH-2320: Right, change logs are generated from Jira. >

[jira] [Commented] (NUTCH-2319) Link with "rel=alternate" doesn't return in crawl

2016-10-07 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15554993#comment-15554993 ] Sebastian Nagel commented on NUTCH-2319: See the ongoing discussion in user@nutch [Issue Crawling

[jira] [Updated] (NUTCH-2242) lastModified not always set

2016-08-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2242: --- Assignee: (was: Sebastian Nagel) > lastModified not always set >

[jira] [Resolved] (NUTCH-2242) lastModified not always set

2016-08-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2242. Resolution: Fixed Committed (70622c3) to 1.x including NUTCH-2164. Thanks, [~jurian]!

[jira] [Resolved] (NUTCH-2164) Inconsistent 'Modified Time' in crawl db

2016-08-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2164. Resolution: Fixed Committed (70622c3) to 1.x including NUTCH-2242. Thanks,

[jira] [Commented] (NUTCH-2246) Refactor /seed endpoint for backward compatibility

2016-08-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432283#comment-15432283 ] Sebastian Nagel commented on NUTCH-2246: Thanks! I've removed it from the 1.12 section of

[jira] [Assigned] (NUTCH-2242) lastModified not always set

2016-08-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2242: -- Assignee: Sebastian Nagel > lastModified not always set > ---

[jira] [Resolved] (NUTCH-2305) generate.min.score doesn't work in 2.x

2016-08-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2305. Resolution: Fixed Fix Version/s: 2.4 Committed (5c3a38), thanks! The property

[jira] [Commented] (NUTCH-2315) UpdateDb jobs fails everytime (Nutch 2.3.1)

2016-09-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528667#comment-15528667 ] Sebastian Nagel commented on NUTCH-2315: Thanks, for reporting that this configuration change

[jira] [Commented] (NUTCH-2315) UpdateDb jobs fails everytime (Nutch 2.3.1)

2016-09-28 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528786#comment-15528786 ] Sebastian Nagel commented on NUTCH-2315: Ev., take a lower value, according to [mongodb

[jira] [Commented] (NUTCH-2316) Library conflict with Parser-Tika Plugin and Lib Folder

2016-09-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520581#comment-15520581 ] Sebastian Nagel commented on NUTCH-2316: Plugins have the Nutch "main" classes and their

[jira] [Commented] (NUTCH-1314) Impose a limit on the length of outlink target urls

2016-09-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522329#comment-15522329 ] Sebastian Nagel commented on NUTCH-1314: Is there a reason why this issue is still open? To be

[jira] [Updated] (NUTCH-2328) GeneratorJob does not generate anything on second run

2016-10-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2328: --- Fix Version/s: 2.4 > GeneratorJob does not generate anything on second run >

[jira] [Updated] (NUTCH-2328) GeneratorJob does not generate anything on second run

2016-10-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2328: --- Affects Version/s: (was: 2.5) (was: 2.4) > GeneratorJob does

[jira] [Commented] (NUTCH-2328) GeneratorJob does not generate anything on second run

2016-10-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585585#comment-15585585 ] Sebastian Nagel commented on NUTCH-2328: Thanks, [~arthur-evozon]. Good catch! In which

[jira] [Commented] (NUTCH-2328) GeneratorJob does not generate anything on second run

2016-10-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586690#comment-15586690 ] Sebastian Nagel commented on NUTCH-2328: > the only solution is to have a cluster wide propagated

[jira] [Commented] (NUTCH-2334) Extension point for schedulers

2016-11-24 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15693599#comment-15693599 ] Sebastian Nagel commented on NUTCH-2334: Hi [~roannel], what does "extension point for schedulers"

[jira] [Created] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2016-11-28 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2335: -- Summary: Injector not to filter and normalize existing URLs in CrawlDb Key: NUTCH-2335 URL: https://issues.apache.org/jira/browse/NUTCH-2335 Project: Nutch

[jira] [Created] (NUTCH-2337) urlnormalizer-basic to strip empty port

2016-12-09 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2337: -- Summary: urlnormalizer-basic to strip empty port Key: NUTCH-2337 URL: https://issues.apache.org/jira/browse/NUTCH-2337 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-2046) The crawl script should be able to skip an initial injection.

2016-12-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745073#comment-15745073 ] Sebastian Nagel commented on NUTCH-2046: A statement in change log and release notes that the

[jira] [Commented] (NUTCH-2320) URLFilterChecker to run as TCP Telnet service

2016-12-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744691#comment-15744691 ] Sebastian Nagel commented on NUTCH-2320: Hi Markus, generally +1 - the telnet service works and

[jira] [Commented] (NUTCH-2338) URLNormalizerChecker to run as TCP Telnet service

2016-12-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15744738#comment-15744738 ] Sebastian Nagel commented on NUTCH-2338: Hi Markus, thanks! See the comments on NUTCH-2320 which

[jira] [Resolved] (NUTCH-2337) urlnormalizer-basic to strip empty port

2016-12-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2337. Resolution: Fixed Fix Version/s: 2.4 Committed to trunk f351790 and 2.x 6e3c34d.

[jira] [Updated] (NUTCH-2337) urlnormalizer-basic to strip empty port

2016-12-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2337: --- Affects Version/s: 2.3.1 > urlnormalizer-basic to strip empty port >

[jira] [Commented] (NUTCH-2340) Can't install NUTCH from latest master branch. resolve-default: [ivy:resolve] :: Apache Ivy 2.4.0 - 20141213170938 :: http://ant.apache.org/ivy/ :: [ivy:resolve] :: lo

2016-12-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15757702#comment-15757702 ] Sebastian Nagel commented on NUTCH-2340: Thanks [~rajanchandi]! However, this looks like a

[jira] [Comment Edited] (NUTCH-2340) Can't install NUTCH from latest master branch. resolve-default: [ivy:resolve] :: Apache Ivy 2.4.0 - 20141213170938 :: http://ant.apache.org/ivy/ :: [ivy:resolve]

2016-12-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15757702#comment-15757702 ] Sebastian Nagel edited comment on NUTCH-2340 at 12/17/16 10:14 PM: ---

[jira] [Created] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-01-11 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2349: -- Summary: urlnormalizer-basic NPE for ill-formed URL "http:/" Key: NUTCH-2349 URL: https://issues.apache.org/jira/browse/NUTCH-2349 Project: Nutch Issue

[jira] [Commented] (NUTCH-2334) Extension point for schedulers

2017-01-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15815806#comment-15815806 ] Sebastian Nagel commented on NUTCH-2334: If it's only about deciding whether a page is (re)fetched

[jira] [Commented] (NUTCH-2336) SegmentReader to implement Tool

2016-11-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15708949#comment-15708949 ] Sebastian Nagel commented on NUTCH-2336: Thanks, [~VSlot]! Looks good to me. If there are no

[jira] [Commented] (NUTCH-2336) SegmentReader to implement Tool

2016-12-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15711816#comment-15711816 ] Sebastian Nagel commented on NUTCH-2336: The error happened in

[jira] [Commented] (NUTCH-2336) SegmentReader to implement Tool

2016-12-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15711845#comment-15711845 ] Sebastian Nagel commented on NUTCH-2336: Ok, this was a temporary failure. The next build without

[jira] [Assigned] (NUTCH-2336) SegmentReader to implement Tool

2016-12-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2336: -- Assignee: Sebastian Nagel > SegmentReader to implement Tool >

[jira] [Resolved] (NUTCH-2336) SegmentReader to implement Tool

2016-12-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2336. Resolution: Fixed Committed (6e051f2). Thanks! > SegmentReader to implement Tool >

[jira] [Commented] (NUTCH-2193) Upgrade feed parser plugin to use rome 1.5

2017-03-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936536#comment-15936536 ] Sebastian Nagel commented on NUTCH-2193: Any objections to include this in 1.13? We would also

[jira] [Commented] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-03-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936522#comment-15936522 ] Sebastian Nagel commented on NUTCH-2335: Rebased pull-request, tested in production. > Injector

[jira] [Commented] (NUTCH-2247) Protocol resolver

2017-03-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936622#comment-15936622 ] Sebastian Nagel commented on NUTCH-2247: Hi Markus, interesting tool! - although ant is able to

[jira] [Commented] (NUTCH-2212) Decrease memory consumption by tuning stack size

2017-03-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936581#comment-15936581 ] Sebastian Nagel commented on NUTCH-2212: Hi Markus, is this really a problem of Nutch? The stack

[jira] [Commented] (NUTCH-2334) Extension point for schedulers

2017-03-29 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946865#comment-15946865 ] Sebastian Nagel commented on NUTCH-2334: Hi [~roannel], see

[jira] [Commented] (NUTCH-2372) Javadocs build failing.

2017-04-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963051#comment-15963051 ] Sebastian Nagel commented on NUTCH-2372: Hi [~omkar20895], great! Patch looks good, and all

[jira] [Updated] (NUTCH-2372) Javadocs build failing.

2017-04-10 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2372: --- Fix Version/s: 1.14 2.4 > Javadocs build failing. >

[jira] [Commented] (NUTCH-2335) Injector not to filter and normalize existing URLs in CrawlDb

2017-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964389#comment-15964389 ] Sebastian Nagel commented on NUTCH-2335: Hi Markus, I cannot see what's going on, it's a

[jira] [Resolved] (NUTCH-2269) Clean not working after crawl

2017-04-06 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2269. Resolution: Fixed > Clean not working after crawl > - > >

<    5   6   7   8   9   10   11   12   13   14   >