[jira] [Work started] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2016-01-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1712 started by Sebastian Nagel. -- > Use MultipleInputs in Injector to make it a single mapreduce job >

[jira] [Commented] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2016-01-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092924#comment-15092924 ] Sebastian Nagel commented on NUTCH-1712: The merging is done together with minor improvements

[jira] [Commented] (NUTCH-2272) Index checker server to optionally keep client connection open

2016-06-15 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331434#comment-15331434 ] Sebastian Nagel commented on NUTCH-2272: Not included in [1.12 release

[jira] [Commented] (NUTCH-827) HTTP POST Authentication

2016-06-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328204#comment-15328204 ] Sebastian Nagel commented on NUTCH-827: --- Hi [~stevegy], would you mind to open a new Jira for this

[jira] [Created] (NUTCH-2281) Support non-default FileSystem

2016-06-17 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2281: -- Summary: Support non-default FileSystem Key: NUTCH-2281 URL: https://issues.apache.org/jira/browse/NUTCH-2281 Project: Nutch Issue Type: Improvement

[jira] [Commented] (NUTCH-2281) Support non-default FileSystem

2016-06-21 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341680#comment-15341680 ] Sebastian Nagel commented on NUTCH-2281: I tried to fix all tools but haven't tested all of them

[jira] [Created] (NUTCH-2286) CrawlDbReader -stats fetch time and interval

2016-06-23 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2286: -- Summary: CrawlDbReader -stats fetch time and interval Key: NUTCH-2286 URL: https://issues.apache.org/jira/browse/NUTCH-2286 Project: Nutch Issue Type:

[jira] [Updated] (NUTCH-2272) Index checker server to optionally keep client connection open

2016-06-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2272: --- Fix Version/s: (was: 1.12) 1.13 > Index checker server to optionally

[jira] [Updated] (NUTCH-2286) CrawlDbReader -stats to show fetch time and interval

2016-06-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2286: --- Summary: CrawlDbReader -stats to show fetch time and interval (was: CrawlDbReader -stats

[jira] [Commented] (NUTCH-2272) Index checker server to optionally keep client connection open

2016-06-23 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346585#comment-15346585 ] Sebastian Nagel commented on NUTCH-2272: Not included in released 1.12: removed from CHANGES.txt,

[jira] [Commented] (NUTCH-2269) Clean not working after crawl

2016-06-27 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15351824#comment-15351824 ] Sebastian Nagel commented on NUTCH-2269: Thanks for reporting the problems. Afaics, they can be

[jira] [Issue Comment Deleted] (NUTCH-2269) Clean not working after crawl

2016-06-27 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2269: --- Comment: was deleted (was: The message {noformat} WARN output.FileOutputCommitter - Output

[jira] [Issue Comment Deleted] (NUTCH-2269) Clean not working after crawl

2016-06-27 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2269: --- Comment: was deleted (was: The message {noformat} WARN output.FileOutputCommitter - Output

[jira] [Issue Comment Deleted] (NUTCH-2269) Clean not working after crawl

2016-06-27 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2269: --- Comment: was deleted (was: The message {noformat} WARN output.FileOutputCommitter - Output

[jira] [Updated] (NUTCH-1314) Impose a limit on the length of outlink target urls

2016-02-03 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1314: --- Fix Version/s: 1.12 > Impose a limit on the length of outlink target urls >

[jira] [Commented] (NUTCH-2228) index-replace unit test fails

2016-02-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157655#comment-15157655 ] Sebastian Nagel commented on NUTCH-2228: The name of the failing test "testInvalidPatterns"

[jira] [Updated] (NUTCH-2228) index-replace unit test fails

2016-02-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2228: --- Attachment: NUTCH-2228.patch > index-replace unit test fails > -

[jira] [Comment Edited] (NUTCH-2228) index-replace unit test fails

2016-02-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157655#comment-15157655 ] Sebastian Nagel edited comment on NUTCH-2228 at 2/22/16 8:38 PM: - The name

[jira] [Updated] (NUTCH-2228) index-replace unit test fails

2016-02-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2228: --- Patch Info: Patch Available > index-replace unit test fails > - >

[jira] [Commented] (NUTCH-2228) index-replace unit test fails

2016-02-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157632#comment-15157632 ] Sebastian Nagel commented on NUTCH-2228: That's only a problem if Nutch is built with Java 8.

[jira] [Commented] (NUTCH-2220) Rename db.* options used only by the linkdb to linkdb.*

2016-02-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157831#comment-15157831 ] Sebastian Nagel commented on NUTCH-2220: 0 / +1 Since this breaks existing crawl configurations: a

[jira] [Commented] (NUTCH-2221) Introduce db.ignore.internal.links to FetcherThread

2016-02-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157816#comment-15157816 ] Sebastian Nagel commented on NUTCH-2221: +1 Just to consider: the additional argument to

[jira] [Commented] (NUTCH-2216) db.ignore.*.links to optionally follow internal redirects

2016-02-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1515#comment-1515 ] Sebastian Nagel commented on NUTCH-2216: * this was the case before, but shouldn't

[jira] [Resolved] (NUTCH-1712) Use MultipleInputs in Injector to make it a single mapreduce job

2016-02-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1712. Resolution: Fixed Fix Version/s: 1.12 Committed to trunk (f5e430e). > Use

[jira] [Updated] (NUTCH-2204) remove junit lib from runtime

2016-01-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2204: --- Attachment: NUTCH-2204.patch > remove junit lib from runtime > -

[jira] [Created] (NUTCH-2204) remove junit lib from runtime

2016-01-22 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2204: -- Summary: remove junit lib from runtime Key: NUTCH-2204 URL: https://issues.apache.org/jira/browse/NUTCH-2204 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-2204) Remove junit lib from runtime

2016-01-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2204: --- Summary: Remove junit lib from runtime (was: remove junit lib from runtime) > Remove junit

[jira] [Resolved] (NUTCH-2204) remove junit lib from runtime

2016-01-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2204. Resolution: Fixed Committed to trunk, r1726318. > remove junit lib from runtime >

[jira] [Commented] (NUTCH-2144) Plugin to override db.ignore.external to exempt interesting external domain URLs

2016-02-14 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15146685#comment-15146685 ] Sebastian Nagel commented on NUTCH-2144: Hi [~thammegowda], thanks! Everything looks good with the

[jira] [Commented] (NUTCH-2060) dedup is removing entries with status db_gone

2016-03-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174628#comment-15174628 ] Sebastian Nagel commented on NUTCH-2060: Afaics from the mentioned thread on the user mailing

[jira] [Commented] (NUTCH-2242) lastModified not always set

2016-03-24 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210136#comment-15210136 ] Sebastian Nagel commented on NUTCH-2242: Hi Jurian, thanks for reporting this problem. This is

[jira] [Commented] (NUTCH-2237) DeduplicationJob: Add extra order criteria based on slug

2016-03-03 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178587#comment-15178587 ] Sebastian Nagel commented on NUTCH-2237: Good idea! Nice patch, including unit tests. A few

[jira] [Updated] (NUTCH-2237) DeduplicationJob: Add extra order criteria based on slug

2016-03-03 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2237: --- Fix Version/s: 1.12 > DeduplicationJob: Add extra order criteria based on slug >

[jira] [Assigned] (NUTCH-2256) Inconsistent log level practice

2016-04-29 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2256: -- Assignee: Sebastian Nagel > Inconsistent log level practice >

[jira] [Commented] (NUTCH-2256) Inconsistent log level practice

2016-04-29 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264274#comment-15264274 ] Sebastian Nagel commented on NUTCH-2256: Good catch, will fix right now. Thanks, [~songwang]! >

[jira] [Updated] (NUTCH-2256) Inconsistent log level practice

2016-04-29 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2256: --- Fix Version/s: 2.3.2 1.12 2.4 > Inconsistent log level

[jira] [Updated] (NUTCH-2256) Inconsistent log level practice

2016-04-29 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2256: --- Affects Version/s: 1.11 > Inconsistent log level practice > --- >

[jira] [Resolved] (NUTCH-2254) Charset issues when using -addBinaryContent and -base64 options

2016-04-27 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2254. Resolution: Fixed Committed, r6d2bfa9. Thanks, [~fedechicco]! > Charset issues when using

[jira] [Commented] (NUTCH-2254) Charset issues when using -addBinaryContent and -base64 options

2016-04-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256225#comment-15256225 ] Sebastian Nagel commented on NUTCH-2254: Hi [~fedechicco], the patch should work. Thanks! I'll add

[jira] [Assigned] (NUTCH-2254) Charset issues when using -addBinaryContent and -base64 options

2016-04-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2254: -- Assignee: Sebastian Nagel > Charset issues when using -addBinaryContent and -base64

[jira] [Resolved] (NUTCH-2256) Inconsistent log level practice

2016-04-29 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2256. Resolution: Fixed Fix Version/s: (was: 2.3.2) Fixed and committed to 1.x

[jira] [Closed] (NUTCH-2256) Inconsistent log level practice

2016-04-29 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel closed NUTCH-2256. -- Also did a grep on all Java files for errors of the same kind - nothing found. Thanks,

[jira] [Updated] (NUTCH-2164) Inconsistent 'Modified Time' in crawl db

2016-05-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2164: --- Fix Version/s: 1.13 > Inconsistent 'Modified Time' in crawl db >

[jira] [Commented] (NUTCH-1858) Migrate Nutch documentation from Moin Moin to Confluence

2016-05-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15291591#comment-15291591 ] Sebastian Nagel commented on NUTCH-1858: It's hardly a work for a single person. First steps could

[jira] [Reopened] (NUTCH-2252) Allow phantomjs as a browser for selenium options

2016-05-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reopened NUTCH-2252: Tests fail to compile [[1|https://builds.apache.org/job/Nutch-trunk/3365/console]]: {noformat}

[jira] [Commented] (NUTCH-2242) lastModified not always set

2016-05-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280076#comment-15280076 ] Sebastian Nagel commented on NUTCH-2242: Opened pull request

[jira] [Commented] (NUTCH-2242) lastModified not always set

2016-05-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279942#comment-15279942 ] Sebastian Nagel commented on NUTCH-2242: [~markus17]: Sorry, I didn't upload a final patch, simply

[jira] [Commented] (NUTCH-1785) Ability to index raw content

2016-04-20 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250812#comment-15250812 ] Sebastian Nagel commented on NUTCH-1785: The class o.a.n.indexer.NutchField supports only a couple

[jira] [Resolved] (NUTCH-2191) Add protocol-htmlunit

2016-04-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2191. Resolution: Fixed Merged pull request #105. Build should succeed now. Thanks,

[jira] [Reopened] (NUTCH-2191) Add protocol-htmlunit

2016-04-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reopened NUTCH-2191: Build fails because protocol-htmlunit's build.xml claims to have unit tests but there aren't

[jira] [Created] (NUTCH-2299) Remove obsolete properties protocol.plugin.check.*

2016-08-15 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2299: -- Summary: Remove obsolete properties protocol.plugin.check.* Key: NUTCH-2299 URL: https://issues.apache.org/jira/browse/NUTCH-2299 Project: Nutch Issue

[jira] [Commented] (NUTCH-2297) CrawlDbReader -stats wrong values for earliest fetch time and shortest interval

2016-08-08 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411716#comment-15411716 ] Sebastian Nagel commented on NUTCH-2297: The wrong values are already in the temporary output of

[jira] [Created] (NUTCH-2297) CrawlDbReader -stats wrong values for earliest fetch time and shortest interval

2016-08-08 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2297: -- Summary: CrawlDbReader -stats wrong values for earliest fetch time and shortest interval Key: NUTCH-2297 URL: https://issues.apache.org/jira/browse/NUTCH-2297

[jira] [Created] (NUTCH-2291) Fix mrunit dependencies

2016-06-30 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2291: -- Summary: Fix mrunit dependencies Key: NUTCH-2291 URL: https://issues.apache.org/jira/browse/NUTCH-2291 Project: Nutch Issue Type: Bug

[jira] [Resolved] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1553. Resolution: Fixed Thanks [~alfonso.presa]! Verified the solution and committed. > Property

[jira] [Updated] (NUTCH-2291) Fix mrunit dependencies

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2291: --- Attachment: NUTCH-2291-2.patch Solution 2: explicitly add mockito dependencies. (Solution 1

[jira] [Comment Edited] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356625#comment-15356625 ] Sebastian Nagel edited comment on NUTCH-1553 at 6/30/16 6:48 AM: - Thanks

[jira] [Assigned] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-1553: -- Assignee: Sebastian Nagel > Property 'indexer.delete.robots.noindex' not working when

[jira] [Updated] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1553: --- Fix Version/s: 1.13 > Property 'indexer.delete.robots.noindex' not working when using

[jira] [Updated] (NUTCH-2291) Fix mrunit dependencies

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2291: --- Attachment: mrunit-deps-cached.png mrunit-deps-new.png Screenshots from Ant

[jira] [Updated] (NUTCH-2291) Fix mrunit dependencies

2016-06-30 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2291: --- Attachment: NUTCH-2291-1.patch Solution 1: remove {{maven:classifier="hadoop2"}} from the

[jira] [Reopened] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-07-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reopened NUTCH-1553: Tests fail, see

[jira] [Commented] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-07-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358841#comment-15358841 ] Sebastian Nagel commented on NUTCH-1553: Ok, the test did not test anything before because no

[jira] [Commented] (NUTCH-2291) Fix mrunit dependencies

2016-07-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358917#comment-15358917 ] Sebastian Nagel commented on NUTCH-2291: Solution 1 is correct because the classifier is only a

[jira] [Resolved] (NUTCH-2291) Fix mrunit dependencies

2016-07-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2291. Resolution: Fixed > Fix mrunit dependencies > --- > >

[jira] [Assigned] (NUTCH-2291) Fix mrunit dependencies

2016-07-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2291: -- Assignee: Sebastian Nagel > Fix mrunit dependencies > --- > >

[jira] [Resolved] (NUTCH-1553) Property 'indexer.delete.robots.noindex' not working when using parser-html.

2016-07-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1553. Resolution: Fixed Fixed the unit test. Also made sure that all values are added, in case

[jira] [Updated] (NUTCH-1308) Add main() to ZipParser

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1308: --- Summary: Add main() to ZipParser (was: Unnecessary truncate content configuration, and

[jira] [Work started] (NUTCH-1308) Unnecessary truncate content configuration, and logging in parse-zip/ZipParser

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1308 started by Sebastian Nagel. -- > Unnecessary truncate content configuration, and logging in >

[jira] [Updated] (NUTCH-1308) Add main() to ZipParser

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1308: --- Fix Version/s: (was: 2.5) 1.13 > Add main() to ZipParser >

[jira] [Updated] (NUTCH-1308) Add main() to ZipParser

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1308: --- Issue Type: Improvement (was: Bug) > Add main() to ZipParser > --- > >

[jira] [Updated] (NUTCH-1308) Add main() to ZipParser

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1308: --- Priority: Minor (was: Major) > Add main() to ZipParser > --- > >

[jira] [Resolved] (NUTCH-1308) Add main() to ZipParser

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1308. Resolution: Fixed Added main() to 1.x ZipParser. For 2.x parse-zip has been disabled. >

[jira] [Assigned] (NUTCH-2286) CrawlDbReader -stats to show fetch time and interval

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2286: -- Assignee: Sebastian Nagel > CrawlDbReader -stats to show fetch time and interval >

[jira] [Resolved] (NUTCH-2286) CrawlDbReader -stats to show fetch time and interval

2016-07-02 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2286. Resolution: Fixed Committed. Thanks, [~markus17]! > CrawlDbReader -stats to show fetch

[jira] [Created] (NUTCH-2290) Update licenses of bundled libraries

2016-06-29 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2290: -- Summary: Update licenses of bundled libraries Key: NUTCH-2290 URL: https://issues.apache.org/jira/browse/NUTCH-2290 Project: Nutch Issue Type: Bug

[jira] [Resolved] (NUTCH-2299) Remove obsolete properties protocol.plugin.check.*

2016-08-16 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2299. Resolution: Fixed > Remove obsolete properties protocol.plugin.check.* >

[jira] [Assigned] (NUTCH-2299) Remove obsolete properties protocol.plugin.check.*

2016-08-16 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2299: -- Assignee: Sebastian Nagel > Remove obsolete properties protocol.plugin.check.* >

[jira] [Work started] (NUTCH-2299) Remove obsolete properties protocol.plugin.check.*

2016-08-16 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2299 started by Sebastian Nagel. -- > Remove obsolete properties protocol.plugin.check.* >

[jira] [Commented] (NUTCH-2355) Protocol plugins to set cookie if Cookie metadata field is present

2017-02-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848277#comment-15848277 ] Sebastian Nagel commented on NUTCH-2355: Hi Markus, useful for sure, e.g., if a server uses

[jira] [Resolved] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name

2017-02-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2345. Resolution: Duplicate Thanks [~Mgupta]! The fix is included in NUTCH-2352. >

[jira] [Resolved] (NUTCH-2347) Use Logger Instead of Printing Throwable

2017-02-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2347. Resolution: Fixed Merged into 2.x, thanks [~kamaci]! > Use Logger Instead of Printing

[jira] [Resolved] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-02-01 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2349. Resolution: Fixed Assignee: Sebastian Nagel Committed to 1.x and 2.x. >

[jira] [Commented] (NUTCH-2346) Check Types at Object Equality

2017-01-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840129#comment-15840129 ] Sebastian Nagel commented on NUTCH-2346: Hi Lewis, no problem. That's a subtle issue, looks

[jira] [Comment Edited] (NUTCH-2346) Check Types at Object Equality

2017-01-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837625#comment-15837625 ] Sebastian Nagel edited comment on NUTCH-2346 at 1/25/17 12:12 PM: -- Hi,

[jira] [Reopened] (NUTCH-2346) Check Types at Object Equality

2017-01-25 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reopened NUTCH-2346: Hi, o.a.n.protocol.TestContent now fails in line 50 {code}

[jira] [Updated] (NUTCH-2357) Index metadata throw Exception because writable object cannot be cast to Text

2017-02-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2357: --- Flags: Patch Patch Info: Patch Available > Index metadata throw Exception because

[jira] [Commented] (NUTCH-2357) Index metadata throw Exception because writable object cannot be cast to Text

2017-02-09 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15859272#comment-15859272 ] Sebastian Nagel commented on NUTCH-2357: Thanks! See also [this discussion on the user mailing

[jira] [Commented] (NUTCH-2350) Add Missing activeConfId Field to NutchStatus Object

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826198#comment-15826198 ] Sebastian Nagel commented on NUTCH-2350: +1 but isn't this just part of NUTCH-2344 (or a subtask),

[jira] [Updated] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2349: --- Affects Version/s: 2.4 > urlnormalizer-basic NPE for ill-formed URL "http:/" >

[jira] [Updated] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2349: --- Fix Version/s: 2.4 > urlnormalizer-basic NPE for ill-formed URL "http:/" >

[jira] [Commented] (NUTCH-2349) urlnormalizer-basic NPE for ill-formed URL "http:/"

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826958#comment-15826958 ] Sebastian Nagel commented on NUTCH-2349: See also

[jira] [Commented] (NUTCH-2345) FetchItemQueue logs are logged with wrong class name

2017-01-17 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826084#comment-15826084 ] Sebastian Nagel commented on NUTCH-2345: Hi Lewis, yes, that looks easier to maintain, but better

[jira] [Commented] (NUTCH-2333) Indexer for RabbitMQ

2017-01-18 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828234#comment-15828234 ] Sebastian Nagel commented on NUTCH-2333: +1 looks good, although I haven't tested it. Yes, there

[jira] [Resolved] (NUTCH-2351) Log with Generic Class Name at Nutch 2.x

2017-01-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2351. Resolution: Fixed Committed to 2.x, thanks [~kamaci]! > Log with Generic Class Name at

[jira] [Resolved] (NUTCH-2352) Log with Generic Class Name at Nutch 1.x

2017-01-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2352. Resolution: Fixed Committed to 1.x. Thanks, [~kamaci]! > Log with Generic Class Name at

[jira] [Commented] (NUTCH-2352) Log with Generic Class Name at Nutch 1.x

2017-01-19 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830548#comment-15830548 ] Sebastian Nagel commented on NUTCH-2352: +1 lgtm, going to commit... > Log with Generic Class

[jira] [Created] (NUTCH-2300) Fetcher to optionally save robots.txt

2016-08-19 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2300: -- Summary: Fetcher to optionally save robots.txt Key: NUTCH-2300 URL: https://issues.apache.org/jira/browse/NUTCH-2300 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-2246) Refactor /seed endpoint for backward compatibility

2016-08-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15431497#comment-15431497 ] Sebastian Nagel commented on NUTCH-2246: Hi [~sujenshah], it's committed, couldn't it be resolved

[jira] [Work started] (NUTCH-2305) generate.min.score doesn't work in 2.x

2016-08-22 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2305 started by Sebastian Nagel. -- > generate.min.score doesn't work in 2.x >

<    5   6   7   8   9   10   11   12   13   14   >