[jira] [Assigned] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2533: -- Assignee: Sebastian Nagel > Injector: NullPointerException if seed URL dir contains

[jira] [Updated] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2533: --- Summary: Injector: NullPointerException if seed URL dir contains non-file entries (was:

[jira] [Updated] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2533: --- Component/s: injector > Injector: NullPointerException if seed URL dir contains non-file

[jira] [Updated] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2533: --- Priority: Minor (was: Blocker) > Injector: NullPointerException if seed URL dir contains

[jira] [Commented] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433638#comment-16433638 ] Sebastian Nagel commented on NUTCH-2533: Nutch resp. the Hadoop

[jira] [Updated] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2533: --- Fix Version/s: 1.15 2.4 > Injector: NullPointerException if seed URL dir

[jira] [Resolved] (NUTCH-1224) Migrate FreeGenerator to MapReduce API

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1224. Resolution: Fixed Fix Version/s: 1.15 This has been done by NUTCH-2375 for 1.15. >

[jira] [Resolved] (NUTCH-1223) Migrate WebGraph to MapReduce API

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1223. Resolution: Fixed Fix Version/s: 1.15 This has been done by NUTCH-2375 for 1.15. >

[jira] [Commented] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries

2018-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433656#comment-16433656 ] ASF GitHub Bot commented on NUTCH-2533: --- sebastian-nagel opened a new pull request #312: NUTCH-2533

[jira] [Assigned] (NUTCH-2566) Fix exception log messages

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel reassigned NUTCH-2566: -- Assignee: Sebastian Nagel > Fix exception log messages > -- >

[jira] [Created] (NUTCH-2566) Fix exception log messages

2018-04-11 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-2566: -- Summary: Fix exception log messages Key: NUTCH-2566 URL: https://issues.apache.org/jira/browse/NUTCH-2566 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-1226) Migrate CrawlDbReader to MapReduce API

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-1226: --- Fix Version/s: 1.15 > Migrate CrawlDbReader to MapReduce API >

[jira] [Resolved] (NUTCH-1226) Migrate CrawlDbReader to MapReduce API

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1226. Resolution: Implemented This has been done by NUTCH-2375. > Migrate CrawlDbReader to

[jira] [Updated] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2533: --- Affects Version/s: 1.14 > Injector: NullPointerException if seed URL dir contains non-file

[jira] [Resolved] (NUTCH-2384) nutch 2.3.1 job not properly interacting with hadoop 2.7.1

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2384. Resolution: Incomplete Fix Version/s: (was: 2.4) Hi [~shubham.gupta], this can

[jira] [Commented] (NUTCH-2518) Must check return value of job.waitForCompletion()

2018-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433692#comment-16433692 ] ASF GitHub Bot commented on NUTCH-2518: --- sebastian-nagel closed pull request #307: NUTCH-2518

[jira] [Commented] (NUTCH-2518) Must check return value of job.waitForCompletion()

2018-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433691#comment-16433691 ] ASF GitHub Bot commented on NUTCH-2518: --- sebastian-nagel commented on issue #307: NUTCH-2518

[jira] [Commented] (NUTCH-2518) Must check return value of job.waitForCompletion()

2018-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433693#comment-16433693 ] ASF GitHub Bot commented on NUTCH-2518: --- sebastian-nagel commented on issue #307: NUTCH-2518

[jira] [Commented] (NUTCH-2566) Fix exception log messages

2018-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433698#comment-16433698 ] ASF GitHub Bot commented on NUTCH-2566: --- sebastian-nagel opened a new pull request #314: NUTCH-2566

[jira] [Commented] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries

2018-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433660#comment-16433660 ] ASF GitHub Bot commented on NUTCH-2533: --- sebastian-nagel opened a new pull request #313: NUTCH-2533

[jira] [Commented] (NUTCH-2551) NullPointerException in generator

2018-04-11 Thread Omkar Reddy (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433705#comment-16433705 ] Omkar Reddy commented on NUTCH-2551: [~wastl-nagel], [~HansBrende], [~lewi...@apache.org] please let

[jira] [Resolved] (NUTCH-1219) Upgrade all jobs to new MapReduce API

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-1219. Resolution: Implemented Fix Version/s: 1.15 This has been done by NUTCH-2375 for

[jira] [Commented] (NUTCH-2012) Merge parsechecker and indexchecker

2018-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433812#comment-16433812 ] ASF GitHub Bot commented on NUTCH-2012: --- sebastian-nagel closed pull request #310: NUTCH-2012 Merge

[jira] [Commented] (NUTCH-2566) Fix exception log messages

2018-04-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433870#comment-16433870 ] Hudson commented on NUTCH-2566: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3517 (See

[jira] [Commented] (NUTCH-2554) parserchecker can't fetch some URLs

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433822#comment-16433822 ] Sebastian Nagel commented on NUTCH-2554: Thanks, for the review! PR is merged, see NUTCH-2012. >

[jira] [Updated] (NUTCH-2554) parserchecker can't fetch some URLs

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2554: --- Fix Version/s: 1.15 > parserchecker can't fetch some URLs >

[jira] [Resolved] (NUTCH-2566) Fix exception log messages

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2566. Resolution: Fixed > Fix exception log messages > -- > >

[jira] [Commented] (NUTCH-2012) Merge parsechecker and indexchecker

2018-04-11 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433869#comment-16433869 ] Hudson commented on NUTCH-2012: --- SUCCESS: Integrated in Jenkins build Nutch-trunk #3517 (See

[jira] [Resolved] (NUTCH-2012) Merge parsechecker and indexchecker

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2012. Resolution: Fixed Assignee: Sebastian Nagel Fix Version/s: 1.15

[jira] [Resolved] (NUTCH-2145) parse/index checker fail to fetch valid percent-encoded URLs

2018-04-11 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel resolved NUTCH-2145. Resolution: Fixed Fix Version/s: 1.15 > parse/index checker fail to fetch valid

[jira] [Commented] (NUTCH-2566) Fix exception log messages

2018-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433824#comment-16433824 ] ASF GitHub Bot commented on NUTCH-2566: --- sebastian-nagel closed pull request #314: NUTCH-2566 Fix

[jira] [Commented] (NUTCH-2552) CrawlDbReader -topN fails

2018-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433789#comment-16433789 ] ASF GitHub Bot commented on NUTCH-2552: --- sebastian-nagel opened a new pull request #315: NUTCH-2552

[jira] [Updated] (NUTCH-2567) parse-metatags writes every meta tags twice

2018-04-11 Thread Gerard Bouchar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gerard Bouchar updated NUTCH-2567: -- Description: Using nutch with the following configuration, MetaTagsParser writes HTML meta

[jira] [Updated] (NUTCH-2567) parse-metatags writes every meta tags twice

2018-04-11 Thread Gerard Bouchar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gerard Bouchar updated NUTCH-2567: -- Description: Using nutch witch the following configuration, MetaTagsParser writes HTML meta

[jira] [Updated] (NUTCH-2567) parse-metatags writes every meta tags twice

2018-04-11 Thread Gerard Bouchar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gerard Bouchar updated NUTCH-2567: -- Description: Using nutch witch the following configuration, MetaTagsParser writes HTML meta

[jira] [Created] (NUTCH-2567) parse-metatags writes every meta tags twice

2018-04-11 Thread Gerard Bouchar (JIRA)
Gerard Bouchar created NUTCH-2567: - Summary: parse-metatags writes every meta tags twice Key: NUTCH-2567 URL: https://issues.apache.org/jira/browse/NUTCH-2567 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-2567) parse-metatags writes every meta tags twice

2018-04-11 Thread Gerard Bouchar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gerard Bouchar updated NUTCH-2567: -- Description: Using nutch witch the following configuration, MetaTagsParser writes HTML meta

[jira] [Updated] (NUTCH-2567) parse-metatags writes every meta tags twice

2018-04-11 Thread Gerard Bouchar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gerard Bouchar updated NUTCH-2567: -- Description: Using nutch witch the following configuration, MetaTagsParser writes HTML meta

[jira] [Commented] (NUTCH-2533) Injector: NullPointerException if seed URL dir contains non-file entries

2018-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434305#comment-16434305 ] ASF GitHub Bot commented on NUTCH-2533: --- lewismc commented on issue #312: NUTCH-2533 Injector:

[jira] [Commented] (NUTCH-2551) NullPointerException in generator

2018-04-11 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434492#comment-16434492 ] Hans Brende commented on NUTCH-2551: [~omkar20895] regarding your "quick fix", no, that shouldn't make

[jira] [Commented] (NUTCH-2551) NullPointerException in generator

2018-04-11 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434531#comment-16434531 ] ASF GitHub Bot commented on NUTCH-2551: --- HansBrende opened a new pull request #316: fix for

[jira] [Commented] (NUTCH-2551) NullPointerException in generator

2018-04-11 Thread Hans Brende (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434473#comment-16434473 ] Hans Brende commented on NUTCH-2551: [~omkar20895] I only got the error when running on a full-blown