Interest in static source code analysis with sonarcloud.io

2019-12-12 Thread lewis john mcgibbney
Hi dev@, I posted on this topic previously but cannot find the thread. We'll it turns out that we have made a slight bit of progress. See https://issues.apache.org/jira/browse/INFRA-19474 for context. Is anyone else registered in sonarcloud.io? If so, can you please update INFRA-19474 as follows

[jira] [Assigned] (NUTCH-1863) Add JSON format dump output to readdb command

2019-12-03 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-1863: --- Assignee: Shashanka Balakuntala Srinivasa > Add JSON format dump out

[jira] [Commented] (NUTCH-1863) Add JSON format dump output to readdb command

2019-12-03 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987208#comment-16987208 ] Lewis John McGibbney commented on NUTCH-1863: - +1, please go ahead [~balaShashanka] >

[jira] [Updated] (NUTCH-1863) Add JSON format dump output to readdb command

2019-12-03 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1863: Description: Opening up the ability for third parties to consume Nutch crawldb

Static source code anlysis via sonarcloud.io

2019-11-08 Thread lewis john mcgibbney
Hi dev@, Quick heads up, I am working on sonarcloud.io analysis for Nutch master branch. Reasoning being, that I did this previously whilst we hosted SonarQube internally at Apache... but didn't really do anything about it. This is a renewed attempt to study the improvements which can be made on

[jira] [Comment Edited] (NUTCH-2677) Update Jest client in indexer-elastic-rest plugin

2019-10-30 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963221#comment-16963221 ] Lewis John McGibbney edited comment on NUTCH-2677 at 10/30/19 4:58 PM

[jira] [Commented] (NUTCH-2677) Update Jest client in indexer-elastic-rest plugin

2019-10-30 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963221#comment-16963221 ] Lewis John McGibbney commented on NUTCH-2677: - [~balaShashanka] bq. can i work on this issue

[SECURITY] Nutch 2.3.1 affected by downstream dependency CVE-2016-6809

2019-10-14 Thread lewis john mcgibbney
Title: Nutch 2.3.1 affected by downstream dependency CVE-2016-6809 Vulnerable Versions: 2.3.1 (1.16 is not vulnerable) Disclosure date: 2018-10-22 Credit: Pierre Ernst, Salesforce Summary: Remote Code Execution in Apache Nutch 2.3.1 when crawling web site containing malicious content

[jira] [Work stopped] (NUTCH-2307) Implement Missing NutchServer REST API Tests

2019-10-13 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2307 stopped by Lewis John McGibbney. --- > Implement Missing NutchServer REST API Te

[jira] [Assigned] (NUTCH-2307) Implement Missing NutchServer REST API Tests

2019-10-13 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2307: --- Assignee: Lewis John McGibbney > Implement Missing NutchServer REST

[jira] [Work stopped] (NUTCH-1709) Generated classes o.a.n.storage.Host and o.a.n.storage.ProtocolStatus contain methods not defined in source .avsc

2019-10-13 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1709 stopped by Lewis John McGibbney. --- > Generated classes o.a.n.storage.H

[jira] [Updated] (NUTCH-2722) Fetch dependencies via https

2019-10-13 Thread Lewis John McGibbney (Jira)
[ https://issues.apache.org/jira/browse/NUTCH-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2722: Fix Version/s: (was: 2.5) > Fetch dependencies via ht

Re: [VOTE] Release Apache Nutch 1.16 RC#1

2019-10-03 Thread Lewis John McGibbney
Hi Seb, Sigs check out fine gpg --verify apache-nutch-1.16-src.tar.gz.asc apache-nutch-1.16-src.tar.gz gpg: Signature made Wed Oct 2 08:07:47 2019 PDT gpg:using RSA key FF82A487F92D70E52FF77E0AC66EA7B7DB0A9C6D gpg: Good signature from "Sebastian Nagel " [unknown] gpg: WARNING:

[jira] [Comment Edited] (NUTCH-2669) Reliable solution for javax.ws packaging.type

2019-03-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792267#comment-16792267 ] Lewis John McGibbney edited comment on NUTCH-2669 at 3/14/19 2:32 AM

[jira] [Updated] (NUTCH-2669) Reliable solution for javax.ws packaging.type

2019-03-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2669: Priority: Blocker (was: Major) > Reliable solution for javax.ws packaging.t

[jira] [Updated] (NUTCH-2669) Reliable solution for javax.ws packaging.type

2019-03-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2669: Fix Version/s: (was: 2.5) 2.4 > Reliable solut

[jira] [Commented] (NUTCH-2669) Reliable solution for javax.ws packaging.type

2019-03-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792267#comment-16792267 ] Lewis John McGibbney commented on NUTCH-2669: - [~wastl-nagel] has become a major pain whilst

[jira] [Commented] (NUTCH-2498) Docker files are outdated

2019-03-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16788812#comment-16788812 ] Lewis John McGibbney commented on NUTCH-2498: - [~dhirajforyou] thank you for reporting

Mavenize Nutch Build as Google Summer of Code

2019-03-09 Thread lewis john mcgibbney
Hi user@ and dev@, If you are a student and would like to tackle the task of Mavenizing the Nutch master build please get in touch with me here, directly or comment on the following issue https://issues.apache.org/jira/plugins/servlet/mobile#issue/NUTCH-2292 Thank you Lewis --

[jira] [Resolved] (NUTCH-2698) Remove sonar build task from build.xml

2019-03-05 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2698. - Resolution: Fixed Thanks [~wastl-nagel] for review. > Remove sonar build t

[jira] [Updated] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2019-03-05 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2292: Labels: gsoc2019 (was: ) > Mavenize the build for nutch-core and nutch-plug

[jira] [Commented] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2019-03-02 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782482#comment-16782482 ] Lewis John McGibbney commented on NUTCH-2292: - Hi [~wastl-nagel] long story short... we need

[jira] [Created] (NUTCH-2698) Remove sonar build task from build.xml

2019-03-02 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2698: --- Summary: Remove sonar build task from build.xml Key: NUTCH-2698 URL: https://issues.apache.org/jira/browse/NUTCH-2698 Project: Nutch Issue

[jira] [Commented] (NUTCH-2697) Upgrade Ivy to fix the issue of an unset packaging.type property.

2019-03-02 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782477#comment-16782477 ] Lewis John McGibbney commented on NUTCH-2697: - Apologies folks, thank you [~wastl-nagel

[jira] [Commented] (NUTCH-2679) "ant eclipse" failed as eclipse binary is moved

2018-12-12 Thread lewis john mcgibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718828#comment-16718828 ] lewis john mcgibbney commented on NUTCH-2679: - Can we use https://search.maven.org/artifact

[jira] [Commented] (NUTCH-2292) Mavenize the build for nutch-core and nutch-plugins

2018-11-30 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704793#comment-16704793 ] Lewis John McGibbney commented on NUTCH-2292: - I'm going to take this work on from where

Maven vs Gradle for Nutch Build System

2018-11-29 Thread lewis john mcgibbney
Hi Folks, Seb and I were talking build systems this week. I wanted to get a feel for what we as a PMC would rather use for the next Nutch build lifecycle. Personall I've used Maven for many of y Java projects however I have also really enjoyed working with Gradle. I would like t start working on

[jira] [Created] (NUTCH-2677) Update Jest client in indexer-elastic-rest plugin

2018-11-28 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2677: --- Summary: Update Jest client in indexer-elastic-rest plugin Key: NUTCH-2677 URL: https://issues.apache.org/jira/browse/NUTCH-2677 Project: Nutch

[jira] [Updated] (NUTCH-2667) Update Tika and Commons Collections 4

2018-10-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2667: Description: Tika and Commons Collections 4 need to be updated. This issue needs

[jira] [Created] (NUTCH-2667) Update Tika and Commons Collections 4

2018-10-23 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2667: --- Summary: Update Tika and Commons Collections 4 Key: NUTCH-2667 URL: https://issues.apache.org/jira/browse/NUTCH-2667 Project: Nutch Issue Type

[jira] [Resolved] (NUTCH-2199) Documentation for Nutch 2.X REST API

2018-10-18 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2199. - Resolution: Fixed > Documentation for Nutch 2.X REST

[jira] [Commented] (NUTCH-1861) Implement POP3 Protocol

2018-08-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594347#comment-16594347 ] Lewis John McGibbney commented on NUTCH-1861: - [~yossi] the existing JavaMail license

[jira] [Commented] (NUTCH-1861) Implement POP3 Protocol

2018-08-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594141#comment-16594141 ] Lewis John McGibbney commented on NUTCH-1861: - Hi [~yossi] thanks for response bq. Isn't

[jira] [Commented] (NUTCH-1861) Implement POP3 Protocol

2018-08-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16593875#comment-16593875 ] Lewis John McGibbney commented on NUTCH-1861: - Hi Folks, using commons-net I was thinking

[jira] [Assigned] (NUTCH-1861) Implement POP3 Protocol

2018-08-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-1861: --- Assignee: Lewis John McGibbney > Implement POP3 Proto

[jira] [Resolved] (NUTCH-2633) Fix deprecation warnings when building Nutch master branch under JDK 10.0.2+13

2018-08-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2633. - Resolution: Fixed We can address Ivy issues in a separate patch. >

[jira] [Created] (NUTCH-2633) Fix deprecation warnings when building Nutch master branch under JDK 10.0.2+13

2018-08-09 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2633: --- Summary: Fix deprecation warnings when building Nutch master branch under JDK 10.0.2+13 Key: NUTCH-2633 URL: https://issues.apache.org/jira/browse/NUTCH-2633

[jira] [Resolved] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2018-08-01 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-. - Resolution: Fixed Thank you [~alaffet] and everyone else for attempting to fix

[jira] [Commented] (NUTCH-2222) re-fetch deletes all metadata except _csh_ and _rs_

2018-07-31 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564745#comment-16564745 ] Lewis John McGibbney commented on NUTCH-: - [~alaffet] thank you, can you please provide

[jira] [Commented] (NUTCH-2512) Nutch 1.14 does not work under JDK9

2018-05-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484563#comment-16484563 ] Lewis John McGibbney commented on NUTCH-2512: - See my comment above... > Nutch 1.14 d

[jira] [Updated] (NUTCH-2539) Not correct naming of db.url.filters and db.url.normalizers in nutch-default.xml

2018-04-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2539: Fix Version/s: 1.15 > Not correct naming of db.url.filters and db.url.normaliz

[jira] [Resolved] (NUTCH-2539) Not correct naming of db.url.filters and db.url.normalizers in nutch-default.xml

2018-04-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2539. - Resolution: Fixed > Not correct naming of db.url.filters and db.url.normaliz

[jira] [Resolved] (NUTCH-2550) Fetcher fails to follow redirects

2018-04-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2550. - Resolution: Fixed > Fetcher fails to follow redire

[jira] [Closed] (NUTCH-2545) Upgrade to Any23 2.2

2018-04-02 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-2545. --- > Upgrade to Any23 2.2 > > > Key

[jira] [Resolved] (NUTCH-2545) Upgrade to Any23 2.2

2018-04-02 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2545. - Resolution: Fixed > Upgrade to Any23

[jira] [Resolved] (NUTCH-2536) GeneratorReducer.count is a static variable

2018-03-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2536. - Resolution: Fixed > GeneratorReducer.count is a static varia

[jira] [Created] (NUTCH-2545) Upgrade to Any23 2.2

2018-03-27 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2545: --- Summary: Upgrade to Any23 2.2 Key: NUTCH-2545 URL: https://issues.apache.org/jira/browse/NUTCH-2545 Project: Nutch Issue Type: Improvement

[jira] [Work stopped] (NUTCH-2516) Hadoop imports use wildcards

2018-03-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2516 stopped by Lewis John McGibbney. --- > Hadoop imports use wildca

[jira] [Commented] (NUTCH-2518) Must check return value of job.waitForCompletion()

2018-03-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415429#comment-16415429 ] Lewis John McGibbney commented on NUTCH-2518: - Yes please do [~omkar20895] Thank you > M

[jira] [Work started] (NUTCH-2516) Hadoop imports use wildcards

2018-03-14 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-2516 started by Lewis John McGibbney. --- > Hadoop imports use wildca

[jira] [Commented] (NUTCH-2517) mergesegs corrupts segment data

2018-03-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398025#comment-16398025 ] Lewis John McGibbney commented on NUTCH-2517: - Correct [~wastl-nagel] > mergesegs corru

[jira] [Resolved] (NUTCH-2427) Remove all the Hadoop wildcard imports.

2018-03-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2427. - Resolution: Duplicate > Remove all the Hadoop wildcard impo

Re: Upgrade to Hadoop 3

2018-03-13 Thread Lewis John McGibbney
Hi Seb, On 2018/03/12 11:00:52, Sebastian Nagel wrote: > Hi, > > > seeing as we have just merged in the 'new' MR patch > > yep, but there's still something to do (NUTCH-2517, ACK, this needs more testing. > NUTCH-2518). I honestly didn't see this come through

[jira] [Commented] (NUTCH-2518) Must check return value of job.waitForCompletion()

2018-03-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397808#comment-16397808 ] Lewis John McGibbney commented on NUTCH-2518: - Hi [~wastl-nagel] I think we just overwrote

Re: Upgrade to Hadoop 3

2018-03-13 Thread Lewis John McGibbney
Hi RRK, Response inline On 2018/03/08 01:46:18, BlackIce wrote: > > Why do you say "Is it too early"? Could you please elaborate on this, thnx. > What I mean is that maybe a lot of people have not upgraded existing infrastructure to Hadoop 3 yet. People don't usually

[jira] [Commented] (NUTCH-2517) mergesegs corrupts segment data

2018-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391440#comment-16391440 ] Lewis John McGibbney commented on NUTCH-2517: - Can anyone else confirm the above

[jira] [Commented] (NUTCH-2517) mergesegs corrupts segment data

2018-03-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391430#comment-16391430 ] Lewis John McGibbney commented on NUTCH-2517: - Hi [~mebbinghaus] I ran it from the Docker

Upgrade to Hadoop 3

2018-03-07 Thread lewis john mcgibbney
Hi Folks, Before we get started with GSoC again, and seeing as we have just merged in the 'new' MR patch, I wonder if folks are partial to migration to Hadoop 3? Is it too early? Comments? Lewis -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc

[jira] [Commented] (NUTCH-2517) mergesegs corrupts segment data

2018-03-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388718#comment-16388718 ] Lewis John McGibbney commented on NUTCH-2517: - Should be noted that I didn't run this from

[jira] [Assigned] (NUTCH-2517) mergesegs corrupts segment data

2018-03-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2517: --- Assignee: Lewis John McGibbney > mergesegs corrupts segment d

[jira] [Comment Edited] (NUTCH-2517) mergesegs corrupts segment data

2018-03-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388650#comment-16388650 ] Lewis John McGibbney edited comment on NUTCH-2517 at 3/6/18 11:09 PM

[jira] [Comment Edited] (NUTCH-2517) mergesegs corrupts segment data

2018-03-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388650#comment-16388650 ] Lewis John McGibbney edited comment on NUTCH-2517 at 3/6/18 10:50 PM

[jira] [Comment Edited] (NUTCH-2517) mergesegs corrupts segment data

2018-03-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388650#comment-16388650 ] Lewis John McGibbney edited comment on NUTCH-2517 at 3/6/18 10:49 PM

[jira] [Commented] (NUTCH-2517) mergesegs corrupts segment data

2018-03-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388650#comment-16388650 ] Lewis John McGibbney commented on NUTCH-2517: - I cannot reproduce this... see below for tests

[jira] [Commented] (NUTCH-2517) mergesegs corrupts segment data

2018-03-05 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386469#comment-16386469 ] Lewis John McGibbney commented on NUTCH-2517: - Thank you [~mebbinghaus] for reporting

[jira] [Updated] (NUTCH-2517) mergesegs corrupts segment data

2018-03-05 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2517: Priority: Blocker (was: Major) > mergesegs corrupts segment d

[jira] [Updated] (NUTCH-2517) mergesegs corrupts segment data

2018-03-04 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2517: Fix Version/s: 1.15 > mergesegs corrupts segment d

Re: Apache Nutch - Exception since last commit

2018-03-03 Thread lewis john mcgibbney
Hello Marco, Thank you very much for the information. Please register the issue on jira. I will personally look into it and make best efforts to fix the bug if one exists. Please provide details as to how I can reproduce. Thanks, Lewis On Sat, Mar 3, 2018 at 09:14 Marco Ebbinghaus

[jira] [Updated] (NUTCH-2516) Hadoop imports use wildcards

2018-02-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2516: Description: Right now the Hadoop imports use wildcards all over the place. We

[jira] [Created] (NUTCH-2516) Hadoop imports use wildcards

2018-02-27 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-2516: --- Summary: Hadoop imports use wildcards Key: NUTCH-2516 URL: https://issues.apache.org/jira/browse/NUTCH-2516 Project: Nutch Issue Type

[jira] [Assigned] (NUTCH-2375) Upgrade the code base from org.apache.hadoop.mapred to org.apache.hadoop.mapreduce

2018-02-27 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-2375: --- Assignee: Lewis John McGibbney > Upgrade the code base f

[jira] [Commented] (NUTCH-2512) Nutch 1.14 does not work under JDK9

2018-02-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373173#comment-16373173 ] Lewis John McGibbney commented on NUTCH-2512: - Hi [~Bl4ck1c3] thanks for logging the issue

[jira] [Updated] (NUTCH-2512) Nutch 1.14 does not work under JDK9

2018-02-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2512: Fix Version/s: 1.15 > Nutch 1.14 does not work under J

[jira] [Resolved] (NUTCH-2489) Dependency collision with lucene-analyzers-common in scoring-similarity plugin

2018-02-07 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2489. - Resolution: Fixed Thank you [~yossi] > Dependency collision with luc

[jira] [Updated] (NUTCH-2489) Dependency collision with lucene-analyzers-common in scoring-similarity plugin

2018-02-07 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2489: Fix Version/s: 1.15 > Dependency collision with lucene-analyzers-common in scor

[jira] [Resolved] (NUTCH-2508) Misleading documentation about http.proxy.exception.list

2018-01-31 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2508. - Resolution: Fixed Thank you [~mfeltscher] > Misleading documentation ab

[jira] [Updated] (NUTCH-2508) Misleading documentation about http.proxy.exception.list

2018-01-31 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2508: Fix Version/s: 1.15 > Misleading documentation about http.proxy.exception.l

[jira] [Commented] (NUTCH-2369) Create a new GraphGenerator Tool for writing Nutch Records as a Full Web Graph

2018-01-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341326#comment-16341326 ] Lewis John McGibbney commented on NUTCH-2369: - Hi [~markus17] the idea here was to export full

[jira] [Updated] (NUTCH-2369) Create a new GraphGenerator Tool for writing Nutch Records as a Full Web Graph

2018-01-24 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2369: Labels: gsoc2017 gsoc2018 (was: gsoc2017) > Create a new GraphGenerator T

[jira] [Resolved] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering

2018-01-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2502. - Resolution: Fixed Thank you [~mfeltscher] > Any23 Plugin: Add Content-T

[jira] [Updated] (NUTCH-2502) Any23 Plugin: Add Content-Type filtering

2018-01-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2502: Fix Version/s: 1.15 > Any23 Plugin: Add Content-Type filter

[jira] [Updated] (NUTCH-2499) Elastic REST Indexer: Duplicate values

2018-01-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2499: Fix Version/s: 1.15 > Elastic REST Indexer: Duplicate val

[jira] [Resolved] (NUTCH-2499) Elastic REST Indexer: Duplicate values

2018-01-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2499. - Resolution: Fixed Thank you [~mfeltscher]   > Elastic REST Indexer: Duplic

[jira] [Resolved] (NUTCH-2503) Add option to run tests for a single plugin

2018-01-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2503. - Resolution: Fixed Thank you [~mfeltscher] > Add option to run tests for a sin

[jira] [Updated] (NUTCH-2503) Add option to run tests for a single plugin

2018-01-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2503: Fix Version/s: 1.15 > Add option to run tests for a single plu

[jira] [Resolved] (NUTCH-2497) Elastic REST Indexer: Allow multiple hosts

2018-01-18 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2497. - Resolution: Fixed Thank you [~mfeltscher] > Elastic REST Indexer: Allow multi

[jira] [Updated] (NUTCH-2497) Elastic REST Indexer: Allow multiple hosts

2018-01-18 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2497: Fix Version/s: 1.15 > Elastic REST Indexer: Allow multiple ho

[jira] [Resolved] (NUTCH-2461) Generate passes the data to when maxCount == 0

2018-01-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2461. - Resolution: Fixed Thank you [~semyon.semyo...@mail.com] > Generate pas

[jira] [Commented] (NUTCH-2321) Indexing filter checker leaks threads

2018-01-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16326453#comment-16326453 ] Lewis John McGibbney commented on NUTCH-2321: - Thank you [~jurian] > Indexing filter chec

[jira] [Resolved] (NUTCH-2321) Indexing filter checker leaks threads

2018-01-15 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2321. - Resolution: Fixed > Indexing filter checker leaks thre

[jira] [Updated] (NUTCH-1129) Any23 Nutch plugin

2018-01-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1129: Fix Version/s: (was: 2.5) 1.15 > Any23 Nutch plu

[jira] [Resolved] (NUTCH-1129) Any23 Nutch plugin

2018-01-11 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1129. - Resolution: Fixed Thank you [~mfeltscher] this is great > Any23 Nutch plu

[jira] [Resolved] (NUTCH-2493) Add configuration parameter for sitemap processing to crawler script

2018-01-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2493. - Resolution: Fixed Thank you [~mfeltscher] > Add configuration parame

[jira] [Updated] (NUTCH-2493) Add configuration parameter for sitemap processing to crawler script

2018-01-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2493: Fix Version/s: 1.15 > Add configuration parameter for sitemap processing to craw

[jira] [Resolved] (NUTCH-2324) Issue in setting default linkdb path

2018-01-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2324. - Resolution: Fixed Thank you [~sachin] > Issue in setting default linkdb p

[jira] [Updated] (NUTCH-2324) Issue in setting default linkdb path

2018-01-09 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2324: Fix Version/s: 1.15 > Issue in setting default linkdb p

[jira] [Resolved] (NUTCH-2492) Add more configuration parameters to crawl script

2018-01-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2492. - Resolution: Fixed Thank you [~mfeltscher] > Add more configuration paramet

[jira] [Updated] (NUTCH-2492) Add more configuration parameters to crawl script

2018-01-08 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2492: Fix Version/s: 1.15 > Add more configuration parameters to crawl scr

[jira] [Updated] (NUTCH-2490) Sitemap processing: Sitemap index files not working

2018-01-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2490: Fix Version/s: 1.15 > Sitemap processing: Sitemap index files not work

[jira] [Resolved] (NUTCH-2490) Sitemap processing: Sitemap index files not working

2018-01-03 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2490. - Resolution: Fixed Thank you [~mfeltscher] > Sitemap processing: Sitemap in

<    1   2   3   4   5   6   7   8   9   10   >