[jira] [Created] (NUTCH-1590) [SECURITY] Frame injection vulnerability in published Javadoc

2013-06-20 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1590: --- Summary: [SECURITY] Frame injection vulnerability in published Javadoc Key: NUTCH-1590 URL: https://issues.apache.org/jira/browse/NUTCH-1590 Project: Nu

[VOTE] Apache Nutch 1.7 Release Candidate

2013-06-20 Thread lewis john mcgibbney
Hi, Please VOTE on the release of the Apache Nutch 1.7 artifacts. As always, we solved a bunch of issues: http://s.apache.org/1zE SVN source tag: http://svn.apache.org/repos/asf/nutch/tags/release-1.7/ Staging repo: https://repository.apache.org/content/repositories/orgapachenutch-044/ Release

[Nutch Wiki] Trivial Update of "Release_HOWTO" by LewisJohnMcgibbney

2013-06-20 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Release_HOWTO" page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/Release_HOWTO?action=diff&rev1=21&rev2=22 1. Create a new release in JIRA. If you

[Nutch Wiki] Trivial Update of "Release_HOWTO" by LewisJohnMcgibbney

2013-06-20 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Release_HOWTO" page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/Release_HOWTO?action=diff&rev1=20&rev2=21 1. Tag it. {{{svn copy h

[jira] [Commented] (NUTCH-1585) Ensure duplicate tags do not exist in microformat-reltag tag set.

2013-06-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689675#comment-13689675 ] Hudson commented on NUTCH-1585: --- Integrated in Nutch-nutchgora #654 (See [https://builds.ap

[jira] [Resolved] (NUTCH-1475) Index-More Plugin -- A better fall back value for date field

2013-06-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1475. - Resolution: Fixed Resolving this issue to clear up Jira report. Final parts of po

[jira] [Created] (NUTCH-1589) Port NUTCH-1475 Index-More Plugin -- A better fall back value for date field to 2.x

2013-06-20 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1589: --- Summary: Port NUTCH-1475 Index-More Plugin -- A better fall back value for date field to 2.x Key: NUTCH-1589 URL: https://issues.apache.org/jira/browse/NUTCH-1589

[jira] [Updated] (NUTCH-1475) Index-More Plugin -- A better fall back value for date field

2013-06-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1475: Fix Version/s: (was: 2.3) > Index-More Plugin -- A better fall back value f

[jira] [Resolved] (NUTCH-1245) URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again

2013-06-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1245. - Resolution: Fixed Resolving to clear up Jira for 1.7 release report. Please see N

[jira] [Created] (NUTCH-1588) Port NUTCH-1245 URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again to 2.x

2013-06-20 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1588: --- Summary: Port NUTCH-1245 URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again to 2.x Key: NUTCH-1588 URL

[jira] [Updated] (NUTCH-1588) Port NUTCH-1245 URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again to 2.x

2013-06-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1588: Description: A document gone with 404 after db.fetch.interval.max (90 days) has pas

[jira] [Updated] (NUTCH-1585) Ensure duplicate tags do not exist in microformat-reltag tag set.

2013-06-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1585: Fix Version/s: (was: 1.8) 1.7 > Ensure duplicate tags do

[jira] [Closed] (NUTCH-1585) Ensure duplicate tags do not exist in microformat-reltag tag set.

2013-06-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney closed NUTCH-1585. --- > Ensure duplicate tags do not exist in microformat-reltag tag set. > ---

[jira] [Resolved] (NUTCH-1585) Ensure duplicate tags do not exist in microformat-reltag tag set.

2013-06-20 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1585. - Resolution: Fixed Committed @revisions 1495171 and 1495173 in trunk Committed @re

Nutch 2.x - Adding extra job in DbUpdateJob

2013-06-20 Thread Ahmet Emre Aladağ
Hi, My problem: integrating Giraph LinkRank implementation in Nutch 2.x DbUpdate stage. For Nutch 2.x, In DbUpdateJob, DbUpdateMapper and DbUpdateReducer are applied to the WebPage objects. Some scoring filters are activated as well. I want to export this data as nodes: http://www.google.c

[jira] [Commented] (NUTCH-1564) AdaptiveFetchSchedule: sync_delta forces immediate refetch for documents not modified

2013-06-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689067#comment-13689067 ] Markus Jelsma commented on NUTCH-1564: -- Yes, we've noticed this as well and disabled

[jira] [Updated] (NUTCH-1342) Read time out protocol-http

2013-06-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1342: - Priority: Major (was: Critical) > Read time out protocol-http > ---

[jira] [Updated] (NUTCH-1475) Index-More Plugin -- A better fall back value for date field

2013-06-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1475: - Fix Version/s: (was: 1.8) 1.7 > Index-More Plugin -- A better fall bac

[jira] [Updated] (NUTCH-1583) Headings does not support multiValued headings

2013-06-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1583: - Affects Version/s: (was: 1.7) 1.6 Fix Version/s: (was: 1.8)

[jira] [Resolved] (NUTCH-1583) Headings does not support multiValued headings

2013-06-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1583. -- Resolution: Fixed Closed. There is no patch for 2.x because the plugin isn't there. Anyone can

[jira] [Updated] (NUTCH-1245) URL gone with 404 after db.fetch.interval.max stays db_unfetched in CrawlDb and is generated over and over again

2013-06-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1245: - Fix Version/s: (was: 1.8) 1.7 > URL gone with 404 after db.fetch.inter

[jira] [Commented] (NUTCH-1586) Non-db_success records should have interval.max

2013-06-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689050#comment-13689050 ] Markus Jelsma commented on NUTCH-1586: -- Btw: if using an adaptive fetch schedule you

[jira] [Updated] (NUTCH-1586) Non-db_success records should have interval.max

2013-06-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1586: - Attachment: NUTCH-1586.patch Patch for trunk. This introduces two news configuration directives (

[jira] [Commented] (NUTCH-1583) Headings does not support multiValued headings

2013-06-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689038#comment-13689038 ] Markus Jelsma commented on NUTCH-1583: -- Committed for trunk in rev. 1494894.

[jira] [Created] (NUTCH-1587) misspelled property "threshold" in conf/log4j.properties

2013-06-20 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-1587: -- Summary: misspelled property "threshold" in conf/log4j.properties Key: NUTCH-1587 URL: https://issues.apache.org/jira/browse/NUTCH-1587 Project: Nutch Is