[jira] [Resolved] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2012-06-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1024. -- Resolution: Fixed Committed for 1.6 in rev. 1349226. Thanks! Dynamically set

[jira] [Resolved] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-06-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1356. -- Resolution: Fixed Committed for 1.6 in rev. 1349230. Thanks Ferdy. ParseUtil

[jira] [Resolved] (NUTCH-1386) Headings filter not to add empty values

2012-06-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1386. -- Resolution: Fixed Committed for 1.6 in rev. 1349233. Headings filter not to

[jira] [Created] (NUTCH-1386) Headings filter not to add empty values

2012-06-12 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-1386: Summary: Headings filter not to add empty values Key: NUTCH-1386 URL: https://issues.apache.org/jira/browse/NUTCH-1386 Project: Nutch Issue Type: Bug

[jira] [Resolved] (NUTCH-1319) HostNormalizer

2012-06-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1319. -- Resolution: Fixed Committed for 1.6 in rev. 1349236. HostNormalizer

[jira] [Resolved] (NUTCH-1330) OutlinkDB to preserve back up

2012-06-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1330. -- Resolution: Fixed Committed for 1.6 in rev. 1349240. OutlinkDB to preserve

[jira] [Commented] (NUTCH-1330) OutlinkDB to preserve back up

2012-06-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293502#comment-13293502 ] Markus Jelsma commented on NUTCH-1330: -- Thanks Lewis! OutlinkDB to

[Nutch Wiki] Update of bin/nutch solrindex by MarkusJelsma

2012-06-12 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The bin/nutch solrindex page has been changed by MarkusJelsma: http://wiki.apache.org/nutch/bin/nutch%20solrindex?action=diffrev1=4rev2=5 Usage: {{{ - bin/nutch solrindex solr url

[jira] [Commented] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-06-12 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293510#comment-13293510 ] Ferdy Galema commented on NUTCH-1356: - Thanks. The parser threads you refer to, is

[jira] [Created] (NUTCH-1387) All parsers should respond to cancellation.

2012-06-12 Thread Ferdy Galema (JIRA)
Ferdy Galema created NUTCH-1387: --- Summary: All parsers should respond to cancellation. Key: NUTCH-1387 URL: https://issues.apache.org/jira/browse/NUTCH-1387 Project: Nutch Issue Type: Bug

[jira] [Updated] (NUTCH-1387) All parsers should respond to cancellation / interrupts.

2012-06-12 Thread Ferdy Galema (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy Galema updated NUTCH-1387: Component/s: parser Summary: All parsers should respond to cancellation / interrupts.

[jira] [Resolved] (NUTCH-1300) Indexer to normalize URL's

2012-06-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1300. -- Resolution: Fixed Committed for 1.6 in rev. 1349262. The -filter and -normalize options are

[jira] [Resolved] (NUTCH-1318) Parse time outs crash parsing fetcher

2012-06-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1318. -- Resolution: Duplicate Closing issue in favor of NUTCH-1387. Parse time outs

[jira] [Commented] (NUTCH-1330) OutlinkDB to preserve back up

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293545#comment-13293545 ] Hudson commented on NUTCH-1330: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Commented] (NUTCH-1024) Dynamically set fetchInterval by MIME-type

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293543#comment-13293543 ] Hudson commented on NUTCH-1024: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Commented] (NUTCH-1352) Improve regex urlfilters/normalizers synchronization

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293546#comment-13293546 ] Hudson commented on NUTCH-1352: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Commented] (NUTCH-1300) Indexer to normalize URL's

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293544#comment-13293544 ] Hudson commented on NUTCH-1300: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Commented] (NUTCH-1386) Headings filter not to add empty values

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293547#comment-13293547 ] Hudson commented on NUTCH-1386: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Commented] (NUTCH-1356) ParseUtil use ExecutorService instead of manually thread handling.

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293548#comment-13293548 ] Hudson commented on NUTCH-1356: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Commented] (NUTCH-1319) HostNormalizer

2012-06-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293549#comment-13293549 ] Hudson commented on NUTCH-1319: --- Integrated in nutch-trunk-maven #310 (See

[jira] [Created] (NUTCH-1388) Optionally maintain custom fetch interval despite AdaptiveFetchSchedule

2012-06-12 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-1388: Summary: Optionally maintain custom fetch interval despite AdaptiveFetchSchedule Key: NUTCH-1388 URL: https://issues.apache.org/jira/browse/NUTCH-1388 Project: Nutch

[jira] [Updated] (NUTCH-1388) Optionally maintain custom fetch interval despite AdaptiveFetchSchedule

2012-06-12 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1388: - Attachment: NUTCH-1388-1.6-1.patch Patch for 1.6 correcting the key in Nutch.java for this patch

Re: VOTE Apache Nutch 2.0 RC1

2012-06-12 Thread Lewis John Mcgibbney
Hi Everyone, I appreciate that most of the core dev's are using trunk, however I would appeal to you guys to at least check out the artifacts and check sigs, tests, license headers if possible. Although this does not fully satisfy the requirements of a thoroughly reviewed RC, hopefully the

Re: VOTE Apache Nutch 2.0 RC1

2012-06-12 Thread Mattmann, Chris A (388J)
Hey Lewis, I will get to this tonight, for sure. Thanks! Cheers, Chris On Jun 12, 2012, at 1:16 PM, Lewis John Mcgibbney wrote: Hi Everyone, I appreciate that most of the core dev's are using trunk, however I would appeal to you guys to at least check out the artifacts and check sigs,

Re: VOTE Apache Nutch 2.0 RC1

2012-06-12 Thread Lewis John Mcgibbney
Thank you On Tue, Jun 12, 2012 at 9:19 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Lewis, I will get to this tonight, for sure. Thanks! Cheers, Chris On Jun 12, 2012, at 1:16 PM, Lewis John Mcgibbney wrote: Hi Everyone, I appreciate that most of the core

[jira] [Created] (NUTCH-1389) parsechecker and indexchecker to report truncated content

2012-06-12 Thread Sebastian Nagel (JIRA)
Sebastian Nagel created NUTCH-1389: -- Summary: parsechecker and indexchecker to report truncated content Key: NUTCH-1389 URL: https://issues.apache.org/jira/browse/NUTCH-1389 Project: Nutch

[jira] [Updated] (NUTCH-1389) parsechecker and indexchecker to report truncated content

2012-06-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1389: Fix Version/s: 2.1 1.6 Aye this is a nice ticket Sebastian.

Re: bin/nutch -core

2012-06-12 Thread Lewis John Mcgibbney
Hi Sebastian, Looking at the patch on NUTCH-843 (what an issue btw, this has got to be one of the most fundamentally important core restructuring exercises Nutch has experienced) I see that the -core option seems to have been eradicated altogether with the replacement being to run in local mode

Nutch and IPv6

2012-06-12 Thread Lewis John Mcgibbney
Hi Guys, Can anyone please provide insight into what the recent roll out of IPv6 will/is having on Nutch? I have done minimal reading on this but it has been on my mind for some time, I quite honestly don't know the answer. Best Lewis -- Lewis

Re: VOTE Apache Nutch 2.0 RC1

2012-06-12 Thread Sebastian Nagel
Hi Lewis, my first steps with 2.0 (to be continued, still struggling). Two points (I'll try to give a final vote tomorrow): 1 some guidance would be nice. README.txt points to http://wiki.apache.org/nutch/NutchTutorial which refers to 1.x (I'm using

Re: VOTE Apache Nutch 2.0 RC1

2012-06-12 Thread Mattmann, Chris A (388J)
Hey Guys, #2 is probably reason enough for a respin. Lewis if you don't have time to do it before Thursday, I could probably give it a whack. Let me know. Cheers, Chris On Jun 12, 2012, at 3:33 PM, Sebastian Nagel wrote: Hi Lewis, my first steps with 2.0 (to be continued, still