[jira] Created: (NUTCH-296) Image Search

2006-06-03 Thread Thomas Delnoij (JIRA)
Image Search Key: NUTCH-296 URL: http://issues.apache.org/jira/browse/NUTCH-296 Project: Nutch Type: New Feature Reporter: Thomas Delnoij Priority: Minor Per the discussion in the Nutch-User mailing list, there is a wish for an Image Search

[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414598 ] Chris A. Mattmann commented on NUTCH-258: - Hi there, I believe that the fetcher halting on a LOG.Severe is the intended behavior of the system. The use of this

[jira] Commented: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-236?page=comments#action_12414599 ] Chris A. Mattmann commented on NUTCH-236: - Hi Jason, I'll have a patch prepared for this issue shortly, and I'll attach it to JIRA by this Sunday night. Thanks,

[jira] Updated: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-236?page=all ] Chris A. Mattmann updated NUTCH-236: Due Date: 05/Jun/06 PdfParser and RSSParser Log4j appender redirection -- Key: NUTCH-236

[jira] Updated: (NUTCH-187) Cannot start Nutch datanodes on Windows outside of a cygwin environment because of DF

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-187?page=all ] Chris A. Mattmann updated NUTCH-187: Summary: Cannot start Nutch datanodes on Windows outside of a cygwin environment because of DF (was: Run Nutch on Windows without Cygwin) Update

[jira] Created: (NUTCH-298) if a 404 for a robots.txt is returned no page is fetched at all from the host

2006-06-03 Thread Stefan Groschupf (JIRA)
if a 404 for a robots.txt is returned no page is fetched at all from the host - Key: NUTCH-298 URL: http://issues.apache.org/jira/browse/NUTCH-298 Project: Nutch Type: Bug Reporter:

[jira] Updated: (NUTCH-298) if a 404 for a robots.txt is returned no page is fetched at all from the host

2006-06-03 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-298?page=all ] Stefan Groschupf updated NUTCH-298: --- Attachment: fixNpeRobotRuleSet.patch fix the npe in RobotRuleSet happen in case we use a empthy RuleSet if a 404 for a robots.txt is returned no page is

RobotRuleSet

2006-06-03 Thread Stefan Groschupf
Hi, just posted a fix for a NPE in case a empty RobotRuleSet is used. The patch only contains a two lines fix, since I learned that this best way to get things committed sooner. :) However I really don't like the RobotRuleSet implementation since entries are copied between a arraylist and a

[jira] Created: (NUTCH-299) Bittorrent Parser

2006-06-03 Thread Hasan Diwan (JIRA)
Bittorrent Parser - Key: NUTCH-299 URL: http://issues.apache.org/jira/browse/NUTCH-299 Project: Nutch Type: New Feature Reporter: Hasan Diwan Priority: Minor BitTorrent information file parser -- This message is automatically generated

[jira] Updated: (NUTCH-299) Bittorrent Parser

2006-06-03 Thread Hasan Diwan (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-299?page=all ] Hasan Diwan updated NUTCH-299: -- Attachment: BitTorrent.jar The Parser plugin code Bittorrent Parser - Key: NUTCH-299 URL: