[jira] Commented: (NUTCH-88) Enhance ParserFactory plugin selection policy

2005-09-08 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-88?page=comments#action_12322997 ] Chris A. Mattmann commented on NUTCH-88: I'm currently working on writing a proposal for addressing this issue. The proposal will include the following information: *

[jira] Created: (NUTCH-112) Link in cached.jsp page to cached content is an absolute link

2005-10-15 Thread Chris A. Mattmann (JIRA)
Link in cached.jsp page to cached content is an absolute link - Key: NUTCH-112 URL: http://issues.apache.org/jira/browse/NUTCH-112 Project: Nutch Type: Bug Components: web gui Versions: 0.7.1, 0.7,

[jira] Updated: (NUTCH-112) Link in cached.jsp page to cached content is an absolute link

2005-10-15 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-112?page=all ] Chris A. Mattmann updated NUTCH-112: Attachment: NUTCH-112.Mattmann.patch.txt The small patch to fix NUTCH-112 Link in cached.jsp page to cached content is an absolute link

[jira] Commented: (NUTCH-88) Enhance ParserFactory plugin selection policy

2005-10-19 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-88?page=comments#action_12332535 ] Chris A. Mattmann commented on NUTCH-88: * Add a static method in MimeType that parse a content-type string and remove all it's parameters (keeping only primary type

[jira] Commented: (NUTCH-88) Enhance ParserFactory plugin selection policy

2005-10-19 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-88?page=comments#action_12332550 ] Chris A. Mattmann commented on NUTCH-88: Hey Doug, Yeah I think you're right on this one. I'll work with Jerome (if he needs any help) to get this fixed ASAP.

[jira] Commented: (NUTCH-133) ParserFactory does not work as expected

2005-12-06 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-133?page=comments#action_12359502 ] Chris A. Mattmann commented on NUTCH-133: - From an initial quick glance at much of the patch, I see that many of existing working classes are just rewritten, given

[jira] Commented: (NUTCH-133) ParserFactory does not work as expected

2005-12-07 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-133?page=comments#action_12359603 ] Chris A. Mattmann commented on NUTCH-133: - Just another comment on the issue. The reported bug listed as the following: Problem: Actually the configuration of parser

[jira] Commented: (NUTCH-133) ParserFactory does not work as expected

2005-12-07 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-133?page=comments#action_12359645 ] Chris A. Mattmann commented on NUTCH-133: - Hi Stefan, Thanks for your reply. I actually like a lot of your proposed changes having to do with the MimeType cleansing

[jira] Commented: (NUTCH-133) ParserFactory does not work as expected

2005-12-07 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-133?page=comments#action_12359679 ] Chris A. Mattmann commented on NUTCH-133: - Hi Doug, I like this idea for the getContentType method. In general, I completely agree that the server provided content

[jira] Commented: (NUTCH-34) Parsing different content formats

2005-12-11 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-34?page=comments#action_12360147 ] Chris A. Mattmann commented on NUTCH-34: Hi Folks, Just wondering: is this issue taken care of by NUTCH-88? It would seem at least some elements of it were (i.e., the

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-13 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360389 ] Chris A. Mattmann commented on NUTCH-139: - According to Andrzej: I agree, too. Perhaps we should use the names as they appear in the Dublin Core for those properties

[jira] Updated: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-13 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=all ] Chris A. Mattmann updated NUTCH-139: Priority: Minor (was: Major) Standard metadata property names in the ParseData metadata --

[jira] Created: (NUTCH-140) Add alias capability in parse-plugins.xml file that allows mimeType-extensionId mapping

2005-12-13 Thread Chris A. Mattmann (JIRA)
Add alias capability in parse-plugins.xml file that allows mimeType-extensionId mapping Key: NUTCH-140 URL: http://issues.apache.org/jira/browse/NUTCH-140 Project: Nutch Type:

[jira] Commented: (NUTCH-140) Add alias capability in parse-plugins.xml file that allows mimeType-extensionId mapping

2005-12-16 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-140?page=comments#action_12360643 ] Chris A. Mattmann commented on NUTCH-140: - Hey Stefan, Mainly, it would be to make them more human readable. Also, if I go in there and define all the aliases for

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-17 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360681 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, Jerome, I'm confused as to why all of the constant names have X_nutch in them. I'd expect to see something like

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360929 ] Chris A. Mattmann commented on NUTCH-139: - Hi Andrzej, I have an objection, in fact I think the patches miss the main point of using of prefixed property names.

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2005-12-20 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12360931 ] Chris A. Mattmann commented on NUTCH-139: - Hmm, Okay, I just finished reading the rest of the comments :-) Sorry, just woke up out here in Los Angeles. Okay, I

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361923 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361924 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361925 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361926 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-05 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12361927 ] Chris A. Mattmann commented on NUTCH-139: - Hi Doug, While it's true that content-length can be computed from the Content's data, wouldn't it also be nice to have it

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-19 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12363352 ] Chris A. Mattmann commented on NUTCH-139: - Hi Jerome, org.apache.nutch.parse.ParseData * The constructor becomes ParseData(ParseStatus, String, Outlink[],

[jira] Commented: (NUTCH-139) Standard metadata property names in the ParseData metadata

2006-01-26 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-139?page=comments#action_12364116 ] Chris A. Mattmann commented on NUTCH-139: - Just to add to Jerome's last comment, I think the key here is simplicity. As a software developer, and ultimately as an end

[jira] Commented: (NUTCH-190) ParseUtil drops reason for failed parse

2006-01-26 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-190?page=comments#action_12364151 ] Chris A. Mattmann commented on NUTCH-190: - +1 i think that this is a needed patch. ParseUtil drops reason for failed parse ---

[jira] Resolved: (NUTCH-149) outlinks not shown properly in cached.jsp

2006-02-07 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-149?page=all ] Chris A. Mattmann resolved NUTCH-149: - Resolution: Invalid Closed at request of the reporter: not a bug. outlinks not shown properly in cached.jsp

[jira] Commented: (NUTCH-140) Add alias capability in parse-plugins.xml file that allows mimeType-extensionId mapping

2006-02-14 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-140?page=comments#action_12366376 ] Chris A. Mattmann commented on NUTCH-140: - Hi Folks, I've went ahead and created an initial patch for this issue. I'll be attaching it to JIRA within the next day

[jira] Updated: (NUTCH-140) Add alias capability in parse-plugins.xml file that allows mimeType-extensionId mapping

2006-02-15 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-140?page=all ] Chris A. Mattmann updated NUTCH-140: Attachment: NUTCH-140.20051502.patch.txt An initial patch for NUTCH-140 for everyone's review. Add alias capability in parse-plugins.xml file that

[jira] Updated: (NUTCH-210) Context.xml file for Nutch web application

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-210?page=all ] Chris A. Mattmann updated NUTCH-210: Attachment: NUTCH-210.Mattmann.patch.txt Initial NUTCH-210 patch. Uses an XSL stylesheet to read searcher., plugin., extension.clustering and

[jira] Commented: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-236?page=comments#action_12371664 ] Chris A. Mattmann commented on NUTCH-236: - I'd be happy to make these changes and submit a patch, but I wanted to know it the change would be welcome first. I think

[jira] Commented: (NUTCH-220) PDF Box can't parse document: java.lang.NullPointerException

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-220?page=comments#action_12371669 ] Chris A. Mattmann commented on NUTCH-220: - Could you provide some more detail on this issue? For instance, a stack trace here would be quite helpful in trying to debug

[jira] Commented: (NUTCH-185) XMLParser is configurable xml parser plugin.

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-185?page=comments#action_12371671 ] Chris A. Mattmann commented on NUTCH-185: - I propose that either this issue be closed and the patch files moved to NUTCH-23, or that NUTCH-23 be closed, as the two are

[jira] Resolved: (NUTCH-34) Parsing different content formats

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-34?page=all ] Chris A. Mattmann resolved NUTCH-34: Fix Version: 0.7.2-dev 0.8-dev Resolution: Fixed This issue was addressed via the application of NUTCH-88 applied to Nutch

[jira] Closed: (NUTCH-34) Parsing different content formats

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-34?page=all ] Chris A. Mattmann closed NUTCH-34: -- Issue addressed by NUTCH-88. Parsing different content formats - Key: NUTCH-34 URL:

[jira] Closed: (NUTCH-24) Cannot handle incorrectly cased Content-Type

2006-03-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-24?page=all ] Chris A. Mattmann closed NUTCH-24: -- issue addressed by NUTCH-139. Cannot handle incorrectly cased Content-Type Key: NUTCH-24

[jira] Commented: (NUTCH-210) Context.xml file for Nutch web application

2006-03-25 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-210?page=comments#action_12371849 ] Chris A. Mattmann commented on NUTCH-210: - Hi Jerome, The updates look fine. No objections from my end. I hope people find the patch useful. Cheers, Chris

[jira] Resolved: (NUTCH-23) content text/xml parser

2006-03-25 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-23?page=all ] Chris A. Mattmann resolved NUTCH-23: Resolution: Duplicate Duplicate of NUTCH-185. content text/xml parser --- Key: NUTCH-23 URL:

[jira] Created: (NUTCH-245) XML Schemas for xml configuration files in conf directory

2006-04-07 Thread Chris A. Mattmann (JIRA)
XML Schemas for xml configuration files in conf directory - Key: NUTCH-245 URL: http://issues.apache.org/jira/browse/NUTCH-245 Project: Nutch Type: New Feature Components: fetcher, indexer, ndfs, searcher,

[jira] Updated: (NUTCH-245) DTD Schemas for plugin.xml configuration files in conf directory

2006-04-07 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-245?page=all ] Chris A. Mattmann updated NUTCH-245: Summary: DTD Schemas for plugin.xml configuration files in conf directory (was: XML Schemas for xml configuration files in conf directory) DTD

[jira] Updated: (NUTCH-245) DTD Schemas for plugin.xml configuration files in conf directory

2006-04-11 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-245?page=all ] Chris A. Mattmann updated NUTCH-245: Description: Currently, the plugin.xml file does not have a DTD or XML Schema associated with it, and most people just go look at an existing plugin's

[jira] Updated: (NUTCH-245) DTD Schemas for plugin.xml configuration files in conf directory

2006-04-11 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-245?page=all ] Chris A. Mattmann updated NUTCH-245: Attachment: NUTCH-245.Mattmann.patch.txt Here's the patch for the plugin DTD file. I got a lot of info from:

[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12414598 ] Chris A. Mattmann commented on NUTCH-258: - Hi there, I believe that the fetcher halting on a LOG.Severe is the intended behavior of the system. The use of this

[jira] Commented: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-236?page=comments#action_12414599 ] Chris A. Mattmann commented on NUTCH-236: - Hi Jason, I'll have a patch prepared for this issue shortly, and I'll attach it to JIRA by this Sunday night. Thanks,

[jira] Updated: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-236?page=all ] Chris A. Mattmann updated NUTCH-236: Due Date: 05/Jun/06 PdfParser and RSSParser Log4j appender redirection -- Key: NUTCH-236

[jira] Updated: (NUTCH-187) Cannot start Nutch datanodes on Windows outside of a cygwin environment because of DF

2006-06-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-187?page=all ] Chris A. Mattmann updated NUTCH-187: Summary: Cannot start Nutch datanodes on Windows outside of a cygwin environment because of DF (was: Run Nutch on Windows without Cygwin) Update

[jira] Resolved: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-04 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann resolved NUTCH-258: - Resolution: Won't Fix The use of LOG.severe in the fetcher indicates an unrecoverable error: thus, this issue is not a bug, and in fact

[jira] Closed: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-04 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann closed NUTCH-258: --- Won't fix: issue describes intended behavior of system (fetcher component). Once Nutch logs a SEVERE log item, Nutch fails forevermore

[jira] Reopened: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-05 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann reopened NUTCH-258: - Assign To: Chris A. Mattmann Issue found to in fact be a real issue with the Fetcher: here's the proposed solution: * add flag field

[jira] Updated: (NUTCH-236) PdfParser and RSSParser Log4j appender redirection

2006-06-08 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-236?page=all ] Chris A. Mattmann updated NUTCH-236: Attachment: NUTCH-236.Mattmann.060806.patch.txt Okay a bit late, but as usual with me :-) This patch implements Jason's suggestion for the following

[jira] Created: (NUTCH-304) Change JIRA email address for nutch issues from apache incubator

2006-06-08 Thread Chris A. Mattmann (JIRA)
Change JIRA email address for nutch issues from apache incubator Key: NUTCH-304 URL: http://issues.apache.org/jira/browse/NUTCH-304 Project: Nutch Type: Task Environment: Dell Pentium M mobile 1.4

[jira] Updated: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-09 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann updated NUTCH-258: Attachment: NUTCH-258.Mattmann.060906.patch.txt Hi Folks, Attached is a patch that implements the suggested two fixes to this issue. I had to go

[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-15 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12416379 ] Chris A. Mattmann commented on NUTCH-258: - Thanks for this patch Chris - even if now it is outdate by NUTCH-303 :-( Since Nutch no more use the deprecated Hadoop

[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-07-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12422962 ] Chris A. Mattmann commented on NUTCH-258: - Guys, This issue slipped off my radar for a bt, but I'll have some free time this week to work on it. If there

[jira] Updated: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-07-25 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann updated NUTCH-258: Fix Version/s: 0.8-dev Once Nutch logs a SEVERE log item, Nutch fails forevermore --

[jira] Created: (NUTCH-338) Remove the text parser as an option for parsing PDF files in parse-plugins.xml

2006-08-03 Thread Chris A. Mattmann (JIRA)
Remove the text parser as an option for parsing PDF files in parse-plugins.xml -- Key: NUTCH-338 URL: http://issues.apache.org/jira/browse/NUTCH-338 Project: Nutch

[jira] Updated: (NUTCH-338) Remove the text parser as an option for parsing PDF files in parse-plugins.xml

2006-08-03 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-338?page=all ] Chris A. Mattmann updated NUTCH-338: Attachment: NUTCH-338.Mattmann.patch.txt simple patch for removing the parse-text plugin from being mapped to PDF content type in parse-plugins.xml.

[jira] Updated: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-08-04 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann updated NUTCH-258: Attachment: NUTCH-258.Mattmann.080406.patch.txt Hi Folks, Sorry I'm a little later than I expected on this one. Attached is a patch that implements

[jira] Commented: (NUTCH-338) Remove the text parser as an option for parsing PDF files in parse-plugins.xml

2006-08-18 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-338?page=comments#action_12429033 ] Chris A. Mattmann commented on NUTCH-338: - Hi Andrzej, A patch is available that you can apply quickly to remove the text parser as an option for pdf.

[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-08-18 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=comments#action_12429035 ] Chris A. Mattmann commented on NUTCH-258: - Hi Folks, A patch is available on this issue. Has anyone who was experiencing the original problem tried out

[jira] Commented: (NUTCH-338) Remove the text parser as an option for parsing PDF files in parse-plugins.xml

2006-08-18 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-338?page=comments#action_12429042 ] Chris A. Mattmann commented on NUTCH-338: - Hi Sami, Thanks much. It's weird that it was broken seeing as it was a one line patch, however, I tried it

[jira] Commented: (NUTCH-356) Plugin repository cache can lead to memory leak

2006-08-21 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-356?page=comments#action_12429548 ] Chris A. Mattmann commented on NUTCH-356: - -1 for closing this issue. If there is a demonstrable memory leak in the plugin system, then I think it should

[jira] Created: (NUTCH-379) ParseUtil does not pass through the content's URL to the ParserFactory

2006-10-04 Thread Chris A. Mattmann (JIRA)
ParseUtil does not pass through the content's URL to the ParserFactory -- Key: NUTCH-379 URL: http://issues.apache.org/jira/browse/NUTCH-379 Project: Nutch Issue Type: Bug

[jira] Work started: (NUTCH-379) ParseUtil does not pass through the content's URL to the ParserFactory

2006-10-04 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-379?page=all ] Work on NUTCH-379 started by Chris A. Mattmann. ParseUtil does not pass through the content's URL to the ParserFactory -- Key: NUTCH-379

[jira] Updated: (NUTCH-379) ParseUtil does not pass through the content's URL to the ParserFactory

2006-10-04 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-379?page=all ] Chris A. Mattmann updated NUTCH-379: Attachment: NUTCH-379.Mattmann.100406.patch.txt Small patch that at least gets started on fixing the larger issue of content urls and parser mapping,

[jira] Updated: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2006-10-11 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-384?page=all ] Chris A. Mattmann updated NUTCH-384: Summary: Protocol-file plugin does not allow the parse plugins framework to operate properly (was: When using the file protocol one can not map a

[jira] Updated: (NUTCH-406) Metadata tries to write null values

2006-11-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-406?page=all ] Chris A. Mattmann updated NUTCH-406: Assignee: Chris A. Mattmann Metadata tries to write null values --- Key: NUTCH-406

[jira] Commented: (NUTCH-406) Metadata tries to write null values

2006-11-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-406?page=comments#action_12452275 ] Chris A. Mattmann commented on NUTCH-406: - Hi Andrzej, Doğacan, +1. I think it makes a lot of sense to just not include the null key in the Met

[jira] Commented: (NUTCH-406) Metadata tries to write null values

2006-11-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-406?page=comments#action_12452285 ] Chris A. Mattmann commented on NUTCH-406: - Hi Doğacan, Loooking at your latest patch, I'm not sure that it completely does the right behavior. For

[jira] Commented: (NUTCH-406) Metadata tries to write null values

2006-11-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-406?page=comments#action_12452286 ] Chris A. Mattmann commented on NUTCH-406: - Hi Andrzej, Yup, you caught the same thing as me. +1 for your solution. I will extend my above patch by

[jira] Resolved: (NUTCH-406) Metadata tries to write null values

2006-11-23 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-406?page=all ] Chris A. Mattmann resolved NUTCH-406. - Fix Version/s: 0.9.0 Resolution: Fixed Fix applied and tested in trunk. Metadata tries to write null values

[jira] Assigned: (NUTCH-390) Javadoc warnings

2006-11-24 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-390?page=all ] Chris A. Mattmann reassigned NUTCH-390: --- Assignee: Chris A. Mattmann Javadoc warnings Key: NUTCH-390 URL:

[jira] Assigned: (NUTCH-185) XMLParser is configurable xml parser plugin.

2006-11-24 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-185?page=all ] Chris A. Mattmann reassigned NUTCH-185: --- Assignee: Chris A. Mattmann XMLParser is configurable xml parser plugin. Key:

[jira] Commented: (NUTCH-407) Make Nutch crawling parent directories for file protocol configurable

2006-11-28 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-407?page=comments#action_12453934 ] Chris A. Mattmann commented on NUTCH-407: - I'm not entirey sure what the right answer to this is. One thing that I do know is that a colleague at my own

[jira] Created: (NUTCH-431) Move plugin specific properties out of nutch-site.xml and into specific conf files for plugins

2007-01-20 Thread Chris A. Mattmann (JIRA)
Move plugin specific properties out of nutch-site.xml and into specific conf files for plugins -- Key: NUTCH-431 URL: https://issues.apache.org/jira/browse/NUTCH-431

[jira] Commented: (NUTCH-353) pages that serverside forwards will be refetched every time

2007-01-20 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466285 ] Chris A. Mattmann commented on NUTCH-353: - Doug, Let's see what you got. I'd be happy to take a look at

[jira] Assigned: (NUTCH-431) Move plugin specific properties out of nutch-site.xml and into specific conf files for plugins

2007-01-26 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-431: --- Assignee: Chris A. Mattmann Move plugin specific properties out of nutch-site.xml

[jira] Commented: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-01-26 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467887 ] Chris A. Mattmann commented on NUTCH-258: - Guys, From recent conversations on the mailing list where Doug

[jira] Work started: (NUTCH-390) Javadoc warnings

2007-01-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-390 started by Chris A. Mattmann. Javadoc warnings Key: NUTCH-390 URL:

[jira] Resolved: (NUTCH-390) Javadoc warnings

2007-01-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-390. - Resolution: Fixed Fix Version/s: 0.9.0 I've fixed this issue in the trunk. I had

[jira] Closed: (NUTCH-390) Javadoc warnings

2007-01-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann closed NUTCH-390. --- Fixed in the trunk: http://svn.apache.org/viewvc?view=revrevision=501315 Javadoc warnings

[jira] Work started: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2007-01-29 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-384 started by Chris A. Mattmann. Protocol-file plugin does not allow the parse plugins framework to operate properly

[jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471780 ] Chris A. Mattmann commented on NUTCH-443: - Nutch Newbie, What exactly do you mean when you mention Apache

[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-02-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471955 ] Chris A. Mattmann commented on NUTCH-444: - Hi Renaud, In fact, Rome does appear to be quite easy to use,

[jira] Assigned: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-443: --- Assignee: Chris A. Mattmann allow parsers to return multiple Parse object, this will

[jira] Assigned: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-02-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-444: --- Assignee: Chris A. Mattmann Possibly use a different library to parse RSS feed for

[jira] Resolved: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2007-02-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-258. - Resolution: Cannot Reproduce With recent API changes to Hadoop, and with the note from

[jira] Assigned: (NUTCH-309) Uses commons logging Code Guards

2007-02-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-309: --- Assignee: Chris A. Mattmann (was: Jerome Charron) Uses commons logging Code Guards

[jira] Work started: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-443 started by Chris A. Mattmann. allow parsers to return multiple Parse object, this will speed up the rss parser

[jira] Updated: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-02-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated NUTCH-443: Attachment: NUTCH-443.022507.patch.txt Hi Folks, Attached is a candidate patch for

[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-02-25 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475794 ] Chris A. Mattmann commented on NUTCH-444: - Hi Nick, Thanks for your insightful comments on this issue. I

[jira] Commented: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2007-03-08 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479430 ] Chris A. Mattmann commented on NUTCH-384: - Thanks for your patch Heiko! I am looking at this right now. If

[jira] Resolved: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2007-03-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-384. - Resolution: Fixed Fix Version/s: 0.9.0 Fixed tested in local crawl, works

[jira] Closed: (NUTCH-384) Protocol-file plugin does not allow the parse plugins framework to operate properly

2007-03-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann closed NUTCH-384. --- Patch applied, with whitespace changes, and unit test (contributed by yours truly):

[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-05-10 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494764 ] Chris A. Mattmann commented on NUTCH-444: - Hi Doğacan, Well I must say, with all the discussion that's

[jira] Reopened: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-05-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reopened NUTCH-443: - Assignee: Chris A. Mattmann (was: Andrzej Bialecki ) Per Doğacan's comment, we need to

[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-05-13 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495381 ] Chris A. Mattmann commented on NUTCH-444: - Doğacan -- I will check this out tomorrow (Monday) night, latest

[jira] Commented: (NUTCH-444) Possibly use a different library to parse RSS feed for improved performance and compatibility

2007-05-30 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500133 ] Chris A. Mattmann commented on NUTCH-444: - Hi Guys, Okay, here is the way that I currently see this issue,

[jira] Commented: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-06-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505501 ] Chris A. Mattmann commented on NUTCH-443: - Doğacan, Whoops :) This one kind of fell off the radar screen.

[jira] Commented: (NUTCH-485) Change HtmlParseFilter 's to return ParseResult object instead of Parse object

2007-06-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505502 ] Chris A. Mattmann commented on NUTCH-485: - Doğacan, +1. As for your question, IMO, these type of minor

[jira] Resolved: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser

2007-06-17 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved NUTCH-443. - Resolution: Fixed Patch tested and contributed by Dogacan. This update is a fix and

  1   2   3   >