Re: [jira] Updated: (NUTCH-379) ParseUtil does not pass through the content's URL to the ParserFactory
Hi Guys, Can we disable the selection of released versions within JIRA for issues so that people like me don't continue to get confused? Thanks! Cheers, Chris On 10/13/06 9:32 AM, Sami Siren (JIRA) [EMAIL PROTECTED] wrote: [ http://issues.apache.org/jira/browse/NUTCH-379?page=all ] Sami Siren updated NUTCH-379: - Fix Version/s: (was: 0.8.1) (was: 0.8) cannot fix released versions ParseUtil does not pass through the content's URL to the ParserFactory -- Key: NUTCH-379 URL: http://issues.apache.org/jira/browse/NUTCH-379 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 0.8.1, 0.8, 0.9.0 Environment: Power Mac Dual G5, 2.0 Ghz, although fix is independent of environment Reporter: Chris A. Mattmann Assigned To: Chris A. Mattmann Fix For: 0.8.2, 0.9.0 Attachments: NUTCH-379.Mattmann.100406.patch.txt Currently the ParseUtil class that is called by the Fetcher to actually perform the parsing of content does not forward thorugh the content's url for use in the ParserFactory. A bigger issue, however, is that the url (and for that matter, the pathSuffix) is no longer used to determine which parsing plugin should be called. My colleague at JPL discovered that more major bug and will soon input a JIRA issue for it. However, in the meantime, this small patch at least sets up the forwarding of the content's URL to the ParserFactory.
[jira] Updated: (NUTCH-383) Upgrade Nutch to Hadoop 0.7
[ http://issues.apache.org/jira/browse/NUTCH-383?page=all ] Andrzej Bialecki updated NUTCH-383: Attachment: patch-v3.txt Cleanup the patch by removing accidental changes. If there are no further objections I'd like to commit this. Upgrade Nutch to Hadoop 0.7 --- Key: NUTCH-383 URL: http://issues.apache.org/jira/browse/NUTCH-383 Project: Nutch Issue Type: Improvement Affects Versions: 0.9.0 Reporter: Andrzej Bialecki Assigned To: Andrzej Bialecki Attachments: patch-v2.txt, patch-v3.txt, patch.txt Upgrade Nutch to Hadoop 0.7, and replace all occurences of UTF8 with Text. UTF8 is deprecated and its use is discouraged due to its limitations. This change will break API, in the sense that all third-party additions will have to be updated to use new APIs that use Text instead of UTF8 in method parameters. This change also breaks backward compatibility of data in CrawlDb, LinkDb and segments. A tool to upgrade CrawlDb, LinkDb and segments can be created to facilitate the upgrade path. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-386) Plugin to index categories by url rules
[ http://issues.apache.org/jira/browse/NUTCH-386?page=all ] Ernesto De Santis updated NUTCH-386: Attachment: index-url-category-0.1.zip Plugin to index categories by url rules --- Key: NUTCH-386 URL: http://issues.apache.org/jira/browse/NUTCH-386 Project: Nutch Issue Type: New Feature Components: indexer, searcher Reporter: Ernesto De Santis Priority: Minor Attachments: index-url-category-0.1.zip The compressed zip has a install_notes.txt file with instructions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira