Re: [jira] Updated: (NUTCH-379) ParseUtil does not pass through the content's URL to the ParserFactory

2006-10-14 Thread Chris Mattmann
Hi Guys,

 Can we disable the selection of released versions within JIRA for issues
so that people like me don't continue to get confused?

Thanks!

Cheers,
  Chris



On 10/13/06 9:32 AM, Sami Siren (JIRA) [EMAIL PROTECTED] wrote:

  [ http://issues.apache.org/jira/browse/NUTCH-379?page=all ]
 
 Sami Siren updated NUTCH-379:
 -
 
 Fix Version/s: (was: 0.8.1)
(was: 0.8)
 
 cannot fix released versions
 
 ParseUtil does not pass through the content's URL to the ParserFactory
 --
 
 Key: NUTCH-379
 URL: http://issues.apache.org/jira/browse/NUTCH-379
 Project: Nutch
  Issue Type: Bug
  Components: fetcher
Affects Versions: 0.8.1, 0.8, 0.9.0
 Environment: Power Mac Dual G5, 2.0 Ghz, although fix is independent
 of environment
Reporter: Chris A. Mattmann
 Assigned To: Chris A. Mattmann
 Fix For: 0.8.2, 0.9.0
 
 Attachments: NUTCH-379.Mattmann.100406.patch.txt
 
 
 Currently the ParseUtil class that is called by the Fetcher to actually
 perform the parsing of content does not forward thorugh the content's url for
 use in the ParserFactory. A bigger issue, however, is that the url (and for
 that matter, the pathSuffix) is no longer used to determine which parsing
 plugin should be called. My colleague at JPL discovered that more major bug
 and will soon input a JIRA issue for it. However, in the meantime, this small
 patch at least sets up the forwarding of the content's URL to the
 ParserFactory.




[jira] Updated: (NUTCH-383) Upgrade Nutch to Hadoop 0.7

2006-10-14 Thread Andrzej Bialecki (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-383?page=all ]

Andrzej Bialecki  updated NUTCH-383:


Attachment: patch-v3.txt

Cleanup the patch by removing accidental changes.

If there are no further objections I'd like to commit this.

 Upgrade Nutch to Hadoop 0.7
 ---

 Key: NUTCH-383
 URL: http://issues.apache.org/jira/browse/NUTCH-383
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 0.9.0
Reporter: Andrzej Bialecki 
 Assigned To: Andrzej Bialecki 
 Attachments: patch-v2.txt, patch-v3.txt, patch.txt


 Upgrade Nutch to Hadoop 0.7, and replace all occurences of UTF8 with Text. 
 UTF8 is deprecated and its use is discouraged due to its limitations.
 This change will break API, in the sense that all third-party additions will 
 have to be updated to use new APIs that use Text instead of UTF8 in method 
 parameters.
 This change also breaks backward compatibility of data in CrawlDb, LinkDb and 
 segments. A tool to upgrade CrawlDb, LinkDb and segments can be created to 
 facilitate the upgrade path.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (NUTCH-386) Plugin to index categories by url rules

2006-10-14 Thread Ernesto De Santis (JIRA)
 [ http://issues.apache.org/jira/browse/NUTCH-386?page=all ]

Ernesto De Santis updated NUTCH-386:


Attachment: index-url-category-0.1.zip

 Plugin to index categories by url rules
 ---

 Key: NUTCH-386
 URL: http://issues.apache.org/jira/browse/NUTCH-386
 Project: Nutch
  Issue Type: New Feature
  Components: indexer, searcher
Reporter: Ernesto De Santis
Priority: Minor
 Attachments: index-url-category-0.1.zip


 The compressed zip has a install_notes.txt file with instructions.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira