[ 
http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12365130 ] 

Doug Cutting commented on NUTCH-193:
------------------------------------

Okay, I've moved the code from Nutch to Hadoop.  Now I need to repair Nutch so 
that it still works!

One remaining problem is the need to separate nutch config files from hadoop 
config files.  There's now a hadoop-default.xml and hadoop-site.xml, which are 
separate from the similarly-named nutch files.  For now, I'll fix this by 
adding the following methods to Hadoop's Configuration class:

void addDefaultResource(String name);
void addFinalResource(String name);

Then add a Nutch utility class like:

public class NutchConfiguration {
  public static Configuration create() {
    Configuration conf = new Configuration();
    addNutchResources(conf);
  }
  public static Configuration addNutchResources(Configuration conf) {
    addDefaultResource("nutch-default.xml");
    addFinalResource("nutch-site.xml");
  }
}

Then all of the places which currently call 'new NutchConf()' can be replaced 
with 'NutchConfiguration().create()'.

Longer-term we might consider a more radical re-design of the configuration 
API.  But first we need to get Hadoop and Nutch split.





> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
>          Key: NUTCH-193
>          URL: http://issues.apache.org/jira/browse/NUTCH-193
>      Project: Nutch
>         Type: Task
>   Components: ndfs
>     Versions: 0.8-dev
>     Reporter: Doug Cutting
>     Assignee: Doug Cutting
>      Fix For: 0.8-dev

>
> The NDFS and MapReduce code should move from Nutch to a new Lucene 
> sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will 
> be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable 
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to