[
http://issues.apache.org/jira/browse/NUTCH-193?page=comments#action_12365130 ]
Doug Cutting commented on NUTCH-193:
------------------------------------
Okay, I've moved the code from Nutch to Hadoop. Now I need to repair Nutch so
that it still works!
One remaining problem is the need to separate nutch config files from hadoop
config files. There's now a hadoop-default.xml and hadoop-site.xml, which are
separate from the similarly-named nutch files. For now, I'll fix this by
adding the following methods to Hadoop's Configuration class:
void addDefaultResource(String name);
void addFinalResource(String name);
Then add a Nutch utility class like:
public class NutchConfiguration {
public static Configuration create() {
Configuration conf = new Configuration();
addNutchResources(conf);
}
public static Configuration addNutchResources(Configuration conf) {
addDefaultResource("nutch-default.xml");
addFinalResource("nutch-site.xml");
}
}
Then all of the places which currently call 'new NutchConf()' can be replaced
with 'NutchConfiguration().create()'.
Longer-term we might consider a more radical re-design of the configuration
API. But first we need to get Hadoop and Nutch split.
> move NDFS and MapReduce to a separate project
> ---------------------------------------------
>
> Key: NUTCH-193
> URL: http://issues.apache.org/jira/browse/NUTCH-193
> Project: Nutch
> Type: Task
> Components: ndfs
> Versions: 0.8-dev
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 0.8-dev
>
> The NDFS and MapReduce code should move from Nutch to a new Lucene
> sub-project named Hadoop.
> My plan is to do this as follows:
> 1. Move all code in the following packages from Nutch to Hadoop:
> org.apache.nutch.fs
> org.apache.nutch.io
> org.apache.nutch.ipc
> org.apache.nutch.mapred
> org.apache.nutch.ndfs
> These packages will all be renamed to org.apache.hadoop, and Nutch code will
> be updated to reflect this.
> 2. Move selected classes from Nutch to Hadoop, as follows:
> org.apache.nutch.util.NutchConf -> org.apache.hadoop.conf.Configuration
> org.apache.nutch.util.NutchConfigurable -> org.apache.hadoop.Configurable
> org.apache.nutch.util.NutchConfigured -> org.apache.hadoop.Configured
> org.apache.nutch.util.Progress -> org.apache.hadoop.util.Progress
> org.apache.nutch.util.LogFormatter-> org.apache.hadoop.util.LogFormatter
> org.apache.nutch.util.Daemon -> org.apache.hadoop.util.Daemon
> 3. Add a jar containing all of the above the Nutch's lib directory.
> Does this plan sound reasonable?
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers