svn commit: r378108 - /lucene/nutch/trunk/
Author: cutting Date: Wed Feb 15 14:47:00 2006 New Revision: 378108 URL: http://svn.apache.org/viewcvs?rev=378108&view=rev Log: Ignore logs directory. Modified: lucene/nutch/trunk/ (props changed) Propchange: lucene/nutch/trunk/ -- --- svn:ignore (original) +++ svn:ignore Wed Feb 15 14:47:00 2006 @@ -1,4 +1,5 @@ build +logs nutch.jar .classpath .project
svn commit: r378107 - in /lucene/nutch/trunk: conf/ conf/hadoop-env.sh.template conf/slaves.template lib/hadoop-0.1-dev.jar src/java/org/apache/nutch/fetcher/Fetcher.java
Author: cutting Date: Wed Feb 15 14:45:31 2006 New Revision: 378107 URL: http://svn.apache.org/viewcvs?rev=378107&view=rev Log: Fix Fetcher to disable speculative exexution, to keep it polite. Also upgrade to latest hadoop jar that supports this feature. Note that Hadoop's environment specification has changed, with all environment variables settable from conf/hadoop-env.sh, and the slaves file is now in conf/, rather than in one's home directory. Added: lucene/nutch/trunk/conf/hadoop-env.sh.template lucene/nutch/trunk/conf/slaves.template Modified: lucene/nutch/trunk/conf/ (props changed) lucene/nutch/trunk/lib/hadoop-0.1-dev.jar lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java Propchange: lucene/nutch/trunk/conf/ -- --- svn:ignore (original) +++ svn:ignore Wed Feb 15 14:45:31 2006 @@ -1,5 +1,4 @@ -nutch-site.xml -regex-normalize.xml -crawl-urlfilter.txt -regex-urlfilter.txt -mapred-default.xml +*.xml +*.txt +*.sh +slaves Added: lucene/nutch/trunk/conf/hadoop-env.sh.template URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/conf/hadoop-env.sh.template?rev=378107&view=auto == --- lucene/nutch/trunk/conf/hadoop-env.sh.template (added) +++ lucene/nutch/trunk/conf/hadoop-env.sh.template Wed Feb 15 14:45:31 2006 @@ -0,0 +1,25 @@ +# Set Hadoop-specific environment variables here. + +# The java implementation to use. +# export JAVA_HOME=/usr/bin/java + +# The maximum amount of heap to use, in MB. Default is 1000. +# export HADOOP_HEAPSIZE=2000 + +# Extra Java runtime options. Empty by default. +# export HADOOP_OPTS=-server + +# Where log files are stored. $HADOOP_HOME/logs by default. +# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs + +# File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. +# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves + +# host:path where hadoop code should be rsync'd from. Unset by default. +# export HADOOP_MASTER=master:/home/$USER/src/hadoop + +# The directory where pid files are stored. /tmp by default. +# export HADOOP_PID_DIR=/var/hadoop/pids + +# A string representing this instance of hadoop. $USER by default. +# export HADOOP_IDENT_STRING=$USER Added: lucene/nutch/trunk/conf/slaves.template URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/conf/slaves.template?rev=378107&view=auto == --- lucene/nutch/trunk/conf/slaves.template (added) +++ lucene/nutch/trunk/conf/slaves.template Wed Feb 15 14:45:31 2006 @@ -0,0 +1 @@ +localhost Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/lib/hadoop-0.1-dev.jar?rev=378107&r1=378106&r2=378107&view=diff == Binary files - no diff available. Modified: lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java?rev=378107&r1=378106&r2=378107&view=diff == --- lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java (original) +++ lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java Wed Feb 15 14:45:31 2006 @@ -348,6 +348,9 @@ job.set(SEGMENT_NAME_KEY, segment.getName()); job.setBoolean("fetcher.parse", parsing); +// for politeness, don't permit parallel execution of a single task +job.setBoolean("mapred.speculative.execution", false); + job.setInputDir(new File(segment, CrawlDatum.GENERATE_DIR_NAME)); job.setInputFormat(InputFormat.class); job.setInputKeyClass(UTF8.class);
svn commit: r378044 - /lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
Author: cutting Date: Wed Feb 15 09:56:54 2006 New Revision: 378044 URL: http://svn.apache.org/viewcvs?rev=378044&view=rev Log: Upgrade to latest version of Hadoop. Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/lib/hadoop-0.1-dev.jar?rev=378044&r1=378043&r2=378044&view=diff == Binary files - no diff available.
svn commit: r378011 - in /lucene/nutch/trunk/src/plugin: ./ clustering-carrot2/ clustering-carrot2/lib/ lib-log4j/ lib-log4j/lib/ parse-pdf/ parse-pdf/lib/ parse-rss/ parse-rss/lib/
Author: jerome Date: Wed Feb 15 06:24:56 2006 New Revision: 378011 URL: http://svn.apache.org/viewcvs?rev=378011&view=rev Log: Add a log4j library plugin (lib-log4j) Added: lucene/nutch/trunk/src/plugin/lib-log4j/ lucene/nutch/trunk/src/plugin/lib-log4j/build.xml (with props) lucene/nutch/trunk/src/plugin/lib-log4j/lib/ lucene/nutch/trunk/src/plugin/lib-log4j/lib/log4j-1.2.11.jar (with props) lucene/nutch/trunk/src/plugin/lib-log4j/plugin.xml (with props) Removed: lucene/nutch/trunk/src/plugin/clustering-carrot2/lib/log4j-1.2.11.jar lucene/nutch/trunk/src/plugin/clustering-carrot2/lib/log4j.LICENSE lucene/nutch/trunk/src/plugin/parse-pdf/lib/log4j-1.2.9.jar lucene/nutch/trunk/src/plugin/parse-pdf/lib/log4j-LICENSE.txt lucene/nutch/trunk/src/plugin/parse-rss/lib/log4j-1.2.6.jar Modified: lucene/nutch/trunk/src/plugin/build.xml lucene/nutch/trunk/src/plugin/clustering-carrot2/build.xml lucene/nutch/trunk/src/plugin/clustering-carrot2/plugin.xml lucene/nutch/trunk/src/plugin/parse-pdf/build.xml lucene/nutch/trunk/src/plugin/parse-pdf/plugin.xml lucene/nutch/trunk/src/plugin/parse-rss/build.xml lucene/nutch/trunk/src/plugin/parse-rss/plugin.xml Modified: lucene/nutch/trunk/src/plugin/build.xml URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/build.xml?rev=378011&r1=378010&r2=378011&view=diff == --- lucene/nutch/trunk/src/plugin/build.xml (original) +++ lucene/nutch/trunk/src/plugin/build.xml Wed Feb 15 06:24:56 2006 @@ -13,6 +13,7 @@ + @@ -78,6 +79,7 @@ + Modified: lucene/nutch/trunk/src/plugin/clustering-carrot2/build.xml URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/clustering-carrot2/build.xml?rev=378011&r1=378010&r2=378011&view=diff == --- lucene/nutch/trunk/src/plugin/clustering-carrot2/build.xml (original) +++ lucene/nutch/trunk/src/plugin/clustering-carrot2/build.xml Wed Feb 15 06:24:56 2006 @@ -4,4 +4,10 @@ + + + + + + Modified: lucene/nutch/trunk/src/plugin/clustering-carrot2/plugin.xml URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/clustering-carrot2/plugin.xml?rev=378011&r1=378010&r2=378011&view=diff == --- lucene/nutch/trunk/src/plugin/clustering-carrot2/plugin.xml (original) +++ lucene/nutch/trunk/src/plugin/clustering-carrot2/plugin.xml Wed Feb 15 06:24:56 2006 @@ -29,6 +29,7 @@ + http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/lib-log4j/build.xml?rev=378011&view=auto == --- lucene/nutch/trunk/src/plugin/lib-log4j/build.xml (added) +++ lucene/nutch/trunk/src/plugin/lib-log4j/build.xml Wed Feb 15 06:24:56 2006 @@ -0,0 +1,17 @@ + + + + + + + + + + + + + + Propchange: lucene/nutch/trunk/src/plugin/lib-log4j/build.xml -- svn:eol-style = native Added: lucene/nutch/trunk/src/plugin/lib-log4j/lib/log4j-1.2.11.jar URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/lib-log4j/lib/log4j-1.2.11.jar?rev=378011&view=auto == Binary file - no diff available. Propchange: lucene/nutch/trunk/src/plugin/lib-log4j/lib/log4j-1.2.11.jar -- svn:mime-type = application/octet-stream Added: lucene/nutch/trunk/src/plugin/lib-log4j/plugin.xml URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/lib-log4j/plugin.xml?rev=378011&view=auto == --- lucene/nutch/trunk/src/plugin/lib-log4j/plugin.xml (added) +++ lucene/nutch/trunk/src/plugin/lib-log4j/plugin.xml Wed Feb 15 06:24:56 2006 @@ -0,0 +1,21 @@ + + + + + + + + + + + Propchange: lucene/nutch/trunk/src/plugin/lib-log4j/plugin.xml -- svn:eol-style = native Modified: lucene/nutch/trunk/src/plugin/parse-pdf/build.xml URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/parse-pdf/build.xml?rev=378011&r1=378010&r2=378011&view=diff == --- lucene/nutch/trunk/src/plugin/parse-pdf/build.xml (original) +++ lucene/nutch/trunk/src/plugin/parse-pdf/build.xml Wed Feb 15 06:24:56 2006 @@ -4,6 +4,12 @@ + + + + + + Modified: lucene/nutch/trunk/src/plugin/parse-pdf/plugin.xml URL: http://svn.apache.org/viewcvs/lucene/nutch/