svn commit: r798304 [3/3] - in /lucene/nutch/logos: ./ character-hand-big.png character.eps nutch_logo.eps nutch_logo.png

2009-07-27 Thread cutting
Added: lucene/nutch/logos/nutch_logo.eps URL: http://svn.apache.org/viewvc/lucene/nutch/logos/nutch_logo.eps?rev=798304view=auto == Binary file - no diff available. Propchange: lucene/nutch/logos/nutch_logo.eps

svn commit: r475926 - /lucene/nutch/nightly/nightly.sh

2006-11-17 Thread cutting
Author: cutting Date: Thu Nov 16 13:03:26 2006 New Revision: 475926 URL: http://svn.apache.org/viewvc?view=revrev=475926 Log: Update nightly build location. Modified: lucene/nutch/nightly/nightly.sh Modified: lucene/nutch/nightly/nightly.sh URL: http://svn.apache.org/viewvc/lucene/nutch

svn commit: r421185 - /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDb.java

2006-07-12 Thread cutting
Author: cutting Date: Wed Jul 12 01:16:37 2006 New Revision: 421185 URL: http://svn.apache.org/viewvc?rev=421185view=rev Log: Patch a bug introduced by Hadoop 0.4.0, which requires specified input directories to exist. Modified: lucene/nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDb.java

svn commit: r417884 - in /lucene/nutch/trunk: lib/ src/java/org/apache/nutch/fetcher/ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/parse/ src/java/org/apache/nutch/segment/

2006-06-28 Thread cutting
Author: cutting Date: Wed Jun 28 14:54:53 2006 New Revision: 417884 URL: http://svn.apache.org/viewvc?rev=417884view=rev Log: NUTCH-312. Upgrade to Hadoop 0.4.0. Added: lucene/nutch/trunk/lib/commons-cli-2.0-SNAPSHOT.jar (with props) lucene/nutch/trunk/lib/hadoop-0.4.0.jar

svn commit: r413175 - in /lucene/nutch/trunk/lib: hadoop-0.3.1.jar hadoop-0.3.2.jar

2006-06-09 Thread cutting
Author: cutting Date: Fri Jun 9 14:48:23 2006 New Revision: 413175 URL: http://svn.apache.org/viewvc?rev=413175view=rev Log: Upgrading to Hadoop 0.3.2 release. Added: lucene/nutch/trunk/lib/hadoop-0.3.2.jar (with props) Removed: lucene/nutch/trunk/lib/hadoop-0.3.1.jar Added: lucene

svn commit: r405861 - in /lucene/nutch/trunk/lib: hadoop-0.2.0.jar hadoop-0.2.1.jar

2006-05-12 Thread cutting
Author: cutting Date: Fri May 12 13:31:59 2006 New Revision: 405861 URL: http://svn.apache.org/viewcvs?rev=405861view=rev Log: Upgrading to Hadoop 0.2.1. Added: lucene/nutch/trunk/lib/hadoop-0.2.1.jar (with props) Removed: lucene/nutch/trunk/lib/hadoop-0.2.0.jar Added: lucene/nutch

svn commit: r400159 - /lucene/nutch/trunk/bin/

2006-05-05 Thread cutting
Author: cutting Date: Fri May 5 13:01:44 2006 New Revision: 400159 URL: http://svn.apache.org/viewcvs?rev=400159view=rev Log: Ignore bin/rcc (from Hadoop). Modified: lucene/nutch/trunk/bin/ (props changed) Propchange: lucene/nutch/trunk/bin

svn commit: r400199 - in /lucene/nutch/trunk/lib: hadoop-0.1.1.jar hadoop-0.2.0.jar

2006-05-05 Thread cutting
Author: cutting Date: Fri May 5 15:44:04 2006 New Revision: 400199 URL: http://svn.apache.org/viewcvs?rev=400199view=rev Log: Upgrading to Hadoop 0.2.0. Added: lucene/nutch/trunk/lib/hadoop-0.2.0.jar (with props) Removed: lucene/nutch/trunk/lib/hadoop-0.1.1.jar Added: lucene/nutch

svn commit: r394781 - /lucene/nutch/trunk/bin/

2006-04-17 Thread cutting
Author: cutting Date: Mon Apr 17 14:40:58 2006 New Revision: 394781 URL: http://svn.apache.org/viewcvs?rev=394781view=rev Log: Ignore more bin files. Modified: lucene/nutch/trunk/bin/ (props changed) Propchange: lucene/nutch/trunk/bin

svn commit: r392458 - in /lucene/nutch/trunk/lib: hadoop-0.1.0.jar hadoop-0.1.1.jar

2006-04-07 Thread cutting
Author: cutting Date: Fri Apr 7 16:48:10 2006 New Revision: 392458 URL: http://svn.apache.org/viewcvs?rev=392458view=rev Log: Upgrading to Hadoop release 0.1.1. Added: lucene/nutch/trunk/lib/hadoop-0.1.1.jar (with props) Removed: lucene/nutch/trunk/lib/hadoop-0.1.0.jar Added: lucene

svn commit: r390745 - in /lucene/nutch/trunk/lib: hadoop-0.1-dev.jar hadoop-0.1.0.jar

2006-04-01 Thread cutting
Author: cutting Date: Sat Apr 1 12:16:22 2006 New Revision: 390745 URL: http://svn.apache.org/viewcvs?rev=390745view=rev Log: Update to Hadoop 0.1.0 release. Added: lucene/nutch/trunk/lib/hadoop-0.1.0.jar (with props) Removed: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar Added: lucene

svn commit: r388310 - in /lucene/nutch/trunk: lib/ src/java/org/apache/nutch/crawl/ src/java/org/apache/nutch/fetcher/ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/parse/ src/java/org/

2006-03-23 Thread cutting
Author: cutting Date: Thu Mar 23 16:57:56 2006 New Revision: 388310 URL: http://svn.apache.org/viewcvs?rev=388310view=rev Log: Upgrade to latest Hadoop jar. Add job names to Nutch mapred jobs. Update OutputFormat implementations to implement new checkOutputSpecs() method. Modified: lucene

svn commit: r387310 - /lucene/nutch/trunk/lib/hadoop-0.1-dev.jar

2006-03-20 Thread cutting
Author: cutting Date: Mon Mar 20 13:08:15 2006 New Revision: 387310 URL: http://svn.apache.org/viewcvs?rev=387310view=rev Log: Upgrade to current Hadoop. Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar URL: http://svn.apache.org

svn commit: r386181 - in /lucene/nutch/branches/branch-0.7: site/issue_tracking.html site/issue_tracking.pdf src/site/src/documentation/content/xdocs/issue_tracking.xml

2006-03-15 Thread cutting
Author: cutting Date: Wed Mar 15 14:20:40 2006 New Revision: 386181 URL: http://svn.apache.org/viewcvs?rev=386181view=rev Log: Updated link to jira. Modified: lucene/nutch/branches/branch-0.7/site/issue_tracking.html lucene/nutch/branches/branch-0.7/site/issue_tracking.pdf lucene

svn commit: r383698 - /lucene/nutch/trunk/lib/hadoop-0.1-dev.jar

2006-03-06 Thread cutting
Author: cutting Date: Mon Mar 6 14:54:20 2006 New Revision: 383698 URL: http://svn.apache.org/viewcvs?rev=383698view=rev Log: Upgrade to latest version of Hadoop. Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar URL: http

svn commit: r382912 - in /lucene/nutch/trunk/src/java/org/apache/nutch: crawl/ fetcher/ indexer/ parse/ plugin/ searcher/ segment/

2006-03-03 Thread cutting
Author: cutting Date: Fri Mar 3 11:05:41 2006 New Revision: 382912 URL: http://svn.apache.org/viewcvs?rev=382912view=rev Log: Undo unintentional changes made in r381751. Thanks, Jerome, for catching this! Modified: lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Crawl.java lucene

svn commit: r382939 - /lucene/nutch/trunk/lib/hadoop-0.1-dev.jar

2006-03-03 Thread cutting
Author: cutting Date: Fri Mar 3 13:46:21 2006 New Revision: 382939 URL: http://svn.apache.org/viewcvs?rev=382939view=rev Log: Upgrade hadoop to latest version with some important mapred bug fixes. Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar Modified: lucene/nutch/trunk/lib/hadoop

svn commit: r382512 - in /lucene/nutch/trunk/lib: lucene-core-1.9-final.jar lucene-core-1.9.1.jar lucene-misc-1.9-final.jar lucene-misc-1.9.1.jar

2006-03-02 Thread cutting
Author: cutting Date: Thu Mar 2 12:59:09 2006 New Revision: 382512 URL: http://svn.apache.org/viewcvs?rev=382512view=rev Log: Upgrade to Lucene 1.9.1. Added: lucene/nutch/trunk/lib/lucene-core-1.9.1.jar (with props) lucene/nutch/trunk/lib/lucene-misc-1.9.1.jar (with props) Removed

svn commit: r382573 - in /lucene/nutch/trunk: conf/hadoop-env.sh.template lib/hadoop-0.1-dev.jar

2006-03-02 Thread cutting
Author: cutting Date: Thu Mar 2 15:59:24 2006 New Revision: 382573 URL: http://svn.apache.org/viewcvs?rev=382573view=rev Log: Update to latest Hadoop code. Modified: lucene/nutch/trunk/conf/hadoop-env.sh.template lucene/nutch/trunk/lib/hadoop-0.1-dev.jar Modified: lucene/nutch/trunk

svn commit: r382579 - /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/DeleteDuplicates.java

2006-03-02 Thread cutting
Author: cutting Date: Thu Mar 2 16:06:59 2006 New Revision: 382579 URL: http://svn.apache.org/viewcvs?rev=382579view=rev Log: Disable speculative execution, since input format has side effects. Modified: lucene/nutch/trunk/src/java/org/apache/nutch/indexer/DeleteDuplicates.java Modified

svn commit: r381721 - in /lucene/nutch/trunk/lib: lucene-core-1.9-final.jar lucene-core-1.9-rc1-dev.jar lucene-misc-1.9-final.jar lucene-misc-1.9-rc1-dev.jar

2006-02-28 Thread cutting
Author: cutting Date: Tue Feb 28 10:00:43 2006 New Revision: 381721 URL: http://svn.apache.org/viewcvs?rev=381721view=rev Log: Upgrade lucene version to final release. Added: lucene/nutch/trunk/lib/lucene-core-1.9-final.jar (with props) lucene/nutch/trunk/lib/lucene-misc-1.9-final.jar

svn commit: r381824 - /lucene/nutch/trunk/lib/hadoop-0.1-dev.jar

2006-02-28 Thread cutting
Author: cutting Date: Tue Feb 28 15:30:02 2006 New Revision: 381824 URL: http://svn.apache.org/viewcvs?rev=381824view=rev Log: Updating hadoop jar. Includes fixes for Windows. Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar URL: http

svn commit: r380789 - /lucene/nutch/trunk/build.xml

2006-02-24 Thread cutting
Author: cutting Date: Fri Feb 24 11:11:44 2006 New Revision: 380789 URL: http://svn.apache.org/viewcvs?rev=380789view=rev Log: Fix to not use 'exec', but rather 'untar' and 'chmod' which are more portable. Modified: lucene/nutch/trunk/build.xml Modified: lucene/nutch/trunk/build.xml URL

svn commit: r380840 - /lucene/nutch/trunk/lib/hadoop-0.1-dev.jar

2006-02-24 Thread cutting
Author: cutting Date: Fri Feb 24 14:38:06 2006 New Revision: 380840 URL: http://svn.apache.org/viewcvs?rev=380840view=rev Log: Update hadoop jar, to get recent fixes from that project. Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar

svn commit: r378381 - /lucene/nutch/trunk/src/site/src/documentation/content/xdocs/tabs.xml

2006-02-16 Thread cutting
Author: cutting Date: Thu Feb 16 14:24:47 2006 New Revision: 378381 URL: http://svn.apache.org/viewcvs?rev=378381view=rev Log: Fix to work with Forrest 0.7, where ext: links seem to no longer work in tabs.xml. Modified: lucene/nutch/trunk/src/site/src/documentation/content/xdocs/tabs.xml

svn commit: r378044 - /lucene/nutch/trunk/lib/hadoop-0.1-dev.jar

2006-02-15 Thread cutting
Author: cutting Date: Wed Feb 15 09:56:54 2006 New Revision: 378044 URL: http://svn.apache.org/viewcvs?rev=378044view=rev Log: Upgrade to latest version of Hadoop. Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar URL: http

svn commit: r378107 - in /lucene/nutch/trunk: conf/ conf/hadoop-env.sh.template conf/slaves.template lib/hadoop-0.1-dev.jar src/java/org/apache/nutch/fetcher/Fetcher.java

2006-02-15 Thread cutting
Author: cutting Date: Wed Feb 15 14:45:31 2006 New Revision: 378107 URL: http://svn.apache.org/viewcvs?rev=378107view=rev Log: Fix Fetcher to disable speculative exexution, to keep it polite. Also upgrade to latest hadoop jar that supports this feature. Note that Hadoop's environment

svn commit: r376815 - /lucene/nutch/trunk/lib/hadoop-0.1-dev.jar

2006-02-10 Thread cutting
Author: cutting Date: Fri Feb 10 11:44:47 2006 New Revision: 376815 URL: http://svn.apache.org/viewcvs?rev=376815view=rev Log: Update Hadoop jar. Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar URL: http://svn.apache.org/viewcvs

svn commit: r376435 - in /lucene/nutch/trunk: lib/ src/java/org/apache/nutch/crawl/ src/java/org/apache/nutch/fetcher/ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/parse/ src/java/org/

2006-02-09 Thread cutting
Author: cutting Date: Thu Feb 9 12:57:44 2006 New Revision: 376435 URL: http://svn.apache.org/viewcvs?rev=376435view=rev Log: Updating to latest Hadoop jar, adding now-required close() methods to mapper and reducer implementations. Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar

svn commit: r376485 - in /lucene/nutch/trunk: ./ bin/ lib/ src/java/org/apache/nutch/crawl/ src/java/org/apache/nutch/fetcher/ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/parse/ src/j

2006-02-09 Thread cutting
Author: cutting Date: Thu Feb 9 15:20:28 2006 New Revision: 376485 URL: http://svn.apache.org/viewcvs?rev=376485view=rev Log: Fix for NUTCH-209. Nutch now supplies all code to remote MapReduce daemons through a job jar file. So Hadoop daemons no longer need to be restarted when Nutch code

svn commit: r376072 - /lucene/nutch/trunk/conf/nutch-default.xml

2006-02-08 Thread cutting
Author: cutting Date: Wed Feb 8 13:25:30 2006 New Revision: 376072 URL: http://svn.apache.org/viewcvs?rev=376072view=rev Log: Restore accidentally removed file defaults. Modified: lucene/nutch/trunk/conf/nutch-default.xml Modified: lucene/nutch/trunk/conf/nutch-default.xml URL: http

svn commit: r375704 - in /lucene/nutch/trunk/lib: jetty-5.1.4.LICENSE.txt jetty-5.1.4.jar jetty-ext/

2006-02-07 Thread cutting
Author: cutting Date: Tue Feb 7 13:02:46 2006 New Revision: 375704 URL: http://svn.apache.org/viewcvs?rev=375704view=rev Log: Restoring jetty to Nutch lib: removed by mistake. Added: lucene/nutch/trunk/lib/jetty-5.1.4.LICENSE.txt - copied unchanged from r374759, lucene/hadoop/trunk

svn commit: r375333 - /lucene/nutch/nightly/nightly.properties

2006-02-06 Thread cutting
Author: cutting Date: Mon Feb 6 10:57:09 2006 New Revision: 375333 URL: http://svn.apache.org/viewcvs?rev=375333view=rev Log: Updated email paramters. Modified: lucene/nutch/nightly/nightly.properties Modified: lucene/nutch/nightly/nightly.properties URL: http://svn.apache.org/viewcvs

svn commit: r372315 - /lucene/nutch/nightly/nightly.sh

2006-01-25 Thread cutting
Author: cutting Date: Wed Jan 25 13:12:13 2006 New Revision: 372315 URL: http://svn.apache.org/viewcvs?rev=372315view=rev Log: Fix deletion of old versions. Modified: lucene/nutch/nightly/nightly.sh Modified: lucene/nutch/nightly/nightly.sh URL: http://svn.apache.org/viewcvs/lucene/nutch

svn commit: r372342 - /lucene/nutch/nightly/nightly.sh

2006-01-25 Thread cutting
Author: cutting Date: Wed Jan 25 14:20:06 2006 New Revision: 372342 URL: http://svn.apache.org/viewcvs?rev=372342view=rev Log: Fix remove command. Modified: lucene/nutch/nightly/nightly.sh Modified: lucene/nutch/nightly/nightly.sh URL: http://svn.apache.org/viewcvs/lucene/nutch/nightly

svn commit: r370632 - /lucene/nutch/trunk/conf/nutch-default.xml

2006-01-19 Thread cutting
Author: cutting Date: Thu Jan 19 12:58:54 2006 New Revision: 370632 URL: http://svn.apache.org/viewcvs?rev=370632view=rev Log: Switch default to protocol-http, since it seems more reliable than protocol-httpclient. Modified: lucene/nutch/trunk/conf/nutch-default.xml Modified: lucene/nutch

svn commit: r370638 - /lucene/nutch/trunk/conf/nutch-default.xml

2006-01-19 Thread cutting
Author: cutting Date: Thu Jan 19 13:24:58 2006 New Revision: 370638 URL: http://svn.apache.org/viewcvs?rev=370638view=rev Log: Document a few more properties. Contributed by Dominik Friedrich. Modified: lucene/nutch/trunk/conf/nutch-default.xml Modified: lucene/nutch/trunk/conf/nutch

svn commit: r370657 - in /lucene/nutch/nightly: nightly.cron nightly.properties nightly.sh

2006-01-19 Thread cutting
Author: cutting Date: Thu Jan 19 14:46:28 2006 New Revision: 370657 URL: http://svn.apache.org/viewcvs?rev=370657view=rev Log: Moving nightly build to lucene.zones.apache.org. Modified: lucene/nutch/nightly/nightly.cron lucene/nutch/nightly/nightly.properties lucene/nutch/nightly

svn commit: r370281 - /lucene/nutch/trunk/build.xml

2006-01-18 Thread cutting
Author: cutting Date: Wed Jan 18 14:03:28 2006 New Revision: 370281 URL: http://svn.apache.org/viewcvs?rev=370281view=rev Log: Fix NUTCH-102: include webapps in packaged releases. Modified: lucene/nutch/trunk/build.xml Modified: lucene/nutch/trunk/build.xml URL: http://svn.apache.org

svn commit: r367406 - in /lucene/nutch/trunk/src: java/org/apache/nutch/ipc/RPC.java test/org/apache/nutch/ipc/TestRPC.java

2006-01-09 Thread cutting
Author: cutting Date: Mon Jan 9 13:50:48 2006 New Revision: 367406 URL: http://svn.apache.org/viewcvs?rev=367406view=rev Log: Fix parallel RPC calls to work correctly with methods that return void. Modified: lucene/nutch/trunk/src/java/org/apache/nutch/ipc/RPC.java lucene/nutch/trunk

svn commit: r367408 - /lucene/nutch/trunk/src/plugin/urlfilter-regex/src/java/org/apache/nutch/net/RegexURLFilter.java

2006-01-09 Thread cutting
Author: cutting Date: Mon Jan 9 13:55:31 2006 New Revision: 367408 URL: http://svn.apache.org/viewcvs?rev=367408view=rev Log: NUTCH-160: Switch RegexURLFilter to use Java regex's rather than oro, since Java's seem to be faster more reliable. By Rod Taylor. Modified: lucene/nutch/trunk

svn commit: r366550 - /lucene/nutch/trunk/src/java/org/apache/nutch/ipc/Client.java

2006-01-06 Thread cutting
Author: cutting Date: Fri Jan 6 11:14:46 2006 New Revision: 366550 URL: http://svn.apache.org/viewcvs?rev=366550view=rev Log: Make it clearer why this optimization is valid. For Stefan. Modified: lucene/nutch/trunk/src/java/org/apache/nutch/ipc/Client.java Modified: lucene/nutch/trunk/src

svn commit: r366242 - in /lucene/nutch/trunk: conf/nutch-default.xml src/java/org/apache/nutch/searcher/LuceneQueryOptimizer.java

2006-01-05 Thread cutting
Author: cutting Date: Thu Jan 5 10:38:44 2006 New Revision: 366242 URL: http://svn.apache.org/viewcvs?rev=366242view=rev Log: Fix NegativeArraySizeException. Modified: lucene/nutch/trunk/conf/nutch-default.xml lucene/nutch/trunk/src/java/org/apache/nutch/searcher

svn commit: r366271 - /lucene/nutch/trunk/src/java/org/apache/nutch/mapred/TaskTracker.java

2006-01-05 Thread cutting
Author: cutting Date: Thu Jan 5 12:13:43 2006 New Revision: 366271 URL: http://svn.apache.org/viewcvs?rev=366271view=rev Log: Fix for NUTCH-108: eliminate voluminous messages when reconnecting. From Paul Baclace. Modified: lucene/nutch/trunk/src/java/org/apache/nutch/mapred/TaskTracker.java

svn commit: r366322 - /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/LuceneQueryOptimizer.java

2006-01-05 Thread cutting
Author: cutting Date: Thu Jan 5 14:37:19 2006 New Revision: 366322 URL: http://svn.apache.org/viewcvs?rev=366322view=rev Log: Fix a bug in LimitedCollector. Modified: lucene/nutch/trunk/src/java/org/apache/nutch/searcher/LuceneQueryOptimizer.java Modified: lucene/nutch/trunk/src/java/org

svn commit: r357197 [5/5] - in /lucene/nutch: branches/mapred/ trunk/ trunk/bin/ trunk/conf/ trunk/lib/ trunk/lib/jetty-ext/ trunk/site/ trunk/src/java/org/apache/nutch/crawl/ trunk/src/java/org/apach

2005-12-16 Thread cutting
Modified: lucene/nutch/trunk/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/RobotRulesParser.java URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/RobotRulesParser.java?rev=357197r1=357196r2=357197view=diff

svn commit: r348210 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/NFSDataInputStream.java

2005-11-22 Thread cutting
Author: cutting Date: Tue Nov 22 10:46:43 2005 New Revision: 348210 URL: http://svn.apache.org/viewcvs?rev=348210view=rev Log: Silently ignore missing checksum files. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/NFSDataInputStream.java Modified: lucene/nutch

svn commit: r348212 - in /lucene/nutch/branches/mapred/conf: crawl-tool.xml nutch-default.xml

2005-11-22 Thread cutting
Author: cutting Date: Tue Nov 22 10:55:26 2005 New Revision: 348212 URL: http://svn.apache.org/viewcvs?rev=348212view=rev Log: Increase defaults for http.max.delays, since, with MapReduce's partitioning of fetchlists, delays are more likely. Modified: lucene/nutch/branches/mapred/conf/crawl

svn commit: r332371 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Generator.java

2005-11-10 Thread cutting
Author: cutting Date: Thu Nov 10 13:03:16 2005 New Revision: 332371 URL: http://svn.apache.org/viewcvs?rev=332371view=rev Log: Fix to not increment count of urls when urls are filtered by maxPerHost limit. Patch contributed by Rod Taylor. Modified: lucene/nutch/branches/mapred/src/java/org

svn commit: r328414 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/ParseOutputFormat.java

2005-10-25 Thread cutting
Author: cutting Date: Tue Oct 25 09:57:51 2005 New Revision: 328414 URL: http://svn.apache.org/viewcvs?rev=328414view=rev Log: Fix a type error for JDK 1.4. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/ParseOutputFormat.java Modified: lucene/nutch/branches/mapred

svn commit: r327572 - /lucene/nutch/branches/mapred/bin/slaves.sh

2005-10-21 Thread cutting
Author: cutting Date: Fri Oct 21 13:45:32 2005 New Revision: 327572 URL: http://svn.apache.org/viewcvs?rev=327572view=rev Log: Tag standard error with hostname too. Modified: lucene/nutch/branches/mapred/bin/slaves.sh Modified: lucene/nutch/branches/mapred/bin/slaves.sh URL: http

svn commit: r327581 - in /lucene/nutch/branches/mapred/src/plugin/parse-html/src: java/org/apache/nutch/parse/html/DOMContentUtils.java test/org/apache/nutch/parse/html/TestDOMContentUtils.java

2005-10-21 Thread cutting
Author: cutting Date: Fri Oct 21 14:04:54 2005 New Revision: 327581 URL: http://svn.apache.org/viewcvs?rev=327581view=rev Log: Ignore rel=nofollow links. Modified: lucene/nutch/branches/mapred/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMContentUtils.java lucene/nutch

svn commit: r327593 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/TaskRunner.java

2005-10-21 Thread cutting
Author: cutting Date: Fri Oct 21 15:07:00 2005 New Revision: 327593 URL: http://svn.apache.org/viewcvs?rev=327593view=rev Log: Always create workdir so child can connect to it. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/TaskRunner.java Modified: lucene/nutch

svn commit: r326007 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Indexer.java

2005-10-17 Thread cutting
Author: cutting Date: Mon Oct 17 18:08:07 2005 New Revision: 326007 URL: http://svn.apache.org/viewcvs?rev=326007view=rev Log: Fix bogus javadoc. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Indexer.java Modified: lucene/nutch/branches/mapred/src/java/org/apache

svn commit: r320835 - in /lucene/nutch/branches/mapred/src: java/org/apache/nutch/db/ java/org/apache/nutch/fs/ java/org/apache/nutch/indexer/ java/org/apache/nutch/io/ java/org/apache/nutch/mapred/ j

2005-10-13 Thread cutting
Author: cutting Date: Thu Oct 13 10:59:30 2005 New Revision: 320835 URL: http://svn.apache.org/viewcvs?rev=320835view=rev Log: Store checksums for all files written and verify them on read. CRCs are stored for every 512 bytes of data, so that randomly accessed data may be verified. Errors

svn commit: r320893 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/Seekable.java

2005-10-13 Thread cutting
Author: cutting Date: Thu Oct 13 12:42:21 2005 New Revision: 320893 URL: http://svn.apache.org/viewcvs?rev=320893view=rev Log: Add new file. Added: lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/Seekable.java Added: lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs

svn commit: r320899 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/DeleteDuplicates.java

2005-10-13 Thread cutting
Author: cutting Date: Thu Oct 13 12:57:03 2005 New Revision: 320899 URL: http://svn.apache.org/viewcvs?rev=320899view=rev Log: Fix progress reporting for dedup. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/DeleteDuplicates.java Modified: lucene/nutch/branches

svn commit: r320931 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/NutchFileSystem.java

2005-10-13 Thread cutting
Author: cutting Date: Thu Oct 13 14:43:23 2005 New Revision: 320931 URL: http://svn.apache.org/viewcvs?rev=320931view=rev Log: Fix a NullPointerException. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/NutchFileSystem.java Modified: lucene/nutch/branches/mapred/src

svn commit: r314958 - in /lucene/nutch/trunk/site: about.html bot.html credits.html i18n.html index.html index.pdf issue_tracking.html linkmap.html mailing_lists.html tutorial.html version_control.htm

2005-10-12 Thread cutting
Author: cutting Date: Wed Oct 12 09:31:33 2005 New Revision: 314958 URL: http://svn.apache.org/viewcvs?rev=314958view=rev Log: Use mirrors for downloads. Modified: lucene/nutch/trunk/site/about.html lucene/nutch/trunk/site/bot.html lucene/nutch/trunk/site/credits.html lucene

svn commit: r314991 - /lucene/nutch/nightly/nightly.sh

2005-10-12 Thread cutting
Author: cutting Date: Wed Oct 12 11:33:47 2005 New Revision: 314991 URL: http://svn.apache.org/viewcvs?rev=314991view=rev Log: Put nightly releases on cvs.apache.org, not www, per Apache policy. Modified: lucene/nutch/nightly/nightly.sh Modified: lucene/nutch/nightly/nightly.sh URL: http

svn commit: r312693 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/io/MapFile.java

2005-10-10 Thread cutting
Author: cutting Date: Mon Oct 10 10:40:21 2005 New Revision: 312693 URL: http://svn.apache.org/viewcvs?rev=312693view=rev Log: Fix to permit non one-to-one mappings in index. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/io/MapFile.java Modified: lucene/nutch/branches

svn commit: r307445 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/NDFSFileSystem.java

2005-10-09 Thread cutting
Author: cutting Date: Sun Oct 9 08:15:34 2005 New Revision: 307445 URL: http://svn.apache.org/viewcvs?rev=307445view=rev Log: Overwrite should be default now. Use super's implementation. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/NDFSFileSystem.java Modified

svn commit: r307203 - in /lucene/nutch/branches/mapred: bin/nutch src/java/org/apache/nutch/crawl/Crawl.java src/java/org/apache/nutch/crawl/DeleteDuplicates.java src/java/org/apache/nutch/indexer/NdfsDirectory.java

2005-10-07 Thread cutting
Author: cutting Date: Fri Oct 7 15:16:27 2005 New Revision: 307203 URL: http://svn.apache.org/viewcvs?rev=307203view=rev Log: First working version of MapReduce-based dedup. Added: lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/DeleteDuplicates.java Modified: lucene/nutch

svn commit: r306808 - /lucene/nutch/trunk/conf/parse-plugins.xml

2005-10-06 Thread cutting
Author: cutting Date: Thu Oct 6 10:02:03 2005 New Revision: 306808 URL: http://svn.apache.org/viewcvs?rev=306808view=rev Log: Add parse-ext content-types so that unit tests pass. Modified: lucene/nutch/trunk/conf/parse-plugins.xml Modified: lucene/nutch/trunk/conf/parse-plugins.xml URL

svn commit: r306812 - /lucene/nutch/nightly/nightly.properties

2005-10-06 Thread cutting
Author: cutting Date: Thu Oct 6 10:18:01 2005 New Revision: 306812 URL: http://svn.apache.org/viewcvs?rev=306812view=rev Log: Update mailhost, since I moved and have a different ISP at home. Modified: lucene/nutch/nightly/nightly.properties Modified: lucene/nutch/nightly/nightly.properties

svn commit: r306813 - /lucene/nutch/nightly/nightly.sh

2005-10-06 Thread cutting
Author: cutting Date: Thu Oct 6 10:18:46 2005 New Revision: 306813 URL: http://svn.apache.org/viewcvs?rev=306813view=rev Log: Use /tmp/nutch-nightly instead of /tmp/nutch to avoid conflicts with mapred. Modified: lucene/nutch/nightly/nightly.sh Modified: lucene/nutch/nightly/nightly.sh URL

svn commit: r294928 - in /lucene/nutch/branches/mapred: site/tutorial.html site/tutorial.pdf src/site/src/documentation/content/xdocs/tutorial.xml

2005-10-04 Thread cutting
Author: cutting Date: Tue Oct 4 14:58:53 2005 New Revision: 294928 URL: http://svn.apache.org/viewcvs?rev=294928view=rev Log: Update tutorial for mapred changes. Still does not describe mapred or NDFS configuration. Modified: lucene/nutch/branches/mapred/site/tutorial.html lucene

svn commit: r293404 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/TaskTracker.java

2005-10-03 Thread cutting
Author: cutting Date: Mon Oct 3 10:33:32 2005 New Revision: 293404 URL: http://svn.apache.org/viewcvs?rev=293404view=rev Log: Remove redundant call to done(), observed by Stefan. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/TaskTracker.java Modified: lucene

svn commit: r292509 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Fetcher.java

2005-09-29 Thread cutting
Author: cutting Date: Thu Sep 29 11:57:35 2005 New Revision: 292509 URL: http://svn.apache.org/viewcvs?rev=292509view=rev Log: Use a more reasonable value when timing out hung fetcher threads. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Fetcher.java Modified

svn commit: r292532 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/MRConstants.java

2005-09-29 Thread cutting
Author: cutting Date: Thu Sep 29 13:30:11 2005 New Revision: 292532 URL: http://svn.apache.org/viewcvs?rev=292532view=rev Log: Increase timeout, as launching large jobs can sometimes cause the jobtracker to not see heartbeats for a bit. Modified: lucene/nutch/branches/mapred/src/java/org

svn commit: r292539 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch: fs/LocalFileSystem.java fs/NutchFileSystem.java ndfs/NDFSClient.java

2005-09-29 Thread cutting
Author: cutting Date: Thu Sep 29 13:43:53 2005 New Revision: 292539 URL: http://svn.apache.org/viewcvs?rev=292539view=rev Log: Change so that default is to overwrite existing files, as this is normal under MapReduce, when tasks may be re-executed. Modified: lucene/nutch/branches/mapred/src

svn commit: r292556 - /lucene/nutch/branches/mapred/conf/nutch-default.xml

2005-09-29 Thread cutting
Author: cutting Date: Thu Sep 29 14:27:49 2005 New Revision: 292556 URL: http://svn.apache.org/viewcvs?rev=292556view=rev Log: Document mapred.tasktracker.tasks.maximum and provide a default. Modified: lucene/nutch/branches/mapred/conf/nutch-default.xml Modified: lucene/nutch/branches

svn commit: r290602 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/ndfs/DF.java

2005-09-20 Thread cutting
Author: cutting Date: Tue Sep 20 19:38:56 2005 New Revision: 290602 URL: http://svn.apache.org/viewcvs?rev=290602view=rev Log: Fix NUTCH-93: long filesystem names can wrap to a new line and were not parsed correctly. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/ndfs

svn commit: r290067 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch: mapred/InputFormatBase.java util/NutchConf.java

2005-09-19 Thread cutting
Author: cutting Date: Sun Sep 18 23:08:19 2005 New Revision: 290067 URL: http://svn.apache.org/viewcvs?rev=290067view=rev Log: Improved error string javadoc. Contributed by Paul Baclace. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/InputFormatBase.java

svn commit: r289281 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred: LocalJobRunner.java MapTask.java ReduceTask.java Task.java

2005-09-15 Thread cutting
Author: cutting Date: Thu Sep 15 10:12:36 2005 New Revision: 289281 URL: http://svn.apache.org/viewcvs?rev=289281view=rev Log: Improve status reports: Always send final status when done; Have LocalJobRunner log status. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred

svn commit: r289282 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Fetcher.java

2005-09-15 Thread cutting
Author: cutting Date: Thu Sep 15 10:15:16 2005 New Revision: 289282 URL: http://svn.apache.org/viewcvs?rev=289282view=rev Log: Finish even when some threads hung. Improve status reports. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Fetcher.java Modified: lucene

svn commit: r289286 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Fetcher.java

2005-09-15 Thread cutting
Author: cutting Date: Thu Sep 15 11:11:39 2005 New Revision: 289286 URL: http://svn.apache.org/viewcvs?rev=289286view=rev Log: Don't synchronize while making setStatus() RPC. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Fetcher.java Modified: lucene/nutch/branches

svn commit: r280911 - in /lucene/nutch/branches/mapred/bin: nutch-daemons.sh slaves.sh

2005-09-14 Thread cutting
Author: cutting Date: Wed Sep 14 12:04:07 2005 New Revision: 280911 URL: http://svn.apache.org/viewcvs?rev=280911view=rev Log: Change scripts to pass environment, so that shared home directory is not required. Modified: lucene/nutch/branches/mapred/bin/nutch-daemons.sh lucene/nutch

svn commit: r280912 - /lucene/nutch/branches/mapred/bin/stop-all.sh

2005-09-14 Thread cutting
Author: cutting Date: Wed Sep 14 12:04:41 2005 New Revision: 280912 URL: http://svn.apache.org/viewcvs?rev=280912view=rev Log: Stop jobtracker first, to stop tasks faster. Modified: lucene/nutch/branches/mapred/bin/stop-all.sh Modified: lucene/nutch/branches/mapred/bin/stop-all.sh URL

svn commit: r280913 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/ReduceTaskRunner.java

2005-09-14 Thread cutting
Author: cutting Date: Wed Sep 14 12:05:08 2005 New Revision: 280913 URL: http://svn.apache.org/viewcvs?rev=280913view=rev Log: Log the stack trace, so we can debug this one better. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/ReduceTaskRunner.java Modified

svn commit: r280368 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/TestClient.java

2005-09-12 Thread cutting
Author: cutting Date: Mon Sep 12 10:03:00 2005 New Revision: 280368 URL: http://svn.apache.org/viewcvs?rev=280368view=rev Log: Change so that -du and -ls commands work with zero arguments. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/TestClient.java Modified: lucene

svn commit: r280370 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/NDFSFileSystem.java

2005-09-12 Thread cutting
Author: cutting Date: Mon Sep 12 10:04:33 2005 New Revision: 280370 URL: http://svn.apache.org/viewcvs?rev=280370view=rev Log: Fix to correctly convert empty path to home directory rather than root. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/NDFSFileSystem.java

svn commit: r279945 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Crawl.java

2005-09-09 Thread cutting
Author: cutting Date: Fri Sep 9 21:29:27 2005 New Revision: 279945 URL: http://svn.apache.org/viewcvs?rev=279945view=rev Log: Fix a typo. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Crawl.java Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch

svn commit: r279596 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/JobTracker.java

2005-09-08 Thread cutting
Author: cutting Date: Thu Sep 8 11:09:28 2005 New Revision: 279596 URL: http://svn.apache.org/viewcvs?rev=279596view=rev Log: Fix so that input splitting errors don't leave job hung. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/JobTracker.java Modified: lucene

svn commit: r279397 - /lucene/nutch/branches/mapred/src/test/org/apache/nutch/fs/TestNutchFileSystem.java

2005-09-07 Thread cutting
Author: cutting Date: Wed Sep 7 11:42:11 2005 New Revision: 279397 URL: http://svn.apache.org/viewcvs?rev=279397view=rev Log: Add seek test. Modified: lucene/nutch/branches/mapred/src/test/org/apache/nutch/fs/TestNutchFileSystem.java Modified: lucene/nutch/branches/mapred/src/test/org

svn commit: r279417 - /lucene/nutch/branches/mapred/src/test/org/apache/nutch/fs/TestNutchFileSystem.java

2005-09-07 Thread cutting
Author: cutting Date: Wed Sep 7 13:34:00 2005 New Revision: 279417 URL: http://svn.apache.org/viewcvs?rev=279417view=rev Log: Run seek test as unit test; add -noseek command line option. Modified: lucene/nutch/branches/mapred/src/test/org/apache/nutch/fs/TestNutchFileSystem.java Modified

svn commit: r265762 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/searcher/FetchedSegments.java

2005-09-01 Thread cutting
Author: cutting Date: Thu Sep 1 11:35:15 2005 New Revision: 265762 URL: http://svn.apache.org/viewcvs?rev=265762view=rev Log: Use partitioner to get partition. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/searcher/FetchedSegments.java Modified: lucene/nutch/branches

svn commit: r265778 - in /lucene/nutch/branches/mapred/src: java/org/apache/nutch/crawl/ java/org/apache/nutch/mapred/ java/org/apache/nutch/searcher/ web/jsp/

2005-09-01 Thread cutting
Author: cutting Date: Thu Sep 1 14:03:51 2005 New Revision: 265778 URL: http://svn.apache.org/viewcvs?rev=265778view=rev Log: Fix anchor inlink access. Added: lucene/nutch/branches/mapred/src/java/org/apache/nutch/searcher/HitInlinks.java lucene/nutch/branches/mapred/src/java/org

svn commit: r264880 - /lucene/nutch/branches/mapred/bin/slaves.sh

2005-08-30 Thread cutting
Author: cutting Date: Tue Aug 30 15:18:55 2005 New Revision: 264880 URL: http://svn.apache.org/viewcvs?rev=264880view=rev Log: Always put a newline after host name. Modified: lucene/nutch/branches/mapred/bin/slaves.sh Modified: lucene/nutch/branches/mapred/bin/slaves.sh URL: http

svn commit: r264685 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred: InterTrackerProtocol.java JobTracker.java TaskTracker.java

2005-08-29 Thread cutting
Author: cutting Date: Mon Aug 29 20:08:46 2005 New Revision: 264685 URL: http://svn.apache.org/viewcvs?rev=264685view=rev Log: Synchronize things in TaskTracker.offerService() loop. Also remove boxing in the heartbeat RPC. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch

svn commit: r240279 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred: MapFileOutputFormat.java MapTask.java RecordWriter.java ReduceTask.java SequenceFileOutputFormat.java TaskTracker.java TextOutputFormat.java

2005-08-26 Thread cutting
Author: cutting Date: Fri Aug 26 09:37:55 2005 New Revision: 240279 URL: http://svn.apache.org/viewcvs?rev=240279view=rev Log: Always call done() on tasks, setting final progress to 1.0. Also permit RecordWriter.close() to emit progress reports to avoid task timeouts when closing is lengthy

svn commit: r240280 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/LinkDb.java

2005-08-26 Thread cutting
Author: cutting Date: Fri Aug 26 09:39:11 2005 New Revision: 240280 URL: http://svn.apache.org/viewcvs?rev=240280view=rev Log: Limit to 10,000 inlinks by default. Also optimize a common case. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/LinkDb.java Modified

svn commit: r240346 - /lucene/nutch/branches/mapred/conf/nutch-default.xml

2005-08-26 Thread cutting
Author: cutting Date: Fri Aug 26 14:21:08 2005 New Revision: 240346 URL: http://svn.apache.org/viewcvs?rev=240346view=rev Log: Fix a crazy default. This made indexing rather slow... Modified: lucene/nutch/branches/mapred/conf/nutch-default.xml Modified: lucene/nutch/branches/mapred/conf

svn commit: r235756 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/TaskRunner.java

2005-08-22 Thread cutting
Author: cutting Date: Mon Aug 22 10:08:17 2005 New Revision: 235756 URL: http://svn.apache.org/viewcvs?rev=235756view=rev Log: Always kill forked child so that it doesn't consume file handles. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/TaskRunner.java Modified

svn commit: r233569 - /lucene/nutch/branches/mapred/bin/nutch-daemon.sh

2005-08-19 Thread cutting
Author: cutting Date: Fri Aug 19 15:54:04 2005 New Revision: 233569 URL: http://svn.apache.org/viewcvs?rev=233569view=rev Log: Fix to sync whole tree. Modified: lucene/nutch/branches/mapred/bin/nutch-daemon.sh Modified: lucene/nutch/branches/mapred/bin/nutch-daemon.sh URL: http

svn commit: r233360 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/protocol/Content.java

2005-08-18 Thread cutting
Author: cutting Date: Thu Aug 18 12:19:05 2005 New Revision: 233360 URL: http://svn.apache.org/viewcvs?rev=233360view=rev Log: Fix a bug in equals(), whether other object may still be deflated. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/protocol/Content.java Modified

svn commit: r232841 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch: io/CompressedWritable.java protocol/Content.java

2005-08-15 Thread cutting
Author: cutting Date: Mon Aug 15 11:10:23 2005 New Revision: 232841 URL: http://svn.apache.org/viewcvs?rev=232841view=rev Log: Lazily decompress content. Added: lucene/nutch/branches/mapred/src/java/org/apache/nutch/io/CompressedWritable.java Modified: lucene/nutch/branches/mapred/src

svn commit: r225344 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/io/SequenceFile.java

2005-07-26 Thread cutting
Author: cutting Date: Tue Jul 26 09:40:00 2005 New Revision: 225344 URL: http://svn.apache.org/viewcvs?rev=225344view=rev Log: Fix bug with syncs in large merges. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/io/SequenceFile.java Modified: lucene/nutch/branches/mapred

svn commit: r219566 - in /lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred: TaskRunner.java TaskTracker.java

2005-07-18 Thread cutting
Author: cutting Date: Mon Jul 18 13:57:34 2005 New Revision: 219566 URL: http://svn.apache.org/viewcvs?rev=219566view=rev Log: Catch Throwable, not just Exception, and always log and report it to tracker. Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred

svn commit: r219563 - in /lucene/nutch/branches/mapred/conf: crawl-urlfilter.txt.template regex-urlfilter.txt.template

2005-07-18 Thread cutting
Author: cutting Date: Mon Jul 18 13:42:37 2005 New Revision: 219563 URL: http://svn.apache.org/viewcvs?rev=219563view=rev Log: Skip URLs with repeating segments. Modified: lucene/nutch/branches/mapred/conf/crawl-urlfilter.txt.template lucene/nutch/branches/mapred/conf/regex

  1   2   >