Added: lucene/nutch/logos/nutch_logo.eps
URL:
http://svn.apache.org/viewvc/lucene/nutch/logos/nutch_logo.eps?rev=798304view=auto
==
Binary file - no diff available.
Propchange: lucene/nutch/logos/nutch_logo.eps
Author: cutting
Date: Thu Nov 16 13:03:26 2006
New Revision: 475926
URL: http://svn.apache.org/viewvc?view=revrev=475926
Log:
Update nightly build location.
Modified:
lucene/nutch/nightly/nightly.sh
Modified: lucene/nutch/nightly/nightly.sh
URL:
http://svn.apache.org/viewvc/lucene/nutch
Author: cutting
Date: Wed Jul 12 01:16:37 2006
New Revision: 421185
URL: http://svn.apache.org/viewvc?rev=421185view=rev
Log:
Patch a bug introduced by Hadoop 0.4.0, which requires specified input
directories to exist.
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDb.java
Author: cutting
Date: Wed Jun 28 14:54:53 2006
New Revision: 417884
URL: http://svn.apache.org/viewvc?rev=417884view=rev
Log:
NUTCH-312. Upgrade to Hadoop 0.4.0.
Added:
lucene/nutch/trunk/lib/commons-cli-2.0-SNAPSHOT.jar (with props)
lucene/nutch/trunk/lib/hadoop-0.4.0.jar
Author: cutting
Date: Fri Jun 9 14:48:23 2006
New Revision: 413175
URL: http://svn.apache.org/viewvc?rev=413175view=rev
Log:
Upgrading to Hadoop 0.3.2 release.
Added:
lucene/nutch/trunk/lib/hadoop-0.3.2.jar (with props)
Removed:
lucene/nutch/trunk/lib/hadoop-0.3.1.jar
Added: lucene
Author: cutting
Date: Fri May 12 13:31:59 2006
New Revision: 405861
URL: http://svn.apache.org/viewcvs?rev=405861view=rev
Log:
Upgrading to Hadoop 0.2.1.
Added:
lucene/nutch/trunk/lib/hadoop-0.2.1.jar (with props)
Removed:
lucene/nutch/trunk/lib/hadoop-0.2.0.jar
Added: lucene/nutch
Author: cutting
Date: Fri May 5 13:01:44 2006
New Revision: 400159
URL: http://svn.apache.org/viewcvs?rev=400159view=rev
Log:
Ignore bin/rcc (from Hadoop).
Modified:
lucene/nutch/trunk/bin/ (props changed)
Propchange: lucene/nutch/trunk/bin
Author: cutting
Date: Fri May 5 15:44:04 2006
New Revision: 400199
URL: http://svn.apache.org/viewcvs?rev=400199view=rev
Log:
Upgrading to Hadoop 0.2.0.
Added:
lucene/nutch/trunk/lib/hadoop-0.2.0.jar (with props)
Removed:
lucene/nutch/trunk/lib/hadoop-0.1.1.jar
Added: lucene/nutch
Author: cutting
Date: Mon Apr 17 14:40:58 2006
New Revision: 394781
URL: http://svn.apache.org/viewcvs?rev=394781view=rev
Log:
Ignore more bin files.
Modified:
lucene/nutch/trunk/bin/ (props changed)
Propchange: lucene/nutch/trunk/bin
Author: cutting
Date: Fri Apr 7 16:48:10 2006
New Revision: 392458
URL: http://svn.apache.org/viewcvs?rev=392458view=rev
Log:
Upgrading to Hadoop release 0.1.1.
Added:
lucene/nutch/trunk/lib/hadoop-0.1.1.jar (with props)
Removed:
lucene/nutch/trunk/lib/hadoop-0.1.0.jar
Added: lucene
Author: cutting
Date: Sat Apr 1 12:16:22 2006
New Revision: 390745
URL: http://svn.apache.org/viewcvs?rev=390745view=rev
Log:
Update to Hadoop 0.1.0 release.
Added:
lucene/nutch/trunk/lib/hadoop-0.1.0.jar (with props)
Removed:
lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
Added: lucene
Author: cutting
Date: Mon Mar 20 13:08:15 2006
New Revision: 387310
URL: http://svn.apache.org/viewcvs?rev=387310view=rev
Log:
Upgrade to current Hadoop.
Modified:
lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
URL:
http://svn.apache.org
Author: cutting
Date: Wed Mar 15 14:20:40 2006
New Revision: 386181
URL: http://svn.apache.org/viewcvs?rev=386181view=rev
Log:
Updated link to jira.
Modified:
lucene/nutch/branches/branch-0.7/site/issue_tracking.html
lucene/nutch/branches/branch-0.7/site/issue_tracking.pdf
lucene
Author: cutting
Date: Mon Mar 6 14:54:20 2006
New Revision: 383698
URL: http://svn.apache.org/viewcvs?rev=383698view=rev
Log:
Upgrade to latest version of Hadoop.
Modified:
lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
URL:
http
Author: cutting
Date: Fri Mar 3 11:05:41 2006
New Revision: 382912
URL: http://svn.apache.org/viewcvs?rev=382912view=rev
Log:
Undo unintentional changes made in r381751. Thanks, Jerome, for catching this!
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Crawl.java
lucene
Author: cutting
Date: Fri Mar 3 13:46:21 2006
New Revision: 382939
URL: http://svn.apache.org/viewcvs?rev=382939view=rev
Log:
Upgrade hadoop to latest version with some important mapred bug fixes.
Modified:
lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
Modified: lucene/nutch/trunk/lib/hadoop
Author: cutting
Date: Thu Mar 2 12:59:09 2006
New Revision: 382512
URL: http://svn.apache.org/viewcvs?rev=382512view=rev
Log:
Upgrade to Lucene 1.9.1.
Added:
lucene/nutch/trunk/lib/lucene-core-1.9.1.jar (with props)
lucene/nutch/trunk/lib/lucene-misc-1.9.1.jar (with props)
Removed
Author: cutting
Date: Thu Mar 2 15:59:24 2006
New Revision: 382573
URL: http://svn.apache.org/viewcvs?rev=382573view=rev
Log:
Update to latest Hadoop code.
Modified:
lucene/nutch/trunk/conf/hadoop-env.sh.template
lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
Modified: lucene/nutch/trunk
Author: cutting
Date: Thu Mar 2 16:06:59 2006
New Revision: 382579
URL: http://svn.apache.org/viewcvs?rev=382579view=rev
Log:
Disable speculative execution, since input format has side effects.
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/indexer/DeleteDuplicates.java
Modified
Author: cutting
Date: Tue Feb 28 10:00:43 2006
New Revision: 381721
URL: http://svn.apache.org/viewcvs?rev=381721view=rev
Log:
Upgrade lucene version to final release.
Added:
lucene/nutch/trunk/lib/lucene-core-1.9-final.jar (with props)
lucene/nutch/trunk/lib/lucene-misc-1.9-final.jar
Author: cutting
Date: Tue Feb 28 15:30:02 2006
New Revision: 381824
URL: http://svn.apache.org/viewcvs?rev=381824view=rev
Log:
Updating hadoop jar. Includes fixes for Windows.
Modified:
lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
URL:
http
Author: cutting
Date: Fri Feb 24 11:11:44 2006
New Revision: 380789
URL: http://svn.apache.org/viewcvs?rev=380789view=rev
Log:
Fix to not use 'exec', but rather 'untar' and 'chmod' which are more portable.
Modified:
lucene/nutch/trunk/build.xml
Modified: lucene/nutch/trunk/build.xml
URL
Author: cutting
Date: Fri Feb 24 14:38:06 2006
New Revision: 380840
URL: http://svn.apache.org/viewcvs?rev=380840view=rev
Log:
Update hadoop jar, to get recent fixes from that project.
Modified:
lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
Author: cutting
Date: Thu Feb 16 14:24:47 2006
New Revision: 378381
URL: http://svn.apache.org/viewcvs?rev=378381view=rev
Log:
Fix to work with Forrest 0.7, where ext: links seem to no longer work
in tabs.xml.
Modified:
lucene/nutch/trunk/src/site/src/documentation/content/xdocs/tabs.xml
Author: cutting
Date: Wed Feb 15 09:56:54 2006
New Revision: 378044
URL: http://svn.apache.org/viewcvs?rev=378044view=rev
Log:
Upgrade to latest version of Hadoop.
Modified:
lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
URL:
http
Author: cutting
Date: Wed Feb 15 14:45:31 2006
New Revision: 378107
URL: http://svn.apache.org/viewcvs?rev=378107view=rev
Log:
Fix Fetcher to disable speculative exexution, to keep it polite. Also upgrade
to latest hadoop jar that supports this feature. Note that Hadoop's
environment
Author: cutting
Date: Fri Feb 10 11:44:47 2006
New Revision: 376815
URL: http://svn.apache.org/viewcvs?rev=376815view=rev
Log:
Update Hadoop jar.
Modified:
lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
URL:
http://svn.apache.org/viewcvs
Author: cutting
Date: Thu Feb 9 12:57:44 2006
New Revision: 376435
URL: http://svn.apache.org/viewcvs?rev=376435view=rev
Log:
Updating to latest Hadoop jar, adding now-required close() methods to mapper
and reducer implementations.
Modified:
lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
Author: cutting
Date: Thu Feb 9 15:20:28 2006
New Revision: 376485
URL: http://svn.apache.org/viewcvs?rev=376485view=rev
Log:
Fix for NUTCH-209. Nutch now supplies all code to remote MapReduce daemons
through a job jar file. So Hadoop daemons no longer need to be restarted when
Nutch code
Author: cutting
Date: Wed Feb 8 13:25:30 2006
New Revision: 376072
URL: http://svn.apache.org/viewcvs?rev=376072view=rev
Log:
Restore accidentally removed file defaults.
Modified:
lucene/nutch/trunk/conf/nutch-default.xml
Modified: lucene/nutch/trunk/conf/nutch-default.xml
URL:
http
Author: cutting
Date: Tue Feb 7 13:02:46 2006
New Revision: 375704
URL: http://svn.apache.org/viewcvs?rev=375704view=rev
Log:
Restoring jetty to Nutch lib: removed by mistake.
Added:
lucene/nutch/trunk/lib/jetty-5.1.4.LICENSE.txt
- copied unchanged from r374759,
lucene/hadoop/trunk
Author: cutting
Date: Mon Feb 6 10:57:09 2006
New Revision: 375333
URL: http://svn.apache.org/viewcvs?rev=375333view=rev
Log:
Updated email paramters.
Modified:
lucene/nutch/nightly/nightly.properties
Modified: lucene/nutch/nightly/nightly.properties
URL:
http://svn.apache.org/viewcvs
Author: cutting
Date: Wed Jan 25 14:20:06 2006
New Revision: 372342
URL: http://svn.apache.org/viewcvs?rev=372342view=rev
Log:
Fix remove command.
Modified:
lucene/nutch/nightly/nightly.sh
Modified: lucene/nutch/nightly/nightly.sh
URL:
http://svn.apache.org/viewcvs/lucene/nutch/nightly
Author: cutting
Date: Thu Jan 19 12:58:54 2006
New Revision: 370632
URL: http://svn.apache.org/viewcvs?rev=370632view=rev
Log:
Switch default to protocol-http, since it seems more reliable than
protocol-httpclient.
Modified:
lucene/nutch/trunk/conf/nutch-default.xml
Modified: lucene/nutch
Author: cutting
Date: Thu Jan 19 13:24:58 2006
New Revision: 370638
URL: http://svn.apache.org/viewcvs?rev=370638view=rev
Log:
Document a few more properties. Contributed by Dominik Friedrich.
Modified:
lucene/nutch/trunk/conf/nutch-default.xml
Modified: lucene/nutch/trunk/conf/nutch
Author: cutting
Date: Thu Jan 19 14:46:28 2006
New Revision: 370657
URL: http://svn.apache.org/viewcvs?rev=370657view=rev
Log:
Moving nightly build to lucene.zones.apache.org.
Modified:
lucene/nutch/nightly/nightly.cron
lucene/nutch/nightly/nightly.properties
lucene/nutch/nightly
Author: cutting
Date: Wed Jan 18 14:03:28 2006
New Revision: 370281
URL: http://svn.apache.org/viewcvs?rev=370281view=rev
Log:
Fix NUTCH-102: include webapps in packaged releases.
Modified:
lucene/nutch/trunk/build.xml
Modified: lucene/nutch/trunk/build.xml
URL:
http://svn.apache.org
Author: cutting
Date: Mon Jan 9 13:50:48 2006
New Revision: 367406
URL: http://svn.apache.org/viewcvs?rev=367406view=rev
Log:
Fix parallel RPC calls to work correctly with methods that return void.
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/ipc/RPC.java
lucene/nutch/trunk
Author: cutting
Date: Mon Jan 9 13:55:31 2006
New Revision: 367408
URL: http://svn.apache.org/viewcvs?rev=367408view=rev
Log:
NUTCH-160: Switch RegexURLFilter to use Java regex's rather than oro, since
Java's seem to be faster more reliable. By Rod Taylor.
Modified:
lucene/nutch/trunk
Author: cutting
Date: Fri Jan 6 11:14:46 2006
New Revision: 366550
URL: http://svn.apache.org/viewcvs?rev=366550view=rev
Log:
Make it clearer why this optimization is valid. For Stefan.
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/ipc/Client.java
Modified: lucene/nutch/trunk/src
Author: cutting
Date: Thu Jan 5 10:38:44 2006
New Revision: 366242
URL: http://svn.apache.org/viewcvs?rev=366242view=rev
Log:
Fix NegativeArraySizeException.
Modified:
lucene/nutch/trunk/conf/nutch-default.xml
lucene/nutch/trunk/src/java/org/apache/nutch/searcher
Author: cutting
Date: Thu Jan 5 12:13:43 2006
New Revision: 366271
URL: http://svn.apache.org/viewcvs?rev=366271view=rev
Log:
Fix for NUTCH-108: eliminate voluminous messages when reconnecting.
From Paul Baclace.
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/mapred/TaskTracker.java
Author: cutting
Date: Thu Jan 5 14:37:19 2006
New Revision: 366322
URL: http://svn.apache.org/viewcvs?rev=366322view=rev
Log:
Fix a bug in LimitedCollector.
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/searcher/LuceneQueryOptimizer.java
Modified:
lucene/nutch/trunk/src/java/org
Modified:
lucene/nutch/trunk/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/RobotRulesParser.java
URL:
http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/RobotRulesParser.java?rev=357197r1=357196r2=357197view=diff
Author: cutting
Date: Tue Nov 22 10:46:43 2005
New Revision: 348210
URL: http://svn.apache.org/viewcvs?rev=348210view=rev
Log:
Silently ignore missing checksum files.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/NFSDataInputStream.java
Modified:
lucene/nutch
Author: cutting
Date: Tue Nov 22 10:55:26 2005
New Revision: 348212
URL: http://svn.apache.org/viewcvs?rev=348212view=rev
Log:
Increase defaults for http.max.delays, since, with MapReduce's partitioning of
fetchlists, delays are more likely.
Modified:
lucene/nutch/branches/mapred/conf/crawl
Author: cutting
Date: Thu Nov 10 13:03:16 2005
New Revision: 332371
URL: http://svn.apache.org/viewcvs?rev=332371view=rev
Log:
Fix to not increment count of urls when urls are filtered by
maxPerHost limit. Patch contributed by Rod Taylor.
Modified:
lucene/nutch/branches/mapred/src/java/org
Author: cutting
Date: Tue Oct 25 09:57:51 2005
New Revision: 328414
URL: http://svn.apache.org/viewcvs?rev=328414view=rev
Log:
Fix a type error for JDK 1.4.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/ParseOutputFormat.java
Modified:
lucene/nutch/branches/mapred
Author: cutting
Date: Fri Oct 21 13:45:32 2005
New Revision: 327572
URL: http://svn.apache.org/viewcvs?rev=327572view=rev
Log:
Tag standard error with hostname too.
Modified:
lucene/nutch/branches/mapred/bin/slaves.sh
Modified: lucene/nutch/branches/mapred/bin/slaves.sh
URL:
http
Author: cutting
Date: Fri Oct 21 14:04:54 2005
New Revision: 327581
URL: http://svn.apache.org/viewcvs?rev=327581view=rev
Log:
Ignore rel=nofollow links.
Modified:
lucene/nutch/branches/mapred/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMContentUtils.java
lucene/nutch
Author: cutting
Date: Fri Oct 21 15:07:00 2005
New Revision: 327593
URL: http://svn.apache.org/viewcvs?rev=327593view=rev
Log:
Always create workdir so child can connect to it.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/TaskRunner.java
Modified:
lucene/nutch
Author: cutting
Date: Mon Oct 17 18:08:07 2005
New Revision: 326007
URL: http://svn.apache.org/viewcvs?rev=326007view=rev
Log:
Fix bogus javadoc.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Indexer.java
Modified:
lucene/nutch/branches/mapred/src/java/org/apache
Author: cutting
Date: Thu Oct 13 10:59:30 2005
New Revision: 320835
URL: http://svn.apache.org/viewcvs?rev=320835view=rev
Log:
Store checksums for all files written and verify them on read. CRCs are stored
for every 512 bytes of data, so that randomly accessed data may be verified.
Errors
Author: cutting
Date: Thu Oct 13 12:42:21 2005
New Revision: 320893
URL: http://svn.apache.org/viewcvs?rev=320893view=rev
Log:
Add new file.
Added:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/Seekable.java
Added: lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs
Author: cutting
Date: Thu Oct 13 12:57:03 2005
New Revision: 320899
URL: http://svn.apache.org/viewcvs?rev=320899view=rev
Log:
Fix progress reporting for dedup.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/DeleteDuplicates.java
Modified:
lucene/nutch/branches
Author: cutting
Date: Thu Oct 13 14:43:23 2005
New Revision: 320931
URL: http://svn.apache.org/viewcvs?rev=320931view=rev
Log:
Fix a NullPointerException.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/NutchFileSystem.java
Modified:
lucene/nutch/branches/mapred/src
Author: cutting
Date: Wed Oct 12 09:31:33 2005
New Revision: 314958
URL: http://svn.apache.org/viewcvs?rev=314958view=rev
Log:
Use mirrors for downloads.
Modified:
lucene/nutch/trunk/site/about.html
lucene/nutch/trunk/site/bot.html
lucene/nutch/trunk/site/credits.html
lucene
Author: cutting
Date: Wed Oct 12 11:33:47 2005
New Revision: 314991
URL: http://svn.apache.org/viewcvs?rev=314991view=rev
Log:
Put nightly releases on cvs.apache.org, not www, per Apache policy.
Modified:
lucene/nutch/nightly/nightly.sh
Modified: lucene/nutch/nightly/nightly.sh
URL:
http
Author: cutting
Date: Mon Oct 10 10:40:21 2005
New Revision: 312693
URL: http://svn.apache.org/viewcvs?rev=312693view=rev
Log:
Fix to permit non one-to-one mappings in index.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/io/MapFile.java
Modified: lucene/nutch/branches
Author: cutting
Date: Sun Oct 9 08:15:34 2005
New Revision: 307445
URL: http://svn.apache.org/viewcvs?rev=307445view=rev
Log:
Overwrite should be default now. Use super's implementation.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/NDFSFileSystem.java
Modified
Author: cutting
Date: Fri Oct 7 15:16:27 2005
New Revision: 307203
URL: http://svn.apache.org/viewcvs?rev=307203view=rev
Log:
First working version of MapReduce-based dedup.
Added:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/DeleteDuplicates.java
Modified:
lucene/nutch
Author: cutting
Date: Thu Oct 6 10:02:03 2005
New Revision: 306808
URL: http://svn.apache.org/viewcvs?rev=306808view=rev
Log:
Add parse-ext content-types so that unit tests pass.
Modified:
lucene/nutch/trunk/conf/parse-plugins.xml
Modified: lucene/nutch/trunk/conf/parse-plugins.xml
URL
Author: cutting
Date: Thu Oct 6 10:18:01 2005
New Revision: 306812
URL: http://svn.apache.org/viewcvs?rev=306812view=rev
Log:
Update mailhost, since I moved and have a different ISP at home.
Modified:
lucene/nutch/nightly/nightly.properties
Modified: lucene/nutch/nightly/nightly.properties
Author: cutting
Date: Thu Oct 6 10:18:46 2005
New Revision: 306813
URL: http://svn.apache.org/viewcvs?rev=306813view=rev
Log:
Use /tmp/nutch-nightly instead of /tmp/nutch to avoid conflicts with mapred.
Modified:
lucene/nutch/nightly/nightly.sh
Modified: lucene/nutch/nightly/nightly.sh
URL
Author: cutting
Date: Tue Oct 4 14:58:53 2005
New Revision: 294928
URL: http://svn.apache.org/viewcvs?rev=294928view=rev
Log:
Update tutorial for mapred changes. Still does not describe mapred or NDFS
configuration.
Modified:
lucene/nutch/branches/mapred/site/tutorial.html
lucene
Author: cutting
Date: Mon Oct 3 10:33:32 2005
New Revision: 293404
URL: http://svn.apache.org/viewcvs?rev=293404view=rev
Log:
Remove redundant call to done(), observed by Stefan.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/TaskTracker.java
Modified:
lucene
Author: cutting
Date: Thu Sep 29 11:57:35 2005
New Revision: 292509
URL: http://svn.apache.org/viewcvs?rev=292509view=rev
Log:
Use a more reasonable value when timing out hung fetcher threads.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Fetcher.java
Modified
Author: cutting
Date: Thu Sep 29 13:30:11 2005
New Revision: 292532
URL: http://svn.apache.org/viewcvs?rev=292532view=rev
Log:
Increase timeout, as launching large jobs can sometimes cause the jobtracker to
not see heartbeats for a bit.
Modified:
lucene/nutch/branches/mapred/src/java/org
Author: cutting
Date: Thu Sep 29 13:43:53 2005
New Revision: 292539
URL: http://svn.apache.org/viewcvs?rev=292539view=rev
Log:
Change so that default is to overwrite existing files, as this is normal under
MapReduce, when tasks may be re-executed.
Modified:
lucene/nutch/branches/mapred/src
Author: cutting
Date: Thu Sep 29 14:27:49 2005
New Revision: 292556
URL: http://svn.apache.org/viewcvs?rev=292556view=rev
Log:
Document mapred.tasktracker.tasks.maximum and provide a default.
Modified:
lucene/nutch/branches/mapred/conf/nutch-default.xml
Modified: lucene/nutch/branches
Author: cutting
Date: Tue Sep 20 19:38:56 2005
New Revision: 290602
URL: http://svn.apache.org/viewcvs?rev=290602view=rev
Log:
Fix NUTCH-93: long filesystem names can wrap to a new line and were
not parsed correctly.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/ndfs
Author: cutting
Date: Sun Sep 18 23:08:19 2005
New Revision: 290067
URL: http://svn.apache.org/viewcvs?rev=290067view=rev
Log:
Improved error string javadoc. Contributed by Paul Baclace.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/InputFormatBase.java
Author: cutting
Date: Thu Sep 15 10:12:36 2005
New Revision: 289281
URL: http://svn.apache.org/viewcvs?rev=289281view=rev
Log:
Improve status reports: Always send final status when done; Have LocalJobRunner
log status.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred
Author: cutting
Date: Thu Sep 15 10:15:16 2005
New Revision: 289282
URL: http://svn.apache.org/viewcvs?rev=289282view=rev
Log:
Finish even when some threads hung. Improve status reports.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Fetcher.java
Modified:
lucene
Author: cutting
Date: Thu Sep 15 11:11:39 2005
New Revision: 289286
URL: http://svn.apache.org/viewcvs?rev=289286view=rev
Log:
Don't synchronize while making setStatus() RPC.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Fetcher.java
Modified:
lucene/nutch/branches
Author: cutting
Date: Wed Sep 14 12:04:07 2005
New Revision: 280911
URL: http://svn.apache.org/viewcvs?rev=280911view=rev
Log:
Change scripts to pass environment, so that shared home directory is not
required.
Modified:
lucene/nutch/branches/mapred/bin/nutch-daemons.sh
lucene/nutch
Author: cutting
Date: Wed Sep 14 12:04:41 2005
New Revision: 280912
URL: http://svn.apache.org/viewcvs?rev=280912view=rev
Log:
Stop jobtracker first, to stop tasks faster.
Modified:
lucene/nutch/branches/mapred/bin/stop-all.sh
Modified: lucene/nutch/branches/mapred/bin/stop-all.sh
URL
Author: cutting
Date: Wed Sep 14 12:05:08 2005
New Revision: 280913
URL: http://svn.apache.org/viewcvs?rev=280913view=rev
Log:
Log the stack trace, so we can debug this one better.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/ReduceTaskRunner.java
Modified
Author: cutting
Date: Mon Sep 12 10:03:00 2005
New Revision: 280368
URL: http://svn.apache.org/viewcvs?rev=280368view=rev
Log:
Change so that -du and -ls commands work with zero arguments.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/TestClient.java
Modified:
lucene
Author: cutting
Date: Mon Sep 12 10:04:33 2005
New Revision: 280370
URL: http://svn.apache.org/viewcvs?rev=280370view=rev
Log:
Fix to correctly convert empty path to home directory rather than root.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/fs/NDFSFileSystem.java
Author: cutting
Date: Thu Sep 8 11:09:28 2005
New Revision: 279596
URL: http://svn.apache.org/viewcvs?rev=279596view=rev
Log:
Fix so that input splitting errors don't leave job hung.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/JobTracker.java
Modified:
lucene
Author: cutting
Date: Wed Sep 7 11:42:11 2005
New Revision: 279397
URL: http://svn.apache.org/viewcvs?rev=279397view=rev
Log:
Add seek test.
Modified:
lucene/nutch/branches/mapred/src/test/org/apache/nutch/fs/TestNutchFileSystem.java
Modified:
lucene/nutch/branches/mapred/src/test/org
Author: cutting
Date: Wed Sep 7 13:34:00 2005
New Revision: 279417
URL: http://svn.apache.org/viewcvs?rev=279417view=rev
Log:
Run seek test as unit test; add -noseek command line option.
Modified:
lucene/nutch/branches/mapred/src/test/org/apache/nutch/fs/TestNutchFileSystem.java
Modified
Author: cutting
Date: Thu Sep 1 11:35:15 2005
New Revision: 265762
URL: http://svn.apache.org/viewcvs?rev=265762view=rev
Log:
Use partitioner to get partition.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/searcher/FetchedSegments.java
Modified:
lucene/nutch/branches
Author: cutting
Date: Thu Sep 1 14:03:51 2005
New Revision: 265778
URL: http://svn.apache.org/viewcvs?rev=265778view=rev
Log:
Fix anchor inlink access.
Added:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/searcher/HitInlinks.java
lucene/nutch/branches/mapred/src/java/org
Author: cutting
Date: Tue Aug 30 15:18:55 2005
New Revision: 264880
URL: http://svn.apache.org/viewcvs?rev=264880view=rev
Log:
Always put a newline after host name.
Modified:
lucene/nutch/branches/mapred/bin/slaves.sh
Modified: lucene/nutch/branches/mapred/bin/slaves.sh
URL:
http
Author: cutting
Date: Mon Aug 29 20:08:46 2005
New Revision: 264685
URL: http://svn.apache.org/viewcvs?rev=264685view=rev
Log:
Synchronize things in TaskTracker.offerService() loop. Also remove boxing in
the heartbeat RPC.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch
Author: cutting
Date: Fri Aug 26 09:37:55 2005
New Revision: 240279
URL: http://svn.apache.org/viewcvs?rev=240279view=rev
Log:
Always call done() on tasks, setting final progress to 1.0. Also permit
RecordWriter.close() to emit progress reports to avoid task timeouts when
closing is lengthy
Author: cutting
Date: Fri Aug 26 09:39:11 2005
New Revision: 240280
URL: http://svn.apache.org/viewcvs?rev=240280view=rev
Log:
Limit to 10,000 inlinks by default. Also optimize a common case.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/LinkDb.java
Modified
Author: cutting
Date: Fri Aug 26 14:21:08 2005
New Revision: 240346
URL: http://svn.apache.org/viewcvs?rev=240346view=rev
Log:
Fix a crazy default. This made indexing rather slow...
Modified:
lucene/nutch/branches/mapred/conf/nutch-default.xml
Modified: lucene/nutch/branches/mapred/conf
Author: cutting
Date: Mon Aug 22 10:08:17 2005
New Revision: 235756
URL: http://svn.apache.org/viewcvs?rev=235756view=rev
Log:
Always kill forked child so that it doesn't consume file handles.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred/TaskRunner.java
Modified
Author: cutting
Date: Fri Aug 19 15:54:04 2005
New Revision: 233569
URL: http://svn.apache.org/viewcvs?rev=233569view=rev
Log:
Fix to sync whole tree.
Modified:
lucene/nutch/branches/mapred/bin/nutch-daemon.sh
Modified: lucene/nutch/branches/mapred/bin/nutch-daemon.sh
URL:
http
Author: cutting
Date: Thu Aug 18 12:19:05 2005
New Revision: 233360
URL: http://svn.apache.org/viewcvs?rev=233360view=rev
Log:
Fix a bug in equals(), whether other object may still be deflated.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/protocol/Content.java
Modified
Author: cutting
Date: Mon Aug 15 11:10:23 2005
New Revision: 232841
URL: http://svn.apache.org/viewcvs?rev=232841view=rev
Log:
Lazily decompress content.
Added:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/io/CompressedWritable.java
Modified:
lucene/nutch/branches/mapred/src
Author: cutting
Date: Tue Jul 26 09:40:00 2005
New Revision: 225344
URL: http://svn.apache.org/viewcvs?rev=225344view=rev
Log:
Fix bug with syncs in large merges.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/io/SequenceFile.java
Modified:
lucene/nutch/branches/mapred
Author: cutting
Date: Mon Jul 18 13:57:34 2005
New Revision: 219566
URL: http://svn.apache.org/viewcvs?rev=219566view=rev
Log:
Catch Throwable, not just Exception, and always log and report it to tracker.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/mapred
Author: cutting
Date: Mon Jul 18 13:42:37 2005
New Revision: 219563
URL: http://svn.apache.org/viewcvs?rev=219563view=rev
Log:
Skip URLs with repeating segments.
Modified:
lucene/nutch/branches/mapred/conf/crawl-urlfilter.txt.template
lucene/nutch/branches/mapred/conf/regex
Author: cutting
Date: Mon Jul 11 13:05:28 2005
New Revision: 210201
URL: http://svn.apache.org/viewcvs?rev=210201view=rev
Log:
Store indexes in indexes directory. Use correct FS to list segments.
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Crawl.java
Modified
Author: cutting
Date: Mon Jul 11 14:30:22 2005
New Revision: 213607
URL: http://svn.apache.org/viewcvs?rev=213607view=rev
Log:
Get search working on NDFS-resident, MapReduce-created crawl.
Modified:
lucene/nutch/branches/mapred/build.xml
lucene/nutch/branches/mapred/conf/nutch
Author: cutting
Date: Sun Jul 10 14:20:46 2005
New Revision: 210036
URL: http://svn.apache.org/viewcvs?rev=210036view=rev
Log:
Actually use the new InputFormat!
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/crawl/Fetcher.java
Modified:
lucene/nutch/branches/mapred/src
1 - 100 of 106 matches
Mail list logo