svn commit: r894716 - in /lucene/nutch/trunk: site/credits.html site/credits.pdf src/site/src/documentation/content/xdocs/credits.xml

2009-12-30 Thread jnioche
Author: jnioche Date: Wed Dec 30 21:34:28 2009 New Revision: 894716 URL: http://svn.apache.org/viewvc?rev=894716view=rev Log: Adding J. Nioche to the list of committers Modified: lucene/nutch/trunk/site/credits.html lucene/nutch/trunk/site/credits.pdf lucene/nutch/trunk/src/site/src

svn commit: r895972 - in /lucene/nutch/trunk: CHANGES.txt src/java/org/apache/nutch/fetcher/Fetcher.java src/java/org/apache/nutch/parse/ParseSegment.java src/java/org/apache/nutch/protocol/ProtocolSt

2010-01-05 Thread jnioche
Author: jnioche Date: Tue Jan 5 10:14:49 2010 New Revision: 895972 URL: http://svn.apache.org/viewvc?rev=895972view=rev Log: NUTCH-658 : Add Counter for # of doc fetched in Reporter Modified: lucene/nutch/trunk/CHANGES.txt lucene/nutch/trunk/src/java/org/apache/nutch/fetcher

svn commit: r896539 - in /lucene/nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/Injector.java

2010-01-06 Thread jnioche
Author: jnioche Date: Wed Jan 6 17:01:51 2010 New Revision: 896539 URL: http://svn.apache.org/viewvc?rev=896539view=rev Log: NUTCH-655 : Injecting Crawl metadata (jnioche) Modified: lucene/nutch/trunk/CHANGES.txt lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Injector.java Modified

svn commit: r897180 - in /lucene/nutch/trunk: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/crawl/CrawlDbReducer.java

2010-01-08 Thread jnioche
Author: jnioche Date: Fri Jan 8 12:01:46 2010 New Revision: 897180 URL: http://svn.apache.org/viewvc?rev=897180view=rev Log: NUTCH-269 : OOME because no upper-bound on inlinks count (stack + jnioche) Modified: lucene/nutch/trunk/CHANGES.txt lucene/nutch/trunk/conf/nutch-default.xml

svn commit: r897825 - in /lucene/nutch/trunk/src: java/org/apache/nutch/util/MimeUtil.java test/org/apache/nutch/protocol/TestContent.java

2010-01-11 Thread jnioche
Author: jnioche Date: Mon Jan 11 10:13:21 2010 New Revision: 897825 URL: http://svn.apache.org/viewvc?rev=897825view=rev Log: fix for NUTCH-767 : reverted original expected values for test + treat text/plain as a default mime-type from Tika Modified: lucene/nutch/trunk/src/java/org/apache

svn commit: r905228 - in /lucene/nutch/trunk/lib: tika-core-0.5.jar tika-core-0.6.jar

2010-02-01 Thread jnioche
Author: jnioche Date: Mon Feb 1 09:59:50 2010 New Revision: 905228 URL: http://svn.apache.org/viewvc?rev=905228view=rev Log: NUTCH-781: upgrade tika to version 0.6 Added: lucene/nutch/trunk/lib/tika-core-0.6.jar (with props) Removed: lucene/nutch/trunk/lib/tika-core-0.5.jar Added

svn commit: r905229 - /lucene/nutch/trunk/CHANGES.txt

2010-02-01 Thread jnioche
Author: jnioche Date: Mon Feb 1 10:03:07 2010 New Revision: 905229 URL: http://svn.apache.org/viewvc?rev=905229view=rev Log: NUTCH-781: upgrade tika to version 0.6 Modified: lucene/nutch/trunk/CHANGES.txt Modified: lucene/nutch/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/lucene

svn commit: r905550 [1/2] - /lucene/nutch/trunk/conf/tika-mimetypes.xml

2010-02-02 Thread jnioche
Author: jnioche Date: Tue Feb 2 09:31:19 2010 New Revision: 905550 URL: http://svn.apache.org/viewvc?rev=905550view=rev Log: NUTCH-781 : updated tika-mimetypes.xml Modified: lucene/nutch/trunk/conf/tika-mimetypes.xml

svn commit: r906907 - in /lucene/nutch/trunk: CHANGES.txt conf/domain-suffixes.xml

2010-02-05 Thread jnioche
Author: jnioche Date: Fri Feb 5 11:52:57 2010 New Revision: 906907 URL: http://svn.apache.org/viewvc?rev=906907view=rev Log: NUTCH-786 Modified: lucene/nutch/trunk/CHANGES.txt lucene/nutch/trunk/conf/domain-suffixes.xml Modified: lucene/nutch/trunk/CHANGES.txt URL: http

svn commit: r910454 - in /lucene/nutch/trunk/src/plugin/languageidentifier/src: java/org/apache/nutch/analysis/lang/HTMLLanguageParser.java test/org/apache/nutch/analysis/lang/TestHTMLLanguageParser.j

2010-02-16 Thread jnioche
Author: jnioche Date: Tue Feb 16 10:20:22 2010 New Revision: 910454 URL: http://svn.apache.org/viewvc?rev=910454view=rev Log: NUTCH-794 : Language Identification must use check the parse metadata for language values Modified: lucene/nutch/trunk/src/plugin/languageidentifier/src/java/org

svn commit: r917557 - in /lucene/nutch/trunk: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/parse/HtmlParseFilters.java

2010-03-01 Thread jnioche
Author: jnioche Date: Mon Mar 1 15:08:05 2010 New Revision: 917557 URL: http://svn.apache.org/viewvc?rev=917557view=rev Log: NUTCH-782: Ability to order htmlparsefilters Modified: lucene/nutch/trunk/CHANGES.txt lucene/nutch/trunk/conf/nutch-default.xml lucene/nutch/trunk/src/java

svn commit: r919358 - in /lucene/nutch/trunk: CHANGES.txt src/java/org/apache/nutch/indexer/solr/SolrIndexer.java src/java/org/apache/nutch/indexer/solr/SolrWriter.java

2010-03-05 Thread jnioche
Author: jnioche Date: Fri Mar 5 10:09:08 2010 New Revision: 919358 URL: http://svn.apache.org/viewvc?rev=919358view=rev Log: NUTCH-799 SOLRIndexer to commit once all reducers have finished Modified: lucene/nutch/trunk/CHANGES.txt lucene/nutch/trunk/src/java/org/apache/nutch/indexer/solr

svn commit: r921831 - in /lucene/nutch/trunk: ./ lib/

2010-03-11 Thread jnioche
Author: jnioche Date: Thu Mar 11 13:06:12 2010 New Revision: 921831 URL: http://svn.apache.org/viewvc?rev=921831view=rev Log: NUTCH-798 : Upgrade to SOLR1.4 and its dependencies Added: lucene/nutch/trunk/lib/apache-solr-core-1.4.0.jar (with props) lucene/nutch/trunk/lib/apache-solr

svn commit: r921840 - in /lucene/nutch/trunk: CHANGES.txt conf/parse-plugins.xml src/plugin/build.xml src/plugin/parse-mp3/ src/plugin/parse-rtf/

2010-03-11 Thread jnioche
Author: jnioche Date: Thu Mar 11 13:25:44 2010 New Revision: 921840 URL: http://svn.apache.org/viewvc?rev=921840view=rev Log: NUTCH-801 Remove RTF and MP3 parse plugins Removed: lucene/nutch/trunk/src/plugin/parse-mp3/ lucene/nutch/trunk/src/plugin/parse-rtf/ Modified: lucene/nutch

svn commit: r926003 - in /lucene/nutch/trunk: ./ conf/ src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/ src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/ src/plugin/pro

2010-03-22 Thread jnioche
Author: jnioche Date: Mon Mar 22 09:00:11 2010 New Revision: 926003 URL: http://svn.apache.org/viewvc?rev=926003view=rev Log: NUTCH-740 Configuration option to override default language for fetched pages Modified: lucene/nutch/trunk/CHANGES.txt lucene/nutch/trunk/conf/nutch-default.xml

svn commit: r926155 - in /lucene/nutch/trunk: ./ conf/ src/java/org/apache/nutch/crawl/ src/java/org/apache/nutch/net/ src/java/org/apache/nutch/tools/ src/test/org/apache/nutch/crawl/ src/test/org/ap

2010-03-22 Thread jnioche
Author: jnioche Date: Mon Mar 22 16:19:12 2010 New Revision: 926155 URL: http://svn.apache.org/viewvc?rev=926155view=rev Log: NUTCH-762 : Generator can generate several segments in one parse of the crawlDB Added: lucene/nutch/trunk/src/java/org/apache/nutch/crawl/URLPartitioner.java Removed

svn commit: r926163 - /lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java

2010-03-22 Thread jnioche
Author: jnioche Date: Mon Mar 22 16:29:30 2010 New Revision: 926163 URL: http://svn.apache.org/viewvc?rev=926163view=rev Log: fixed NPE introduced in NUTCH-762 Modified: lucene/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java Modified: lucene/nutch/trunk/src/java/org/apache/nutch

svn commit: r931098 - in /lucene/nutch/trunk: ./ conf/ lib/ src/plugin/ src/plugin/parse-tika/ src/plugin/parse-tika/lib/ src/plugin/parse-tika/src/java/org/apache/nutch/parse/tika/

2010-04-06 Thread jnioche
Author: jnioche Date: Tue Apr 6 11:38:26 2010 New Revision: 931098 URL: http://svn.apache.org/viewvc?rev=931098view=rev Log: NUTCH-810 Upgraded to Tika 0.7 Added: lucene/nutch/trunk/lib/tika-core-0.7.jar (with props) lucene/nutch/trunk/src/plugin/parse-tika/lib/bcmail-jdk15-1.45.jar