svn commit: r1496025 - in /nutch/trunk: ./ src/plugin/ src/plugin/urlfilter-prefix/src/test/ src/plugin/urlfilter-prefix/src/test/org/ src/plugin/urlfilter-prefix/src/test/org/apache/ src/plugin/urlfi

2013-06-24 Thread markus
Author: markus Date: Mon Jun 24 13:14:14 2013 New Revision: 1496025 URL: http://svn.apache.org/r1496025 Log: NUTCH-1126 JUnit test for urlfilter-prefix Added: nutch/trunk/src/plugin/urlfilter-prefix/src/test/ nutch/trunk/src/plugin/urlfilter-prefix/src/test/org/ nutch/trunk/src

svn commit: r1494893 - /nutch/trunk/ivy/ivy.xml

2013-06-20 Thread markus
Author: markus Date: Thu Jun 20 09:06:34 2013 New Revision: 1494893 URL: http://svn.apache.org/r1494893 Log: NUTCH-1527 ES dep in ivy missing Modified: nutch/trunk/ivy/ivy.xml Modified: nutch/trunk/ivy/ivy.xml URL: http://svn.apache.org/viewvc/nutch/trunk/ivy/ivy.xml?rev=1494893r1

svn commit: r1494894 - /nutch/trunk/src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java

2013-06-20 Thread markus
Author: markus Date: Thu Jun 20 09:07:12 2013 New Revision: 1494894 URL: http://svn.apache.org/r1494894 Log: NUTCH-1583 Headings plugin to support multivalued headings Modified: nutch/trunk/src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java Modified

svn commit: r1494496 - in /nutch/trunk: ./ conf/ src/plugin/ src/plugin/indexer-elastic/ src/plugin/indexer-elastic/src/ src/plugin/indexer-elastic/src/java/ src/plugin/indexer-elastic/src/java/org/ s

2013-06-19 Thread markus
Author: markus Date: Wed Jun 19 08:31:28 2013 New Revision: 1494496 URL: http://svn.apache.org/r1494496 Log: NUTCH-1527 Elasticsearch indexer Added: nutch/trunk/src/plugin/indexer-elastic/ nutch/trunk/src/plugin/indexer-elastic/build.xml nutch/trunk/src/plugin/indexer-elastic/ivy.xml

svn commit: r1492639 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java src/java/org/apache/nutch/tools/FreeGenerator.java

2013-06-13 Thread markus
Author: markus Date: Thu Jun 13 12:10:37 2013 New Revision: 1492639 URL: http://svn.apache.org/r1492639 Log: NUTCH-1430 Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl

svn commit: r1488879 - in /nutch/trunk: CHANGES.txt ivy/ivy.xml

2013-06-03 Thread markus
Author: markus Date: Mon Jun 3 08:02:35 2013 New Revision: 1488879 URL: http://svn.apache.org/r1488879 Log: NUTCH-1578 Upgrade to Hadoop 1.2.0 Modified: nutch/trunk/CHANGES.txt nutch/trunk/ivy/ivy.xml Modified: nutch/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/nutch/trunk

svn commit: r1433900 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/indexer/NutchField.java

2013-01-16 Thread markus
Author: markus Date: Wed Jan 16 11:10:09 2013 New Revision: 1433900 URL: http://svn.apache.org/viewvc?rev=1433900view=rev Log: Implement read/write in NutchField Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/indexer/NutchField.java Modified: nutch/trunk

svn commit: r1406077 - in /nutch/branches/2.x: CHANGES.txt src/java/org/apache/nutch/indexer/solr/SolrWriter.java

2012-11-06 Thread markus
Author: markus Date: Tue Nov 6 09:17:38 2012 New Revision: 1406077 URL: http://svn.apache.org/viewvc?rev=1406077view=rev Log: NUTCH-1491 Strip UTF-8 non-character codepoints in title Modified: nutch/branches/2.x/CHANGES.txt nutch/branches/2.x/src/java/org/apache/nutch/indexer/solr

svn commit: r1401225 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/CrawlDb.java

2012-10-23 Thread markus
Author: markus Date: Tue Oct 23 09:45:29 2012 New Revision: 1401225 URL: http://svn.apache.org/viewvc?rev=1401225view=rev Log: NUTCH-1215 UpdateDB should not require segment as input Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDb.java Modified

svn commit: r1363793 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java src/java/org/apache/nutch/crawl/Injector.java src/java/org/apache/nutch/metadata/Nutch.ja

2012-07-20 Thread markus
Author: markus Date: Fri Jul 20 14:22:19 2012 New Revision: 1363793 URL: http://svn.apache.org/viewvc?rev=1363793view=rev Log: NUTCH-1388 Optionally maintain custom fetch interval despite AdaptiveFetchSchedule Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl

svn commit: r1353582 - in /nutch/trunk: CHANGES.txt src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/RobotRulesParser.java

2012-06-25 Thread markus
Author: markus Date: Mon Jun 25 14:42:05 2012 New Revision: 1353582 URL: http://svn.apache.org/viewvc?rev=1353582view=rev Log: NUTCH-1408 RobotRulesParser main doesn't take URL's Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http

svn commit: r1353118 - /nutch/trunk/src/plugin/build.xml

2012-06-23 Thread markus
Author: markus Date: Sat Jun 23 12:15:04 2012 New Revision: 1353118 URL: http://svn.apache.org/viewvc?rev=1353118view=rev Log: NUTCH-1319 HostNormalizer was not properly added to plugins/build.xml Modified: nutch/trunk/src/plugin/build.xml Modified: nutch/trunk/src/plugin/build.xml URL

svn commit: r1349236 - in /nutch/trunk: ./ conf/ src/plugin/urlnormalizer-host/ src/plugin/urlnormalizer-host/data/ src/plugin/urlnormalizer-host/src/ src/plugin/urlnormalizer-host/src/java/ src/plugi

2012-06-12 Thread markus
Author: markus Date: Tue Jun 12 10:33:18 2012 New Revision: 1349236 URL: http://svn.apache.org/viewvc?rev=1349236view=rev Log: NUTCH-1319 HostNormalizer plugin Added: nutch/trunk/conf/host-urlnormalizer.txt nutch/trunk/src/plugin/urlnormalizer-host/ nutch/trunk/src/plugin

svn commit: r1348764 - in /nutch/trunk: ./ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/net/ src/java/org/apache/nutch/parse/ src/java/org/apache/nutch/scoring/

2012-06-11 Thread markus
Author: markus Date: Mon Jun 11 09:28:14 2012 New Revision: 1348764 URL: http://svn.apache.org/viewvc?rev=1348764view=rev Log: NUTCH-1385 More robust plug-in order properties in nutch-site.xml Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/indexer

svn commit: r1348766 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/parse/ParseSegment.java

2012-06-11 Thread markus
Author: markus Date: Mon Jun 11 09:30:26 2012 New Revision: 1348766 URL: http://svn.apache.org/viewvc?rev=1348766view=rev Log: NUTCH-1384 Typo in ParseSegments's run-method Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java Modified: nutch

svn commit: r1347897 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/fetcher/Fetcher.java

2012-06-08 Thread markus
Author: markus Date: Fri Jun 8 07:03:38 2012 New Revision: 1347897 URL: http://svn.apache.org/viewvc?rev=1347897view=rev Log: NUTCH-1346 Follow outlinks to ignore external Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/java/org/apache/nutch

svn commit: r1347909 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/indexer/IndexerMapReduce.java

2012-06-08 Thread markus
Author: markus Date: Fri Jun 8 07:37:42 2012 New Revision: 1347909 URL: http://svn.apache.org/viewvc?rev=1347909view=rev Log: NUTCH-1336 Optionally not index db_notmodified pages Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/java/org/apache

svn commit: r1347747 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/util/URLUtil.java src/java/org/apache/nutch/util/domain/DomainStatistics.java

2012-06-07 Thread markus
Author: markus Date: Thu Jun 7 18:21:39 2012 New Revision: 1347747 URL: http://svn.apache.org/viewvc?rev=1347747view=rev Log: NUTCH-1351 DomainStatistics to aggregate by TLD Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/util/URLUtil.java nutch/trunk/src

svn commit: r1347755 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java src/java/org/apache/nutch/parse/ParserChecker.java src/java/org/apache/nutch/util/URLU

2012-06-07 Thread markus
Author: markus Date: Thu Jun 7 18:48:58 2012 New Revision: 1347755 URL: http://svn.apache.org/viewvc?rev=1347755view=rev Log: NUTCH-1320 IndexChecker and ParseChecker choke on IDN's Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/indexer

svn commit: r1306440 - /nutch/trunk/src/plugin/parse-tika/ivy.xml

2012-03-28 Thread markus
Author: markus Date: Wed Mar 28 15:48:23 2012 New Revision: 1306440 URL: http://svn.apache.org/viewvc?rev=1306440view=rev Log: Upgrade to Tika 1.1 Modified: nutch/trunk/src/plugin/parse-tika/ivy.xml Modified: nutch/trunk/src/plugin/parse-tika/ivy.xml URL: http://svn.apache.org/viewvc/nutch

svn commit: r1305355 - in /nutch/trunk: CHANGES.txt ivy/ivy.xml

2012-03-26 Thread markus
Author: markus Date: Mon Mar 26 13:44:57 2012 New Revision: 1305355 URL: http://svn.apache.org/viewvc?rev=1305355view=rev Log: NUTCH-1234 Upgrade to Tika 1.1 Modified: nutch/trunk/CHANGES.txt nutch/trunk/ivy/ivy.xml Modified: nutch/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc

svn commit: r1305381 - /nutch/trunk/src/plugin/urlfilter-domain/src/java/org/apache/nutch/urlfilter/domain/DomainURLFilter.java

2012-03-26 Thread markus
Author: markus Date: Mon Mar 26 14:51:02 2012 New Revision: 1305381 URL: http://svn.apache.org/viewvc?rev=1305381view=rev Log: NUTCH-1305 DomainFilter missing Modified: nutch/trunk/src/plugin/urlfilter-domain/src/java/org/apache/nutch/urlfilter/domain/DomainURLFilter.java Modified: nutch

svn commit: r1298394 - /nutch/trunk/src/plugin/urlfilter-domainblacklist/src/java/org/apache/nutch/urlfilter/domainblacklist/DomainBlacklistURLFilter.java

2012-03-08 Thread markus
Author: markus Date: Thu Mar 8 13:52:54 2012 New Revision: 1298394 URL: http://svn.apache.org/viewvc?rev=1298394view=rev Log: NUTCH-1305 Domain(blacklist)URLFilter to trim entries Modified: nutch/trunk/src/plugin/urlfilter-domainblacklist/src/java/org/apache/nutch/urlfilter/domainblacklist

svn commit: r1231168 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/Generator.java

2012-01-13 Thread markus
Author: markus Date: Fri Jan 13 16:43:42 2012 New Revision: 1231168 URL: http://svn.apache.org/viewvc?rev=1231168view=rev Log: NUTCH-1248 Generator to select on status Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java Modified: nutch/trunk

svn commit: r1226406 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/NutchWritable.java src/java/org/apache/nutch/scoring/webgraph/WebGraph.java

2012-01-02 Thread markus
Author: markus Date: Mon Jan 2 13:11:50 2012 New Revision: 1226406 URL: http://svn.apache.org/viewvc?rev=1226406view=rev Log: NUTCH-1239 Webgraph should remove deleted pages from segment input Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl

svn commit: r1226409 - in /nutch/trunk: CHANGES.txt conf/schema-solr4.xml conf/schema.xml conf/solrindex-mapping.xml src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.

2012-01-02 Thread markus
Author: markus Date: Mon Jan 2 13:16:59 2012 New Revision: 1226409 URL: http://svn.apache.org/viewvc?rev=1226409view=rev Log: NUTCH-1232 Remove site field from index-basic Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/schema-solr4.xml nutch/trunk/conf/schema.xml nutch/trunk

svn commit: r1225544 - /nutch/trunk/CHANGES.txt

2011-12-29 Thread markus
Author: markus Date: Thu Dec 29 14:35:31 2011 New Revision: 1225544 URL: http://svn.apache.org/viewvc?rev=1225544view=rev Log: NUTCH-1238 Missed changes.txt Modified: nutch/trunk/CHANGES.txt Modified: nutch/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?rev

svn commit: r1225543 - in /nutch/trunk: conf/nutch-default.xml src/java/org/apache/nutch/fetcher/Fetcher.java

2011-12-29 Thread markus
Author: markus Date: Thu Dec 29 14:32:50 2011 New Revision: 1225543 URL: http://svn.apache.org/viewvc?rev=1225543view=rev Log: NUTCH-1238 Fetcher throughput threshold must start before feeder finished Modified: nutch/trunk/conf/nutch-default.xml nutch/trunk/src/java/org/apache/nutch

svn commit: r1224912 - /nutch/trunk/ivy/ivy.xml

2011-12-27 Thread markus
Author: markus Date: Tue Dec 27 14:08:22 2011 New Revision: 1224912 URL: http://svn.apache.org/viewvc?rev=1224912view=rev Log: NUTCH-1235, added Jackson ASL mapper as dep Modified: nutch/trunk/ivy/ivy.xml Modified: nutch/trunk/ivy/ivy.xml URL: http://svn.apache.org/viewvc/nutch/trunk/ivy

svn commit: r1200834 - in /nutch/branches/nutchgora: CHANGES.txt conf/nutch-default.xml

2011-11-11 Thread markus
Author: markus Date: Fri Nov 11 12:00:20 2011 New Revision: 1200834 URL: http://svn.apache.org/viewvc?rev=1200834view=rev Log: NUTCH-1185 Decrease solr.commit.size to 250 Modified: nutch/branches/nutchgora/CHANGES.txt nutch/branches/nutchgora/conf/nutch-default.xml Modified: nutch

svn commit: r1200912 - /nutch/trunk/src/test/org/apache/nutch/crawl/TestGenerator.java

2011-11-11 Thread markus
Author: markus Date: Fri Nov 11 15:01:05 2011 New Revision: 1200912 URL: http://svn.apache.org/viewvc?rev=1200912view=rev Log: NUTCH-1155 Fixes failing test Modified: nutch/trunk/src/test/org/apache/nutch/crawl/TestGenerator.java Modified: nutch/trunk/src/test/org/apache/nutch/crawl

svn commit: r1178376 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/LinkDb.java

2011-10-03 Thread markus
Author: markus Date: Mon Oct 3 10:57:33 2011 New Revision: 1178376 URL: http://svn.apache.org/viewvc?rev=1178376view=rev Log: NUTCH-1137 LinkDB other options ignored with -dir Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/LinkDb.java Modified: nutch

svn commit: r1174689 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/crawl/Generator.java

2011-09-23 Thread markus
Author: markus Date: Fri Sep 23 12:09:35 2011 New Revision: 1174689 URL: http://svn.apache.org/viewvc?rev=1174689view=rev Log: NUTCH-1074 topN is ignored with maxNumSegments and generate.max.count Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org

svn commit: r1174147 - in /nutch/branches/branch-1.4: conf/nutch-default.xml src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMContentUtils.java src/plugin/parse-tika/src/java/org/apache/n

2011-09-22 Thread markus
Author: markus Date: Thu Sep 22 14:02:51 2011 New Revision: 1174147 URL: http://svn.apache.org/viewvc?rev=1174147view=rev Log: NUTCH-1115 Option to disable fixing of URL embedded parameters in DomContentUtils Modified: nutch/branches/branch-1.4/conf/nutch-default.xml nutch/branches

svn commit: r1172585 - /nutch/branches/branch-1.4/conf/nutch-default.xml

2011-09-19 Thread markus
Author: markus Date: Mon Sep 19 12:12:17 2011 New Revision: 1172585 URL: http://svn.apache.org/viewvc?rev=1172585view=rev Log: NUTCH-1067 Nutch-default configuration directives missing Modified: nutch/branches/branch-1.4/conf/nutch-default.xml Modified: nutch/branches/branch-1.4/conf/nutch

svn commit: r1172637 - in /nutch/branches/branch-1.4: CHANGES.txt src/plugin/urlfilter-domain/plugin.xml

2011-09-19 Thread markus
Author: markus Date: Mon Sep 19 14:14:05 2011 New Revision: 1172637 URL: http://svn.apache.org/viewvc?rev=1172637view=rev Log: NUTCH-1114 Attr file missing in domain filter Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/plugin/urlfilter-domain/plugin.xml

svn commit: r1170526 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/fetcher/Fetcher.java

2011-09-14 Thread markus
Author: markus Date: Wed Sep 14 10:59:24 2011 New Revision: 1170526 URL: http://svn.apache.org/viewvc?rev=1170526view=rev Log: NUTCH-1067 Configure minimum throughput for fetcher and NUTCH-1102 Fetcher to rely on fetcher.parse directive Modified: nutch/branches/branch-1.4/CHANGES.txt

svn commit: r1170557 - in /nutch/branches/branch-1.4/src: java/org/apache/nutch/crawl/Crawl.java java/org/apache/nutch/tools/Benchmark.java test/org/apache/nutch/fetcher/TestFetcher.java

2011-09-14 Thread markus
Author: markus Date: Wed Sep 14 12:13:42 2011 New Revision: 1170557 URL: http://svn.apache.org/viewvc?rev=1170557view=rev Log: NUTCH-1067,NUTCH-1102 Fixes for Benchmark, Crawl and TestFetcher Modified: nutch/branches/branch-1.4/src/java/org/apache/nutch/crawl/Crawl.java nutch/branches

svn commit: r1170282 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/crawl/CrawlDb.java

2011-09-13 Thread markus
Author: markus Date: Tue Sep 13 18:15:17 2011 New Revision: 1170282 URL: http://svn.apache.org/viewvc?rev=1170282view=rev Log: NUTCH-1110 UpdateDB must not write _success file Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutch/crawl

svn commit: r1169707 - in /nutch/branches/branch-1.4: CHANGES.txt conf/nutch-default.xml src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java

2011-09-12 Thread markus
Author: markus Date: Mon Sep 12 12:16:53 2011 New Revision: 1169707 URL: http://svn.apache.org/viewvc?rev=1169707view=rev Log: NUTCH-1105 Max content length option for index-basic Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/nutch-default.xml nutch

svn commit: r1167096 - in /nutch/branches/branch-1.4: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/crawl/CrawlDb.java src/java/org/apache/nutch/crawl/CrawlDbFilter.java

2011-09-09 Thread markus
Author: markus Date: Fri Sep 9 11:13:54 2011 New Revision: 1167096 URL: http://svn.apache.org/viewvc?rev=1167096view=rev Log: NUTCH-1101 Option to purge db_gone records from CrawlDB Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/nutch-default.xml

svn commit: r1159207 - in /nutch/branches/branch-1.4: CHANGES.txt src/bin/nutch

2011-08-18 Thread markus
Author: markus Date: Thu Aug 18 13:25:13 2011 New Revision: 1159207 URL: http://svn.apache.org/viewvc?rev=1159207view=rev Log: NUTCH-1049 Add classes to bin/nutch script Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/bin/nutch Modified: nutch/branches

svn commit: r1158215 - in /nutch/trunk: CHANGES.txt src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java

2011-08-16 Thread markus
Author: markus Date: Tue Aug 16 11:59:01 2011 New Revision: 1158215 URL: http://svn.apache.org/viewvc?rev=1158215view=rev Log: NUTCH-1004 Do not index empty values for title field Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/index-basic/src/java/org/apache/nutch/indexer

svn commit: r1158218 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java

2011-08-16 Thread markus
Author: markus Date: Tue Aug 16 12:03:30 2011 New Revision: 1158218 URL: http://svn.apache.org/viewvc?rev=1158218view=rev Log: NUTCH-1082 IndexingFiltersChecker does not list multi valued fields Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org

svn commit: r1158357 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/scoring/webgraph/NodeDumper.java

2011-08-16 Thread markus
Author: markus Date: Tue Aug 16 16:28:43 2011 New Revision: 1158357 URL: http://svn.apache.org/viewvc?rev=1158357view=rev Log: NUTCH-1051 Export WebGraph node scores for Solr.ExternalFileField Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache

svn commit: r1156665 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/crawl/LinkDbMerger.java

2011-08-11 Thread markus
Author: markus Date: Thu Aug 11 16:38:58 2011 New Revision: 1156665 URL: http://svn.apache.org/viewvc?rev=1156665view=rev Log: NUTCH-1069 Readlinkdb broken on Hadoop 0.20 Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutch/crawl

svn commit: r1148301 - in /nutch/branches/branch-1.4: conf/log4j.properties src/java/org/apache/nutch/scoring/webgraph/LinkRank.java src/java/org/apache/nutch/scoring/webgraph/WebGraph.java

2011-07-19 Thread markus
Author: markus Date: Tue Jul 19 12:49:58 2011 New Revision: 1148301 URL: http://svn.apache.org/viewvc?rev=1148301view=rev Log: NUTCH-1050 Add segmentDir to WebGraph Modified: nutch/branches/branch-1.4/conf/log4j.properties nutch/branches/branch-1.4/src/java/org/apache/nutch/scoring

svn commit: r1148308 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/plugin/index-anchor/src/java/org/apache/nutch/indexer/anchor/AnchorIndexingFilter.java

2011-07-19 Thread markus
Author: markus Date: Tue Jul 19 13:12:43 2011 New Revision: 1148308 URL: http://svn.apache.org/viewvc?rev=1148308view=rev Log: NUTCH-1037 Option to deduplicate anchors prior to indexing Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/plugin/index

svn commit: r1147615 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/crawl/CrawlDbReader.java

2011-07-17 Thread markus
Author: markus Date: Sun Jul 17 14:01:51 2011 New Revision: 1147615 URL: http://svn.apache.org/viewvc?rev=1147615view=rev Log: NUTCH-1029 ReadDB throws EOFException Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutch/crawl

svn commit: r1146035 - in /nutch/branches/branch-1.4: ./ conf/ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/indexer/solr/

2011-07-13 Thread markus
Author: markus Date: Wed Jul 13 13:59:11 2011 New Revision: 1146035 URL: http://svn.apache.org/viewvc?rev=1146035view=rev Log: NUTCH-987, NUTCH-1036 Solr HTTP auth support and Hadoop reporter counter increments Added: nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/solr

svn commit: r1146043 - /nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/solr/SolrConstants.java

2011-07-13 Thread markus
Author: markus Date: Wed Jul 13 14:05:47 2011 New Revision: 1146043 URL: http://svn.apache.org/viewvc?rev=1146043view=rev Log: NUTCH-987 Constants for HTTP auth for Solr Modified: nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/solr/SolrConstants.java Modified: nutch/branches

svn commit: r1143467 - in /nutch/branches/branch-1.4: CHANGES.txt conf/regex-normalize.xml.template src/test/org/apache/nutch/net/TestURLNormalizers.java

2011-07-06 Thread markus
Author: markus Date: Wed Jul 6 15:34:43 2011 New Revision: 1143467 URL: http://svn.apache.org/viewvc?rev=1143467view=rev Log: NUTCH-1011 Remove duplicate slashes from URLs Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/regex-normalize.xml.template

svn commit: r1142664 - in /nutch/branches/branch-1.4: CHANGES.txt src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java

2011-07-04 Thread markus
Author: markus Date: Mon Jul 4 13:44:57 2011 New Revision: 1142664 URL: http://svn.apache.org/viewvc?rev=1142664view=rev Log: NUTCH-1013 Migrate RegexURLNormalizer from Apache ORO java.util.regex Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/plugin

svn commit: r1142687 - in /nutch/trunk: CHANGES.txt src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java

2011-07-04 Thread markus
Author: markus Date: Mon Jul 4 14:28:17 2011 New Revision: 1142687 URL: http://svn.apache.org/viewvc?rev=1142687view=rev Log: NUTCH-1013 Migrate RegexURLNormalizer from Apache ORO to java.util.regex Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/urlnormalizer-regex/src/java

svn commit: r1141500 - in /nutch/branches/branch-1.4: CHANGES.txt conf/log4j.properties src/java/org/apache/nutch/indexer/solr/SolrWriter.java

2011-06-30 Thread markus
Author: markus Date: Thu Jun 30 12:13:26 2011 New Revision: 1141500 URL: http://svn.apache.org/viewvc?rev=1141500view=rev Log: NUTCH-1016 Strip UTF-8 non-character codepoints Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/log4j.properties nutch

svn commit: r1140696 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/util/EncodingDetector.java

2011-06-28 Thread markus
Author: markus Date: Tue Jun 28 16:03:28 2011 New Revision: 1140696 URL: http://svn.apache.org/viewvc?rev=1140696view=rev Log: NUTCH-1012 Cannot handle illegal charset Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/util/EncodingDetector.java Modified: nutch

svn commit: r1140117 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml

2011-06-27 Thread markus
Author: markus Date: Mon Jun 27 11:41:22 2011 New Revision: 1140117 URL: http://svn.apache.org/viewvc?rev=1140117view=rev Log: NUTCH-295 Description for fetcher.threads.fetch property Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml Modified: nutch/trunk/CHANGES.txt

svn commit: r1139331 - in /nutch/trunk: CHANGES.txt src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java

2011-06-24 Thread markus
Author: markus Date: Fri Jun 24 14:38:44 2011 New Revision: 1139331 URL: http://svn.apache.org/viewvc?rev=1139331view=rev Log: NUTCH-1006 MetaEquiv with single quotes not accepted Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html

svn commit: r1126417 - in /nutch/branches/branch-1.3: CHANGES.txt conf/schema.xml

2011-05-23 Thread markus
Author: markus Date: Mon May 23 10:17:14 2011 New Revision: 1126417 URL: http://svn.apache.org/viewvc?rev=1126417view=rev Log: NUTCH-994 Fine tune Solr schema Modified: nutch/branches/branch-1.3/CHANGES.txt nutch/branches/branch-1.3/conf/schema.xml Modified: nutch/branches/branch-1.3

svn commit: r1126425 - in /nutch/trunk: CHANGES.txt conf/schema.xml

2011-05-23 Thread markus
Author: markus Date: Mon May 23 10:48:59 2011 New Revision: 1126425 URL: http://svn.apache.org/viewvc?rev=1126425view=rev Log: NUTCH-994 Fine tune Solr schema Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/schema.xml Modified: nutch/trunk/CHANGES.txt URL: http://svn.apache.org

svn commit: r1101279 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/indexer/solr/SolrWriter.java

2011-05-09 Thread markus
Author: markus Date: Tue May 10 00:44:42 2011 New Revision: 1101279 URL: http://svn.apache.org/viewvc?rev=1101279view=rev Log: NUTCH-996 Indexer adds solr.commit.size+1 docs Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java Modified

svn commit: r1101280 - in /nutch/branches/branch-1.3: CHANGES.txt src/java/org/apache/nutch/indexer/solr/SolrWriter.java

2011-05-09 Thread markus
Author: markus Date: Tue May 10 00:46:04 2011 New Revision: 1101280 URL: http://svn.apache.org/viewvc?rev=1101280view=rev Log: NUTCH-996 Indexer adds solr.commit.size+1 docs Modified: nutch/branches/branch-1.3/CHANGES.txt nutch/branches/branch-1.3/src/java/org/apache/nutch/indexer/solr

svn commit: r1092084 - in /nutch/branches/branch-1.3: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/indexer/solr/SolrConstants.java

2011-04-14 Thread markus
Author: markus Date: Thu Apr 14 09:59:11 2011 New Revision: 1092084 URL: http://svn.apache.org/viewvc?rev=1092084view=rev Log: NUTCH-976 Rename properties solrindex.* to solr.* Modified: nutch/branches/branch-1.3/CHANGES.txt nutch/branches/branch-1.3/conf/nutch-default.xml nutch

svn commit: r1092090 - /nutch/trunk/CHANGES.txt

2011-04-14 Thread markus
Author: markus Date: Thu Apr 14 10:05:47 2011 New Revision: 1092090 URL: http://svn.apache.org/viewvc?rev=1092090view=rev Log: NUTCH-977 SolrMappingReader uses hardcoded configuration parameter name for mapping file Modified: nutch/trunk/CHANGES.txt Modified: nutch/trunk/CHANGES.txt URL

svn commit: r1092091 - in /nutch/branches/branch-1.3: CHANGES.txt src/java/org/apache/nutch/indexer/solr/SolrMappingReader.java

2011-04-14 Thread markus
Author: markus Date: Thu Apr 14 10:06:06 2011 New Revision: 1092091 URL: http://svn.apache.org/viewvc?rev=1092091view=rev Log: NUTCH-977 SolrMappingReader uses hardcoded configuration parameter name for mapping file Modified: nutch/branches/branch-1.3/CHANGES.txt nutch/branches/branch

svn commit: r1091895 - in /nutch/trunk: CHANGES.txt conf/solrindex-mapping.xml

2011-04-13 Thread markus
Author: markus Date: Wed Apr 13 19:34:53 2011 New Revision: 1091895 URL: http://svn.apache.org/viewvc?rev=1091895view=rev Log: NUTCH-982 Remove copying of ID and URL field in solrmapping Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/solrindex-mapping.xml Modified: nutch/trunk

svn commit: r1089866 - /nutch/branches/branch-1.3/src/bin/nutch

2011-04-07 Thread markus
Author: markus Date: Thu Apr 7 13:08:05 2011 New Revision: 1089866 URL: http://svn.apache.org/viewvc?rev=1089866view=rev Log: NUTCH-975 Fix missing and wrong headers in source files (src/bin/nutch) Modified: nutch/branches/branch-1.3/src/bin/nutch Modified: nutch/branches/branch-1.3/src

svn commit: r1082943 - in /nutch/branches/branch-1.3: CHANGES.txt conf/log4j.properties src/bin/nutch

2011-03-18 Thread markus
Author: markus Date: Fri Mar 18 15:05:34 2011 New Revision: 1082943 URL: http://svn.apache.org/viewvc?rev=1082943view=rev Log: NUTCH-963 Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (Claudio Martella, markus) Modified: nutch/branches/branch-1.3/CHANGES.txt

svn commit: r1037742 - in /nutch/branches/branch-1.3/src/plugin/urlnormalizer-basic/src: java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java test/org/apache/nutch/net/urlnormalizer/b

2010-11-22 Thread markus
Author: markus Date: Mon Nov 22 14:56:40 2010 New Revision: 1037742 URL: http://svn.apache.org/viewvc?rev=1037742view=rev Log: NUTCH-935 - remove unnecessary /./ in basic urlnormalizer (via Stondubleyt) Modified: nutch/branches/branch-1.3/src/plugin/urlnormalizer-basic/src/java/org/apache

<    1   2