svn commit: r1099802 - in /nutch/trunk: CHANGES.txt conf/schema.xml

2011-05-05 Thread markus
Author: markus Date: Thu May 5 13:48:56 2011 New Revision: 1099802 URL: http://svn.apache.org/viewvc?rev=1099802&view=rev Log: NUTCH-989 Index-basic plugin and Solr schema now use date fieldType for tstamp field Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/schema.xml Modi

svn commit: r1101279 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/indexer/solr/SolrWriter.java

2011-05-09 Thread markus
Author: markus Date: Tue May 10 00:44:42 2011 New Revision: 1101279 URL: http://svn.apache.org/viewvc?rev=1101279&view=rev Log: NUTCH-996 Indexer adds solr.commit.size+1 docs Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java Modi

svn commit: r1101280 - in /nutch/branches/branch-1.3: CHANGES.txt src/java/org/apache/nutch/indexer/solr/SolrWriter.java

2011-05-09 Thread markus
Author: markus Date: Tue May 10 00:46:04 2011 New Revision: 1101280 URL: http://svn.apache.org/viewvc?rev=1101280&view=rev Log: NUTCH-996 Indexer adds solr.commit.size+1 docs Modified: nutch/branches/branch-1.3/CHANGES.txt nutch/branches/branch-1.3/src/java/org/apache/nutch/indexer/

svn commit: r1126417 - in /nutch/branches/branch-1.3: CHANGES.txt conf/schema.xml

2011-05-23 Thread markus
Author: markus Date: Mon May 23 10:17:14 2011 New Revision: 1126417 URL: http://svn.apache.org/viewvc?rev=1126417&view=rev Log: NUTCH-994 Fine tune Solr schema Modified: nutch/branches/branch-1.3/CHANGES.txt nutch/branches/branch-1.3/conf/schema.xml Modified: nutch/branches/branch

svn commit: r1126425 - in /nutch/trunk: CHANGES.txt conf/schema.xml

2011-05-23 Thread markus
Author: markus Date: Mon May 23 10:48:59 2011 New Revision: 1126425 URL: http://svn.apache.org/viewvc?rev=1126425&view=rev Log: NUTCH-994 Fine tune Solr schema Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/schema.xml Modified: nutch/trunk/CHANGES.txt URL: http://svn.apache

svn commit: r1139307 - in /nutch/branches/branch-1.4: CHANGES.txt src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java

2011-06-24 Thread markus
Author: markus Date: Fri Jun 24 13:56:27 2011 New Revision: 1139307 URL: http://svn.apache.org/viewvc?rev=1139307&view=rev Log: NUTCH-1010 ContentLength not trimmed Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/plugin/index-more/src/java/org/apache/n

svn commit: r1139308 - in /nutch/trunk: CHANGES.txt src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java

2011-06-24 Thread markus
Author: markus Date: Fri Jun 24 13:56:42 2011 New Revision: 1139308 URL: http://svn.apache.org/viewvc?rev=1139308&view=rev Log: NUTCH-1010 ContentLength not trimmed Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/index-more/src/java/org/apache/nutch/indexer/

svn commit: r1139329 - in /nutch/branches/branch-1.4: CHANGES.txt src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java

2011-06-24 Thread markus
Author: markus Date: Fri Jun 24 14:37:57 2011 New Revision: 1139329 URL: http://svn.apache.org/viewvc?rev=1139329&view=rev Log: NUTCH-1006 MetaEquiv with single quotes not accepted Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/plugin/parse-html/src/

svn commit: r1139331 - in /nutch/trunk: CHANGES.txt src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java

2011-06-24 Thread markus
Author: markus Date: Fri Jun 24 14:38:44 2011 New Revision: 1139331 URL: http://svn.apache.org/viewvc?rev=1139331&view=rev Log: NUTCH-1006 MetaEquiv with single quotes not accepted Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/

svn commit: r1139357 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/indexer/solr/SolrClean.java src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java src/java/org/ap

2011-06-24 Thread markus
Author: markus Date: Fri Jun 24 15:35:12 2011 New Revision: 1139357 URL: http://svn.apache.org/viewvc?rev=1139357&view=rev Log: NUTCH-1000 Add option not to commit to Solr Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/

svn commit: r1140116 - in /nutch/branches/branch-1.4: CHANGES.txt conf/nutch-default.xml

2011-06-27 Thread markus
Author: markus Date: Mon Jun 27 11:40:53 2011 New Revision: 1140116 URL: http://svn.apache.org/viewvc?rev=1140116&view=rev Log: NUTCH-295 Description for fetcher.threads.fetch property Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/nutch-default

svn commit: r1140117 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml

2011-06-27 Thread markus
Author: markus Date: Mon Jun 27 11:41:22 2011 New Revision: 1140117 URL: http://svn.apache.org/viewvc?rev=1140117&view=rev Log: NUTCH-295 Description for fetcher.threads.fetch property Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml Modified: nutch/trunk/CHANGES

svn commit: r1140619 - in /nutch/branches/branch-1.4: CHANGES.txt conf/nutch-default.xml

2011-06-28 Thread markus
Author: markus Date: Tue Jun 28 13:54:21 2011 New Revision: 1140619 URL: http://svn.apache.org/viewvc?rev=1140619&view=rev Log: NUTCH-1022 Upgrade version number of Nutch agent in conf Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/nutch-default

svn commit: r1140685 - in /nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/solr: SolrDeleteDuplicates.java SolrIndexer.java

2011-06-28 Thread markus
Author: markus Date: Tue Jun 28 15:26:20 2011 New Revision: 1140685 URL: http://svn.apache.org/viewvc?rev=1140685&view=rev Log: NUTCH-1000 Method overrides for indexer and dedup Modified: nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java n

svn commit: r1140695 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/util/EncodingDetector.java

2011-06-28 Thread markus
Author: markus Date: Tue Jun 28 15:59:47 2011 New Revision: 1140695 URL: http://svn.apache.org/viewvc?rev=1140695&view=rev Log: NUTCH-1012 Cannot handle illegal charset Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutch/

svn commit: r1140696 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/util/EncodingDetector.java

2011-06-28 Thread markus
Author: markus Date: Tue Jun 28 16:03:28 2011 New Revision: 1140696 URL: http://svn.apache.org/viewvc?rev=1140696&view=rev Log: NUTCH-1012 Cannot handle illegal charset Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/util/EncodingDetector.java Modified: n

svn commit: r1141500 - in /nutch/branches/branch-1.4: CHANGES.txt conf/log4j.properties src/java/org/apache/nutch/indexer/solr/SolrWriter.java

2011-06-30 Thread markus
Author: markus Date: Thu Jun 30 12:13:26 2011 New Revision: 1141500 URL: http://svn.apache.org/viewvc?rev=1141500&view=rev Log: NUTCH-1016 Strip UTF-8 non-character codepoints Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/log4j.properties n

svn commit: r1142664 - in /nutch/branches/branch-1.4: CHANGES.txt src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java

2011-07-04 Thread markus
Author: markus Date: Mon Jul 4 13:44:57 2011 New Revision: 1142664 URL: http://svn.apache.org/viewvc?rev=1142664&view=rev Log: NUTCH-1013 Migrate RegexURLNormalizer from Apache ORO java.util.regex Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/pl

svn commit: r1142687 - in /nutch/trunk: CHANGES.txt src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java

2011-07-04 Thread markus
Author: markus Date: Mon Jul 4 14:28:17 2011 New Revision: 1142687 URL: http://svn.apache.org/viewvc?rev=1142687&view=rev Log: NUTCH-1013 Migrate RegexURLNormalizer from Apache ORO to java.util.regex Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/urlnormalizer-regex/src/

svn commit: r1143467 - in /nutch/branches/branch-1.4: CHANGES.txt conf/regex-normalize.xml.template src/test/org/apache/nutch/net/TestURLNormalizers.java

2011-07-06 Thread markus
Author: markus Date: Wed Jul 6 15:34:43 2011 New Revision: 1143467 URL: http://svn.apache.org/viewvc?rev=1143467&view=rev Log: NUTCH-1011 Remove duplicate slashes from URLs Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/regex-normalize.xml.temp

svn commit: r1143468 - in /nutch/trunk: CHANGES.txt conf/regex-normalize.xml.template src/test/org/apache/nutch/net/TestURLNormalizers.java

2011-07-06 Thread markus
Author: markus Date: Wed Jul 6 15:35:51 2011 New Revision: 1143468 URL: http://svn.apache.org/viewvc?rev=1143468&view=rev Log: NUTCH-1011 Remove duplicate slashes from URLs Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/regex-normalize.xml.template nutch/trunk/src/test

svn commit: r1145109 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/scoring/webgraph/WebGraph.java

2011-07-11 Thread markus
Author: markus Date: Mon Jul 11 10:22:37 2011 New Revision: 1145109 URL: http://svn.apache.org/viewvc?rev=1145109&view=rev Log: NUTCH-1030 WebgraphDB program requires manually added directories Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/ap

svn commit: r1145110 - /nutch/branches/branch-1.4/conf/log4j.properties

2011-07-11 Thread markus
Author: markus Date: Mon Jul 11 10:30:20 2011 New Revision: 1145110 URL: http://svn.apache.org/viewvc?rev=1145110&view=rev Log: NUTCH-1030 Updating log4j.properties as well Modified: nutch/branches/branch-1.4/conf/log4j.properties Modified: nutch/branches/branch-1.4/conf/log4j.proper

svn commit: r1145117 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java

2011-07-11 Thread markus
Author: markus Date: Mon Jul 11 10:44:56 2011 New Revision: 1145117 URL: http://svn.apache.org/viewvc?rev=1145117&view=rev Log: NUTCH-783 IndexingFiltersChecker utility added Added: nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java Modified: n

svn commit: r1145130 - in /nutch/branches/branch-1.4: CHANGES.txt src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java

2011-07-11 Thread markus
Author: markus Date: Mon Jul 11 11:57:47 2011 New Revision: 1145130 URL: http://svn.apache.org/viewvc?rev=1145130&view=rev Log: NUTCH-1027 Degrade log level of 'can't find rules for scope' Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/p

svn commit: r1145131 - in /nutch/trunk: CHANGES.txt src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java

2011-07-11 Thread markus
Author: markus Date: Mon Jul 11 11:58:00 2011 New Revision: 1145131 URL: http://svn.apache.org/viewvc?rev=1145131&view=rev Log: NUTCH-1027 Degrade log level of 'can't find rules for scope' Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/urlnormalizer-regex/

svn commit: r1146035 - in /nutch/branches/branch-1.4: ./ conf/ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/indexer/solr/

2011-07-13 Thread markus
Author: markus Date: Wed Jul 13 13:59:11 2011 New Revision: 1146035 URL: http://svn.apache.org/viewvc?rev=1146035&view=rev Log: NUTCH-987, NUTCH-1036 Solr HTTP auth support and Hadoop reporter counter increments Added: nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/

svn commit: r1146043 - /nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/solr/SolrConstants.java

2011-07-13 Thread markus
Author: markus Date: Wed Jul 13 14:05:47 2011 New Revision: 1146043 URL: http://svn.apache.org/viewvc?rev=1146043&view=rev Log: NUTCH-987 Constants for HTTP auth for Solr Modified: nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/solr/SolrConstants.java Modified: nutch/bran

svn commit: r1147615 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/crawl/CrawlDbReader.java

2011-07-17 Thread markus
Author: markus Date: Sun Jul 17 14:01:51 2011 New Revision: 1147615 URL: http://svn.apache.org/viewvc?rev=1147615&view=rev Log: NUTCH-1029 ReadDB throws EOFException Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutch/c

svn commit: r1148301 - in /nutch/branches/branch-1.4: conf/log4j.properties src/java/org/apache/nutch/scoring/webgraph/LinkRank.java src/java/org/apache/nutch/scoring/webgraph/WebGraph.java

2011-07-19 Thread markus
Author: markus Date: Tue Jul 19 12:49:58 2011 New Revision: 1148301 URL: http://svn.apache.org/viewvc?rev=1148301&view=rev Log: NUTCH-1050 Add segmentDir to WebGraph Modified: nutch/branches/branch-1.4/conf/log4j.properties nutch/branches/branch-1.4/src/java/org/apache/nutch/sco

svn commit: r1148305 - in /nutch/branches/branch-1.4: CHANGES.txt conf/nutch-default.xml src/plugin/index-anchor/src/java/org/apache/nutch/indexer/anchor/AnchorIndexingFilter.java

2011-07-19 Thread markus
Author: markus Date: Tue Jul 19 13:01:45 2011 New Revision: 1148305 URL: http://svn.apache.org/viewvc?rev=1148305&view=rev Log: NUTCH-1037 Option to deduplicate anchors prior to indexing Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/nutch-default

svn commit: r1148308 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/plugin/index-anchor/src/java/org/apache/nutch/indexer/anchor/AnchorIndexingFilter.java

2011-07-19 Thread markus
Author: markus Date: Tue Jul 19 13:12:43 2011 New Revision: 1148308 URL: http://svn.apache.org/viewvc?rev=1148308&view=rev Log: NUTCH-1037 Option to deduplicate anchors prior to indexing Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/pl

svn commit: r1148406 - in /nutch/branches/branch-1.4: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/fetcher/Fetcher.java

2011-07-19 Thread markus
Author: markus Date: Tue Jul 19 15:40:34 2011 New Revision: 1148406 URL: http://svn.apache.org/viewvc?rev=1148406&view=rev Log: NUTCH-1057 Fetcher thread time out configurable Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/nutch-default.xml n

svn commit: r1156132 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/parse/ParseSegment.java

2011-08-10 Thread markus
Author: markus Date: Wed Aug 10 12:26:49 2011 New Revision: 1156132 URL: http://svn.apache.org/viewvc?rev=1156132&view=rev Log: NUTCH-1028 Log urls when parsing Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutch/parse/ParseSegment.

svn commit: r1156665 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/crawl/LinkDbMerger.java

2011-08-11 Thread markus
Author: markus Date: Thu Aug 11 16:38:58 2011 New Revision: 1156665 URL: http://svn.apache.org/viewvc?rev=1156665&view=rev Log: NUTCH-1069 Readlinkdb broken on Hadoop > 0.20 Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutc

svn commit: r1158214 - in /nutch/branches/branch-1.4: CHANGES.txt src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java

2011-08-16 Thread markus
Author: markus Date: Tue Aug 16 11:58:12 2011 New Revision: 1158214 URL: http://svn.apache.org/viewvc?rev=1158214&view=rev Log: NUTCH-1004 Do not index empty values for title field Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/plugin/index-basic/src/

svn commit: r1158215 - in /nutch/trunk: CHANGES.txt src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java

2011-08-16 Thread markus
Author: markus Date: Tue Aug 16 11:59:01 2011 New Revision: 1158215 URL: http://svn.apache.org/viewvc?rev=1158215&view=rev Log: NUTCH-1004 Do not index empty values for title field Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/index-basic/src/java/org/apache/nutch/ind

svn commit: r1158218 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java

2011-08-16 Thread markus
Author: markus Date: Tue Aug 16 12:03:30 2011 New Revision: 1158218 URL: http://svn.apache.org/viewvc?rev=1158218&view=rev Log: NUTCH-1082 IndexingFiltersChecker does not list multi valued fields Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java

svn commit: r1158357 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/scoring/webgraph/NodeDumper.java

2011-08-16 Thread markus
Author: markus Date: Tue Aug 16 16:28:43 2011 New Revision: 1158357 URL: http://svn.apache.org/viewvc?rev=1158357&view=rev Log: NUTCH-1051 Export WebGraph node scores for Solr.ExternalFileField Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/ap

svn commit: r1159207 - in /nutch/branches/branch-1.4: CHANGES.txt src/bin/nutch

2011-08-18 Thread markus
Author: markus Date: Thu Aug 18 13:25:13 2011 New Revision: 1159207 URL: http://svn.apache.org/viewvc?rev=1159207&view=rev Log: NUTCH-1049 Add classes to bin/nutch script Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/bin/nutch Modified: nutch/bran

svn commit: r1167096 - in /nutch/branches/branch-1.4: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/crawl/CrawlDb.java src/java/org/apache/nutch/crawl/CrawlDbFilter.java

2011-09-09 Thread markus
Author: markus Date: Fri Sep 9 11:13:54 2011 New Revision: 1167096 URL: http://svn.apache.org/viewvc?rev=1167096&view=rev Log: NUTCH-1101 Option to purge db_gone records from CrawlDB Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/nutch-default

svn commit: r1169707 - in /nutch/branches/branch-1.4: CHANGES.txt conf/nutch-default.xml src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java

2011-09-12 Thread markus
Author: markus Date: Mon Sep 12 12:16:53 2011 New Revision: 1169707 URL: http://svn.apache.org/viewvc?rev=1169707&view=rev Log: NUTCH-1105 Max content length option for index-basic Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/nutch-default.xml n

svn commit: r1170282 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/crawl/CrawlDb.java

2011-09-13 Thread markus
Author: markus Date: Tue Sep 13 18:15:17 2011 New Revision: 1170282 URL: http://svn.apache.org/viewvc?rev=1170282&view=rev Log: NUTCH-1110 UpdateDB must not write _success file Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutch/c

svn commit: r1170526 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/fetcher/Fetcher.java

2011-09-14 Thread markus
Author: markus Date: Wed Sep 14 10:59:24 2011 New Revision: 1170526 URL: http://svn.apache.org/viewvc?rev=1170526&view=rev Log: NUTCH-1067 Configure minimum throughput for fetcher and NUTCH-1102 Fetcher to rely on fetcher.parse directive Modified: nutch/branches/branch-1.4/CHANGES

svn commit: r1170557 - in /nutch/branches/branch-1.4/src: java/org/apache/nutch/crawl/Crawl.java java/org/apache/nutch/tools/Benchmark.java test/org/apache/nutch/fetcher/TestFetcher.java

2011-09-14 Thread markus
Author: markus Date: Wed Sep 14 12:13:42 2011 New Revision: 1170557 URL: http://svn.apache.org/viewvc?rev=1170557&view=rev Log: NUTCH-1067,NUTCH-1102 Fixes for Benchmark, Crawl and TestFetcher Modified: nutch/branches/branch-1.4/src/java/org/apache/nutch/crawl/Crawl.java nutch/bran

svn commit: r1172585 - /nutch/branches/branch-1.4/conf/nutch-default.xml

2011-09-19 Thread markus
Author: markus Date: Mon Sep 19 12:12:17 2011 New Revision: 1172585 URL: http://svn.apache.org/viewvc?rev=1172585&view=rev Log: NUTCH-1067 Nutch-default configuration directives missing Modified: nutch/branches/branch-1.4/conf/nutch-default.xml Modified: nutch/branches/branch-1.4/conf/n

svn commit: r1172637 - in /nutch/branches/branch-1.4: CHANGES.txt src/plugin/urlfilter-domain/plugin.xml

2011-09-19 Thread markus
Author: markus Date: Mon Sep 19 14:14:05 2011 New Revision: 1172637 URL: http://svn.apache.org/viewvc?rev=1172637&view=rev Log: NUTCH-1114 Attr file missing in domain filter Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/plugin/urlfilter-domain/plugin

svn commit: r1174147 - in /nutch/branches/branch-1.4: conf/nutch-default.xml src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMContentUtils.java src/plugin/parse-tika/src/java/org/apache/n

2011-09-22 Thread markus
Author: markus Date: Thu Sep 22 14:02:51 2011 New Revision: 1174147 URL: http://svn.apache.org/viewvc?rev=1174147&view=rev Log: NUTCH-1115 Option to disable fixing of URL embedded parameters in DomContentUtils Modified: nutch/branches/branch-1.4/conf/nutch-default.xml nutch/bran

svn commit: r1174222 - /nutch/branches/branch-1.4/CHANGES.txt

2011-09-22 Thread markus
Author: markus Date: Thu Sep 22 15:45:25 2011 New Revision: 1174222 URL: http://svn.apache.org/viewvc?rev=1174222&view=rev Log: Recommitted CHANGELOG entry for NUTCH-1115. Was overwritten by NUTCH-1078 commit Modified: nutch/branches/branch-1.4/CHANGES.txt Modified: nutch/branches/br

svn commit: r1174689 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/crawl/Generator.java

2011-09-23 Thread markus
Author: markus Date: Fri Sep 23 12:09:35 2011 New Revision: 1174689 URL: http://svn.apache.org/viewvc?rev=1174689&view=rev Log: NUTCH-1074 topN is ignored with maxNumSegments and generate.max.count Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java

svn commit: r1178376 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/LinkDb.java

2011-10-03 Thread markus
Author: markus Date: Mon Oct 3 10:57:33 2011 New Revision: 1178376 URL: http://svn.apache.org/viewvc?rev=1178376&view=rev Log: NUTCH-1137 LinkDB other options ignored with -dir Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/LinkDb.java Modified: n

svn commit: r1178409 - in /nutch/trunk: CHANGES.txt conf/schema.xml

2011-10-03 Thread markus
Author: markus Date: Mon Oct 3 13:25:18 2011 New Revision: 1178409 URL: http://svn.apache.org/viewvc?rev=1178409&view=rev Log: NUTCH-1058 Upgrade Solr schema to version 1.4 Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/schema.xml Modified: nutch/trunk/CHANGES.txt URL:

svn commit: r1178410 - in /nutch/branches/nutchgora: CHANGES.txt conf/schema.xml

2011-10-03 Thread markus
Author: markus Date: Mon Oct 3 13:25:49 2011 New Revision: 1178410 URL: http://svn.apache.org/viewvc?rev=1178410&view=rev Log: NUTCH-1058 Upgrade Solr schema to version 1.4 Modified: nutch/branches/nutchgora/CHANGES.txt nutch/branches/nutchgora/conf/schema.xml Modified: nutch/bran

svn commit: r1200344 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/scoring/webgraph/LinkRank.java

2011-11-10 Thread markus
Author: markus Date: Thu Nov 10 14:27:53 2011 New Revision: 1200344 URL: http://svn.apache.org/viewvc?rev=1200344&view=rev Log: NUTCH-1153 LinkRank not to log all keys and not to write Hadoop _SUCCESS file Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/sco

svn commit: r1200346 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/scoring/webgraph/WebGraph.java

2011-11-10 Thread markus
Author: markus Date: Thu Nov 10 14:29:45 2011 New Revision: 1200346 URL: http://svn.apache.org/viewvc?rev=1200346&view=rev Log: NUTCH-1142 Normalization and filtering in WebGraph Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/scoring/webgraph/WebGraph.

svn commit: r1200347 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/CrawlDbReader.java

2011-11-10 Thread markus
Author: markus Date: Thu Nov 10 14:31:33 2011 New Revision: 1200347 URL: http://svn.apache.org/viewvc?rev=1200347&view=rev Log: NUTCH-1178 Incorrect CSV header CrawlDatumCsvOutputFormat Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDbReader.

svn commit: r1200360 - in /nutch/trunk: ./ src/plugin/index-more/ src/plugin/index-more/src/java/org/apache/nutch/indexer/more/ src/plugin/index-more/src/test/org/apache/nutch/indexer/more/

2011-11-10 Thread markus
Author: markus Date: Thu Nov 10 15:02:04 2011 New Revision: 1200360 URL: http://svn.apache.org/viewvc?rev=1200360&view=rev Log: NUTCH-1061 Migrate MoreIndexingFilter from Apache ORO to java.util.regex Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/index-more/plugin

svn commit: r1200370 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/Generator.java

2011-11-10 Thread markus
Author: markus Date: Thu Nov 10 15:16:23 2011 New Revision: 1200370 URL: http://svn.apache.org/viewvc?rev=1200370&view=rev Log: NUTCH-1155 Host/domain limit in generator is generate.max.count+1 Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/Generator.

svn commit: r1200377 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/util/domain/DomainStatistics.java

2011-11-10 Thread markus
Author: markus Date: Thu Nov 10 15:24:30 2011 New Revision: 1200377 URL: http://svn.apache.org/viewvc?rev=1200377&view=rev Log: NUTCH-1173 DomainStats doesn't count db_not_modified Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/util/domain/DomainStatis

svn commit: r1200830 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/crawl/CrawlDb.java

2011-11-11 Thread markus
Author: markus Date: Fri Nov 11 11:55:21 2011 New Revision: 1200830 URL: http://svn.apache.org/viewvc?rev=1200830&view=rev Log: NUTCH-1180 UpdateDB to backup previous CrawlDB Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/java/org/apache/n

svn commit: r1200833 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml

2011-11-11 Thread markus
Author: markus Date: Fri Nov 11 11:59:49 2011 New Revision: 1200833 URL: http://svn.apache.org/viewvc?rev=1200833&view=rev Log: NUTCH-1185 Decrease solr.commit.size to 250 Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml Modified: nutch/trunk/CHANGES.txt URL:

svn commit: r1200834 - in /nutch/branches/nutchgora: CHANGES.txt conf/nutch-default.xml

2011-11-11 Thread markus
Author: markus Date: Fri Nov 11 12:00:20 2011 New Revision: 1200834 URL: http://svn.apache.org/viewvc?rev=1200834&view=rev Log: NUTCH-1185 Decrease solr.commit.size to 250 Modified: nutch/branches/nutchgora/CHANGES.txt nutch/branches/nutchgora/conf/nutch-default.xml Modified: n

svn commit: r1200912 - /nutch/trunk/src/test/org/apache/nutch/crawl/TestGenerator.java

2011-11-11 Thread markus
Author: markus Date: Fri Nov 11 15:01:05 2011 New Revision: 1200912 URL: http://svn.apache.org/viewvc?rev=1200912&view=rev Log: NUTCH-1155 Fixes failing test Modified: nutch/trunk/src/test/org/apache/nutch/crawl/TestGenerator.java Modified: nutch/trunk/src/test/org/apache/nutch/c

svn commit: r1200915 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/parse/ParseSegment.java

2011-11-11 Thread markus
Author: markus Date: Fri Nov 11 15:16:49 2011 New Revision: 1200915 URL: http://svn.apache.org/viewvc?rev=1200915&view=rev Log: NUTCH-1203 ParseSegment to show number of milliseconds per parse Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.

svn commit: r1200917 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/parse/Outlink.java src/java/org/apache/nutch/parse/ParseOutputFormat.java

2011-11-11 Thread markus
Author: markus Date: Fri Nov 11 15:19:28 2011 New Revision: 1200917 URL: http://svn.apache.org/viewvc?rev=1200917&view=rev Log: NUTCH-1174 Outlinks are not properly normalized Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/parse/Outlink.java nutch/trunk

svn commit: r1202143 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/LinkDb.java

2011-11-15 Thread markus
Author: markus Date: Tue Nov 15 11:56:30 2011 New Revision: 1202143 URL: http://svn.apache.org/viewvc?rev=1202143&view=rev Log: NUTCH-1090 InvertLinks should inform when ignoring internal links Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/LinkDb.

svn commit: r1204492 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/parse/ParserChecker.java

2011-11-21 Thread markus
Author: markus Date: Mon Nov 21 13:42:16 2011 New Revision: 1204492 URL: http://svn.apache.org/viewvc?rev=1204492&view=rev Log: NUTCH-1207 ParserChecker to output signature Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/parse/ParserChecker.java Modified: n

svn commit: r1207967 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/util/domain/DomainStatistics.java

2011-11-29 Thread markus
Author: markus Date: Tue Nov 29 16:56:45 2011 New Revision: 1207967 URL: http://svn.apache.org/viewvc?rev=1207967&view=rev Log: NUTCH-1214 DomainStats tool should be named for what it's doing Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/ut

svn commit: r1215090 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/util/domain/DomainStatistics.java

2011-12-16 Thread markus
Author: markus Date: Fri Dec 16 11:17:10 2011 New Revision: 1215090 URL: http://svn.apache.org/viewvc?rev=1215090&view=rev Log: NUTCH-1221 Migrate DomainStatistics to MapReduce API Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/util/domain/DomainStatistics.

svn commit: r1220786 - in /nutch/trunk: CHANGES.txt ivy/ivy.xml

2011-12-19 Thread markus
Author: markus Date: Mon Dec 19 15:12:53 2011 New Revision: 1220786 URL: http://svn.apache.org/viewvc?rev=1220786&view=rev Log: NUTCH-1222 Upgrade to new Hadoop 0.22.0 Modified: nutch/trunk/CHANGES.txt nutch/trunk/ivy/ivy.xml Modified: nutch/trunk/CHANGES.txt URL: http://svn.apache

svn commit: r1220788 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/tools/CrawlDBScanner.java

2011-12-19 Thread markus
Author: markus Date: Mon Dec 19 15:15:43 2011 New Revision: 1220788 URL: http://svn.apache.org/viewvc?rev=1220788&view=rev Log: NUTCH-1225 Migrate CrawlDBScanner to MapReduce API Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/tools/CrawlDBScanner.java Modi

svn commit: r1221181 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/fetcher/Fetcher.java src/java/org/apache/nutch/parse/ParseData.java src/java/org/apache/nutch/parse

2011-12-20 Thread markus
Author: markus Date: Tue Dec 20 10:11:09 2011 New Revision: 1221181 URL: http://svn.apache.org/viewvc?rev=1221181&view=rev Log: NUTCH-1184 Fetcher to parse and follow Nth degree outlinks Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/java

svn commit: r1221185 - in /nutch/trunk: CHANGES.txt conf/log4j.properties

2011-12-20 Thread markus
Author: markus Date: Tue Dec 20 10:22:06 2011 New Revision: 1221185 URL: http://svn.apache.org/viewvc?rev=1221185&view=rev Log: NUTCH-1129 Add freegenerator, domainstats and crawldbscanner to log4j Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/log4j.properties Modified: nutch/t

svn commit: r1221194 - /nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java

2011-12-20 Thread markus
Author: markus Date: Tue Dec 20 10:50:31 2011 New Revision: 1221194 URL: http://svn.apache.org/viewvc?rev=1221194&view=rev Log: Renamed FetcherStatus to FetcherOutlinks for the new outlinks section of NUTCH-1184 Modified: nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java Modi

svn commit: r1222627 - /nutch/trunk/pom.xml

2011-12-23 Thread markus
Author: markus Date: Fri Dec 23 10:11:08 2011 New Revision: 1222627 URL: http://svn.apache.org/viewvc?rev=1222627&view=rev Log: Updated pom to reflect Hadoop upgrade Modified: nutch/trunk/pom.xml Modified: nutch/trunk/pom.xml URL: http://svn.apache.org/viewvc/nutch/trunk/pom.xml

svn commit: r1224916 - in /nutch/trunk: ./ ivy/ src/java/org/apache/nutch/util/ src/plugin/index-more/src/java/org/apache/nutch/indexer/more/ src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip/

2011-12-27 Thread markus
Author: markus Date: Tue Dec 27 14:36:27 2011 New Revision: 1224916 URL: http://svn.apache.org/viewvc?rev=1224916&view=rev Log: NUTCH-1230 and NUTCH-1231 Upgrade to Tika 1.0 and using new Tika detect API Modified: nutch/trunk/CHANGES.txt nutch/trunk/ivy/ivy.xml nutch/trunk/src/

svn commit: r1224905 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/tools/CrawlDBScanner.java

2011-12-27 Thread markus
Author: markus Date: Tue Dec 27 13:22:50 2011 New Revision: 1224905 URL: http://svn.apache.org/viewvc?rev=1224905&view=rev Log: Reverting Nutch-1125 CrawlDBScanner Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/tools/CrawlDBScanner.java Modified: nutch/t

svn commit: r1224906 - in /nutch/trunk: CHANGES.txt ivy/ivy.xml

2011-12-27 Thread markus
Author: markus Date: Tue Dec 27 13:28:44 2011 New Revision: 1224906 URL: http://svn.apache.org/viewvc?rev=1224906&view=rev Log: NUTCH-1235 Upgrade to new Hadoop 0.20.205.0 Modified: nutch/trunk/CHANGES.txt nutch/trunk/ivy/ivy.xml Modified: nutch/trunk/CHANGES.txt URL:

svn commit: r1224912 - /nutch/trunk/ivy/ivy.xml

2011-12-27 Thread markus
Author: markus Date: Tue Dec 27 14:08:22 2011 New Revision: 1224912 URL: http://svn.apache.org/viewvc?rev=1224912&view=rev Log: NUTCH-1235, added Jackson ASL mapper as dep Modified: nutch/trunk/ivy/ivy.xml Modified: nutch/trunk/ivy/ivy.xml URL: http://svn.apache.org/viewvc/nutch/trunk

svn commit: r1225544 - /nutch/trunk/CHANGES.txt

2011-12-29 Thread markus
Author: markus Date: Thu Dec 29 14:35:31 2011 New Revision: 1225544 URL: http://svn.apache.org/viewvc?rev=1225544&view=rev Log: NUTCH-1238 Missed changes.txt Modified: nutch/trunk/CHANGES.txt Modified: nutch/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt

svn commit: r1225543 - in /nutch/trunk: conf/nutch-default.xml src/java/org/apache/nutch/fetcher/Fetcher.java

2011-12-29 Thread markus
Author: markus Date: Thu Dec 29 14:32:50 2011 New Revision: 1225543 URL: http://svn.apache.org/viewvc?rev=1225543&view=rev Log: NUTCH-1238 Fetcher throughput threshold must start before feeder finished Modified: nutch/trunk/conf/nutch-default.xml nutch/trunk/src/java/org/apache/n

svn commit: r1226406 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/NutchWritable.java src/java/org/apache/nutch/scoring/webgraph/WebGraph.java

2012-01-02 Thread markus
Author: markus Date: Mon Jan 2 13:11:50 2012 New Revision: 1226406 URL: http://svn.apache.org/viewvc?rev=1226406&view=rev Log: NUTCH-1239 Webgraph should remove deleted pages from segment input Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/c

svn commit: r1226409 - in /nutch/trunk: CHANGES.txt conf/schema-solr4.xml conf/schema.xml conf/solrindex-mapping.xml src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.

2012-01-02 Thread markus
Author: markus Date: Mon Jan 2 13:16:59 2012 New Revision: 1226409 URL: http://svn.apache.org/viewvc?rev=1226409&view=rev Log: NUTCH-1232 Remove site field from index-basic Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/schema-solr4.xml nutch/trunk/conf/schema.xml n

svn commit: r1229226 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/CrawlDbReader.java

2012-01-09 Thread markus
Author: markus Date: Mon Jan 9 16:01:27 2012 New Revision: 1229226 URL: http://svn.apache.org/viewvc?rev=1229226&view=rev Log: NUTCH-1244 CrawlDBDumper to filter by regex Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDbReader.java Modified: n

svn commit: r1229544 - in /nutch/trunk: ./ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/indexer/solr/

2012-01-10 Thread markus
Author: markus Date: Tue Jan 10 13:57:29 2012 New Revision: 1229544 URL: http://svn.apache.org/viewvc?rev=1229544&view=rev Log: NUTCH-1139 Indexer to delete gone documents Added: nutch/trunk/src/java/org/apache/nutch/indexer/NutchIndexAction.java Modified: nutch/trunk/CHANGES

svn commit: r1231090 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/crawl/Generator.java

2012-01-13 Thread markus
Author: markus Date: Fri Jan 13 14:31:22 2012 New Revision: 1231090 URL: http://svn.apache.org/viewvc?rev=1231090&view=rev Log: NUTCH-1177 Generator to select on retry interval Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/java/org/apache/n

svn commit: r1231168 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/Generator.java

2012-01-13 Thread markus
Author: markus Date: Fri Jan 13 16:43:42 2012 New Revision: 1231168 URL: http://svn.apache.org/viewvc?rev=1231168&view=rev Log: NUTCH-1248 Generator to select on status Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java Modified: nutch/t

svn commit: r1236674 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/fetcher/Fetcher.java

2012-01-27 Thread markus
Author: markus Date: Fri Jan 27 13:11:47 2012 New Revision: 1236674 URL: http://svn.apache.org/viewvc?rev=1236674&view=rev Log: NUTCH-1260 Fetcher should log fetching of redirects Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java Modi

svn commit: r1238590 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/scoring/webgraph/NodeDumper.java

2012-01-31 Thread markus
Author: markus Date: Tue Jan 31 14:17:27 2012 New Revision: 1238590 URL: http://svn.apache.org/viewvc?rev=1238590&view=rev Log: NUTCH-1256 WebGraph to dump host + score. Most if not all WebGraph options have been added to nutch-default as well. Modified: nutch/trunk/CHANGES.txt n

svn commit: r1238663 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/parse/ParseOutputFormat.java src/java/org/apache/nutch/parse/ParseSegment.java

2012-01-31 Thread markus
Author: markus Date: Tue Jan 31 15:24:37 2012 New Revision: 1238663 URL: http://svn.apache.org/viewvc?rev=1238663&view=rev Log: NUTCH-1242 Allow disabling of URL Filters in ParseSegment Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/java

svn commit: r1241460 - in /nutch/trunk: ./ src/plugin/ src/plugin/headings/ src/plugin/headings/src/ src/plugin/headings/src/java/ src/plugin/headings/src/java/org/ src/plugin/headings/src/java/org/ap

2012-02-07 Thread markus
Author: markus Date: Tue Feb 7 13:25:46 2012 New Revision: 1241460 URL: http://svn.apache.org/viewvc?rev=1241460&view=rev Log: NUTCH-1005 Parse headings plugin Added: nutch/trunk/src/plugin/headings/ nutch/trunk/src/plugin/headings/build.xml nutch/trunk/src/plugin/headings/ivy

svn commit: r1242255 - in /nutch/trunk: ./ src/plugin/subcollection/src/java/org/apache/nutch/collection/ src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/

2012-02-09 Thread markus
Author: markus Date: Thu Feb 9 09:55:08 2012 New Revision: 1242255 URL: http://svn.apache.org/viewvc?rev=1242255&view=rev Log: NUTCH-1266 Subcollection to optionally write to configured fields Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/subcollection/src/java/org/ap

svn commit: r1292764 - in /nutch/trunk: ./ conf/ src/plugin/ src/plugin/urlfilter-domainblacklist/ src/plugin/urlfilter-domainblacklist/data/ src/plugin/urlfilter-domainblacklist/src/ src/plugin/urlfi

2012-02-23 Thread markus
Author: markus Date: Thu Feb 23 12:32:49 2012 New Revision: 1292764 URL: http://svn.apache.org/viewvc?rev=1292764&view=rev Log: NUTCH-1210 Domain Blacklist Filter Added: nutch/trunk/conf/domainblacklist-urlfilter.txt nutch/trunk/src/plugin/urlfilter-domainblacklist/ nutch/trunk

svn commit: r1292790 - /nutch/trunk/src/plugin/build.xml

2012-02-23 Thread markus
Author: markus Date: Thu Feb 23 13:14:50 2012 New Revision: 1292790 URL: http://svn.apache.org/viewvc?rev=1292790&view=rev Log: NUTCH-1210 Domain Blacklist Filter added test to plugin/build.xml Modified: nutch/trunk/src/plugin/build.xml Modified: nutch/trunk/src/plugin/build.xml URL:

svn commit: r1295119 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/fetcher/Fetcher.java

2012-02-29 Thread markus
Author: markus Date: Wed Feb 29 14:12:36 2012 New Revision: 1295119 URL: http://svn.apache.org/viewvc?rev=1295119&view=rev Log: NUTCH-1291 Fetcher to stringify exception on // unexpected exception Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/fet

svn commit: r1295614 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java

2012-03-01 Thread markus
Author: markus Date: Thu Mar 1 15:24:02 2012 New Revision: 1295614 URL: http://svn.apache.org/viewvc?rev=1295614&view=rev Log: NUTCH-1293 IndexingFiltersChecker to store detected content type in crawldatum metadata Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/n

svn commit: r1295624 - in /nutch/trunk: CHANGES.txt src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java

2012-03-01 Thread markus
Author: markus Date: Thu Mar 1 15:37:56 2012 New Revision: 1295624 URL: http://svn.apache.org/viewvc?rev=1295624&view=rev Log: NUTCH-1258 MoreIndexingFilter should be able to read Content-Type from both parse metadata and content metadata Modified: nutch/trunk/CHANGES.txt nutch/t

svn commit: r1297586 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/scoring/webgraph/LinkRank.java

2012-03-06 Thread markus
Author: markus Date: Tue Mar 6 17:31:39 2012 New Revision: 1297586 URL: http://svn.apache.org/viewvc?rev=1297586&view=rev Log: NUTCH-1299 LinkRank inverter to ignore records without Node Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/scoring/webg

svn commit: r1298394 - /nutch/trunk/src/plugin/urlfilter-domainblacklist/src/java/org/apache/nutch/urlfilter/domainblacklist/DomainBlacklistURLFilter.java

2012-03-08 Thread markus
Author: markus Date: Thu Mar 8 13:52:54 2012 New Revision: 1298394 URL: http://svn.apache.org/viewvc?rev=1298394&view=rev Log: NUTCH-1305 Domain(blacklist)URLFilter to trim entries Modified: nutch/trunk/src/plugin/urlfilter-domainblacklist/src/java/org/apache/nutch/urlfi

svn commit: r1300871 - /nutch/trunk/CHANGES.txt

2012-03-15 Thread markus
Author: markus Date: Thu Mar 15 09:53:49 2012 New Revision: 1300871 URL: http://svn.apache.org/viewvc?rev=1300871&view=rev Log: NUTCH-1305 missing in CHANGES Modified: nutch/trunk/CHANGES.txt Modified: nutch/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt

<    1   2   3   4   >