svn commit: r1037742 - in /nutch/branches/branch-1.3/src/plugin/urlnormalizer-basic/src: java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java test/org/apache/nutch/net/urlnormalizer/b

2010-11-22 Thread markus
Author: markus Date: Mon Nov 22 14:56:40 2010 New Revision: 1037742 URL: http://svn.apache.org/viewvc?rev=1037742view=rev Log: NUTCH-935 - remove unnecessary /./ in basic urlnormalizer (via Stondubleyt) Modified: nutch/branches/branch-1.3/src/plugin/urlnormalizer-basic/src/java/org/apache

svn commit: r1082943 - in /nutch/branches/branch-1.3: CHANGES.txt conf/log4j.properties src/bin/nutch

2011-03-18 Thread markus
Author: markus Date: Fri Mar 18 15:05:34 2011 New Revision: 1082943 URL: http://svn.apache.org/viewvc?rev=1082943view=rev Log: NUTCH-963 Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (Claudio Martella, markus) Modified: nutch/branches/branch-1.3/CHANGES.txt

svn commit: r1089866 - /nutch/branches/branch-1.3/src/bin/nutch

2011-04-07 Thread markus
Author: markus Date: Thu Apr 7 13:08:05 2011 New Revision: 1089866 URL: http://svn.apache.org/viewvc?rev=1089866view=rev Log: NUTCH-975 Fix missing and wrong headers in source files (src/bin/nutch) Modified: nutch/branches/branch-1.3/src/bin/nutch Modified: nutch/branches/branch-1.3/src

svn commit: r1091895 - in /nutch/trunk: CHANGES.txt conf/solrindex-mapping.xml

2011-04-13 Thread markus
Author: markus Date: Wed Apr 13 19:34:53 2011 New Revision: 1091895 URL: http://svn.apache.org/viewvc?rev=1091895view=rev Log: NUTCH-982 Remove copying of ID and URL field in solrmapping Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/solrindex-mapping.xml Modified: nutch/trunk

svn commit: r1092084 - in /nutch/branches/branch-1.3: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/indexer/solr/SolrConstants.java

2011-04-14 Thread markus
Author: markus Date: Thu Apr 14 09:59:11 2011 New Revision: 1092084 URL: http://svn.apache.org/viewvc?rev=1092084view=rev Log: NUTCH-976 Rename properties solrindex.* to solr.* Modified: nutch/branches/branch-1.3/CHANGES.txt nutch/branches/branch-1.3/conf/nutch-default.xml nutch

svn commit: r1092090 - /nutch/trunk/CHANGES.txt

2011-04-14 Thread markus
Author: markus Date: Thu Apr 14 10:05:47 2011 New Revision: 1092090 URL: http://svn.apache.org/viewvc?rev=1092090view=rev Log: NUTCH-977 SolrMappingReader uses hardcoded configuration parameter name for mapping file Modified: nutch/trunk/CHANGES.txt Modified: nutch/trunk/CHANGES.txt URL

svn commit: r1092091 - in /nutch/branches/branch-1.3: CHANGES.txt src/java/org/apache/nutch/indexer/solr/SolrMappingReader.java

2011-04-14 Thread markus
Author: markus Date: Thu Apr 14 10:06:06 2011 New Revision: 1092091 URL: http://svn.apache.org/viewvc?rev=1092091view=rev Log: NUTCH-977 SolrMappingReader uses hardcoded configuration parameter name for mapping file Modified: nutch/branches/branch-1.3/CHANGES.txt nutch/branches/branch

svn commit: r1101279 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/indexer/solr/SolrWriter.java

2011-05-09 Thread markus
Author: markus Date: Tue May 10 00:44:42 2011 New Revision: 1101279 URL: http://svn.apache.org/viewvc?rev=1101279view=rev Log: NUTCH-996 Indexer adds solr.commit.size+1 docs Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java Modified

svn commit: r1101280 - in /nutch/branches/branch-1.3: CHANGES.txt src/java/org/apache/nutch/indexer/solr/SolrWriter.java

2011-05-09 Thread markus
Author: markus Date: Tue May 10 00:46:04 2011 New Revision: 1101280 URL: http://svn.apache.org/viewvc?rev=1101280view=rev Log: NUTCH-996 Indexer adds solr.commit.size+1 docs Modified: nutch/branches/branch-1.3/CHANGES.txt nutch/branches/branch-1.3/src/java/org/apache/nutch/indexer/solr

svn commit: r1126417 - in /nutch/branches/branch-1.3: CHANGES.txt conf/schema.xml

2011-05-23 Thread markus
Author: markus Date: Mon May 23 10:17:14 2011 New Revision: 1126417 URL: http://svn.apache.org/viewvc?rev=1126417view=rev Log: NUTCH-994 Fine tune Solr schema Modified: nutch/branches/branch-1.3/CHANGES.txt nutch/branches/branch-1.3/conf/schema.xml Modified: nutch/branches/branch-1.3

svn commit: r1126425 - in /nutch/trunk: CHANGES.txt conf/schema.xml

2011-05-23 Thread markus
Author: markus Date: Mon May 23 10:48:59 2011 New Revision: 1126425 URL: http://svn.apache.org/viewvc?rev=1126425view=rev Log: NUTCH-994 Fine tune Solr schema Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/schema.xml Modified: nutch/trunk/CHANGES.txt URL: http://svn.apache.org

svn commit: r1139331 - in /nutch/trunk: CHANGES.txt src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java

2011-06-24 Thread markus
Author: markus Date: Fri Jun 24 14:38:44 2011 New Revision: 1139331 URL: http://svn.apache.org/viewvc?rev=1139331view=rev Log: NUTCH-1006 MetaEquiv with single quotes not accepted Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html

svn commit: r1140117 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml

2011-06-27 Thread markus
Author: markus Date: Mon Jun 27 11:41:22 2011 New Revision: 1140117 URL: http://svn.apache.org/viewvc?rev=1140117view=rev Log: NUTCH-295 Description for fetcher.threads.fetch property Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml Modified: nutch/trunk/CHANGES.txt

svn commit: r1140695 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/util/EncodingDetector.java

2011-06-28 Thread markus
Author: markus Date: Tue Jun 28 15:59:47 2011 New Revision: 1140695 URL: http://svn.apache.org/viewvc?rev=1140695view=rev Log: NUTCH-1012 Cannot handle illegal charset Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutch/util

svn commit: r1140696 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/util/EncodingDetector.java

2011-06-28 Thread markus
Author: markus Date: Tue Jun 28 16:03:28 2011 New Revision: 1140696 URL: http://svn.apache.org/viewvc?rev=1140696view=rev Log: NUTCH-1012 Cannot handle illegal charset Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/util/EncodingDetector.java Modified: nutch

svn commit: r1141500 - in /nutch/branches/branch-1.4: CHANGES.txt conf/log4j.properties src/java/org/apache/nutch/indexer/solr/SolrWriter.java

2011-06-30 Thread markus
Author: markus Date: Thu Jun 30 12:13:26 2011 New Revision: 1141500 URL: http://svn.apache.org/viewvc?rev=1141500view=rev Log: NUTCH-1016 Strip UTF-8 non-character codepoints Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/log4j.properties nutch

svn commit: r1142664 - in /nutch/branches/branch-1.4: CHANGES.txt src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java

2011-07-04 Thread markus
Author: markus Date: Mon Jul 4 13:44:57 2011 New Revision: 1142664 URL: http://svn.apache.org/viewvc?rev=1142664view=rev Log: NUTCH-1013 Migrate RegexURLNormalizer from Apache ORO java.util.regex Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/plugin

svn commit: r1142687 - in /nutch/trunk: CHANGES.txt src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java

2011-07-04 Thread markus
Author: markus Date: Mon Jul 4 14:28:17 2011 New Revision: 1142687 URL: http://svn.apache.org/viewvc?rev=1142687view=rev Log: NUTCH-1013 Migrate RegexURLNormalizer from Apache ORO to java.util.regex Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/urlnormalizer-regex/src/java

svn commit: r1143467 - in /nutch/branches/branch-1.4: CHANGES.txt conf/regex-normalize.xml.template src/test/org/apache/nutch/net/TestURLNormalizers.java

2011-07-06 Thread markus
Author: markus Date: Wed Jul 6 15:34:43 2011 New Revision: 1143467 URL: http://svn.apache.org/viewvc?rev=1143467view=rev Log: NUTCH-1011 Remove duplicate slashes from URLs Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/regex-normalize.xml.template

svn commit: r1145110 - /nutch/branches/branch-1.4/conf/log4j.properties

2011-07-11 Thread markus
Author: markus Date: Mon Jul 11 10:30:20 2011 New Revision: 1145110 URL: http://svn.apache.org/viewvc?rev=1145110view=rev Log: NUTCH-1030 Updating log4j.properties as well Modified: nutch/branches/branch-1.4/conf/log4j.properties Modified: nutch/branches/branch-1.4/conf/log4j.properties URL

svn commit: r1145117 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java

2011-07-11 Thread markus
Author: markus Date: Mon Jul 11 10:44:56 2011 New Revision: 1145117 URL: http://svn.apache.org/viewvc?rev=1145117view=rev Log: NUTCH-783 IndexingFiltersChecker utility added Added: nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java Modified: nutch

svn commit: r1145131 - in /nutch/trunk: CHANGES.txt src/plugin/urlnormalizer-regex/src/java/org/apache/nutch/net/urlnormalizer/regex/RegexURLNormalizer.java

2011-07-11 Thread markus
Author: markus Date: Mon Jul 11 11:58:00 2011 New Revision: 1145131 URL: http://svn.apache.org/viewvc?rev=1145131view=rev Log: NUTCH-1027 Degrade log level of 'can't find rules for scope' Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/urlnormalizer-regex/src/java/org/apache

svn commit: r1146035 - in /nutch/branches/branch-1.4: ./ conf/ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/indexer/solr/

2011-07-13 Thread markus
Author: markus Date: Wed Jul 13 13:59:11 2011 New Revision: 1146035 URL: http://svn.apache.org/viewvc?rev=1146035view=rev Log: NUTCH-987, NUTCH-1036 Solr HTTP auth support and Hadoop reporter counter increments Added: nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/solr

svn commit: r1146043 - /nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/solr/SolrConstants.java

2011-07-13 Thread markus
Author: markus Date: Wed Jul 13 14:05:47 2011 New Revision: 1146043 URL: http://svn.apache.org/viewvc?rev=1146043view=rev Log: NUTCH-987 Constants for HTTP auth for Solr Modified: nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/solr/SolrConstants.java Modified: nutch/branches

svn commit: r1147615 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/crawl/CrawlDbReader.java

2011-07-17 Thread markus
Author: markus Date: Sun Jul 17 14:01:51 2011 New Revision: 1147615 URL: http://svn.apache.org/viewvc?rev=1147615view=rev Log: NUTCH-1029 ReadDB throws EOFException Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutch/crawl

svn commit: r1148301 - in /nutch/branches/branch-1.4: conf/log4j.properties src/java/org/apache/nutch/scoring/webgraph/LinkRank.java src/java/org/apache/nutch/scoring/webgraph/WebGraph.java

2011-07-19 Thread markus
Author: markus Date: Tue Jul 19 12:49:58 2011 New Revision: 1148301 URL: http://svn.apache.org/viewvc?rev=1148301view=rev Log: NUTCH-1050 Add segmentDir to WebGraph Modified: nutch/branches/branch-1.4/conf/log4j.properties nutch/branches/branch-1.4/src/java/org/apache/nutch/scoring

svn commit: r1148305 - in /nutch/branches/branch-1.4: CHANGES.txt conf/nutch-default.xml src/plugin/index-anchor/src/java/org/apache/nutch/indexer/anchor/AnchorIndexingFilter.java

2011-07-19 Thread markus
Author: markus Date: Tue Jul 19 13:01:45 2011 New Revision: 1148305 URL: http://svn.apache.org/viewvc?rev=1148305view=rev Log: NUTCH-1037 Option to deduplicate anchors prior to indexing Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/nutch-default.xml

svn commit: r1148308 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/plugin/index-anchor/src/java/org/apache/nutch/indexer/anchor/AnchorIndexingFilter.java

2011-07-19 Thread markus
Author: markus Date: Tue Jul 19 13:12:43 2011 New Revision: 1148308 URL: http://svn.apache.org/viewvc?rev=1148308view=rev Log: NUTCH-1037 Option to deduplicate anchors prior to indexing Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/plugin/index

svn commit: r1156132 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/parse/ParseSegment.java

2011-08-10 Thread markus
Author: markus Date: Wed Aug 10 12:26:49 2011 New Revision: 1156132 URL: http://svn.apache.org/viewvc?rev=1156132view=rev Log: NUTCH-1028 Log urls when parsing Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutch/parse/ParseSegment.java

svn commit: r1156665 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/crawl/LinkDbMerger.java

2011-08-11 Thread markus
Author: markus Date: Thu Aug 11 16:38:58 2011 New Revision: 1156665 URL: http://svn.apache.org/viewvc?rev=1156665view=rev Log: NUTCH-1069 Readlinkdb broken on Hadoop 0.20 Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutch/crawl

svn commit: r1158214 - in /nutch/branches/branch-1.4: CHANGES.txt src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java

2011-08-16 Thread markus
Author: markus Date: Tue Aug 16 11:58:12 2011 New Revision: 1158214 URL: http://svn.apache.org/viewvc?rev=1158214view=rev Log: NUTCH-1004 Do not index empty values for title field Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/plugin/index-basic/src/java

svn commit: r1158215 - in /nutch/trunk: CHANGES.txt src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java

2011-08-16 Thread markus
Author: markus Date: Tue Aug 16 11:59:01 2011 New Revision: 1158215 URL: http://svn.apache.org/viewvc?rev=1158215view=rev Log: NUTCH-1004 Do not index empty values for title field Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/index-basic/src/java/org/apache/nutch/indexer

svn commit: r1158218 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java

2011-08-16 Thread markus
Author: markus Date: Tue Aug 16 12:03:30 2011 New Revision: 1158218 URL: http://svn.apache.org/viewvc?rev=1158218view=rev Log: NUTCH-1082 IndexingFiltersChecker does not list multi valued fields Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org

svn commit: r1158357 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/scoring/webgraph/NodeDumper.java

2011-08-16 Thread markus
Author: markus Date: Tue Aug 16 16:28:43 2011 New Revision: 1158357 URL: http://svn.apache.org/viewvc?rev=1158357view=rev Log: NUTCH-1051 Export WebGraph node scores for Solr.ExternalFileField Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache

svn commit: r1159207 - in /nutch/branches/branch-1.4: CHANGES.txt src/bin/nutch

2011-08-18 Thread markus
Author: markus Date: Thu Aug 18 13:25:13 2011 New Revision: 1159207 URL: http://svn.apache.org/viewvc?rev=1159207view=rev Log: NUTCH-1049 Add classes to bin/nutch script Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/bin/nutch Modified: nutch/branches

svn commit: r1167096 - in /nutch/branches/branch-1.4: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/crawl/CrawlDb.java src/java/org/apache/nutch/crawl/CrawlDbFilter.java

2011-09-09 Thread markus
Author: markus Date: Fri Sep 9 11:13:54 2011 New Revision: 1167096 URL: http://svn.apache.org/viewvc?rev=1167096view=rev Log: NUTCH-1101 Option to purge db_gone records from CrawlDB Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/nutch-default.xml

svn commit: r1169707 - in /nutch/branches/branch-1.4: CHANGES.txt conf/nutch-default.xml src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java

2011-09-12 Thread markus
Author: markus Date: Mon Sep 12 12:16:53 2011 New Revision: 1169707 URL: http://svn.apache.org/viewvc?rev=1169707view=rev Log: NUTCH-1105 Max content length option for index-basic Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/conf/nutch-default.xml nutch

svn commit: r1170282 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/crawl/CrawlDb.java

2011-09-13 Thread markus
Author: markus Date: Tue Sep 13 18:15:17 2011 New Revision: 1170282 URL: http://svn.apache.org/viewvc?rev=1170282view=rev Log: NUTCH-1110 UpdateDB must not write _success file Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org/apache/nutch/crawl

svn commit: r1170526 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/fetcher/Fetcher.java

2011-09-14 Thread markus
Author: markus Date: Wed Sep 14 10:59:24 2011 New Revision: 1170526 URL: http://svn.apache.org/viewvc?rev=1170526view=rev Log: NUTCH-1067 Configure minimum throughput for fetcher and NUTCH-1102 Fetcher to rely on fetcher.parse directive Modified: nutch/branches/branch-1.4/CHANGES.txt

svn commit: r1170557 - in /nutch/branches/branch-1.4/src: java/org/apache/nutch/crawl/Crawl.java java/org/apache/nutch/tools/Benchmark.java test/org/apache/nutch/fetcher/TestFetcher.java

2011-09-14 Thread markus
Author: markus Date: Wed Sep 14 12:13:42 2011 New Revision: 1170557 URL: http://svn.apache.org/viewvc?rev=1170557view=rev Log: NUTCH-1067,NUTCH-1102 Fixes for Benchmark, Crawl and TestFetcher Modified: nutch/branches/branch-1.4/src/java/org/apache/nutch/crawl/Crawl.java nutch/branches

svn commit: r1172585 - /nutch/branches/branch-1.4/conf/nutch-default.xml

2011-09-19 Thread markus
Author: markus Date: Mon Sep 19 12:12:17 2011 New Revision: 1172585 URL: http://svn.apache.org/viewvc?rev=1172585view=rev Log: NUTCH-1067 Nutch-default configuration directives missing Modified: nutch/branches/branch-1.4/conf/nutch-default.xml Modified: nutch/branches/branch-1.4/conf/nutch

svn commit: r1172637 - in /nutch/branches/branch-1.4: CHANGES.txt src/plugin/urlfilter-domain/plugin.xml

2011-09-19 Thread markus
Author: markus Date: Mon Sep 19 14:14:05 2011 New Revision: 1172637 URL: http://svn.apache.org/viewvc?rev=1172637view=rev Log: NUTCH-1114 Attr file missing in domain filter Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/plugin/urlfilter-domain/plugin.xml

svn commit: r1174147 - in /nutch/branches/branch-1.4: conf/nutch-default.xml src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMContentUtils.java src/plugin/parse-tika/src/java/org/apache/n

2011-09-22 Thread markus
Author: markus Date: Thu Sep 22 14:02:51 2011 New Revision: 1174147 URL: http://svn.apache.org/viewvc?rev=1174147view=rev Log: NUTCH-1115 Option to disable fixing of URL embedded parameters in DomContentUtils Modified: nutch/branches/branch-1.4/conf/nutch-default.xml nutch/branches

svn commit: r1174689 - in /nutch/branches/branch-1.4: CHANGES.txt src/java/org/apache/nutch/crawl/Generator.java

2011-09-23 Thread markus
Author: markus Date: Fri Sep 23 12:09:35 2011 New Revision: 1174689 URL: http://svn.apache.org/viewvc?rev=1174689view=rev Log: NUTCH-1074 topN is ignored with maxNumSegments and generate.max.count Modified: nutch/branches/branch-1.4/CHANGES.txt nutch/branches/branch-1.4/src/java/org

svn commit: r1178376 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/LinkDb.java

2011-10-03 Thread markus
Author: markus Date: Mon Oct 3 10:57:33 2011 New Revision: 1178376 URL: http://svn.apache.org/viewvc?rev=1178376view=rev Log: NUTCH-1137 LinkDB other options ignored with -dir Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/LinkDb.java Modified: nutch

svn commit: r1200360 - in /nutch/trunk: ./ src/plugin/index-more/ src/plugin/index-more/src/java/org/apache/nutch/indexer/more/ src/plugin/index-more/src/test/org/apache/nutch/indexer/more/

2011-11-10 Thread markus
Author: markus Date: Thu Nov 10 15:02:04 2011 New Revision: 1200360 URL: http://svn.apache.org/viewvc?rev=1200360view=rev Log: NUTCH-1061 Migrate MoreIndexingFilter from Apache ORO to java.util.regex Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/index-more/plugin.xml

svn commit: r1200834 - in /nutch/branches/nutchgora: CHANGES.txt conf/nutch-default.xml

2011-11-11 Thread markus
Author: markus Date: Fri Nov 11 12:00:20 2011 New Revision: 1200834 URL: http://svn.apache.org/viewvc?rev=1200834view=rev Log: NUTCH-1185 Decrease solr.commit.size to 250 Modified: nutch/branches/nutchgora/CHANGES.txt nutch/branches/nutchgora/conf/nutch-default.xml Modified: nutch

svn commit: r1200912 - /nutch/trunk/src/test/org/apache/nutch/crawl/TestGenerator.java

2011-11-11 Thread markus
Author: markus Date: Fri Nov 11 15:01:05 2011 New Revision: 1200912 URL: http://svn.apache.org/viewvc?rev=1200912view=rev Log: NUTCH-1155 Fixes failing test Modified: nutch/trunk/src/test/org/apache/nutch/crawl/TestGenerator.java Modified: nutch/trunk/src/test/org/apache/nutch/crawl

svn commit: r1200915 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/parse/ParseSegment.java

2011-11-11 Thread markus
Author: markus Date: Fri Nov 11 15:16:49 2011 New Revision: 1200915 URL: http://svn.apache.org/viewvc?rev=1200915view=rev Log: NUTCH-1203 ParseSegment to show number of milliseconds per parse Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java

svn commit: r1220788 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/tools/CrawlDBScanner.java

2011-12-19 Thread markus
Author: markus Date: Mon Dec 19 15:15:43 2011 New Revision: 1220788 URL: http://svn.apache.org/viewvc?rev=1220788view=rev Log: NUTCH-1225 Migrate CrawlDBScanner to MapReduce API Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/tools/CrawlDBScanner.java Modified

svn commit: r1221181 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/fetcher/Fetcher.java src/java/org/apache/nutch/parse/ParseData.java src/java/org/apache/nutch/parse

2011-12-20 Thread markus
Author: markus Date: Tue Dec 20 10:11:09 2011 New Revision: 1221181 URL: http://svn.apache.org/viewvc?rev=1221181view=rev Log: NUTCH-1184 Fetcher to parse and follow Nth degree outlinks Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/java/org

svn commit: r1224906 - in /nutch/trunk: CHANGES.txt ivy/ivy.xml

2011-12-27 Thread markus
Author: markus Date: Tue Dec 27 13:28:44 2011 New Revision: 1224906 URL: http://svn.apache.org/viewvc?rev=1224906view=rev Log: NUTCH-1235 Upgrade to new Hadoop 0.20.205.0 Modified: nutch/trunk/CHANGES.txt nutch/trunk/ivy/ivy.xml Modified: nutch/trunk/CHANGES.txt URL: http

svn commit: r1224912 - /nutch/trunk/ivy/ivy.xml

2011-12-27 Thread markus
Author: markus Date: Tue Dec 27 14:08:22 2011 New Revision: 1224912 URL: http://svn.apache.org/viewvc?rev=1224912view=rev Log: NUTCH-1235, added Jackson ASL mapper as dep Modified: nutch/trunk/ivy/ivy.xml Modified: nutch/trunk/ivy/ivy.xml URL: http://svn.apache.org/viewvc/nutch/trunk/ivy

svn commit: r1225544 - /nutch/trunk/CHANGES.txt

2011-12-29 Thread markus
Author: markus Date: Thu Dec 29 14:35:31 2011 New Revision: 1225544 URL: http://svn.apache.org/viewvc?rev=1225544view=rev Log: NUTCH-1238 Missed changes.txt Modified: nutch/trunk/CHANGES.txt Modified: nutch/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?rev

svn commit: r1225543 - in /nutch/trunk: conf/nutch-default.xml src/java/org/apache/nutch/fetcher/Fetcher.java

2011-12-29 Thread markus
Author: markus Date: Thu Dec 29 14:32:50 2011 New Revision: 1225543 URL: http://svn.apache.org/viewvc?rev=1225543view=rev Log: NUTCH-1238 Fetcher throughput threshold must start before feeder finished Modified: nutch/trunk/conf/nutch-default.xml nutch/trunk/src/java/org/apache/nutch

svn commit: r1226406 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/NutchWritable.java src/java/org/apache/nutch/scoring/webgraph/WebGraph.java

2012-01-02 Thread markus
Author: markus Date: Mon Jan 2 13:11:50 2012 New Revision: 1226406 URL: http://svn.apache.org/viewvc?rev=1226406view=rev Log: NUTCH-1239 Webgraph should remove deleted pages from segment input Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl

svn commit: r1226409 - in /nutch/trunk: CHANGES.txt conf/schema-solr4.xml conf/schema.xml conf/solrindex-mapping.xml src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.

2012-01-02 Thread markus
Author: markus Date: Mon Jan 2 13:16:59 2012 New Revision: 1226409 URL: http://svn.apache.org/viewvc?rev=1226409view=rev Log: NUTCH-1232 Remove site field from index-basic Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/schema-solr4.xml nutch/trunk/conf/schema.xml nutch/trunk

svn commit: r1229544 - in /nutch/trunk: ./ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/indexer/solr/

2012-01-10 Thread markus
Author: markus Date: Tue Jan 10 13:57:29 2012 New Revision: 1229544 URL: http://svn.apache.org/viewvc?rev=1229544view=rev Log: NUTCH-1139 Indexer to delete gone documents Added: nutch/trunk/src/java/org/apache/nutch/indexer/NutchIndexAction.java Modified: nutch/trunk/CHANGES.txt

svn commit: r1231168 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/Generator.java

2012-01-13 Thread markus
Author: markus Date: Fri Jan 13 16:43:42 2012 New Revision: 1231168 URL: http://svn.apache.org/viewvc?rev=1231168view=rev Log: NUTCH-1248 Generator to select on status Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java Modified: nutch/trunk

svn commit: r1236674 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/fetcher/Fetcher.java

2012-01-27 Thread markus
Author: markus Date: Fri Jan 27 13:11:47 2012 New Revision: 1236674 URL: http://svn.apache.org/viewvc?rev=1236674view=rev Log: NUTCH-1260 Fetcher should log fetching of redirects Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java Modified: nutch

svn commit: r1238663 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/parse/ParseOutputFormat.java src/java/org/apache/nutch/parse/ParseSegment.java

2012-01-31 Thread markus
Author: markus Date: Tue Jan 31 15:24:37 2012 New Revision: 1238663 URL: http://svn.apache.org/viewvc?rev=1238663view=rev Log: NUTCH-1242 Allow disabling of URL Filters in ParseSegment Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/java/org

svn commit: r1298394 - /nutch/trunk/src/plugin/urlfilter-domainblacklist/src/java/org/apache/nutch/urlfilter/domainblacklist/DomainBlacklistURLFilter.java

2012-03-08 Thread markus
Author: markus Date: Thu Mar 8 13:52:54 2012 New Revision: 1298394 URL: http://svn.apache.org/viewvc?rev=1298394view=rev Log: NUTCH-1305 Domain(blacklist)URLFilter to trim entries Modified: nutch/trunk/src/plugin/urlfilter-domainblacklist/src/java/org/apache/nutch/urlfilter/domainblacklist

svn commit: r1305355 - in /nutch/trunk: CHANGES.txt ivy/ivy.xml

2012-03-26 Thread markus
Author: markus Date: Mon Mar 26 13:44:57 2012 New Revision: 1305355 URL: http://svn.apache.org/viewvc?rev=1305355view=rev Log: NUTCH-1234 Upgrade to Tika 1.1 Modified: nutch/trunk/CHANGES.txt nutch/trunk/ivy/ivy.xml Modified: nutch/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc

svn commit: r1305381 - /nutch/trunk/src/plugin/urlfilter-domain/src/java/org/apache/nutch/urlfilter/domain/DomainURLFilter.java

2012-03-26 Thread markus
Author: markus Date: Mon Mar 26 14:51:02 2012 New Revision: 1305381 URL: http://svn.apache.org/viewvc?rev=1305381view=rev Log: NUTCH-1305 DomainFilter missing Modified: nutch/trunk/src/plugin/urlfilter-domain/src/java/org/apache/nutch/urlfilter/domain/DomainURLFilter.java Modified: nutch

svn commit: r1306440 - /nutch/trunk/src/plugin/parse-tika/ivy.xml

2012-03-28 Thread markus
Author: markus Date: Wed Mar 28 15:48:23 2012 New Revision: 1306440 URL: http://svn.apache.org/viewvc?rev=1306440view=rev Log: Upgrade to Tika 1.1 Modified: nutch/trunk/src/plugin/parse-tika/ivy.xml Modified: nutch/trunk/src/plugin/parse-tika/ivy.xml URL: http://svn.apache.org/viewvc/nutch

svn commit: r1347747 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/util/URLUtil.java src/java/org/apache/nutch/util/domain/DomainStatistics.java

2012-06-07 Thread markus
Author: markus Date: Thu Jun 7 18:21:39 2012 New Revision: 1347747 URL: http://svn.apache.org/viewvc?rev=1347747view=rev Log: NUTCH-1351 DomainStatistics to aggregate by TLD Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/util/URLUtil.java nutch/trunk/src

svn commit: r1347755 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java src/java/org/apache/nutch/parse/ParserChecker.java src/java/org/apache/nutch/util/URLU

2012-06-07 Thread markus
Author: markus Date: Thu Jun 7 18:48:58 2012 New Revision: 1347755 URL: http://svn.apache.org/viewvc?rev=1347755view=rev Log: NUTCH-1320 IndexChecker and ParseChecker choke on IDN's Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/indexer

svn commit: r1347897 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/fetcher/Fetcher.java

2012-06-08 Thread markus
Author: markus Date: Fri Jun 8 07:03:38 2012 New Revision: 1347897 URL: http://svn.apache.org/viewvc?rev=1347897view=rev Log: NUTCH-1346 Follow outlinks to ignore external Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/java/org/apache/nutch

svn commit: r1347909 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/java/org/apache/nutch/indexer/IndexerMapReduce.java

2012-06-08 Thread markus
Author: markus Date: Fri Jun 8 07:37:42 2012 New Revision: 1347909 URL: http://svn.apache.org/viewvc?rev=1347909view=rev Log: NUTCH-1336 Optionally not index db_notmodified pages Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/java/org/apache

svn commit: r1348764 - in /nutch/trunk: ./ src/java/org/apache/nutch/indexer/ src/java/org/apache/nutch/net/ src/java/org/apache/nutch/parse/ src/java/org/apache/nutch/scoring/

2012-06-11 Thread markus
Author: markus Date: Mon Jun 11 09:28:14 2012 New Revision: 1348764 URL: http://svn.apache.org/viewvc?rev=1348764view=rev Log: NUTCH-1385 More robust plug-in order properties in nutch-site.xml Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/indexer

svn commit: r1348766 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/parse/ParseSegment.java

2012-06-11 Thread markus
Author: markus Date: Mon Jun 11 09:30:26 2012 New Revision: 1348766 URL: http://svn.apache.org/viewvc?rev=1348766view=rev Log: NUTCH-1384 Typo in ParseSegments's run-method Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java Modified: nutch

svn commit: r1349236 - in /nutch/trunk: ./ conf/ src/plugin/urlnormalizer-host/ src/plugin/urlnormalizer-host/data/ src/plugin/urlnormalizer-host/src/ src/plugin/urlnormalizer-host/src/java/ src/plugi

2012-06-12 Thread markus
Author: markus Date: Tue Jun 12 10:33:18 2012 New Revision: 1349236 URL: http://svn.apache.org/viewvc?rev=1349236view=rev Log: NUTCH-1319 HostNormalizer plugin Added: nutch/trunk/conf/host-urlnormalizer.txt nutch/trunk/src/plugin/urlnormalizer-host/ nutch/trunk/src/plugin

svn commit: r1353118 - /nutch/trunk/src/plugin/build.xml

2012-06-23 Thread markus
Author: markus Date: Sat Jun 23 12:15:04 2012 New Revision: 1353118 URL: http://svn.apache.org/viewvc?rev=1353118view=rev Log: NUTCH-1319 HostNormalizer was not properly added to plugins/build.xml Modified: nutch/trunk/src/plugin/build.xml Modified: nutch/trunk/src/plugin/build.xml URL

svn commit: r1353582 - in /nutch/trunk: CHANGES.txt src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/RobotRulesParser.java

2012-06-25 Thread markus
Author: markus Date: Mon Jun 25 14:42:05 2012 New Revision: 1353582 URL: http://svn.apache.org/viewvc?rev=1353582view=rev Log: NUTCH-1408 RobotRulesParser main doesn't take URL's Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http

svn commit: r1353585 - in /nutch/trunk: CHANGES.txt conf/nutch-default.xml src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java

2012-06-25 Thread markus
Author: markus Date: Mon Jun 25 14:49:03 2012 New Revision: 1353585 URL: http://svn.apache.org/viewvc?rev=1353585view=rev Log: NUTCH-1407 BasicIndexingFilter to optionally add domain field Modified: nutch/trunk/CHANGES.txt nutch/trunk/conf/nutch-default.xml nutch/trunk/src/plugin

svn commit: r1353857 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java

2012-06-26 Thread markus
Author: markus Date: Tue Jun 26 08:21:58 2012 New Revision: 1353857 URL: http://svn.apache.org/viewvc?rev=1353857view=rev Log: NUTCH-1251 SolrDedup to use proper Lucene catch-all query Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/indexer/solr

svn commit: r1353884 - /nutch/trunk/src/test/org/apache/nutch/net/TestURLNormalizers.java

2012-06-26 Thread markus
Author: markus Date: Tue Jun 26 09:18:40 2012 New Revision: 1353884 URL: http://svn.apache.org/viewvc?rev=1353884view=rev Log: NUTCH-1319 adding test to accomodate HostURLNormalizer Modified: nutch/trunk/src/test/org/apache/nutch/net/TestURLNormalizers.java Modified: nutch/trunk/src/test

svn commit: r1363793 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java src/java/org/apache/nutch/crawl/Injector.java src/java/org/apache/nutch/metadata/Nutch.ja

2012-07-20 Thread markus
Author: markus Date: Fri Jul 20 14:22:19 2012 New Revision: 1363793 URL: http://svn.apache.org/viewvc?rev=1363793view=rev Log: NUTCH-1388 Optionally maintain custom fetch interval despite AdaptiveFetchSchedule Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl

svn commit: r1401225 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/CrawlDb.java

2012-10-23 Thread markus
Author: markus Date: Tue Oct 23 09:45:29 2012 New Revision: 1401225 URL: http://svn.apache.org/viewvc?rev=1401225view=rev Log: NUTCH-1215 UpdateDB should not require segment as input Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDb.java Modified

svn commit: r1406077 - in /nutch/branches/2.x: CHANGES.txt src/java/org/apache/nutch/indexer/solr/SolrWriter.java

2012-11-06 Thread markus
Author: markus Date: Tue Nov 6 09:17:38 2012 New Revision: 1406077 URL: http://svn.apache.org/viewvc?rev=1406077view=rev Log: NUTCH-1491 Strip UTF-8 non-character codepoints in title Modified: nutch/branches/2.x/CHANGES.txt nutch/branches/2.x/src/java/org/apache/nutch/indexer/solr

svn commit: r1433900 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/indexer/NutchField.java

2013-01-16 Thread markus
Author: markus Date: Wed Jan 16 11:10:09 2013 New Revision: 1433900 URL: http://svn.apache.org/viewvc?rev=1433900view=rev Log: Implement read/write in NutchField Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/indexer/NutchField.java Modified: nutch/trunk

svn commit: r1488879 - in /nutch/trunk: CHANGES.txt ivy/ivy.xml

2013-06-03 Thread markus
Author: markus Date: Mon Jun 3 08:02:35 2013 New Revision: 1488879 URL: http://svn.apache.org/r1488879 Log: NUTCH-1578 Upgrade to Hadoop 1.2.0 Modified: nutch/trunk/CHANGES.txt nutch/trunk/ivy/ivy.xml Modified: nutch/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/nutch/trunk

svn commit: r1492215 - in /nutch/branches/2.x: CHANGES.txt ivy/ivy.xml

2013-06-12 Thread markus
Author: markus Date: Wed Jun 12 14:19:39 2013 New Revision: 1492215 URL: http://svn.apache.org/r1492215 Log: NUTCH-1578 Upgrade to Hadoop 1.2.0 Modified: nutch/branches/2.x/CHANGES.txt nutch/branches/2.x/ivy/ivy.xml Modified: nutch/branches/2.x/CHANGES.txt URL: http://svn.apache.org

svn commit: r1492639 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java src/java/org/apache/nutch/tools/FreeGenerator.java

2013-06-13 Thread markus
Author: markus Date: Thu Jun 13 12:10:37 2013 New Revision: 1492639 URL: http://svn.apache.org/r1492639 Log: NUTCH-1430 Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl

svn commit: r1494496 - in /nutch/trunk: ./ conf/ src/plugin/ src/plugin/indexer-elastic/ src/plugin/indexer-elastic/src/ src/plugin/indexer-elastic/src/java/ src/plugin/indexer-elastic/src/java/org/ s

2013-06-19 Thread markus
Author: markus Date: Wed Jun 19 08:31:28 2013 New Revision: 1494496 URL: http://svn.apache.org/r1494496 Log: NUTCH-1527 Elasticsearch indexer Added: nutch/trunk/src/plugin/indexer-elastic/ nutch/trunk/src/plugin/indexer-elastic/build.xml nutch/trunk/src/plugin/indexer-elastic/ivy.xml

svn commit: r1494893 - /nutch/trunk/ivy/ivy.xml

2013-06-20 Thread markus
Author: markus Date: Thu Jun 20 09:06:34 2013 New Revision: 1494893 URL: http://svn.apache.org/r1494893 Log: NUTCH-1527 ES dep in ivy missing Modified: nutch/trunk/ivy/ivy.xml Modified: nutch/trunk/ivy/ivy.xml URL: http://svn.apache.org/viewvc/nutch/trunk/ivy/ivy.xml?rev=1494893r1

svn commit: r1494894 - /nutch/trunk/src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java

2013-06-20 Thread markus
Author: markus Date: Thu Jun 20 09:07:12 2013 New Revision: 1494894 URL: http://svn.apache.org/r1494894 Log: NUTCH-1583 Headings plugin to support multivalued headings Modified: nutch/trunk/src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java Modified

svn commit: r1496023 - in /nutch/branches/2.x: ./ src/plugin/ src/plugin/urlfilter-prefix/src/test/ src/plugin/urlfilter-prefix/src/test/org/ src/plugin/urlfilter-prefix/src/test/org/apache/ src/plugi

2013-06-24 Thread markus
Author: markus Date: Mon Jun 24 13:12:59 2013 New Revision: 1496023 URL: http://svn.apache.org/r1496023 Log: NUTCH-1126 JUnit test for urlfilter-prefix Added: nutch/branches/2.x/src/plugin/urlfilter-prefix/src/test/ nutch/branches/2.x/src/plugin/urlfilter-prefix/src/test/org/ nutch

svn commit: r1496025 - in /nutch/trunk: ./ src/plugin/ src/plugin/urlfilter-prefix/src/test/ src/plugin/urlfilter-prefix/src/test/org/ src/plugin/urlfilter-prefix/src/test/org/apache/ src/plugin/urlfi

2013-06-24 Thread markus
Author: markus Date: Mon Jun 24 13:14:14 2013 New Revision: 1496025 URL: http://svn.apache.org/r1496025 Log: NUTCH-1126 JUnit test for urlfilter-prefix Added: nutch/trunk/src/plugin/urlfilter-prefix/src/test/ nutch/trunk/src/plugin/urlfilter-prefix/src/test/org/ nutch/trunk/src

svn commit: r1498346 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/segment/SegmentMerger.java

2013-07-01 Thread markus
Author: markus Date: Mon Jul 1 10:03:12 2013 New Revision: 1498346 URL: http://svn.apache.org/r1498346 Log: NUTCH-1593 Normalize option missing in SegmentMerger's usage Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/segment/SegmentMerger.java Modified: nutch

svn commit: r1498830 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/CrawlDbReader.java

2013-07-02 Thread markus
Author: markus Date: Tue Jul 2 08:36:13 2013 New Revision: 1498830 URL: http://svn.apache.org/r1498830 Log: NUTCH-1327 QueryStringNormalizer Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDbReader.java Modified: nutch/trunk/CHANGES.txt URL: http

svn commit: r1498832 - in /nutch/trunk: ./ src/plugin/ src/plugin/urlnormalizer-querystring/ src/plugin/urlnormalizer-querystring/src/ src/plugin/urlnormalizer-querystring/src/java/ src/plugin/urlnorm

2013-07-02 Thread markus
Author: markus Date: Tue Jul 2 08:37:40 2013 New Revision: 1498832 URL: http://svn.apache.org/r1498832 Log: NUTCH-1581 CrawlDB csv output to include metadata Added: nutch/trunk/src/plugin/urlnormalizer-querystring/ nutch/trunk/src/plugin/urlnormalizer-querystring/build.xml nutch

svn commit: r1499684 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/crawl/Injector.java

2013-07-04 Thread markus
Author: markus Date: Thu Jul 4 08:50:25 2013 New Revision: 1499684 URL: http://svn.apache.org/r1499684 Log: NUTCH-1600 Injector overwrite does not always work properly Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/crawl/Injector.java Modified: nutch/trunk

svn commit: r1499696 - in /nutch/trunk: CHANGES.txt src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java

2013-07-04 Thread markus
Author: markus Date: Thu Jul 4 09:07:12 2013 New Revision: 1499696 URL: http://svn.apache.org/r1499696 Log: NUTCH-1597 HeadingsParseFilter to trim and remove exess whitespace Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/headings/src/java/org/apache/nutch/parse/headings

svn commit: r1499722 - in /nutch/trunk: CHANGES.txt src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java

2013-07-04 Thread markus
Author: markus Date: Thu Jul 4 11:13:34 2013 New Revision: 1499722 URL: http://svn.apache.org/r1499722 Log: NUTCH-1596 HeadingsParseFilter not thread safe Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin/headings/src/java/org/apache/nutch/parse/headings/HeadingsParseFilter.java

svn commit: r1499948 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/segment/SegmentMerger.java

2013-07-05 Thread markus
Author: markus Date: Fri Jul 5 08:52:51 2013 New Revision: 1499948 URL: http://svn.apache.org/r1499948 Log: NUTCH-1520 SegmentMerger looses records Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/segment/SegmentMerger.java Modified: nutch/trunk/CHANGES.txt URL

svn commit: r1499952 - in /nutch/trunk: CHANGES.txt conf/elasticsearch.conf src/plugin/indexer-elastic/src/java/org/apache/nutch/indexwriter/elastic/ElasticIndexWriter.java

2013-07-05 Thread markus
Author: markus Date: Fri Jul 5 09:03:50 2013 New Revision: 1499952 URL: http://svn.apache.org/r1499952 Log: NUTCH-1598 ElasticSearchIndexer to read ImmutableSettings from config Added: nutch/trunk/conf/elasticsearch.conf Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/plugin

svn commit: r1499959 - in /nutch/branches/2.x: CHANGES.txt ivy/ivy.xml src/plugin/parse-tika/howto_upgrade_tika.txt src/plugin/parse-tika/ivy.xml src/plugin/parse-tika/plugin.xml

2013-07-05 Thread markus
Author: markus Date: Fri Jul 5 10:27:47 2013 New Revision: 1499959 URL: http://svn.apache.org/r1499959 Log: NUTCH-1595 Upgrade to Tika 1.4 (jnioche, markus) Added: nutch/branches/2.x/src/plugin/parse-tika/howto_upgrade_tika.txt Modified: nutch/branches/2.x/CHANGES.txt nutch/branches

svn commit: r1499960 - in /nutch/trunk: CHANGES.txt ivy/ivy.xml src/plugin/parse-tika/howto_upgrade_tika.txt src/plugin/parse-tika/ivy.xml src/plugin/parse-tika/plugin.xml

2013-07-05 Thread markus
Author: markus Date: Fri Jul 5 10:28:46 2013 New Revision: 1499960 URL: http://svn.apache.org/r1499960 Log: NUTCH-1595 Upgrade to Tika 1.4 Added: nutch/trunk/src/plugin/parse-tika/howto_upgrade_tika.txt Modified: nutch/trunk/CHANGES.txt nutch/trunk/ivy/ivy.xml nutch/trunk/src

svn commit: r1528072 - in /nutch/trunk: CHANGES.txt src/java/org/apache/nutch/indexer/IndexerMapReduce.java

2013-10-01 Thread markus
Author: markus Date: Tue Oct 1 12:50:06 2013 New Revision: 1528072 URL: http://svn.apache.org/r1528072 Log: NUTCH-1646 IndexerMapReduce to consider DB status Modified: nutch/trunk/CHANGES.txt nutch/trunk/src/java/org/apache/nutch/indexer/IndexerMapReduce.java Modified: nutch/trunk

  1   2   >