Author: markus
Date: Thu May 5 13:48:56 2011
New Revision: 1099802
URL: http://svn.apache.org/viewvc?rev=1099802&view=rev
Log:
NUTCH-989 Index-basic plugin and Solr schema now use date fieldType for tstamp
field
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/schema.xml
Modi
Author: markus
Date: Tue May 10 00:44:42 2011
New Revision: 1101279
URL: http://svn.apache.org/viewvc?rev=1101279&view=rev
Log:
NUTCH-996 Indexer adds solr.commit.size+1 docs
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrWriter.java
Modi
Author: markus
Date: Tue May 10 00:46:04 2011
New Revision: 1101280
URL: http://svn.apache.org/viewvc?rev=1101280&view=rev
Log:
NUTCH-996 Indexer adds solr.commit.size+1 docs
Modified:
nutch/branches/branch-1.3/CHANGES.txt
nutch/branches/branch-1.3/src/java/org/apache/nutch/indexer/
Author: markus
Date: Mon May 23 10:17:14 2011
New Revision: 1126417
URL: http://svn.apache.org/viewvc?rev=1126417&view=rev
Log:
NUTCH-994 Fine tune Solr schema
Modified:
nutch/branches/branch-1.3/CHANGES.txt
nutch/branches/branch-1.3/conf/schema.xml
Modified: nutch/branches/branch
Author: markus
Date: Mon May 23 10:48:59 2011
New Revision: 1126425
URL: http://svn.apache.org/viewvc?rev=1126425&view=rev
Log:
NUTCH-994 Fine tune Solr schema
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/schema.xml
Modified: nutch/trunk/CHANGES.txt
URL:
http://svn.apache
Author: markus
Date: Fri Jun 24 13:56:27 2011
New Revision: 1139307
URL: http://svn.apache.org/viewvc?rev=1139307&view=rev
Log:
NUTCH-1010 ContentLength not trimmed
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/plugin/index-more/src/java/org/apache/n
Author: markus
Date: Fri Jun 24 13:56:42 2011
New Revision: 1139308
URL: http://svn.apache.org/viewvc?rev=1139308&view=rev
Log:
NUTCH-1010 ContentLength not trimmed
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/plugin/index-more/src/java/org/apache/nutch/indexer/
Author: markus
Date: Fri Jun 24 14:37:57 2011
New Revision: 1139329
URL: http://svn.apache.org/viewvc?rev=1139329&view=rev
Log:
NUTCH-1006 MetaEquiv with single quotes not accepted
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/plugin/parse-html/src/
Author: markus
Date: Fri Jun 24 14:38:44 2011
New Revision: 1139331
URL: http://svn.apache.org/viewvc?rev=1139331&view=rev
Log:
NUTCH-1006 MetaEquiv with single quotes not accepted
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/
Author: markus
Date: Fri Jun 24 15:35:12 2011
New Revision: 1139357
URL: http://svn.apache.org/viewvc?rev=1139357&view=rev
Log:
NUTCH-1000 Add option not to commit to Solr
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/
Author: markus
Date: Mon Jun 27 11:40:53 2011
New Revision: 1140116
URL: http://svn.apache.org/viewvc?rev=1140116&view=rev
Log:
NUTCH-295 Description for fetcher.threads.fetch property
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/conf/nutch-default
Author: markus
Date: Mon Jun 27 11:41:22 2011
New Revision: 1140117
URL: http://svn.apache.org/viewvc?rev=1140117&view=rev
Log:
NUTCH-295 Description for fetcher.threads.fetch property
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/nutch-default.xml
Modified: nutch/trunk/CHANGES
Author: markus
Date: Tue Jun 28 13:54:21 2011
New Revision: 1140619
URL: http://svn.apache.org/viewvc?rev=1140619&view=rev
Log:
NUTCH-1022 Upgrade version number of Nutch agent in conf
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/conf/nutch-default
Author: markus
Date: Tue Jun 28 15:26:20 2011
New Revision: 1140685
URL: http://svn.apache.org/viewvc?rev=1140685&view=rev
Log:
NUTCH-1000 Method overrides for indexer and dedup
Modified:
nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java
n
Author: markus
Date: Tue Jun 28 15:59:47 2011
New Revision: 1140695
URL: http://svn.apache.org/viewvc?rev=1140695&view=rev
Log:
NUTCH-1012 Cannot handle illegal charset
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/java/org/apache/nutch/
Author: markus
Date: Tue Jun 28 16:03:28 2011
New Revision: 1140696
URL: http://svn.apache.org/viewvc?rev=1140696&view=rev
Log:
NUTCH-1012 Cannot handle illegal charset
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/util/EncodingDetector.java
Modified: n
Author: markus
Date: Thu Jun 30 12:13:26 2011
New Revision: 1141500
URL: http://svn.apache.org/viewvc?rev=1141500&view=rev
Log:
NUTCH-1016 Strip UTF-8 non-character codepoints
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/conf/log4j.properties
n
Author: markus
Date: Mon Jul 4 13:44:57 2011
New Revision: 1142664
URL: http://svn.apache.org/viewvc?rev=1142664&view=rev
Log:
NUTCH-1013 Migrate RegexURLNormalizer from Apache ORO java.util.regex
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/pl
Author: markus
Date: Mon Jul 4 14:28:17 2011
New Revision: 1142687
URL: http://svn.apache.org/viewvc?rev=1142687&view=rev
Log:
NUTCH-1013 Migrate RegexURLNormalizer from Apache ORO to java.util.regex
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/plugin/urlnormalizer-regex/src/
Author: markus
Date: Wed Jul 6 15:34:43 2011
New Revision: 1143467
URL: http://svn.apache.org/viewvc?rev=1143467&view=rev
Log:
NUTCH-1011 Remove duplicate slashes from URLs
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/conf/regex-normalize.xml.temp
Author: markus
Date: Wed Jul 6 15:35:51 2011
New Revision: 1143468
URL: http://svn.apache.org/viewvc?rev=1143468&view=rev
Log:
NUTCH-1011 Remove duplicate slashes from URLs
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/regex-normalize.xml.template
nutch/trunk/src/test
Author: markus
Date: Mon Jul 11 10:22:37 2011
New Revision: 1145109
URL: http://svn.apache.org/viewvc?rev=1145109&view=rev
Log:
NUTCH-1030 WebgraphDB program requires manually added directories
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/java/org/ap
Author: markus
Date: Mon Jul 11 10:30:20 2011
New Revision: 1145110
URL: http://svn.apache.org/viewvc?rev=1145110&view=rev
Log:
NUTCH-1030 Updating log4j.properties as well
Modified:
nutch/branches/branch-1.4/conf/log4j.properties
Modified: nutch/branches/branch-1.4/conf/log4j.proper
Author: markus
Date: Mon Jul 11 10:44:56 2011
New Revision: 1145117
URL: http://svn.apache.org/viewvc?rev=1145117&view=rev
Log:
NUTCH-783 IndexingFiltersChecker utility added
Added:
nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
Modified:
n
Author: markus
Date: Mon Jul 11 11:57:47 2011
New Revision: 1145130
URL: http://svn.apache.org/viewvc?rev=1145130&view=rev
Log:
NUTCH-1027 Degrade log level of 'can't find rules for scope'
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/p
Author: markus
Date: Mon Jul 11 11:58:00 2011
New Revision: 1145131
URL: http://svn.apache.org/viewvc?rev=1145131&view=rev
Log:
NUTCH-1027 Degrade log level of 'can't find rules for scope'
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/plugin/urlnormalizer-regex/
Author: markus
Date: Wed Jul 13 13:59:11 2011
New Revision: 1146035
URL: http://svn.apache.org/viewvc?rev=1146035&view=rev
Log:
NUTCH-987, NUTCH-1036 Solr HTTP auth support and Hadoop reporter counter
increments
Added:
nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/
Author: markus
Date: Wed Jul 13 14:05:47 2011
New Revision: 1146043
URL: http://svn.apache.org/viewvc?rev=1146043&view=rev
Log:
NUTCH-987 Constants for HTTP auth for Solr
Modified:
nutch/branches/branch-1.4/src/java/org/apache/nutch/indexer/solr/SolrConstants.java
Modified:
nutch/bran
Author: markus
Date: Sun Jul 17 14:01:51 2011
New Revision: 1147615
URL: http://svn.apache.org/viewvc?rev=1147615&view=rev
Log:
NUTCH-1029 ReadDB throws EOFException
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/java/org/apache/nutch/c
Author: markus
Date: Tue Jul 19 12:49:58 2011
New Revision: 1148301
URL: http://svn.apache.org/viewvc?rev=1148301&view=rev
Log:
NUTCH-1050 Add segmentDir to WebGraph
Modified:
nutch/branches/branch-1.4/conf/log4j.properties
nutch/branches/branch-1.4/src/java/org/apache/nutch/sco
Author: markus
Date: Tue Jul 19 13:01:45 2011
New Revision: 1148305
URL: http://svn.apache.org/viewvc?rev=1148305&view=rev
Log:
NUTCH-1037 Option to deduplicate anchors prior to indexing
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/conf/nutch-default
Author: markus
Date: Tue Jul 19 13:12:43 2011
New Revision: 1148308
URL: http://svn.apache.org/viewvc?rev=1148308&view=rev
Log:
NUTCH-1037 Option to deduplicate anchors prior to indexing
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/nutch-default.xml
nutch/trunk/src/pl
Author: markus
Date: Tue Jul 19 15:40:34 2011
New Revision: 1148406
URL: http://svn.apache.org/viewvc?rev=1148406&view=rev
Log:
NUTCH-1057 Fetcher thread time out configurable
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/conf/nutch-default.xml
n
Author: markus
Date: Wed Aug 10 12:26:49 2011
New Revision: 1156132
URL: http://svn.apache.org/viewvc?rev=1156132&view=rev
Log:
NUTCH-1028 Log urls when parsing
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/java/org/apache/nutch/parse/ParseSegment.
Author: markus
Date: Thu Aug 11 16:38:58 2011
New Revision: 1156665
URL: http://svn.apache.org/viewvc?rev=1156665&view=rev
Log:
NUTCH-1069 Readlinkdb broken on Hadoop > 0.20
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/java/org/apache/nutc
Author: markus
Date: Tue Aug 16 11:58:12 2011
New Revision: 1158214
URL: http://svn.apache.org/viewvc?rev=1158214&view=rev
Log:
NUTCH-1004 Do not index empty values for title field
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/plugin/index-basic/src/
Author: markus
Date: Tue Aug 16 11:59:01 2011
New Revision: 1158215
URL: http://svn.apache.org/viewvc?rev=1158215&view=rev
Log:
NUTCH-1004 Do not index empty values for title field
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/plugin/index-basic/src/java/org/apache/nutch/ind
Author: markus
Date: Tue Aug 16 12:03:30 2011
New Revision: 1158218
URL: http://svn.apache.org/viewvc?rev=1158218&view=rev
Log:
NUTCH-1082 IndexingFiltersChecker does not list multi valued fields
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/java
Author: markus
Date: Tue Aug 16 16:28:43 2011
New Revision: 1158357
URL: http://svn.apache.org/viewvc?rev=1158357&view=rev
Log:
NUTCH-1051 Export WebGraph node scores for Solr.ExternalFileField
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/java/org/ap
Author: markus
Date: Thu Aug 18 13:25:13 2011
New Revision: 1159207
URL: http://svn.apache.org/viewvc?rev=1159207&view=rev
Log:
NUTCH-1049 Add classes to bin/nutch script
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/bin/nutch
Modified: nutch/bran
Author: markus
Date: Fri Sep 9 11:13:54 2011
New Revision: 1167096
URL: http://svn.apache.org/viewvc?rev=1167096&view=rev
Log:
NUTCH-1101 Option to purge db_gone records from CrawlDB
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/conf/nutch-default
Author: markus
Date: Mon Sep 12 12:16:53 2011
New Revision: 1169707
URL: http://svn.apache.org/viewvc?rev=1169707&view=rev
Log:
NUTCH-1105 Max content length option for index-basic
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/conf/nutch-default.xml
n
Author: markus
Date: Tue Sep 13 18:15:17 2011
New Revision: 1170282
URL: http://svn.apache.org/viewvc?rev=1170282&view=rev
Log:
NUTCH-1110 UpdateDB must not write _success file
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/java/org/apache/nutch/c
Author: markus
Date: Wed Sep 14 10:59:24 2011
New Revision: 1170526
URL: http://svn.apache.org/viewvc?rev=1170526&view=rev
Log:
NUTCH-1067 Configure minimum throughput for fetcher and NUTCH-1102 Fetcher to
rely on fetcher.parse directive
Modified:
nutch/branches/branch-1.4/CHANGES
Author: markus
Date: Wed Sep 14 12:13:42 2011
New Revision: 1170557
URL: http://svn.apache.org/viewvc?rev=1170557&view=rev
Log:
NUTCH-1067,NUTCH-1102 Fixes for Benchmark, Crawl and TestFetcher
Modified:
nutch/branches/branch-1.4/src/java/org/apache/nutch/crawl/Crawl.java
nutch/bran
Author: markus
Date: Mon Sep 19 12:12:17 2011
New Revision: 1172585
URL: http://svn.apache.org/viewvc?rev=1172585&view=rev
Log:
NUTCH-1067 Nutch-default configuration directives missing
Modified:
nutch/branches/branch-1.4/conf/nutch-default.xml
Modified: nutch/branches/branch-1.4/conf/n
Author: markus
Date: Mon Sep 19 14:14:05 2011
New Revision: 1172637
URL: http://svn.apache.org/viewvc?rev=1172637&view=rev
Log:
NUTCH-1114 Attr file missing in domain filter
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/plugin/urlfilter-domain/plugin
Author: markus
Date: Thu Sep 22 14:02:51 2011
New Revision: 1174147
URL: http://svn.apache.org/viewvc?rev=1174147&view=rev
Log:
NUTCH-1115 Option to disable fixing of URL embedded parameters in
DomContentUtils
Modified:
nutch/branches/branch-1.4/conf/nutch-default.xml
nutch/bran
Author: markus
Date: Thu Sep 22 15:45:25 2011
New Revision: 1174222
URL: http://svn.apache.org/viewvc?rev=1174222&view=rev
Log:
Recommitted CHANGELOG entry for NUTCH-1115. Was overwritten by NUTCH-1078 commit
Modified:
nutch/branches/branch-1.4/CHANGES.txt
Modified: nutch/branches/br
Author: markus
Date: Fri Sep 23 12:09:35 2011
New Revision: 1174689
URL: http://svn.apache.org/viewvc?rev=1174689&view=rev
Log:
NUTCH-1074 topN is ignored with maxNumSegments and generate.max.count
Modified:
nutch/branches/branch-1.4/CHANGES.txt
nutch/branches/branch-1.4/src/java
Author: markus
Date: Mon Oct 3 10:57:33 2011
New Revision: 1178376
URL: http://svn.apache.org/viewvc?rev=1178376&view=rev
Log:
NUTCH-1137 LinkDB other options ignored with -dir
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/crawl/LinkDb.java
Modified: n
Author: markus
Date: Mon Oct 3 13:25:18 2011
New Revision: 1178409
URL: http://svn.apache.org/viewvc?rev=1178409&view=rev
Log:
NUTCH-1058 Upgrade Solr schema to version 1.4
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/schema.xml
Modified: nutch/trunk/CHANGES.txt
URL:
Author: markus
Date: Mon Oct 3 13:25:49 2011
New Revision: 1178410
URL: http://svn.apache.org/viewvc?rev=1178410&view=rev
Log:
NUTCH-1058 Upgrade Solr schema to version 1.4
Modified:
nutch/branches/nutchgora/CHANGES.txt
nutch/branches/nutchgora/conf/schema.xml
Modified: nutch/bran
Author: markus
Date: Thu Nov 10 14:27:53 2011
New Revision: 1200344
URL: http://svn.apache.org/viewvc?rev=1200344&view=rev
Log:
NUTCH-1153 LinkRank not to log all keys and not to write Hadoop _SUCCESS file
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/sco
Author: markus
Date: Thu Nov 10 14:29:45 2011
New Revision: 1200346
URL: http://svn.apache.org/viewvc?rev=1200346&view=rev
Log:
NUTCH-1142 Normalization and filtering in WebGraph
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/scoring/webgraph/WebGraph.
Author: markus
Date: Thu Nov 10 14:31:33 2011
New Revision: 1200347
URL: http://svn.apache.org/viewvc?rev=1200347&view=rev
Log:
NUTCH-1178 Incorrect CSV header CrawlDatumCsvOutputFormat
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDbReader.
Author: markus
Date: Thu Nov 10 15:02:04 2011
New Revision: 1200360
URL: http://svn.apache.org/viewvc?rev=1200360&view=rev
Log:
NUTCH-1061 Migrate MoreIndexingFilter from Apache ORO to java.util.regex
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/plugin/index-more/plugin
Author: markus
Date: Thu Nov 10 15:16:23 2011
New Revision: 1200370
URL: http://svn.apache.org/viewvc?rev=1200370&view=rev
Log:
NUTCH-1155 Host/domain limit in generator is generate.max.count+1
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/crawl/Generator.
Author: markus
Date: Thu Nov 10 15:24:30 2011
New Revision: 1200377
URL: http://svn.apache.org/viewvc?rev=1200377&view=rev
Log:
NUTCH-1173 DomainStats doesn't count db_not_modified
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/util/domain/DomainStatis
Author: markus
Date: Fri Nov 11 11:55:21 2011
New Revision: 1200830
URL: http://svn.apache.org/viewvc?rev=1200830&view=rev
Log:
NUTCH-1180 UpdateDB to backup previous CrawlDB
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/nutch-default.xml
nutch/trunk/src/java/org/apache/n
Author: markus
Date: Fri Nov 11 11:59:49 2011
New Revision: 1200833
URL: http://svn.apache.org/viewvc?rev=1200833&view=rev
Log:
NUTCH-1185 Decrease solr.commit.size to 250
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/nutch-default.xml
Modified: nutch/trunk/CHANGES.txt
URL:
Author: markus
Date: Fri Nov 11 12:00:20 2011
New Revision: 1200834
URL: http://svn.apache.org/viewvc?rev=1200834&view=rev
Log:
NUTCH-1185 Decrease solr.commit.size to 250
Modified:
nutch/branches/nutchgora/CHANGES.txt
nutch/branches/nutchgora/conf/nutch-default.xml
Modified: n
Author: markus
Date: Fri Nov 11 15:01:05 2011
New Revision: 1200912
URL: http://svn.apache.org/viewvc?rev=1200912&view=rev
Log:
NUTCH-1155 Fixes failing test
Modified:
nutch/trunk/src/test/org/apache/nutch/crawl/TestGenerator.java
Modified: nutch/trunk/src/test/org/apache/nutch/c
Author: markus
Date: Fri Nov 11 15:16:49 2011
New Revision: 1200915
URL: http://svn.apache.org/viewvc?rev=1200915&view=rev
Log:
NUTCH-1203 ParseSegment to show number of milliseconds per parse
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.
Author: markus
Date: Fri Nov 11 15:19:28 2011
New Revision: 1200917
URL: http://svn.apache.org/viewvc?rev=1200917&view=rev
Log:
NUTCH-1174 Outlinks are not properly normalized
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/parse/Outlink.java
nutch/trunk
Author: markus
Date: Tue Nov 15 11:56:30 2011
New Revision: 1202143
URL: http://svn.apache.org/viewvc?rev=1202143&view=rev
Log:
NUTCH-1090 InvertLinks should inform when ignoring internal links
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/crawl/LinkDb.
Author: markus
Date: Mon Nov 21 13:42:16 2011
New Revision: 1204492
URL: http://svn.apache.org/viewvc?rev=1204492&view=rev
Log:
NUTCH-1207 ParserChecker to output signature
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/parse/ParserChecker.java
Modified: n
Author: markus
Date: Tue Nov 29 16:56:45 2011
New Revision: 1207967
URL: http://svn.apache.org/viewvc?rev=1207967&view=rev
Log:
NUTCH-1214 DomainStats tool should be named for what it's doing
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/ut
Author: markus
Date: Fri Dec 16 11:17:10 2011
New Revision: 1215090
URL: http://svn.apache.org/viewvc?rev=1215090&view=rev
Log:
NUTCH-1221 Migrate DomainStatistics to MapReduce API
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/util/domain/DomainStatistics.
Author: markus
Date: Mon Dec 19 15:12:53 2011
New Revision: 1220786
URL: http://svn.apache.org/viewvc?rev=1220786&view=rev
Log:
NUTCH-1222 Upgrade to new Hadoop 0.22.0
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/ivy/ivy.xml
Modified: nutch/trunk/CHANGES.txt
URL:
http://svn.apache
Author: markus
Date: Mon Dec 19 15:15:43 2011
New Revision: 1220788
URL: http://svn.apache.org/viewvc?rev=1220788&view=rev
Log:
NUTCH-1225 Migrate CrawlDBScanner to MapReduce API
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/tools/CrawlDBScanner.java
Modi
Author: markus
Date: Tue Dec 20 10:11:09 2011
New Revision: 1221181
URL: http://svn.apache.org/viewvc?rev=1221181&view=rev
Log:
NUTCH-1184 Fetcher to parse and follow Nth degree outlinks
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/nutch-default.xml
nutch/trunk/src/java
Author: markus
Date: Tue Dec 20 10:22:06 2011
New Revision: 1221185
URL: http://svn.apache.org/viewvc?rev=1221185&view=rev
Log:
NUTCH-1129 Add freegenerator, domainstats and crawldbscanner to log4j
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/log4j.properties
Modified: nutch/t
Author: markus
Date: Tue Dec 20 10:50:31 2011
New Revision: 1221194
URL: http://svn.apache.org/viewvc?rev=1221194&view=rev
Log:
Renamed FetcherStatus to FetcherOutlinks for the new outlinks section of
NUTCH-1184
Modified:
nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
Modi
Author: markus
Date: Fri Dec 23 10:11:08 2011
New Revision: 1222627
URL: http://svn.apache.org/viewvc?rev=1222627&view=rev
Log:
Updated pom to reflect Hadoop upgrade
Modified:
nutch/trunk/pom.xml
Modified: nutch/trunk/pom.xml
URL:
http://svn.apache.org/viewvc/nutch/trunk/pom.xml
Author: markus
Date: Tue Dec 27 14:36:27 2011
New Revision: 1224916
URL: http://svn.apache.org/viewvc?rev=1224916&view=rev
Log:
NUTCH-1230 and NUTCH-1231 Upgrade to Tika 1.0 and using new Tika detect API
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/ivy/ivy.xml
nutch/trunk/src/
Author: markus
Date: Tue Dec 27 13:22:50 2011
New Revision: 1224905
URL: http://svn.apache.org/viewvc?rev=1224905&view=rev
Log:
Reverting Nutch-1125 CrawlDBScanner
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/tools/CrawlDBScanner.java
Modified: nutch/t
Author: markus
Date: Tue Dec 27 13:28:44 2011
New Revision: 1224906
URL: http://svn.apache.org/viewvc?rev=1224906&view=rev
Log:
NUTCH-1235 Upgrade to new Hadoop 0.20.205.0
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/ivy/ivy.xml
Modified: nutch/trunk/CHANGES.txt
URL:
Author: markus
Date: Tue Dec 27 14:08:22 2011
New Revision: 1224912
URL: http://svn.apache.org/viewvc?rev=1224912&view=rev
Log:
NUTCH-1235, added Jackson ASL mapper as dep
Modified:
nutch/trunk/ivy/ivy.xml
Modified: nutch/trunk/ivy/ivy.xml
URL:
http://svn.apache.org/viewvc/nutch/trunk
Author: markus
Date: Thu Dec 29 14:35:31 2011
New Revision: 1225544
URL: http://svn.apache.org/viewvc?rev=1225544&view=rev
Log:
NUTCH-1238 Missed changes.txt
Modified:
nutch/trunk/CHANGES.txt
Modified: nutch/trunk/CHANGES.txt
URL:
http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt
Author: markus
Date: Thu Dec 29 14:32:50 2011
New Revision: 1225543
URL: http://svn.apache.org/viewvc?rev=1225543&view=rev
Log:
NUTCH-1238 Fetcher throughput threshold must start before feeder finished
Modified:
nutch/trunk/conf/nutch-default.xml
nutch/trunk/src/java/org/apache/n
Author: markus
Date: Mon Jan 2 13:11:50 2012
New Revision: 1226406
URL: http://svn.apache.org/viewvc?rev=1226406&view=rev
Log:
NUTCH-1239 Webgraph should remove deleted pages from segment input
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/c
Author: markus
Date: Mon Jan 2 13:16:59 2012
New Revision: 1226409
URL: http://svn.apache.org/viewvc?rev=1226409&view=rev
Log:
NUTCH-1232 Remove site field from index-basic
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/schema-solr4.xml
nutch/trunk/conf/schema.xml
n
Author: markus
Date: Mon Jan 9 16:01:27 2012
New Revision: 1229226
URL: http://svn.apache.org/viewvc?rev=1229226&view=rev
Log:
NUTCH-1244 CrawlDBDumper to filter by regex
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDbReader.java
Modified: n
Author: markus
Date: Tue Jan 10 13:57:29 2012
New Revision: 1229544
URL: http://svn.apache.org/viewvc?rev=1229544&view=rev
Log:
NUTCH-1139 Indexer to delete gone documents
Added:
nutch/trunk/src/java/org/apache/nutch/indexer/NutchIndexAction.java
Modified:
nutch/trunk/CHANGES
Author: markus
Date: Fri Jan 13 14:31:22 2012
New Revision: 1231090
URL: http://svn.apache.org/viewvc?rev=1231090&view=rev
Log:
NUTCH-1177 Generator to select on retry interval
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/nutch-default.xml
nutch/trunk/src/java/org/apache/n
Author: markus
Date: Fri Jan 13 16:43:42 2012
New Revision: 1231168
URL: http://svn.apache.org/viewvc?rev=1231168&view=rev
Log:
NUTCH-1248 Generator to select on status
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java
Modified: nutch/t
Author: markus
Date: Fri Jan 27 13:11:47 2012
New Revision: 1236674
URL: http://svn.apache.org/viewvc?rev=1236674&view=rev
Log:
NUTCH-1260 Fetcher should log fetching of redirects
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
Modi
Author: markus
Date: Tue Jan 31 14:17:27 2012
New Revision: 1238590
URL: http://svn.apache.org/viewvc?rev=1238590&view=rev
Log:
NUTCH-1256 WebGraph to dump host + score. Most if not all WebGraph options have
been added to nutch-default as well.
Modified:
nutch/trunk/CHANGES.txt
n
Author: markus
Date: Tue Jan 31 15:24:37 2012
New Revision: 1238663
URL: http://svn.apache.org/viewvc?rev=1238663&view=rev
Log:
NUTCH-1242 Allow disabling of URL Filters in ParseSegment
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/nutch-default.xml
nutch/trunk/src/java
Author: markus
Date: Tue Feb 7 13:25:46 2012
New Revision: 1241460
URL: http://svn.apache.org/viewvc?rev=1241460&view=rev
Log:
NUTCH-1005 Parse headings plugin
Added:
nutch/trunk/src/plugin/headings/
nutch/trunk/src/plugin/headings/build.xml
nutch/trunk/src/plugin/headings/ivy
Author: markus
Date: Thu Feb 9 09:55:08 2012
New Revision: 1242255
URL: http://svn.apache.org/viewvc?rev=1242255&view=rev
Log:
NUTCH-1266 Subcollection to optionally write to configured fields
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/plugin/subcollection/src/java/org/ap
Author: markus
Date: Thu Feb 23 12:32:49 2012
New Revision: 1292764
URL: http://svn.apache.org/viewvc?rev=1292764&view=rev
Log:
NUTCH-1210 Domain Blacklist Filter
Added:
nutch/trunk/conf/domainblacklist-urlfilter.txt
nutch/trunk/src/plugin/urlfilter-domainblacklist/
nutch/trunk
Author: markus
Date: Thu Feb 23 13:14:50 2012
New Revision: 1292790
URL: http://svn.apache.org/viewvc?rev=1292790&view=rev
Log:
NUTCH-1210 Domain Blacklist Filter added test to plugin/build.xml
Modified:
nutch/trunk/src/plugin/build.xml
Modified: nutch/trunk/src/plugin/build.xml
URL:
Author: markus
Date: Wed Feb 29 14:12:36 2012
New Revision: 1295119
URL: http://svn.apache.org/viewvc?rev=1295119&view=rev
Log:
NUTCH-1291 Fetcher to stringify exception on // unexpected exception
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/fet
Author: markus
Date: Thu Mar 1 15:24:02 2012
New Revision: 1295614
URL: http://svn.apache.org/viewvc?rev=1295614&view=rev
Log:
NUTCH-1293 IndexingFiltersChecker to store detected content type in crawldatum
metadata
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/n
Author: markus
Date: Thu Mar 1 15:37:56 2012
New Revision: 1295624
URL: http://svn.apache.org/viewvc?rev=1295624&view=rev
Log:
NUTCH-1258 MoreIndexingFilter should be able to read Content-Type from both
parse metadata and content metadata
Modified:
nutch/trunk/CHANGES.txt
nutch/t
Author: markus
Date: Tue Mar 6 17:31:39 2012
New Revision: 1297586
URL: http://svn.apache.org/viewvc?rev=1297586&view=rev
Log:
NUTCH-1299 LinkRank inverter to ignore records without Node
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/scoring/webg
Author: markus
Date: Thu Mar 8 13:52:54 2012
New Revision: 1298394
URL: http://svn.apache.org/viewvc?rev=1298394&view=rev
Log:
NUTCH-1305 Domain(blacklist)URLFilter to trim entries
Modified:
nutch/trunk/src/plugin/urlfilter-domainblacklist/src/java/org/apache/nutch/urlfi
Author: markus
Date: Thu Mar 15 09:53:49 2012
New Revision: 1300871
URL: http://svn.apache.org/viewvc?rev=1300871&view=rev
Log:
NUTCH-1305 missing in CHANGES
Modified:
nutch/trunk/CHANGES.txt
Modified: nutch/trunk/CHANGES.txt
URL:
http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt
101 - 200 of 317 matches
Mail list logo