Author: snagel
Date: Tue Sep 18 20:54:05 2012
New Revision: 1387357
URL: http://svn.apache.org/viewvc?rev=1387357view=rev
Log:
NUTCH-1415 release packages to contain top level folder apache-nutch-x.x
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/build.xml
Modified: nutch/trunk
Author: snagel
Date: Wed Oct 10 21:06:27 2012
New Revision: 1396796
URL: http://svn.apache.org/viewvc?rev=1396796view=rev
Log:
NUTCH-706 Url regex normalizer: pattern for session id removal not to match
newsId
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/regex
Author: snagel
Date: Wed Oct 10 21:54:37 2012
New Revision: 1396817
URL: http://svn.apache.org/viewvc?rev=1396817view=rev
Log:
NUTCH-706 (applied correct patch)
Modified:
nutch/trunk/conf/regex-normalize.xml.template
nutch/trunk/src/plugin/urlnormalizer-regex/sample/regex-normalize
Author: snagel
Date: Tue Oct 23 20:47:16 2012
New Revision: 1401458
URL: http://svn.apache.org/viewvc?rev=1401458view=rev
Log:
NUTCH-1344 BasicURLNormalizer to normalize https same as http - forgot to add
committer
Modified:
nutch/branches/2.x/CHANGES.txt
Modified: nutch/branches/2.x
Author: snagel
Date: Tue Oct 23 20:51:35 2012
New Revision: 1401459
URL: http://svn.apache.org/viewvc?rev=1401459view=rev
Log:
NUTCH-1421 RegexURLNormalizer to only skip rules with invalid patterns
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/plugin/urlnormalizer-regex/src/java/org
Author: snagel
Date: Wed Mar 27 21:31:42 2013
New Revision: 1461854
URL: http://svn.apache.org/r1461854
Log:
parsechecker and indexchecker to report truncated content
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
nutch
Author: snagel
Date: Wed Mar 27 21:33:38 2013
New Revision: 1461857
URL: http://svn.apache.org/r1461857
Log:
parsechecker and indexchecker to report truncated content
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/java/org/apache/nutch/indexer
Author: snagel
Date: Wed May 8 22:04:04 2013
New Revision: 1480484
URL: http://svn.apache.org/r1480484
Log:
NUTCH-956 solrindex issues
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/conf/schema-solr4.xml
nutch/branches/2.x/conf/schema.xml
nutch/branches/2.x/src
Author: snagel
Date: Wed May 8 22:04:53 2013
New Revision: 1480485
URL: http://svn.apache.org/r1480485
Log:
NUTCH-956 solrindex issues
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/schema-solr4.xml
nutch/trunk/conf/schema.xml
nutch/trunk/src/plugin/index-more/src/java/org
Author: snagel
Date: Wed Jun 19 21:26:07 2013
New Revision: 1494776
URL: http://svn.apache.org/r1494776
Log:
NUTCH-1245 URL gone with 404 after db.fetch.interval.max stays db_unfetched in
CrawlDb and is generated over and over again
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java
Author: snagel
Date: Wed Jun 19 22:22:00 2013
New Revision: 1494785
URL: http://svn.apache.org/r1494785
Log:
NUTCH-1475 (fix after fix) fill field date with fetch time (as before) if
modified time is unset
Modified:
nutch/trunk/src/plugin/index-more/src/java/org/apache/nutch/indexer/more
Author: snagel
Date: Thu Jun 27 20:16:22 2013
New Revision: 1497557
URL: http://svn.apache.org/r1497557
Log:
NUTCH-1580 index-static returns object instead of value for index.static
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/nutch-default.xml
nutch/trunk/src/plugin/index
Author: snagel
Date: Thu Jul 25 21:14:45 2013
New Revision: 1507130
URL: http://svn.apache.org/r1507130
Log:
NUTCH-1587 misspelled property threshold in conf/log4j.properties
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/log4j.properties
Modified: nutch/trunk/CHANGES.txt
URL:
http
Author: snagel
Date: Thu Jul 25 21:15:02 2013
New Revision: 1507131
URL: http://svn.apache.org/r1507131
Log:
NUTCH-1587 misspelled property threshold in conf/log4j.properties
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/conf/log4j.properties
Modified: nutch/branches/2.x
Author: snagel
Date: Wed Aug 7 20:44:01 2013
New Revision: 1511479
URL: http://svn.apache.org/r1511479
Log:
NUTCH-911 protocol-file to return proper protocol status for notmodified, gone,
access_denied
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/plugin/protocol-file/src/java/org
Author: snagel
Date: Wed Aug 7 21:10:17 2013
New Revision: 1511496
URL: http://svn.apache.org/r1511496
Log:
NUTCH-911 protocol-file to return proper protocol status for notmodified, gone,
access_denied
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/plugin/protocol
Author: snagel
Date: Thu Nov 21 22:04:13 2013
New Revision: 1544341
URL: http://svn.apache.org/r1544341
Log:
NUTCH-1587 misspelled property threshold in log4j.properties
Modified:
nutch/branches/2.x/src/test/log4j.properties
Modified: nutch/branches/2.x/src/test/log4j.properties
URL:
http
Author: snagel
Date: Thu Nov 21 22:03:18 2013
New Revision: 1544340
URL: http://svn.apache.org/r1544340
Log:
NUTCH-1587 misspelled property threshold in log4j.properties
Modified:
nutch/trunk/src/test/log4j.properties
Modified: nutch/trunk/src/test/log4j.properties
URL:
http
Author: snagel
Date: Wed Jan 22 21:13:01 2014
New Revision: 1560512
URL: http://svn.apache.org/r1560512
Log:
NUTCH-1413 Record response time
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/nutch-default.xml
nutch/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http
Author: snagel
Date: Fri Mar 7 18:13:20 2014
New Revision: 1575350
URL: http://svn.apache.org/r1575350
Log:
removed HostDB from Nutch 1.8 trunk: fix build, remove HostDb related entries
from change log
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/crawl
Author: snagel
Date: Fri Mar 7 18:15:50 2014
New Revision: 1575351
URL: http://svn.apache.org/r1575351
Log:
NUTCH-1706 IndexerMapReduce does not remove db_redir_temp
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/indexer/IndexerMapReduce.java
Modified: nutch
Author: snagel
Date: Mon Mar 17 21:56:32 2014
New Revision: 1578620
URL: http://svn.apache.org/r1578620
Log:
NUTCH-1671 indexchecker to add digest field
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
Modified
Author: snagel
Date: Fri Mar 21 20:56:13 2014
New Revision: 1580046
URL: http://svn.apache.org/r1580046
Log:
NUTCH-1733 parse-html to support HTML5 charset definitions
Added:
nutch/trunk/src/plugin/parse-html/src/test/org/apache/nutch/parse/html/TestHtmlParser.java
(with props)
Modified
Author: snagel
Date: Sat Mar 22 18:04:10 2014
New Revision: 1580270
URL: http://svn.apache.org/r1580270
Log:
NUTCH-1742 update remaining references of 1.7 - 1.8
Modified:
nutch/site/forrest/src/documentation/content/xdocs/downloads.xml
nutch/site/publish/downloads.html
Modified: nutch
Author: snagel
Date: Sat Mar 22 18:13:52 2014
New Revision: 4777
Log:
NUTCH-1742 removed 1.7 packages from svn (svnpubsub)
Removed:
release/nutch/1.7/
Author: snagel
Date: Sun Mar 30 19:58:59 2014
New Revision: 1583193
URL: http://svn.apache.org/r1583193
Log:
NUTCH-1645 Junit Test Case for Adaptive Fetch Schedule class
Added:
nutch/trunk/src/test/org/apache/nutch/crawl/TestAdaptiveFetchSchedule.java
(with props)
Modified:
nutch
Author: snagel
Date: Sat Apr 5 17:06:04 2014
New Revision: 1585144
URL: http://svn.apache.org/r1585144
Log:
NUTCH-1735 code dedup fetcher queue redirects
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
Modified: nutch/trunk/CHANGES.txt
URL
Author: snagel
Date: Sat Apr 26 22:12:46 2014
New Revision: 1590315
URL: http://svn.apache.org/r1590315
Log:
NUTCH-1764 readdb to show command-line help if no action (-stats, -dump, etc.)
given
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/crawl
Author: snagel
Date: Sun May 4 20:18:50 2014
New Revision: 1592414
URL: http://svn.apache.org/r1592414
Log:
NUTCH-1182 fetcher to log hung threads
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/java/org/apache/nutch/fetcher/FetcherReducer.java
Modified: nutch/branches
Author: snagel
Date: Mon May 12 19:39:43 2014
New Revision: 1594071
URL: http://svn.apache.org/r1594071
Log:
NUTCH-1752 Cache robots.txt rules per protocol:host:port
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http
Author: snagel
Date: Fri May 9 18:48:29 2014
New Revision: 1593595
URL: http://svn.apache.org/r1593595
Log:
Nutch 1.8 includes Tika 1.5
Modified:
nutch/site/forrest/src/documentation/content/xdocs/index.xml
nutch/site/publish/index.html
Modified: nutch/site/forrest/src/documentation
Author: snagel
Date: Fri Jun 20 22:15:43 2014
New Revision: 1604291
URL: http://svn.apache.org/r1604291
Log:
NUTCH-1718 redefine http.robots.agent as additional agent names
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/conf/nutch-default.xml
nutch/branches/2.x/src/java
Author: snagel
Date: Fri Jun 20 22:56:32 2014
New Revision: 1604298
URL: http://svn.apache.org/r1604298
Log:
NUTCH-1767 remove special treatment of params in relative links
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/java/org/apache/nutch/util/URLUtil.java
nutch
Modified:
nutch/trunk/src/plugin/urlnormalizer-basic/src/java/org/apache/nutch/net/urlnormalizer/basic/BasicURLNormalizer.java
URL:
Author: snagel
Date: Fri Jul 4 20:15:12 2014
New Revision: 1607929
URL: http://svn.apache.org/r1607929
Log:
add dependency init (calling ivy-init) to compile-core-test to fix
nightly build failures introduced with NUTCH-1803
Modified:
nutch/trunk/build.xml
Modified: nutch/trunk/build.xml
Author: snagel
Date: Sat Jul 5 20:36:33 2014
New Revision: 1608130
URL: http://svn.apache.org/r1608130
Log:
NUTCH-1605 MIME type detector recognizes xlsx as zip file
Added:
nutch/branches/2.x/src/test/org/apache/nutch/util/TestMimeUtil.java (with
props)
nutch/branches/2.x/src
Author: snagel
Date: Sat Jul 5 21:13:19 2014
New Revision: 1608135
URL: http://svn.apache.org/r1608135
Log:
NUTCH-1566 bin/nutch to allow whitespace in paths
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/bin/crawl
nutch/branches/2.x/src/bin/nutch
nutch/trunk
Author: snagel
Date: Sat Jul 5 21:42:20 2014
New Revision: 1608136
URL: http://svn.apache.org/r1608136
Log:
NUTCH-1776 Log incorrect plugin.folder file path
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/java/org/apache/nutch/plugin/PluginManifestParser.java
nutch
Author: snagel
Date: Thu Jul 10 20:50:27 2014
New Revision: 1609568
URL: http://svn.apache.org/r1609568
Log:
NUTCH-1811 bin/nutch junit to use junit 4 test runner
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/bin/nutch
nutch/trunk/CHANGES.txt
nutch/trunk/src/bin
Author: snagel
Date: Tue Jul 29 15:13:20 2014
New Revision: 1614375
URL: http://svn.apache.org/r1614375
Log:
NUTCH-1708 use same id when indexing and deleting redirects
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/conf/schema.xml
nutch/branches/2.x/src/java/org/apache
Author: snagel
Date: Sun Aug 17 20:24:29 2014
New Revision: 1618521
URL: http://svn.apache.org/r1618521
Log:
CMS commit to nutch by snagel
Modified:
nutch/cms_site/trunk/content/index.md
Modified: nutch/cms_site/trunk/content/index.md
URL:
http://svn.apache.org/viewvc/nutch/cms_site/trunk
Author: snagel
Date: Sun Aug 17 20:26:24 2014
New Revision: 919651
Log:
announce tutorial at ApacheCon Europe in Budapest
Added:
websites/production/nutch/content/
- copied from r919650, websites/staging/nutch/trunk/content/
Author: snagel
Date: Fri Aug 22 21:23:32 2014
New Revision: 1619934
URL: http://svn.apache.org/r1619934
Log:
NUTCH-1409 remove deprecated properties db.{default,max}.fetch.interval,
generate.max.per.host.by.ip
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/java/org
Author: snagel
Date: Fri Aug 22 22:23:27 2014
New Revision: 1619942
URL: http://svn.apache.org/r1619942
Log:
NUTCH-1693 TextMD5Signature computed on textual content
Added:
nutch/branches/2.x/src/java/org/apache/nutch/crawl/TextMD5Signature.java
(with props)
nutch/trunk/src/java/org
Author: snagel
Date: Fri Aug 22 22:28:12 2014
New Revision: 1619944
URL: http://svn.apache.org/r1619944
Log:
NUTCH-1775 IndexingFilter: document origin of passed CrawlDatum
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFilter.java
Modified
Author: snagel
Date: Wed Sep 17 20:52:17 2014
New Revision: 1625821
URL: http://svn.apache.org/r1625821
Log:
add 1.9 Java apidocs
[This commit notification would consist of 137 parts,
which exceeds the limit of 50 ones, so it was shortened to the summary.]
Author: snagel
Date: Wed Sep 17 21:07:29 2014
New Revision: 1625826
URL: http://svn.apache.org/r1625826
Log:
add apidoc 1.9
Modified:
nutch/cms_site/trunk/content/javadoc.md
Modified: nutch/cms_site/trunk/content/javadoc.md
URL:
http://svn.apache.org/viewvc/nutch/cms_site/trunk/content
Author: snagel
Date: Wed Sep 17 21:08:05 2014
New Revision: 922601
Log:
add Java apidoc 1.9
Added:
websites/production/nutch/content/
- copied from r922599, websites/staging/nutch/trunk/content/
Author: snagel
Date: Wed Sep 17 21:32:43 2014
New Revision: 922608
Log:
update Java apidoc 1.9
Added:
websites/production/nutch/content/
- copied from r922607, websites/staging/nutch/trunk/content/
Author: snagel
Date: Sun Sep 21 14:18:26 2014
New Revision: 1626581
URL: http://svn.apache.org/r1626581
Log:
add committer snagel
Modified:
nutch/branches/2.x/KEYS
nutch/branches/2.x/ivy/mvn.template
nutch/trunk/KEYS
nutch/trunk/ivy/mvn.template
Modified: nutch/branches/2.x/KEYS
Author: snagel
Date: Thu Oct 2 21:37:04 2014
New Revision: 1629076
URL: http://svn.apache.org/r1629076
Log:
NUTCH-1826 indexchecker fails if solr.server.url not configured
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/indexer/IndexingFiltersChecker.java
Author: snagel
Date: Thu Oct 9 19:20:51 2014
New Revision: 1630565
URL: http://svn.apache.org/r1630565
Log:
NUTCH-1164 JUnit tests for protocol-http
Added:
nutch/trunk/src/plugin/protocol-http/jsp/
nutch/trunk/src/plugin/protocol-http/jsp/basic-http.jsp (with props)
nutch/trunk
Author: snagel
Date: Mon Oct 20 20:44:00 2014
New Revision: 1633222
URL: http://svn.apache.org/r1633222
Log:
NUTCH-1827 Port issues 1467 and 1561 to 2.x
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/conf/nutch-default.xml
nutch/branches/2.x/src/java/org/apache/nutch
Author: snagel
Date: Tue Oct 21 17:52:27 2014
New Revision: 1633426
URL: http://svn.apache.org/r1633426
Log:
NUTCH-1882 ant eclipse target to add output path to src/test
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/build.xml
nutch/trunk/CHANGES.txt
nutch/trunk
Author: snagel
Date: Mon Oct 27 21:38:50 2014
New Revision: 1634694
URL: http://svn.apache.org/r1634694
Log:
NUTCH-1883 bin/crawl: use function to run bin/nutch and check exit value
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/bin/crawl
nutch/trunk/CHANGES.txt
Author: snagel
Date: Tue Nov 11 16:20:01 2014
New Revision: 1638203
URL: http://svn.apache.org/r1638203
Log:
NUTCH-1883 in case of generate: break loop and do not exit with error
Modified:
nutch/branches/2.x/src/bin/crawl
nutch/trunk/src/bin/crawl
Modified: nutch/branches/2.x/src/bin
Author: snagel
Date: Fri Dec 5 19:53:35 2014
New Revision: 1643412
URL: http://svn.apache.org/r1643412
Log:
NUTCH-1877 Suffix URL filter to ignore query string by default
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/conf/suffix-urlfilter.txt.template
nutch/trunk
Author: snagel
Date: Tue Jan 27 21:45:39 2015
New Revision: 1655169
URL: http://svn.apache.org/r1655169
Log:
NUTCH-1893 Parse-tika failes to parse feed files
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/plugin/parse-tika/ivy.xml
nutch/branches/2.x/src/plugin/parse
Author: snagel
Date: Mon Jan 12 20:45:16 2015
New Revision: 1651193
URL: http://svn.apache.org/r1651193
Log:
NUTCH-1881 ant target resolve-default to keep test libs
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/build.xml
Modified: nutch/trunk/CHANGES.txt
URL:
http://svn.apache.org
Author: snagel
Date: Wed Jan 7 22:25:18 2015
New Revision: 1650181
URL: http://svn.apache.org/r1650181
Log:
NUTCH-1140 index-more plugin, resetTitle creates multiple values in title field
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/plugin/index-more/src/java/org/apache/nutch
Author: snagel
Date: Tue Mar 31 19:28:14 2015
New Revision: 1670442
URL: http://svn.apache.org/r1670442
Log:
NUTCH-1979 CrawlDbReader to implement Tool: fix unit test
Modified:
nutch/trunk/src/test/org/apache/nutch/crawl/TestCrawlDbMerger.java
Modified: nutch/trunk/src/test/org/apache/nutch
Author: snagel
Date: Fri Mar 27 21:42:35 2015
New Revision: 1669692
URL: http://svn.apache.org/r1669692
Log:
NUTCH-1941 Optional rolling http.agent.names
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/conf/nutch-default.xml
nutch/branches/2.x/src/plugin/lib-http/src
Author: snagel
Date: Mon May 11 21:04:59 2015
New Revision: 1678824
URL: http://svn.apache.org/r1678824
Log:
NUTCH-1998 Add support for user-defined file extension to
CommonCrawlDataDumper: fix unit test
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/test/org/apache/nutch/tools
Author: snagel
Date: Mon May 18 21:39:23 2015
New Revision: 1680110
URL: http://svn.apache.org/r1680110
Log:
NUTCH-2013 Fetcher: missing logs fetching ... on stdout
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/log4j.properties
Modified: nutch/trunk/CHANGES.txt
URL:
http
Author: snagel
Date: Mon May 18 21:35:03 2015
New Revision: 1680109
URL: http://svn.apache.org/r1680109
Log:
NUTCH-2014 Fetcher hang-up on completion
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
Modified: nutch/trunk/CHANGES.txt
URL:
http
Author: snagel
Date: Fri Apr 17 20:49:19 2015
New Revision: 1674399
URL: http://svn.apache.org/r1674399
Log:
NUTCH-1927 Create a whitelist of IPs/hostnames to allow skipping of RobotRules
parsing
Removed:
nutch/trunk/src/java/org/apache/nutch/protocol/RobotRules.java
Modified:
nutch
Author: snagel
Date: Sat Apr 18 20:41:13 2015
New Revision: 1674581
URL: http://svn.apache.org/r1674581
Log:
NUTCH-1854 bin/crawl fails with a parsing fetcher
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java
nutch/trunk/src/java/org
Author: snagel
Date: Sat Apr 11 22:07:52 2015
New Revision: 1672939
URL: http://svn.apache.org/r1672939
Log:
NUTCH-1981 Upgrade to icu4j 55.1
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/ivy/ivy.xml
nutch/trunk/CHANGES.txt
nutch/trunk/ivy/ivy.xml
Modified: nutch
Author: snagel
Date: Thu Jun 25 18:41:26 2015
New Revision: 1687604
URL: http://svn.apache.org/r1687604
Log:
NUTCH-2000 Link inversion fails with .locked already exists
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/crawl/LinkDb.java
Modified: nutch/trunk
Author: snagel
Date: Wed May 27 19:31:51 2015
New Revision: 1682103
URL: http://svn.apache.org/r1682103
Log:
NUTCH-2007 add test libs to classpath of bin/nutch junit
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/bin/nutch
Modified: nutch/trunk/CHANGES.txt
URL:
http://svn.apache.org
Author: snagel
Date: Thu Jul 16 19:52:00 2015
New Revision: 1691436
URL: http://svn.apache.org/r1691436
Log:
remove duplicate entries
Modified:
nutch/trunk/CHANGES.txt
Modified: nutch/trunk/CHANGES.txt
URL:
http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?rev=1691436r1=1691435r2
Author: snagel
Date: Mon Nov 16 20:29:33 2015
New Revision: 1714655
URL: http://svn.apache.org/viewvc?rev=1714655=rev
Log:
NUTCH-2130 copyField rawcontent creates error within schema.xml
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/conf/schema.xml
Modified: nutch/branches
Author: snagel
Date: Wed Oct 7 19:02:42 2015
New Revision: 1707360
URL: http://svn.apache.org/viewvc?rev=1707360=rev
Log:
NUTCH-2124 Fetcher following same redirect again and again
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/java/org/apache/nutch/fetcher/FetcherThread.java
Author: snagel
Date: Mon Sep 21 21:14:55 2015
New Revision: 1704425
URL: http://svn.apache.org/viewvc?rev=1704425=rev
Log:
NUTCH-2106 Runtime to contain Selenium and dependencies only once
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/plugin/lib-selenium/build-ivy.xml
nutch/trunk
Author: snagel
Date: Tue Dec 8 19:18:19 2015
New Revision: 1718678
URL: http://svn.apache.org/viewvc?rev=1718678=rev
Log:
Update Nutch trunk for new development: 1.11 -> 1.12
Modified:
nutch/trunk/conf/nutch-default.xml
nutch/trunk/default.properties
nutch/trunk/src/bin/nu
Author: snagel
Date: Tue Dec 1 21:17:14 2015
New Revision: 1717537
URL: http://svn.apache.org/viewvc?rev=1717537=rev
Log:
NUTCH-2107 plugin.xml to validate against plugin.dtd
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/plugin/subcollection/plugin.xml
nutch
Author: snagel
Date: Tue Dec 1 21:15:21 2015
New Revision: 1717536
URL: http://svn.apache.org/viewvc?rev=1717536=rev
Log:
NUTCH-2107 plugin.xml to validate against plugin.dtd
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/src/plugin/subcollection/plugin.xml
nutch/trunk/src/plugin
Author: snagel
Date: Sun Dec 6 21:14:06 2015
New Revision: 1718223
URL: http://svn.apache.org/viewvc?rev=1718223=rev
Log:
NUTCH-2172 index-more: document format of contenttype-mapping.txt
Added:
nutch/trunk/conf/contenttype-mapping.txt.template
Modified:
nutch/trunk/CHANGES.txt
Author: snagel
Date: Tue Dec 8 21:45:47 2015
New Revision: 1718718
URL: http://svn.apache.org/viewvc?rev=1718718=rev
Log:
NUTCH-2042 parse-html increase chunk size used to detect charset
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/plugin/parse-html/src/java/org
Author: snagel
Date: Sat Jan 9 13:01:31 2016
New Revision: 1723851
URL: http://svn.apache.org/viewvc?rev=1723851=rev
Log:
NUTCH-2168 Parse-tika fails to retrieve parser
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/plugin/parse-tika/src/java/org/apache/nutch/parse
Author: snagel
Date: Thu Jan 7 20:57:13 2016
New Revision: 1723626
URL: http://svn.apache.org/viewvc?rev=1723626=rev
Log:
NUTCH-2143 GeneratorJob ignores batch id passed as argument
Modified:
nutch/branches/2.x/CHANGES.txt
nutch/branches/2.x/src/java/org/apache/nutch/crawl
Author: snagel
Date: Tue Nov 24 15:37:32 2015
New Revision: 1716177
URL: http://svn.apache.org/viewvc?rev=1716177=rev
Log:
NUTCH-2175 Typos in property descriptions
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/nutch-default.xml
Modified: nutch/trunk/CHANGES.txt
URL:
http
Repository: nutch
Updated Branches:
refs/heads/master af6d8763f -> d29be63bd
NUTCH-2272 Index checker server to optionally keep client connection open
- removed from change log for release 1.12 as it is not included
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit:
fix unit test: CrawlDbFilter stil writes reduce output dirs as part-0 (not
part-r-0)
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit: http://git-wip-us.apache.org/repos/asf/nutch/commit/f5e430e5
Tree: http://git-wip-us.apache.org/repos/asf/nutch/tree/f5e430e5
Diff:
Repository: nutch
Updated Branches:
refs/heads/master 25e879afc -> f5e430e55
update tests to reflect change of reduce outputs by new API (part-n ->
part-r-n): all unit tests pass now
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit:
NUTCH-1712 applied to current trunk; run first simple tests (inject + merge)
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit: http://git-wip-us.apache.org/repos/asf/nutch/commit/3c691eb2
Tree: http://git-wip-us.apache.org/repos/asf/nutch/tree/3c691eb2
Diff:
g
+* NUTCH-1712 Use MultipleInputs in Injector to make it a single mapreduce job
(tejasp, snagel)
+
* NUTCH-2231 Jexl support in generator job (markus)
* NUTCH-2232 DeduplicationJob should decode URL's before length is compared
(Ron van der Vegt via markus)
add unit tests based on MRUnit
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit: http://git-wip-us.apache.org/repos/asf/nutch/commit/288dceed
Tree: http://git-wip-us.apache.org/repos/asf/nutch/tree/288dceed
Diff: http://git-wip-us.apache.org/repos/asf/nutch/diff/288dceed
Author: snagel
Date: Fri Jan 22 21:26:12 2016
New Revision: 1726314
URL: http://svn.apache.org/viewvc?rev=1726314=rev
Log:
NUTCH-2204 Remove junit lib from runtime
Modified:
nutch/trunk/CHANGES.txt
nutch/trunk/conf/regex-normalize.xml.template
nutch/trunk/ivy/ivy.xml
Modified: nutch
+ b/CHANGES.txt
@@ -10,6 +10,8 @@ in the release announcement and keep it on top in this
CHANGES.txt for the Nutch
Nutch Change Log
+* NUTCH-2256 Inconsistent log level (songwanging via snagel)
+
* NUTCH-2254 Indexer: character set issue with -addBinaryContent and -base64
(Federico Bonelli, sna
ent and -base64
(Federico Bonelli, snagel)
+
* NUTCH-2250 CommonCrawlDumper : Invalid format and skipped parts (Thamme
Gowda N.,lewismc via mattmann)
* NUTCH-2245 Developed the NGram Model on the existing Unigram Cosine
Similarity Model (bhavyasanghavi via sujen)
http://git-wip-us.apache.org/
GES.txt
@@ -2,6 +2,8 @@ Nutch Change Log
Nutch 2.4 Development
+ * NUTCH-2256 Inconsistent log level (songwanging via snagel)
+
* NUTCH-961 GitHub-92 Add the boilerpipe parsing adapted from NUTCH-961
(Jeremie Bourseaux <jeremie.bours...@xilopix.com> via mattmann)
* GitHub-94 Fix
Repository: nutch
Updated Branches:
refs/heads/master 044e8e77e -> 8572fd955
fix for NUTCH-2191 - fixing Nutch build - contributed by karanjeets
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit: http://git-wip-us.apache.org/repos/asf/nutch/commit/8572fd95
Tree:
NUTCH-1553 Property 'indexer.delete.robots.noindex' not working when using
parser-html
- fix broken unit test (fix HTML markup, make test for meta data extraction
obligatory)
- add all values of general metadata to parse metadata
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Repository: nutch
Updated Branches:
refs/heads/master cb6fbae51 -> 34050adae
NUTCH-2291 - Fix mrunit dependencies
- remove classifier from dependency because pom file name on Maven repository
does not contain a classifier
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit:
CrawlDb statistics: add fetch time (earliest, latest, average)
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit: http://git-wip-us.apache.org/repos/asf/nutch/commit/ea2843b9
Tree: http://git-wip-us.apache.org/repos/asf/nutch/tree/ea2843b9
Diff:
CrawlDb statistics: add fetch interval (shortest, longest, average)
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit: http://git-wip-us.apache.org/repos/asf/nutch/commit/39f6c713
Tree: http://git-wip-us.apache.org/repos/asf/nutch/tree/39f6c713
Diff:
Repository: nutch
Updated Branches:
refs/heads/master d27c351f4 -> d37b7ce13
Remove obsolete properties protocol.plugin.check.blocking and
protocol.plugin.check.robots
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit:
Merge branch 'NUTCH-2299' of https://github.com/sebastian-nagel/nutch this
closes #140
- Remove obsolete properties protocol.plugin.check.*
Project: http://git-wip-us.apache.org/repos/asf/nutch/repo
Commit: http://git-wip-us.apache.org/repos/asf/nutch/commit/d37b7ce1
Tree:
Repository: nutch
Updated Branches:
refs/heads/2.x 022ed5c03 -> 700857d16
NUTCH-2349 urlnormalizer-basic: NPE for URLs without authority
- check whether URL.getAuthority() returns null
- recompose URLs without authority with empty authority/host
Project:
1 - 100 of 854 matches
Mail list logo