Author: jerome
Date: Wed Aug 17 02:39:29 2005
New Revision: 233140
URL: http://svn.apache.org/viewcvs?rev=233140view=rev
Log:
First commit test - Add myself to the list of committers
Modified:
lucene/nutch/trunk/site/credits.html
lucene/nutch/trunk/site/credits.pdf
lucene/nutch/trunk
Author: jerome
Date: Thu Aug 18 05:17:22 2005
New Revision: 233312
URL: http://svn.apache.org/viewcvs?rev=233312view=rev
Log:
Gives focus to search query input
Modified:
lucene/nutch/trunk/docs/ca/about.html
lucene/nutch/trunk/docs/ca/help.html
lucene/nutch/trunk/docs/ca/search.html
Author: jerome
Date: Fri Aug 19 08:55:46 2005
New Revision: 233492
URL: http://svn.apache.org/viewcvs?rev=233492view=rev
Log:
NUTCH-10, extension points defined only once (Stefan Grroschupf)
Added:
lucene/nutch/trunk/src/plugin/nutch-extensionpoints/
lucene/nutch/trunk/src/plugin/nutch
Author: jerome
Date: Fri Aug 19 12:26:14 2005
New Revision: 233544
URL: http://svn.apache.org/viewcvs?rev=233544view=rev
Log:
Correction in LanguageIdentifier unit test
Modified:
lucene/nutch/trunk/src/plugin/languageidentifier/src/test/org/apache/nutch/analysis/lang
Author: jerome
Date: Fri Aug 19 14:15:02 2005
New Revision: 233559
URL: http://svn.apache.org/viewcvs?rev=233559view=rev
Log:
* Add utility to extract urls from plain text (Stephan Strittmatter)
* Uses the OutlinkExtractor in parse plugins PDF, MSWord, Text, RTF, Ext
Added:
lucene/nutch
Author: jerome
Date: Fri Aug 26 07:54:16 2005
New Revision: 240254
URL: http://svn.apache.org/viewcvs?rev=240254view=rev
Log:
Javadoc updates, corrections on input stream reading
Modified:
lucene/nutch/tags/Release-0.7/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang
Author: jerome
Date: Fri Aug 26 14:03:19 2005
New Revision: 240345
URL: http://svn.apache.org/viewcvs?rev=240345view=rev
Log:
NUTCH-37, no more javadoc warnings
Modified:
lucene/nutch/branches/Release-0.7/src/java/org/apache/nutch/clustering/OnlineClusterer.java
lucene/nutch/branches
Author: jerome
Date: Fri Aug 26 15:47:04 2005
New Revision: 240359
URL: http://svn.apache.org/viewcvs?rev=240359view=rev
Log:
Add an analysis extension point
Added:
lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java
(with props)
lucene/nutch/trunk/src/java/org
Author: jerome
Date: Wed Aug 31 01:04:52 2005
New Revision: 264964
URL: http://svn.apache.org/viewcvs?rev=264964view=rev
Log:
No more NullPointerException while logging the doc language if none
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSegment.java
Modified: lucene
Author: jerome
Date: Wed Aug 31 04:38:28 2005
New Revision: 265020
URL: http://svn.apache.org/viewcvs?rev=265020view=rev
Log:
Fixes some typo (analySer = analyZer)
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java
Modified:
lucene/nutch/trunk/src/java/org
Author: jerome
Date: Wed Aug 31 08:17:11 2005
New Revision: 265503
URL: http://svn.apache.org/viewcvs?rev=265503view=rev
Log:
Merged 0.7 branch changes 240321:240453 into trunk
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/clustering/OnlineClusterer.java
lucene/nutch/trunk/src
Author: jerome
Date: Thu Sep 1 15:20:51 2005
New Revision: 265794
URL: http://svn.apache.org/viewcvs?rev=265794view=rev
Log:
NUTCH-65, Handles more modification-date format
Added:
lucene/nutch/trunk/lib/commons-lang-2.1.jar (with props)
Modified:
lucene/nutch/trunk/src/plugin/index
Author: jerome
Date: Sun Sep 4 13:53:49 2005
New Revision: 278626
URL: http://svn.apache.org/viewcvs?rev=278626view=rev
Log:
NUTCH-53, Parser plugin for Zip files (Rohit Kulkarni)
Added:
lucene/nutch/trunk/src/plugin/parse-zip/
lucene/nutch/trunk/src/plugin/parse-zip/build.xml
Author: jerome
Date: Tue Sep 6 09:02:26 2005
New Revision: 279027
URL: http://svn.apache.org/viewcvs?rev=279027view=rev
Log:
lib-jakarta-poi added to the list of plugin de deploy/clean
Modified:
lucene/nutch/trunk/src/plugin/build.xml
Modified: lucene/nutch/trunk/src/plugin/build.xml
URL
Author: jerome
Date: Wed Sep 7 02:57:24 2005
New Revision: 279286
URL: http://svn.apache.org/viewcvs?rev=279286view=rev
Log:
Includes protocol-httpclient plugin (instead of protocol-http) and parse-js
erroneously removed during commit of revision 233492
Modified:
lucene/nutch/trunk/conf
Author: jerome
Date: Sun Sep 11 13:28:07 2005
New Revision: 280176
URL: http://svn.apache.org/viewcvs?rev=280176view=rev
Log:
Automatically loads active plugins dependencies (add a property, default is on)
Added:
lucene/nutch/trunk/src/java/org/apache/nutch/plugin
Author: jerome
Date: Sun Sep 11 13:34:12 2005
New Revision: 280179
URL: http://svn.apache.org/viewcvs?rev=280179view=rev
Log:
Add a dependency to nutch-extensionpoints plugin
Modified:
lucene/nutch/trunk/src/plugin/clustering-carrot2/plugin.xml
lucene/nutch/trunk/src/plugin
Author: jerome
Date: Tue Sep 13 05:52:13 2005
New Revision: 280549
URL: http://svn.apache.org/viewcvs?rev=280549view=rev
Log:
Sorted alphabetically for easy maintenance
Modified:
lucene/nutch/trunk/src/plugin/build.xml
Modified: lucene/nutch/trunk/src/plugin/build.xml
URL:
http
Author: jerome
Date: Tue Sep 13 06:06:32 2005
New Revision: 280551
URL: http://svn.apache.org/viewcvs?rev=280551view=rev
Log:
Add a lib plugin for lucene analyzers
Added:
lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/
lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/build.xml
Author: jerome
Date: Tue Sep 13 07:03:36 2005
New Revision: 280556
URL: http://svn.apache.org/viewcvs?rev=280556view=rev
Log:
French and German analyzers added
Added:
lucene/nutch/trunk/src/plugin/analysis-de/
lucene/nutch/trunk/src/plugin/analysis-de/build.xml (with props)
lucene
Author: jerome
Date: Tue Sep 27 13:45:37 2005
New Revision: 292035
URL: http://svn.apache.org/viewcvs?rev=292035view=rev
Log:
NUTCH-88, First step proposal implementation (thanks to Chris Mattmann and
Sébastien Le Callonnec)
Added:
lucene/nutch/trunk/conf/parse-plugins.dtd (with props
Author: jerome
Date: Mon Oct 3 08:56:43 2005
New Revision: 293370
URL: http://svn.apache.org/viewcvs?rev=293370view=rev
Log:
Change the way the parse-plugin.xml file is loaded (MalformedURLException
reported by Earl Cahill)
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/parse
Author: jerome
Date: Tue Dec 6 02:51:06 2005
New Revision: 354399
URL: http://svn.apache.org/viewcvs?rev=354399view=rev
Log:
Merge from trunk 354397:354398 - Improvements in plugin circular dependencies
detection
Modified:
lucene/nutch/branches/mapred/src/java/org/apache/nutch/plugin
Author: jerome
Date: Tue Dec 6 13:42:25 2005
New Revision: 354575
URL: http://svn.apache.org/viewcvs?rev=354575view=rev
Log:
NUTCH-112, link to cached content changed to relative (C. Mattmann)
Modified:
lucene/nutch/trunk/src/web/jsp/cached.jsp
Modified: lucene/nutch/trunk/src/web/jsp
Author: jerome
Date: Tue Dec 6 14:00:12 2005
New Revision: 354582
URL: http://svn.apache.org/viewcvs?rev=354582view=rev
Log:
NUTCH-112, merged from trunk
Modified:
lucene/nutch/branches/mapred/src/web/jsp/cached.jsp
Modified: lucene/nutch/branches/mapred/src/web/jsp/cached.jsp
URL:
http
Author: jerome
Date: Sat Dec 10 15:47:18 2005
New Revision: 355809
URL: http://svn.apache.org/viewcvs?rev=355809view=rev
Log:
Content-Type resolution enhancements:
* Resolution moved from protocol plugins to Content constructor
* Best content-type guessing policy
* Some unit tests added
Modified
Author: jerome
Date: Sat Dec 10 16:36:57 2005
New Revision: 355828
URL: http://svn.apache.org/viewcvs?rev=355828view=rev
Log:
NUTCH-135 : Content metadata are now case insensitive (thanks to S. Groschupf)
Added:
lucene/nutch/trunk/src/java/org/apache/nutch/protocol/ContentProperties.java
Author: jerome
Date: Sat Dec 17 02:06:31 2005
New Revision: 357334
URL: http://svn.apache.org/viewcvs?rev=357334view=rev
Log:
NUTCH-3, ContentProperties can handle multivalued properties (S. Groschupf)
Modified:
lucene/nutch/trunk/conf/nutch-default.xml
lucene/nutch/trunk/src/java/org
Author: jerome
Date: Mon Jan 9 13:45:25 2006
New Revision: 367405
URL: http://svn.apache.org/viewcvs?rev=367405view=rev
Log:
Remove deployment of analysis plugins (under dev)
Remove protocol-http unit tests (moved to lib-http)
Modified:
lucene/nutch/trunk/src/plugin/build.xml
Modified
Author: jerome
Date: Wed Jan 11 15:50:13 2006
New Revision: 368172
URL: http://svn.apache.org/viewcvs?rev=368172view=rev
Log:
Add a style for summary highlight and ellipsis
Modified:
lucene/nutch/trunk/docs/ca/about.html
lucene/nutch/trunk/docs/ca/help.html
lucene/nutch/trunk/docs/ca
Author: jerome
Date: Wed Feb 8 05:58:08 2006
New Revision: 375965
URL: http://svn.apache.org/viewcvs?rev=375965view=rev
Log:
Fix some javadoc issues with lib-http plugin
Added:
lucene/nutch/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/package.html
(with props
Author: jerome
Date: Wed Feb 8 07:42:44 2006
New Revision: 375984
URL: http://svn.apache.org/viewcvs?rev=375984view=rev
Log:
Fix some javadoc errors and warnings
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParsePluginsReader.java
lucene/nutch/trunk/src/java/org/apache
Author: jerome
Date: Wed Feb 8 10:03:01 2006
New Revision: 376012
URL: http://svn.apache.org/viewcvs?rev=376012view=rev
Log:
Add/Move some plugins javadoc to the Plugins group
Modified:
lucene/nutch/trunk/build.xml
lucene/nutch/trunk/default.properties
Modified: lucene/nutch/trunk
Author: jerome
Date: Thu Feb 9 07:14:17 2006
New Revision: 376315
URL: http://svn.apache.org/viewcvs?rev=376315view=rev
Log:
Add a default constructor to avoid failure when JobConf tries to instanciate
ParseSegment on parse command.
(Just a workaround for HADOOP-29 issue).
Modified:
lucene
Author: jerome
Date: Thu Feb 9 15:12:59 2006
New Revision: 376478
URL: http://svn.apache.org/viewcvs?rev=376478view=rev
Log:
Ensure ParseData content metadata contains segment name and score
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
Modified: lucene/nutch
Author: jerome
Date: Thu Feb 9 17:08:15 2006
New Revision: 376522
URL: http://svn.apache.org/viewcvs?rev=376522view=rev
Log:
Ensure signature added in content metadata
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
Modified: lucene/nutch/trunk/src/java/org
Author: jerome
Date: Fri Feb 10 03:24:37 2006
New Revision: 376638
URL: http://svn.apache.org/viewcvs?rev=376638view=rev
Log:
Remove no more used POI libs
Removed:
lucene/nutch/trunk/src/plugin/parse-msword/lib/poi-2.1-20040508.jar
lucene/nutch/trunk/src/plugin/parse-msword/lib/poi
Author: jerome
Date: Sat Feb 11 02:56:14 2006
New Revision: 376966
URL: http://svn.apache.org/viewcvs?rev=376966view=rev
Log:
Fix parse-msexcel unit tests
Modified:
lucene/nutch/trunk/src/plugin/parse-msexcel/src/test/org/apache/nutch/parse/msexcel/TestMSExcelParser.java
Modified:
lucene
Author: jerome
Date: Mon Feb 13 13:28:13 2006
New Revision: 377494
URL: http://svn.apache.org/viewcvs?rev=377494view=rev
Log:
Make use of lib-parsems in word, powerpoint and excel parsers
Removed:
lucene/nutch/trunk/src/plugin/parse-msexcel/src/java/org/apache/nutch/parse/msexcel
Author: jerome
Date: Mon Feb 13 13:43:15 2006
New Revision: 377501
URL: http://svn.apache.org/viewcvs?rev=377501view=rev
Log:
Javadoc updates for ms parsers
Added:
lucene/nutch/trunk/src/plugin/lib-parsems/src/java/org/apache/nutch/parse/ms/package.html
(with props)
Modified:
lucene
Author: jerome
Date: Wed Feb 15 06:24:56 2006
New Revision: 378011
URL: http://svn.apache.org/viewcvs?rev=378011view=rev
Log:
Add a log4j library plugin (lib-log4j)
Added:
lucene/nutch/trunk/src/plugin/lib-log4j/
lucene/nutch/trunk/src/plugin/lib-log4j/build.xml (with props)
lucene
Author: jerome
Date: Thu Feb 16 02:10:23 2006
New Revision: 378214
URL: http://svn.apache.org/viewcvs?rev=378214view=rev
Log:
NUTCH-196 : add a httpclient library (lib-commons-httpclient)
Added:
lucene/nutch/trunk/src/plugin/lib-commons-httpclient/
lucene/nutch/trunk/src/plugin/lib
Author: jerome
Date: Thu Feb 16 02:22:10 2006
New Revision: 378215
URL: http://svn.apache.org/viewcvs?rev=378215view=rev
Log:
remove the unused httpclient library declaration
Modified:
lucene/nutch/trunk/src/plugin/protocol-httpclient/plugin.xml
Modified: lucene/nutch/trunk/src/plugin
Author: jerome
Date: Thu Feb 16 02:51:14 2006
New Revision: 378222
URL: http://svn.apache.org/viewcvs?rev=378222view=rev
Log:
add lib-nekohtml to deploy and clean and force the lib plugins to be builded
first
Modified:
lucene/nutch/trunk/src/plugin/build.xml
Modified: lucene/nutch/trunk
Author: jerome
Date: Fri Feb 17 15:22:55 2006
New Revision: 378653
URL: http://svn.apache.org/viewcvs?rev=378653view=rev
Log:
Adapt parse-rtf to nutch APIs changes (metadata, parse, protocol, ...)
Modified:
lucene/nutch/trunk/src/plugin/parse-rtf/src/java/org/apache/nutch/parse/rtf
Author: jerome
Date: Fri Feb 17 15:28:39 2006
New Revision: 378655
URL: http://svn.apache.org/viewcvs?rev=378655view=rev
Log:
Review plugins building and testing
Modified:
lucene/nutch/trunk/src/plugin/analysis-de/build.xml
lucene/nutch/trunk/src/plugin/analysis-fr/build.xml
lucene
Author: jerome
Date: Fri Feb 17 16:23:35 2006
New Revision: 378667
URL: http://svn.apache.org/viewcvs?rev=378667view=rev
Log:
Adapts parse-mp3 to nutch APIs changes (metadata, parse, protocol, ...)
Modified:
lucene/nutch/trunk/src/plugin/parse-mp3/src/java/org/apache/nutch/parse/mp3
Author: jerome
Date: Tue Feb 21 01:54:21 2006
New Revision: 379403
URL: http://svn.apache.org/viewcvs?rev=379403view=rev
Log:
NUTCH-140, parse-plugin.xml can now use extension-id and plugin-id
Modified:
lucene/nutch/trunk/conf/parse-plugins.dtd
lucene/nutch/trunk/conf/parse-plugins.xml
Author: jerome
Date: Tue Feb 21 03:10:42 2006
New Revision: 379419
URL: http://svn.apache.org/viewcvs?rev=379419view=rev
Log:
NUTCH-214, Add a search mailing list archive link (Jake Vanderdray)
Modified:
lucene/nutch/trunk/site/mailing_lists.html
lucene/nutch/trunk/site/mailing_lists.pdf
Author: jerome
Date: Tue Feb 21 08:05:07 2006
New Revision: 379511
URL: http://svn.apache.org/viewcvs?rev=379511view=rev
Log:
Add fr help page
Added:
lucene/nutch/trunk/docs/fr/help.html (with props)
lucene/nutch/trunk/src/web/pages/fr/help.xml (with props)
Added: lucene/nutch/trunk
Author: jerome
Date: Thu Mar 2 14:38:40 2006
New Revision: 382535
URL: http://svn.apache.org/viewcvs?rev=382535view=rev
Log:
Fix content.limit inconsistency in http, ftp and file
Modified:
lucene/nutch/trunk/conf/nutch-default.xml
lucene/nutch/trunk/src/plugin/protocol-file/src/java
Author: jerome
Date: Fri Mar 3 16:26:54 2006
New Revision: 382981
URL: http://svn.apache.org/viewcvs?rev=382981view=rev
Log:
Plugins now assumes that the core is already builded when building nutch
Modified:
lucene/nutch/trunk/src/plugin/analysis-de/build.xml
lucene/nutch/trunk/src
Author: jerome
Date: Sun Mar 12 01:41:13 2006
New Revision: 385267
URL: http://svn.apache.org/viewcvs?rev=385267view=rev
Log:
NUTCH-228, clustering plugin descriptor fixed (Dawid Weiss)
Modified:
lucene/nutch/trunk/src/plugin/clustering-carrot2/plugin.xml
Modified: lucene/nutch/trunk/src
Author: jerome
Date: Sun Mar 12 01:42:44 2006
New Revision: 385268
URL: http://svn.apache.org/viewcvs?rev=385268view=rev
Log:
Fix milliseconds dumped by cluster.jsp (Dawid Weiss)
Modified:
lucene/nutch/trunk/src/web/jsp/cluster.jsp
Modified: lucene/nutch/trunk/src/web/jsp/cluster.jsp
URL
Author: jerome
Date: Mon Mar 13 16:00:38 2006
New Revision: 385702
URL: http://svn.apache.org/viewcvs?rev=385702view=rev
Log:
Fix NPE if lang is null
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch
Author: jerome
Date: Tue Mar 21 14:26:56 2006
New Revision: 387650
URL: http://svn.apache.org/viewcvs?rev=387650view=rev
Log:
urlfilter-regex now use the lib-regex-filter framework.
Add some unit tests.
Added:
lucene/nutch/trunk/src/plugin/urlfilter-regex/sample/
lucene/nutch/trunk/src
Author: jerome
Date: Tue Mar 21 14:29:18 2006
New Revision: 387651
URL: http://svn.apache.org/viewcvs?rev=387651view=rev
Log:
Add an urlfilter based on dk.brics.automaton.
Added:
lucene/nutch/trunk/src/plugin/urlfilter-automaton/
lucene/nutch/trunk/src/plugin/urlfilter-automaton
Author: jerome
Date: Tue Mar 21 14:49:54 2006
New Revision: 387659
URL: http://svn.apache.org/viewcvs?rev=387659view=rev
Log:
Add some missing runtime dependencies
Modified:
lucene/nutch/trunk/src/plugin/urlfilter-automaton/plugin.xml
lucene/nutch/trunk/src/plugin/urlfilter-regex
Author: jerome
Date: Thu Mar 23 15:21:03 2006
New Revision: 388293
URL: http://svn.apache.org/viewcvs?rev=388293view=rev
Log:
Set the configuration of the parser used in the main method to fix NPEs
Modified:
lucene/nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html
Author: jerome
Date: Mon Mar 27 06:52:14 2006
New Revision: 389160
URL: http://svn.apache.org/viewcvs?rev=389160view=rev
Log:
NUTCH-210, Add an xsl that generates a basic ServletContext XML file for the
nutch webapp and make use of the
ServletContext init parameters to override the properties
Author: jerome
Date: Tue Mar 28 01:40:29 2006
New Revision: 389456
URL: http://svn.apache.org/viewcvs?rev=389456view=rev
Log:
NUTCH-210, forgot to commit the nutch.xml xsl generation
Modified:
lucene/nutch/trunk/build.xml
Modified: lucene/nutch/trunk/build.xml
URL:
http://svn.apache.org
Author: jerome
Date: Tue Mar 28 06:56:21 2006
New Revision: 389514
URL: http://svn.apache.org/viewcvs?rev=389514view=rev
Log:
NUTCH-210, make use of nutch.xml servlet context in opensearch servlet too
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/searcher/OpenSearchServlet.java
Author: jerome
Date: Wed Mar 29 01:45:01 2006
New Revision: 389729
URL: http://svn.apache.org/viewcvs?rev=389729view=rev
Log:
parse-rss now depends on new lib-xml plugin
Removed:
lucene/nutch/trunk/src/plugin/parse-rss/lib/jaxen-core.jar
lucene/nutch/trunk/src/plugin/parse-rss/lib/jaxen
Author: jerome
Date: Fri Mar 31 03:04:43 2006
New Revision: 390392
URL: http://svn.apache.org/viewcvs?rev=390392view=rev
Log:
Add a common Pluggable interface to all Extension Points and a package
description for plugin
Added:
lucene/nutch/trunk/src/java/org/apache/nutch/plugin
Author: jerome
Date: Mon Apr 3 13:57:46 2006
New Revision: 391150
URL: http://svn.apache.org/viewcvs?rev=391150view=rev
Log:
no more dump parse-mspowerpoint unit test result to a file for visual checks
Modified:
lucene/nutch/trunk/src/plugin/parse-mspowerpoint/src/test/org/apache/nutch
Author: jerome
Date: Thu Apr 6 03:49:40 2006
New Revision: 391958
URL: http://svn.apache.org/viewcvs?rev=391958view=rev
Log:
NUTCH-244, db.max.outlinks.per.page can now be negative for no limit of handled
outlinks per page
Modified:
lucene/nutch/trunk/conf/nutch-default.xml
lucene
Author: jerome
Date: Fri Apr 14 16:57:24 2006
New Revision: 394228
URL: http://svn.apache.org/viewcvs?rev=394228view=rev
Log:
NUTCH-245 : Added a DTD for Nutch Plugin Manifest
- Add a commented DTD in src
- Add the DTD in javadoc
- Change the implementation element structure : uses name
Author: jerome
Date: Fri Apr 14 17:13:21 2006
New Revision: 394231
URL: http://svn.apache.org/viewcvs?rev=394231view=rev
Log:
NUTCH-245 : Some minor fixes
- Added Apache License in DTD (?)
- Delete the org/apache/nutch/plugin/doc-files once javadoc task completed.
Modified:
lucene/nutch
Author: jerome
Date: Mon May 8 14:04:01 2006
New Revision: 405165
URL: http://svn.apache.org/viewcvs?rev=405165view=rev
Log:
NUTCH-134 : Added a summarizer extension point and two enxtensions:
* summary-basic is the current nutch implementation moved into a plugin
* summary-lucene a raw version
Author: jerome
Date: Tue May 9 16:06:17 2006
New Revision: 405566
URL: http://svn.apache.org/viewcvs?rev=405566view=rev
Log:
NUTCH-134 - No more needs for the clusterer to remove html tags from summaries
Modified:
lucene/nutch/trunk/src/plugin/clustering-carrot2/src/java/org/apache/nutch
Author: jerome
Date: Sat May 13 02:14:53 2006
New Revision: 406053
URL: http://svn.apache.org/viewcvs?rev=406053view=rev
Log:
Fix Javadoc Warnings
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/clustering/OnlineClustererFactory.java
lucene/nutch/trunk/src/java/org/apache/nutch
Author: jerome
Date: Mon Jun 5 14:43:42 2006
New Revision: 411926
URL: http://svn.apache.org/viewvc?rev=411926view=rev
Log:
NUTCH-298 : No more NPE if a 404 for a robots.txt + some unit tests
Modified:
lucene/nutch/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api
Author: jerome
Date: Mon Jun 5 15:02:35 2006
New Revision: 411935
URL: http://svn.apache.org/viewvc?rev=411935view=rev
Log:
NucthAnalyzer is now Configurable (Teruhiko Kurosaka)
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/analysis/NutchAnalyzer.java
Modified:
lucene/nutch/trunk
Author: jerome
Date: Wed Jun 7 06:07:27 2006
New Revision: 412399
URL: http://svn.apache.org/viewvc?rev=412399view=rev
Log:
NUTCH-275 : Remove the magic resolution for xml content-type
Modified:
lucene/nutch/trunk/conf/mime-types.xml
Modified: lucene/nutch/trunk/conf/mime-types.xml
URL
Author: jerome
Date: Wed Jun 7 15:06:53 2006
New Revision: 412577
URL: http://svn.apache.org/viewvc?rev=412577view=rev
Log:
NUTCH-275 : Remove unit test for magic based content type guessing for xml
Removed:
lucene/nutch/trunk/src/test/org/apache/nutch/util/mime/test.xml
Modified
Author: jerome
Date: Wed Jun 7 15:19:08 2006
New Revision: 412582
URL: http://svn.apache.org/viewvc?rev=412582view=rev
Log:
NUTCH-301 : CommonTerms are cached in the Configuration
Modified:
lucene/nutch/trunk/src/java/org/apache/nutch/analysis/CommonGrams.java
Modified: lucene/nutch/trunk
76 matches
Mail list logo