svn commit: r412399 - /lucene/nutch/trunk/conf/mime-types.xml

2006-06-07 Thread jerome
Author: jerome Date: Wed Jun 7 06:07:27 2006 New Revision: 412399 URL: http://svn.apache.org/viewvc?rev=412399view=rev Log: NUTCH-275 : Remove the magic resolution for xml content-type Modified: lucene/nutch/trunk/conf/mime-types.xml Modified: lucene/nutch/trunk/conf/mime-types.xml URL

svn commit: r412577 - in /lucene/nutch/trunk/src/test/org/apache/nutch/util/mime: mime-types.txt test.xml

2006-06-07 Thread jerome
Author: jerome Date: Wed Jun 7 15:06:53 2006 New Revision: 412577 URL: http://svn.apache.org/viewvc?rev=412577view=rev Log: NUTCH-275 : Remove unit test for magic based content type guessing for xml Removed: lucene/nutch/trunk/src/test/org/apache/nutch/util/mime/test.xml Modified

svn commit: r412582 - /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/CommonGrams.java

2006-06-07 Thread jerome
Author: jerome Date: Wed Jun 7 15:19:08 2006 New Revision: 412582 URL: http://svn.apache.org/viewvc?rev=412582view=rev Log: NUTCH-301 : CommonTerms are cached in the Configuration Modified: lucene/nutch/trunk/src/java/org/apache/nutch/analysis/CommonGrams.java Modified: lucene/nutch/trunk

svn commit: r411926 - in /lucene/nutch/trunk/src/plugin/lib-http/src: java/org/apache/nutch/protocol/http/api/RobotRulesParser.java test/org/apache/nutch/protocol/http/api/TestRobotRulesParser.java

2006-06-05 Thread jerome
Author: jerome Date: Mon Jun 5 14:43:42 2006 New Revision: 411926 URL: http://svn.apache.org/viewvc?rev=411926view=rev Log: NUTCH-298 : No more NPE if a 404 for a robots.txt + some unit tests Modified: lucene/nutch/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api

svn commit: r411935 - /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/NutchAnalyzer.java

2006-06-05 Thread jerome
Author: jerome Date: Mon Jun 5 15:02:35 2006 New Revision: 411935 URL: http://svn.apache.org/viewvc?rev=411935view=rev Log: NucthAnalyzer is now Configurable (Teruhiko Kurosaka) Modified: lucene/nutch/trunk/src/java/org/apache/nutch/analysis/NutchAnalyzer.java Modified: lucene/nutch/trunk

svn commit: r406053 - in /lucene/nutch/trunk/src: java/org/apache/nutch/clustering/ java/org/apache/nutch/parse/ java/org/apache/nutch/scoring/ plugin/protocol-httpclient/src/java/org/apache/nutch/pro

2006-05-13 Thread jerome
Author: jerome Date: Sat May 13 02:14:53 2006 New Revision: 406053 URL: http://svn.apache.org/viewcvs?rev=406053view=rev Log: Fix Javadoc Warnings Modified: lucene/nutch/trunk/src/java/org/apache/nutch/clustering/OnlineClustererFactory.java lucene/nutch/trunk/src/java/org/apache/nutch

svn commit: r405566 - /lucene/nutch/trunk/src/plugin/clustering-carrot2/src/java/org/apache/nutch/clustering/carrot2/LocalNutchInputComponent.java

2006-05-09 Thread jerome
Author: jerome Date: Tue May 9 16:06:17 2006 New Revision: 405566 URL: http://svn.apache.org/viewcvs?rev=405566view=rev Log: NUTCH-134 - No more needs for the clusterer to remove html tags from summaries Modified: lucene/nutch/trunk/src/plugin/clustering-carrot2/src/java/org/apache/nutch

svn commit: r405165 - in /lucene/nutch/trunk: ./ conf/ src/java/org/apache/nutch/searcher/ src/plugin/ src/plugin/nutch-extensionpoints/ src/plugin/summary-basic/ src/plugin/summary-basic/src/ src/plu

2006-05-08 Thread jerome
Author: jerome Date: Mon May 8 14:04:01 2006 New Revision: 405165 URL: http://svn.apache.org/viewcvs?rev=405165view=rev Log: NUTCH-134 : Added a summarizer extension point and two enxtensions: * summary-basic is the current nutch implementation moved into a plugin * summary-lucene a raw version

svn commit: r394228 - in /lucene/nutch/trunk: ./ src/java/org/apache/nutch/plugin/ src/plugin/ src/plugin/analysis-de/ src/plugin/analysis-fr/ src/plugin/clustering-carrot2/ src/plugin/creativecommons

2006-04-14 Thread jerome
Author: jerome Date: Fri Apr 14 16:57:24 2006 New Revision: 394228 URL: http://svn.apache.org/viewcvs?rev=394228view=rev Log: NUTCH-245 : Added a DTD for Nutch Plugin Manifest - Add a commented DTD in src - Add the DTD in javadoc - Change the implementation element structure : uses name

svn commit: r394231 - in /lucene/nutch/trunk: build.xml src/plugin/plugin.dtd

2006-04-14 Thread jerome
Author: jerome Date: Fri Apr 14 17:13:21 2006 New Revision: 394231 URL: http://svn.apache.org/viewcvs?rev=394231view=rev Log: NUTCH-245 : Some minor fixes - Added Apache License in DTD (?) - Delete the org/apache/nutch/plugin/doc-files once javadoc task completed. Modified: lucene/nutch

svn commit: r391958 - in /lucene/nutch/trunk: conf/nutch-default.xml src/java/org/apache/nutch/parse/ParseData.java src/test/org/apache/nutch/parse/TestParseData.java src/test/org/apache/nutch/util/Wr

2006-04-06 Thread jerome
Author: jerome Date: Thu Apr 6 03:49:40 2006 New Revision: 391958 URL: http://svn.apache.org/viewcvs?rev=391958view=rev Log: NUTCH-244, db.max.outlinks.per.page can now be negative for no limit of handled outlinks per page Modified: lucene/nutch/trunk/conf/nutch-default.xml lucene

svn commit: r391150 - /lucene/nutch/trunk/src/plugin/parse-mspowerpoint/src/test/org/apache/nutch/parse/mspowerpoint/TestMSPowerPointParser.java

2006-04-03 Thread jerome
Author: jerome Date: Mon Apr 3 13:57:46 2006 New Revision: 391150 URL: http://svn.apache.org/viewcvs?rev=391150view=rev Log: no more dump parse-mspowerpoint unit test result to a file for visual checks Modified: lucene/nutch/trunk/src/plugin/parse-mspowerpoint/src/test/org/apache/nutch

svn commit: r390392 - in /lucene/nutch/trunk/src/java/org/apache/nutch: analysis/ clustering/ indexer/ net/ ontology/ parse/ plugin/ protocol/ searcher/

2006-03-31 Thread jerome
Author: jerome Date: Fri Mar 31 03:04:43 2006 New Revision: 390392 URL: http://svn.apache.org/viewcvs?rev=390392view=rev Log: Add a common Pluggable interface to all Extension Points and a package description for plugin Added: lucene/nutch/trunk/src/java/org/apache/nutch/plugin

svn commit: r390277 - in /lucene/nutch/trunk: build.xml default.properties

2006-03-30 Thread jerome
Author: jerome Date: Thu Mar 30 15:11:55 2006 New Revision: 390277 URL: http://svn.apache.org/viewcvs?rev=390277view=rev Log: Add query-basic plugin to javadoc Modified: lucene/nutch/trunk/build.xml lucene/nutch/trunk/default.properties Modified: lucene/nutch/trunk/build.xml URL: http

svn commit: r389729 - in /lucene/nutch/trunk/src/plugin/parse-rss: build.xml lib/jaxen-core.jar lib/jaxen-jdom.jar lib/jdom.jar lib/saxpath.jar lib/xercesImpl.jar lib/xml-apis.jar plugin.xml

2006-03-29 Thread jerome
Author: jerome Date: Wed Mar 29 01:45:01 2006 New Revision: 389729 URL: http://svn.apache.org/viewcvs?rev=389729view=rev Log: parse-rss now depends on new lib-xml plugin Removed: lucene/nutch/trunk/src/plugin/parse-rss/lib/jaxen-core.jar lucene/nutch/trunk/src/plugin/parse-rss/lib/jaxen

svn commit: r389456 - /lucene/nutch/trunk/build.xml

2006-03-28 Thread jerome
Author: jerome Date: Tue Mar 28 01:40:29 2006 New Revision: 389456 URL: http://svn.apache.org/viewcvs?rev=389456view=rev Log: NUTCH-210, forgot to commit the nutch.xml xsl generation Modified: lucene/nutch/trunk/build.xml Modified: lucene/nutch/trunk/build.xml URL: http://svn.apache.org

svn commit: r389514 - /lucene/nutch/trunk/src/java/org/apache/nutch/searcher/OpenSearchServlet.java

2006-03-28 Thread jerome
Author: jerome Date: Tue Mar 28 06:56:21 2006 New Revision: 389514 URL: http://svn.apache.org/viewcvs?rev=389514view=rev Log: NUTCH-210, make use of nutch.xml servlet context in opensearch servlet too Modified: lucene/nutch/trunk/src/java/org/apache/nutch/searcher/OpenSearchServlet.java

svn commit: r389160 - in /lucene/nutch/trunk: conf/ src/java/org/apache/nutch/util/ src/web/jsp/

2006-03-27 Thread jerome
Author: jerome Date: Mon Mar 27 06:52:14 2006 New Revision: 389160 URL: http://svn.apache.org/viewcvs?rev=389160view=rev Log: NUTCH-210, Add an xsl that generates a basic ServletContext XML file for the nutch webapp and make use of the ServletContext init parameters to override the properties

svn commit: r388293 - /lucene/nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java

2006-03-23 Thread jerome
Author: jerome Date: Thu Mar 23 15:21:03 2006 New Revision: 388293 URL: http://svn.apache.org/viewcvs?rev=388293view=rev Log: Set the configuration of the parser used in the main method to fix NPEs Modified: lucene/nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html

svn commit: r387647 - in /lucene/nutch/trunk/src/plugin/lib-regex-filter: ./ sample/ src/ src/java/ src/java/org/ src/java/org/apache/ src/java/org/apache/nutch/ src/java/org/apache/nutch/net/ src/tes

2006-03-21 Thread jerome
Author: jerome Date: Tue Mar 21 14:24:16 2006 New Revision: 387647 URL: http://svn.apache.org/viewcvs?rev=387647view=rev Log: Add a mini framework plugin for regex url filter plugins. Added: lucene/nutch/trunk/src/plugin/lib-regex-filter/ lucene/nutch/trunk/src/plugin/lib-regex-filter

svn commit: r387650 - in /lucene/nutch/trunk/src/plugin/urlfilter-regex: ./ sample/ src/java/org/apache/nutch/net/ src/test/ src/test/org/ src/test/org/apache/ src/test/org/apache/nutch/ src/test/org/

2006-03-21 Thread jerome
Author: jerome Date: Tue Mar 21 14:26:56 2006 New Revision: 387650 URL: http://svn.apache.org/viewcvs?rev=387650view=rev Log: urlfilter-regex now use the lib-regex-filter framework. Add some unit tests. Added: lucene/nutch/trunk/src/plugin/urlfilter-regex/sample/ lucene/nutch/trunk/src

svn commit: r387651 - in /lucene/nutch/trunk/src/plugin/urlfilter-automaton: ./ lib/ sample/ src/ src/java/ src/java/org/ src/java/org/apache/ src/java/org/apache/nutch/ src/java/org/apache/nutch/net/

2006-03-21 Thread jerome
Author: jerome Date: Tue Mar 21 14:29:18 2006 New Revision: 387651 URL: http://svn.apache.org/viewcvs?rev=387651view=rev Log: Add an urlfilter based on dk.brics.automaton. Added: lucene/nutch/trunk/src/plugin/urlfilter-automaton/ lucene/nutch/trunk/src/plugin/urlfilter-automaton

svn commit: r387657 - /lucene/nutch/trunk/conf/automaton-urlfilter.txt.template

2006-03-21 Thread jerome
Author: jerome Date: Tue Mar 21 14:40:15 2006 New Revision: 387657 URL: http://svn.apache.org/viewcvs?rev=387657view=rev Log: Add an automaton urlfilter rules template Added: lucene/nutch/trunk/conf/automaton-urlfilter.txt.template Added: lucene/nutch/trunk/conf/automaton

svn commit: r387659 - in /lucene/nutch/trunk/src/plugin: urlfilter-automaton/plugin.xml urlfilter-regex/plugin.xml

2006-03-21 Thread jerome
Author: jerome Date: Tue Mar 21 14:49:54 2006 New Revision: 387659 URL: http://svn.apache.org/viewcvs?rev=387659view=rev Log: Add some missing runtime dependencies Modified: lucene/nutch/trunk/src/plugin/urlfilter-automaton/plugin.xml lucene/nutch/trunk/src/plugin/urlfilter-regex

svn commit: r385702 - /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java

2006-03-13 Thread jerome
Author: jerome Date: Mon Mar 13 16:00:38 2006 New Revision: 385702 URL: http://svn.apache.org/viewcvs?rev=385702view=rev Log: Fix NPE if lang is null Modified: lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java Modified: lucene/nutch/trunk/src/java/org/apache/nutch

svn commit: r385267 - /lucene/nutch/trunk/src/plugin/clustering-carrot2/plugin.xml

2006-03-12 Thread jerome
Author: jerome Date: Sun Mar 12 01:41:13 2006 New Revision: 385267 URL: http://svn.apache.org/viewcvs?rev=385267view=rev Log: NUTCH-228, clustering plugin descriptor fixed (Dawid Weiss) Modified: lucene/nutch/trunk/src/plugin/clustering-carrot2/plugin.xml Modified: lucene/nutch/trunk/src

svn commit: r385268 - /lucene/nutch/trunk/src/web/jsp/cluster.jsp

2006-03-12 Thread jerome
Author: jerome Date: Sun Mar 12 01:42:44 2006 New Revision: 385268 URL: http://svn.apache.org/viewcvs?rev=385268view=rev Log: Fix milliseconds dumped by cluster.jsp (Dawid Weiss) Modified: lucene/nutch/trunk/src/web/jsp/cluster.jsp Modified: lucene/nutch/trunk/src/web/jsp/cluster.jsp URL

svn commit: r384639 - in /lucene/nutch/trunk: conf/ src/java/org/apache/nutch/searcher/ src/plugin/creativecommons/src/java/org/creativecommons/nutch/ src/plugin/languageidentifier/src/java/org/apache

2006-03-09 Thread jerome
Author: jerome Date: Thu Mar 9 15:04:24 2006 New Revision: 384639 URL: http://svn.apache.org/viewcvs?rev=384639view=rev Log: Add boost configuration param for RawFieldQueryFilters Modified: lucene/nutch/trunk/conf/nutch-default.xml lucene/nutch/trunk/src/java/org/apache/nutch/searcher

svn commit: r382981 - in /lucene/nutch/trunk/src/plugin: ./ analysis-de/ analysis-fr/ clustering-carrot2/ creativecommons/ index-basic/ index-more/ languageidentifier/ lib-http/ lib-parsems/ microform

2006-03-03 Thread jerome
Author: jerome Date: Fri Mar 3 16:26:54 2006 New Revision: 382981 URL: http://svn.apache.org/viewcvs?rev=382981view=rev Log: Plugins now assumes that the core is already builded when building nutch Modified: lucene/nutch/trunk/src/plugin/analysis-de/build.xml lucene/nutch/trunk/src

svn commit: r382535 - in /lucene/nutch/trunk: conf/nutch-default.xml src/plugin/protocol-file/src/java/org/apache/nutch/protocol/file/FileResponse.java src/plugin/protocol-ftp/src/java/org/apache/nutc

2006-03-02 Thread jerome
Author: jerome Date: Thu Mar 2 14:38:40 2006 New Revision: 382535 URL: http://svn.apache.org/viewcvs?rev=382535view=rev Log: Fix content.limit inconsistency in http, ftp and file Modified: lucene/nutch/trunk/conf/nutch-default.xml lucene/nutch/trunk/src/plugin/protocol-file/src/java

svn commit: r379403 - in /lucene/nutch/trunk: conf/ src/java/org/apache/nutch/parse/ src/plugin/creativecommons/src/test/org/creativecommons/nutch/ src/plugin/languageidentifier/src/java/org/apache/nu

2006-02-21 Thread jerome
Author: jerome Date: Tue Feb 21 01:54:21 2006 New Revision: 379403 URL: http://svn.apache.org/viewcvs?rev=379403view=rev Log: NUTCH-140, parse-plugin.xml can now use extension-id and plugin-id Modified: lucene/nutch/trunk/conf/parse-plugins.dtd lucene/nutch/trunk/conf/parse-plugins.xml

svn commit: r379419 - in /lucene/nutch/trunk: site/mailing_lists.html site/mailing_lists.pdf src/site/src/documentation/content/xdocs/mailing_lists.xml

2006-02-21 Thread jerome
Author: jerome Date: Tue Feb 21 03:10:42 2006 New Revision: 379419 URL: http://svn.apache.org/viewcvs?rev=379419view=rev Log: NUTCH-214, Add a search mailing list archive link (Jake Vanderdray) Modified: lucene/nutch/trunk/site/mailing_lists.html lucene/nutch/trunk/site/mailing_lists.pdf

svn commit: r379511 - in /lucene/nutch/trunk: docs/fr/help.html src/web/pages/fr/help.xml

2006-02-21 Thread jerome
Author: jerome Date: Tue Feb 21 08:05:07 2006 New Revision: 379511 URL: http://svn.apache.org/viewcvs?rev=379511view=rev Log: Add fr help page Added: lucene/nutch/trunk/docs/fr/help.html (with props) lucene/nutch/trunk/src/web/pages/fr/help.xml (with props) Added: lucene/nutch/trunk

svn commit: r378653 - in /lucene/nutch/trunk/src/plugin/parse-rtf/src: java/org/apache/nutch/parse/rtf/RTFParseFactory.java java/org/apache/nutch/parse/rtf/RTFParserDelegateImpl.java test/org/apache/n

2006-02-17 Thread jerome
Author: jerome Date: Fri Feb 17 15:22:55 2006 New Revision: 378653 URL: http://svn.apache.org/viewcvs?rev=378653view=rev Log: Adapt parse-rtf to nutch APIs changes (metadata, parse, protocol, ...) Modified: lucene/nutch/trunk/src/plugin/parse-rtf/src/java/org/apache/nutch/parse/rtf

svn commit: r378655 - in /lucene/nutch/trunk/src/plugin: ./ analysis-de/ analysis-fr/ clustering-carrot2/ creativecommons/ index-basic/ index-more/ languageidentifier/ lib-commons-httpclient/ lib-http

2006-02-17 Thread jerome
Author: jerome Date: Fri Feb 17 15:28:39 2006 New Revision: 378655 URL: http://svn.apache.org/viewcvs?rev=378655view=rev Log: Review plugins building and testing Modified: lucene/nutch/trunk/src/plugin/analysis-de/build.xml lucene/nutch/trunk/src/plugin/analysis-fr/build.xml lucene

svn commit: r378667 - in /lucene/nutch/trunk/src/plugin/parse-mp3/src: java/org/apache/nutch/parse/mp3/MP3Parser.java java/org/apache/nutch/parse/mp3/MetadataCollector.java test/org/apache/nutch/parse

2006-02-17 Thread jerome
Author: jerome Date: Fri Feb 17 16:23:35 2006 New Revision: 378667 URL: http://svn.apache.org/viewcvs?rev=378667view=rev Log: Adapts parse-mp3 to nutch APIs changes (metadata, parse, protocol, ...) Modified: lucene/nutch/trunk/src/plugin/parse-mp3/src/java/org/apache/nutch/parse/mp3

svn commit: r378214 - in /lucene/nutch/trunk/src/plugin: ./ clustering-carrot2/ lib-commons-httpclient/ lib-commons-httpclient/lib/ parse-rss/lib/ protocol-httpclient/ protocol-httpclient/lib/

2006-02-16 Thread jerome
Author: jerome Date: Thu Feb 16 02:10:23 2006 New Revision: 378214 URL: http://svn.apache.org/viewcvs?rev=378214view=rev Log: NUTCH-196 : add a httpclient library (lib-commons-httpclient) Added: lucene/nutch/trunk/src/plugin/lib-commons-httpclient/ lucene/nutch/trunk/src/plugin/lib

svn commit: r378215 - /lucene/nutch/trunk/src/plugin/protocol-httpclient/plugin.xml

2006-02-16 Thread jerome
Author: jerome Date: Thu Feb 16 02:22:10 2006 New Revision: 378215 URL: http://svn.apache.org/viewcvs?rev=378215view=rev Log: remove the unused httpclient library declaration Modified: lucene/nutch/trunk/src/plugin/protocol-httpclient/plugin.xml Modified: lucene/nutch/trunk/src/plugin

svn commit: r378222 - /lucene/nutch/trunk/src/plugin/build.xml

2006-02-16 Thread jerome
Author: jerome Date: Thu Feb 16 02:51:14 2006 New Revision: 378222 URL: http://svn.apache.org/viewcvs?rev=378222view=rev Log: add lib-nekohtml to deploy and clean and force the lib plugins to be builded first Modified: lucene/nutch/trunk/src/plugin/build.xml Modified: lucene/nutch/trunk

svn commit: r378011 - in /lucene/nutch/trunk/src/plugin: ./ clustering-carrot2/ clustering-carrot2/lib/ lib-log4j/ lib-log4j/lib/ parse-pdf/ parse-pdf/lib/ parse-rss/ parse-rss/lib/

2006-02-15 Thread jerome
Author: jerome Date: Wed Feb 15 06:24:56 2006 New Revision: 378011 URL: http://svn.apache.org/viewcvs?rev=378011view=rev Log: Add a log4j library plugin (lib-log4j) Added: lucene/nutch/trunk/src/plugin/lib-log4j/ lucene/nutch/trunk/src/plugin/lib-log4j/build.xml (with props) lucene

svn commit: r377494 - in /lucene/nutch/trunk/src/plugin: parse-msexcel/ parse-msexcel/src/java/org/apache/nutch/parse/msexcel/ parse-mspowerpoint/ parse-mspowerpoint/src/java/org/apache/nutch/parse/ms

2006-02-13 Thread jerome
Author: jerome Date: Mon Feb 13 13:28:13 2006 New Revision: 377494 URL: http://svn.apache.org/viewcvs?rev=377494view=rev Log: Make use of lib-parsems in word, powerpoint and excel parsers Removed: lucene/nutch/trunk/src/plugin/parse-msexcel/src/java/org/apache/nutch/parse/msexcel

svn commit: r377501 - in /lucene/nutch/trunk: ./ src/plugin/lib-parsems/src/java/org/apache/nutch/parse/ms/ src/plugin/parse-mspowerpoint/src/java/org/apache/nutch/parse/mspowerpoint/

2006-02-13 Thread jerome
Author: jerome Date: Mon Feb 13 13:43:15 2006 New Revision: 377501 URL: http://svn.apache.org/viewcvs?rev=377501view=rev Log: Javadoc updates for ms parsers Added: lucene/nutch/trunk/src/plugin/lib-parsems/src/java/org/apache/nutch/parse/ms/package.html (with props) Modified: lucene

svn commit: r376966 - /lucene/nutch/trunk/src/plugin/parse-msexcel/src/test/org/apache/nutch/parse/msexcel/TestMSExcelParser.java

2006-02-11 Thread jerome
Author: jerome Date: Sat Feb 11 02:56:14 2006 New Revision: 376966 URL: http://svn.apache.org/viewcvs?rev=376966view=rev Log: Fix parse-msexcel unit tests Modified: lucene/nutch/trunk/src/plugin/parse-msexcel/src/test/org/apache/nutch/parse/msexcel/TestMSExcelParser.java Modified: lucene

svn commit: r376638 - in /lucene/nutch/trunk/src/plugin/parse-msword: lib/poi-2.1-20040508.jar lib/poi-scratchpad-2.1-20040508.jar plugin.xml

2006-02-10 Thread jerome
Author: jerome Date: Fri Feb 10 03:24:37 2006 New Revision: 376638 URL: http://svn.apache.org/viewcvs?rev=376638view=rev Log: Remove no more used POI libs Removed: lucene/nutch/trunk/src/plugin/parse-msword/lib/poi-2.1-20040508.jar lucene/nutch/trunk/src/plugin/parse-msword/lib/poi

svn commit: r376315 - /lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParseSegment.java

2006-02-09 Thread jerome
Author: jerome Date: Thu Feb 9 07:14:17 2006 New Revision: 376315 URL: http://svn.apache.org/viewcvs?rev=376315view=rev Log: Add a default constructor to avoid failure when JobConf tries to instanciate ParseSegment on parse command. (Just a workaround for HADOOP-29 issue). Modified: lucene

svn commit: r376478 - /lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java

2006-02-09 Thread jerome
Author: jerome Date: Thu Feb 9 15:12:59 2006 New Revision: 376478 URL: http://svn.apache.org/viewcvs?rev=376478view=rev Log: Ensure ParseData content metadata contains segment name and score Modified: lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java Modified: lucene/nutch

svn commit: r376522 - /lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java

2006-02-09 Thread jerome
Author: jerome Date: Thu Feb 9 17:08:15 2006 New Revision: 376522 URL: http://svn.apache.org/viewcvs?rev=376522view=rev Log: Ensure signature added in content metadata Modified: lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java Modified: lucene/nutch/trunk/src/java/org

svn commit: r375965 - in /lucene/nutch/trunk: build.xml src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/package.html

2006-02-08 Thread jerome
Author: jerome Date: Wed Feb 8 05:58:08 2006 New Revision: 375965 URL: http://svn.apache.org/viewcvs?rev=375965view=rev Log: Fix some javadoc issues with lib-http plugin Added: lucene/nutch/trunk/src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/package.html (with props

svn commit: r375984 - in /lucene/nutch/trunk/src: java/org/apache/nutch/parse/ java/org/apache/nutch/util/ plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/

2006-02-08 Thread jerome
Author: jerome Date: Wed Feb 8 07:42:44 2006 New Revision: 375984 URL: http://svn.apache.org/viewcvs?rev=375984view=rev Log: Fix some javadoc errors and warnings Modified: lucene/nutch/trunk/src/java/org/apache/nutch/parse/ParsePluginsReader.java lucene/nutch/trunk/src/java/org/apache

svn commit: r376012 - in /lucene/nutch/trunk: build.xml default.properties

2006-02-08 Thread jerome
Author: jerome Date: Wed Feb 8 10:03:01 2006 New Revision: 376012 URL: http://svn.apache.org/viewcvs?rev=376012view=rev Log: Add/Move some plugins javadoc to the Plugins group Modified: lucene/nutch/trunk/build.xml lucene/nutch/trunk/default.properties Modified: lucene/nutch/trunk

svn commit: r368172 - in /lucene/nutch/trunk: docs/ca/ docs/de/ docs/en/ docs/es/ docs/fi/ docs/fr/ docs/hu/ docs/jp/ docs/ms/ docs/nl/ docs/pl/ docs/pt/ docs/sv/ docs/th/ docs/zh/ src/java/org/apache

2006-01-11 Thread jerome
Author: jerome Date: Wed Jan 11 15:50:13 2006 New Revision: 368172 URL: http://svn.apache.org/viewcvs?rev=368172view=rev Log: Add a style for summary highlight and ellipsis Modified: lucene/nutch/trunk/docs/ca/about.html lucene/nutch/trunk/docs/ca/help.html lucene/nutch/trunk/docs/ca

svn commit: r367405 - /lucene/nutch/trunk/src/plugin/build.xml

2006-01-09 Thread jerome
Author: jerome Date: Mon Jan 9 13:45:25 2006 New Revision: 367405 URL: http://svn.apache.org/viewcvs?rev=367405view=rev Log: Remove deployment of analysis plugins (under dev) Remove protocol-http unit tests (moved to lib-http) Modified: lucene/nutch/trunk/src/plugin/build.xml Modified

svn commit: r357334 - in /lucene/nutch/trunk: conf/nutch-default.xml src/java/org/apache/nutch/protocol/Content.java src/java/org/apache/nutch/protocol/ContentProperties.java

2005-12-17 Thread jerome
Author: jerome Date: Sat Dec 17 02:06:31 2005 New Revision: 357334 URL: http://svn.apache.org/viewcvs?rev=357334view=rev Log: NUTCH-3, ContentProperties can handle multivalued properties (S. Groschupf) Modified: lucene/nutch/trunk/conf/nutch-default.xml lucene/nutch/trunk/src/java/org

svn commit: r355809 - in /lucene/nutch/trunk/src: java/org/apache/nutch/protocol/ java/org/apache/nutch/util/mime/ plugin/protocol-file/src/java/org/apache/nutch/protocol/file/ plugin/protocol-ftp/src

2005-12-10 Thread jerome
Author: jerome Date: Sat Dec 10 15:47:18 2005 New Revision: 355809 URL: http://svn.apache.org/viewcvs?rev=355809view=rev Log: Content-Type resolution enhancements: * Resolution moved from protocol plugins to Content constructor * Best content-type guessing policy * Some unit tests added Modified

svn commit: r355828 - in /lucene/nutch/trunk/src: java/org/apache/nutch/fetcher/ java/org/apache/nutch/parse/ java/org/apache/nutch/protocol/ java/org/apache/nutch/servlet/ java/org/apache/nutch/tools

2005-12-10 Thread jerome
Author: jerome Date: Sat Dec 10 16:36:57 2005 New Revision: 355828 URL: http://svn.apache.org/viewcvs?rev=355828view=rev Log: NUTCH-135 : Content metadata are now case insensitive (thanks to S. Groschupf) Added: lucene/nutch/trunk/src/java/org/apache/nutch/protocol/ContentProperties.java

svn commit: r354399 - /lucene/nutch/branches/mapred/src/java/org/apache/nutch/plugin/PluginRepository.java

2005-12-06 Thread jerome
Author: jerome Date: Tue Dec 6 02:51:06 2005 New Revision: 354399 URL: http://svn.apache.org/viewcvs?rev=354399view=rev Log: Merge from trunk 354397:354398 - Improvements in plugin circular dependencies detection Modified: lucene/nutch/branches/mapred/src/java/org/apache/nutch/plugin

svn commit: r354575 - /lucene/nutch/trunk/src/web/jsp/cached.jsp

2005-12-06 Thread jerome
Author: jerome Date: Tue Dec 6 13:42:25 2005 New Revision: 354575 URL: http://svn.apache.org/viewcvs?rev=354575view=rev Log: NUTCH-112, link to cached content changed to relative (C. Mattmann) Modified: lucene/nutch/trunk/src/web/jsp/cached.jsp Modified: lucene/nutch/trunk/src/web/jsp

svn commit: r354582 - /lucene/nutch/branches/mapred/src/web/jsp/cached.jsp

2005-12-06 Thread jerome
Author: jerome Date: Tue Dec 6 14:00:12 2005 New Revision: 354582 URL: http://svn.apache.org/viewcvs?rev=354582view=rev Log: NUTCH-112, merged from trunk Modified: lucene/nutch/branches/mapred/src/web/jsp/cached.jsp Modified: lucene/nutch/branches/mapred/src/web/jsp/cached.jsp URL: http

svn commit: r293370 - in /lucene/nutch/trunk/src: java/org/apache/nutch/parse/ParsePluginsReader.java test/org/apache/nutch/parse/TestParserFactory.java

2005-10-03 Thread jerome
Author: jerome Date: Mon Oct 3 08:56:43 2005 New Revision: 293370 URL: http://svn.apache.org/viewcvs?rev=293370view=rev Log: Change the way the parse-plugin.xml file is loaded (MalformedURLException reported by Earl Cahill) Modified: lucene/nutch/trunk/src/java/org/apache/nutch/parse

svn commit: r292035 - in /lucene/nutch/trunk: conf/ src/java/org/apache/nutch/parse/ src/test/org/apache/nutch/parse/

2005-09-27 Thread jerome
Author: jerome Date: Tue Sep 27 13:45:37 2005 New Revision: 292035 URL: http://svn.apache.org/viewcvs?rev=292035view=rev Log: NUTCH-88, First step proposal implementation (thanks to Chris Mattmann and S├ębastien Le Callonnec) Added: lucene/nutch/trunk/conf/parse-plugins.dtd (with props

svn commit: r280549 - /lucene/nutch/trunk/src/plugin/build.xml

2005-09-13 Thread jerome
Author: jerome Date: Tue Sep 13 05:52:13 2005 New Revision: 280549 URL: http://svn.apache.org/viewcvs?rev=280549view=rev Log: Sorted alphabetically for easy maintenance Modified: lucene/nutch/trunk/src/plugin/build.xml Modified: lucene/nutch/trunk/src/plugin/build.xml URL: http

svn commit: r280551 - in /lucene/nutch/trunk/src/plugin: build.xml lib-lucene-analyzers/ lib-lucene-analyzers/build.xml lib-lucene-analyzers/lib/ lib-lucene-analyzers/lib/lucene-analyzers-1.9-rc1-dev.jar lib-lucene-analyzers/plugin.xml

2005-09-13 Thread jerome
Author: jerome Date: Tue Sep 13 06:06:32 2005 New Revision: 280551 URL: http://svn.apache.org/viewcvs?rev=280551view=rev Log: Add a lib plugin for lucene analyzers Added: lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/ lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/build.xml

svn commit: r280556 - in /lucene/nutch/trunk/src/plugin: ./ analysis-de/ analysis-de/src/ analysis-de/src/java/ analysis-de/src/java/org/ analysis-de/src/java/org/apache/ analysis-de/src/java/org/apache/nutch/ analysis-de/src/java/org/apache/nutch/anal...

2005-09-13 Thread jerome
Author: jerome Date: Tue Sep 13 07:03:36 2005 New Revision: 280556 URL: http://svn.apache.org/viewcvs?rev=280556view=rev Log: French and German analyzers added Added: lucene/nutch/trunk/src/plugin/analysis-de/ lucene/nutch/trunk/src/plugin/analysis-de/build.xml (with props) lucene

svn commit: r280176 - in /lucene/nutch/trunk: conf/ src/java/org/apache/nutch/plugin/

2005-09-11 Thread jerome
Author: jerome Date: Sun Sep 11 13:28:07 2005 New Revision: 280176 URL: http://svn.apache.org/viewcvs?rev=280176view=rev Log: Automatically loads active plugins dependencies (add a property, default is on) Added: lucene/nutch/trunk/src/java/org/apache/nutch/plugin

svn commit: r280179 - in /lucene/nutch/trunk/src/plugin: clustering-carrot2/ creativecommons/ index-basic/ index-more/ languageidentifier/ ontology/ parse-ext/ parse-html/ parse-js/ parse-mp3/ parse-mspowerpoint/ parse-msword/ parse-pdf/ parse-rss/ par...

2005-09-11 Thread jerome
Author: jerome Date: Sun Sep 11 13:34:12 2005 New Revision: 280179 URL: http://svn.apache.org/viewcvs?rev=280179view=rev Log: Add a dependency to nutch-extensionpoints plugin Modified: lucene/nutch/trunk/src/plugin/clustering-carrot2/plugin.xml lucene/nutch/trunk/src/plugin

svn commit: r279286 - /lucene/nutch/trunk/conf/nutch-default.xml

2005-09-07 Thread jerome
Author: jerome Date: Wed Sep 7 02:57:24 2005 New Revision: 279286 URL: http://svn.apache.org/viewcvs?rev=279286view=rev Log: Includes protocol-httpclient plugin (instead of protocol-http) and parse-js erroneously removed during commit of revision 233492 Modified: lucene/nutch/trunk/conf

svn commit: r279027 - /lucene/nutch/trunk/src/plugin/build.xml

2005-09-06 Thread jerome
Author: jerome Date: Tue Sep 6 09:02:26 2005 New Revision: 279027 URL: http://svn.apache.org/viewcvs?rev=279027view=rev Log: lib-jakarta-poi added to the list of plugin de deploy/clean Modified: lucene/nutch/trunk/src/plugin/build.xml Modified: lucene/nutch/trunk/src/plugin/build.xml URL

svn commit: r278626 - in /lucene/nutch/trunk/src/plugin: ./ parse-zip/ parse-zip/sample/ parse-zip/src/ parse-zip/src/java/ parse-zip/src/java/org/ parse-zip/src/java/org/apache/ parse-zip/src/java/org/apache/nutch/ parse-zip/src/java/org/apache/nutch/...

2005-09-04 Thread jerome
Author: jerome Date: Sun Sep 4 13:53:49 2005 New Revision: 278626 URL: http://svn.apache.org/viewcvs?rev=278626view=rev Log: NUTCH-53, Parser plugin for Zip files (Rohit Kulkarni) Added: lucene/nutch/trunk/src/plugin/parse-zip/ lucene/nutch/trunk/src/plugin/parse-zip/build.xml

svn commit: r265794 - in /lucene/nutch/trunk: lib/commons-lang-2.1.jar src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java

2005-09-01 Thread jerome
Author: jerome Date: Thu Sep 1 15:20:51 2005 New Revision: 265794 URL: http://svn.apache.org/viewcvs?rev=265794view=rev Log: NUTCH-65, Handles more modification-date format Added: lucene/nutch/trunk/lib/commons-lang-2.1.jar (with props) Modified: lucene/nutch/trunk/src/plugin/index

svn commit: r264964 - /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSegment.java

2005-08-31 Thread jerome
Author: jerome Date: Wed Aug 31 01:04:52 2005 New Revision: 264964 URL: http://svn.apache.org/viewcvs?rev=264964view=rev Log: No more NullPointerException while logging the doc language if none Modified: lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSegment.java Modified: lucene

svn commit: r265020 - /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java

2005-08-31 Thread jerome
Author: jerome Date: Wed Aug 31 04:38:28 2005 New Revision: 265020 URL: http://svn.apache.org/viewcvs?rev=265020view=rev Log: Fixes some typo (analySer = analyZer) Modified: lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java Modified: lucene/nutch/trunk/src/java/org

svn commit: r265503 - in /lucene/nutch/trunk/src: java/org/apache/nutch/clustering/ java/org/apache/nutch/fs/ java/org/apache/nutch/mapReduce/ java/org/apache/nutch/parse/ java/org/apache/nutch/protoc

2005-08-31 Thread jerome
Author: jerome Date: Wed Aug 31 08:17:11 2005 New Revision: 265503 URL: http://svn.apache.org/viewcvs?rev=265503view=rev Log: Merged 0.7 branch changes 240321:240453 into trunk Modified: lucene/nutch/trunk/src/java/org/apache/nutch/clustering/OnlineClusterer.java lucene/nutch/trunk/src

svn commit: r240254 - in /lucene/nutch/tags/Release-0.7/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang: HTMLLanguageParser.java LanguageIdentifier.java LanguageIndexingFilter.java LanguageQueryFilter.java NGramProfile.java

2005-08-26 Thread jerome
Author: jerome Date: Fri Aug 26 07:54:16 2005 New Revision: 240254 URL: http://svn.apache.org/viewcvs?rev=240254view=rev Log: Javadoc updates, corrections on input stream reading Modified: lucene/nutch/tags/Release-0.7/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang

svn commit: r240345 - in /lucene/nutch/branches/Release-0.7/src: java/org/apache/nutch/clustering/ java/org/apache/nutch/fs/ java/org/apache/nutch/mapReduce/ java/org/apache/nutch/parse/ java/org/apache/nutch/protocol/ java/org/apache/nutch/searcher/ j...

2005-08-26 Thread jerome
Author: jerome Date: Fri Aug 26 14:03:19 2005 New Revision: 240345 URL: http://svn.apache.org/viewcvs?rev=240345view=rev Log: NUTCH-37, no more javadoc warnings Modified: lucene/nutch/branches/Release-0.7/src/java/org/apache/nutch/clustering/OnlineClusterer.java lucene/nutch/branches

svn commit: r240359 - in /lucene/nutch/trunk/src: java/org/apache/nutch/analysis/ java/org/apache/nutch/indexer/ plugin/nutch-extensionpoints/

2005-08-26 Thread jerome
Author: jerome Date: Fri Aug 26 15:47:04 2005 New Revision: 240359 URL: http://svn.apache.org/viewcvs?rev=240359view=rev Log: Add an analysis extension point Added: lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java (with props) lucene/nutch/trunk/src/java/org

svn commit: r233492 - in /lucene/nutch/trunk: conf/ src/plugin/ src/plugin/clustering-carrot2/ src/plugin/creativecommons/ src/plugin/index-basic/ src/plugin/index-more/ src/plugin/languageidentifier/ src/plugin/nutch-extensionpoints/ src/plugin/nutch-...

2005-08-19 Thread jerome
Author: jerome Date: Fri Aug 19 08:55:46 2005 New Revision: 233492 URL: http://svn.apache.org/viewcvs?rev=233492view=rev Log: NUTCH-10, extension points defined only once (Stefan Grroschupf) Added: lucene/nutch/trunk/src/plugin/nutch-extensionpoints/ lucene/nutch/trunk/src/plugin/nutch

svn commit: r233544 - /lucene/nutch/trunk/src/plugin/languageidentifier/src/test/org/apache/nutch/analysis/lang/TestLanguageIdentifier.java

2005-08-19 Thread jerome
Author: jerome Date: Fri Aug 19 12:26:14 2005 New Revision: 233544 URL: http://svn.apache.org/viewcvs?rev=233544view=rev Log: Correction in LanguageIdentifier unit test Modified: lucene/nutch/trunk/src/plugin/languageidentifier/src/test/org/apache/nutch/analysis/lang

svn commit: r233559 - in /lucene/nutch/trunk/src: java/org/apache/nutch/parse/ plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ plugin/parse-msword/src/java/org/apache/nutch/parse/msword/ plugin/parse-pdf/src/java/org/apache/nutch/parse/pdf/ plugi...

2005-08-19 Thread jerome
Author: jerome Date: Fri Aug 19 14:15:02 2005 New Revision: 233559 URL: http://svn.apache.org/viewcvs?rev=233559view=rev Log: * Add utility to extract urls from plain text (Stephan Strittmatter) * Uses the OutlinkExtractor in parse plugins PDF, MSWord, Text, RTF, Ext Added: lucene/nutch

svn commit: r233312 - in /lucene/nutch/trunk: docs/ca/ docs/de/ docs/en/ docs/es/ docs/fi/ docs/fr/ docs/hu/ docs/jp/ docs/ms/ docs/nl/ docs/pl/ docs/pt/ docs/sv/ docs/th/ docs/zh/ src/web/jsp/ src/web/style/

2005-08-18 Thread jerome
Author: jerome Date: Thu Aug 18 05:17:22 2005 New Revision: 233312 URL: http://svn.apache.org/viewcvs?rev=233312view=rev Log: Gives focus to search query input Modified: lucene/nutch/trunk/docs/ca/about.html lucene/nutch/trunk/docs/ca/help.html lucene/nutch/trunk/docs/ca/search.html

svn commit: r233140 - in /lucene/nutch/trunk: site/credits.html site/credits.pdf src/site/src/documentation/content/xdocs/credits.xml

2005-08-17 Thread jerome
Author: jerome Date: Wed Aug 17 02:39:29 2005 New Revision: 233140 URL: http://svn.apache.org/viewcvs?rev=233140view=rev Log: First commit test - Add myself to the list of committers Modified: lucene/nutch/trunk/site/credits.html lucene/nutch/trunk/site/credits.pdf lucene/nutch/trunk