Re: [Nutch-cvs] svn commit: r280179 - in /lucene/nutch/trunk/src/plugin: clustering-carrot2/ creativecommons/ index-basic/ index-more/ languageidentifier/ ontology/ parse-ext/ parse-html/ parse-js/ parse-mp3/ parse-mspowerpoint/ parse-msword/ parse-pdf/ parse-rss/ par...

2005-09-13 Thread Andrzej Bialecki

[EMAIL PROTECTED] wrote:

Author: jerome
Date: Sun Sep 11 13:34:12 2005
New Revision: 280179

URL: http://svn.apache.org/viewcvs?rev=280179view=rev
Log:
Add a dependency to nutch-extensionpoints plugin


Looks like something broke after this commit. When I run a nutch crawl 
using the out-of-the-box configuration I get the following (with logging 
turned to ALL):


050913 125223 not including: creativecommons
050913 125223 not including: parse-pdf
050913 125223 not including: parse-ext
050913 125223 not including: ontology
050913 125223 not including: protocol-ftp
050913 125223 not including: protocol-http
050913 125223 not including: parse-zip
050913 125223 not including: nutch-extensionpoints
050913 125223 not including: index-more
050913 125223 not including: clustering-carrot2
050913 125223 not including: query-more
050913 125223 not including: language-identifier
050913 125223 not including: urlfilter-prefix
050913 125223 not including: parse-mspowerpoint
050913 125223 not including: parse-msword
050913 125223 not including: protocol-file
050913 125223 not including: lib-jakarta-poi
050913 125223 not including: parse-rss
050913 125223 Missing dependency nutch-extensionpoints for plugin query-url
050913 125223 Missing dependency nutch-extensionpoints for plugin query-site
050913 125223 Missing dependency nutch-extensionpoints for plugin 
protocol-httpc

lient
050913 125223 Missing dependency nutch-extensionpoints for plugin parse-html
050913 125223 Missing dependency nutch-extensionpoints for plugin 
index-basic

050913 125223 Missing dependency nutch-extensionpoints for plugin parse-text
050913 125223 Missing dependency nutch-extensionpoints for plugin parse-js
050913 125223 Missing dependency nutch-extensionpoints for plugin 
query-basic
050913 125223 Missing dependency nutch-extensionpoints for plugin 
urlfilter-rege

x
050913 125223 Plugin Auto-activation mode: [false]
050913 125223 Registered Plugins:
050913 125223   NONE
050913 125223 Registered Extension-Points:
050913 125223   NONE
Exception in thread main java.lang.ExceptionInInitializerError
at 
org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
at 
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:37

8)
at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter 
not found.


at org.apache.nutch.net.URLFilters.clinit(URLFilters.java:44)
... 4 more


--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



svn commit: r280549 - /lucene/nutch/trunk/src/plugin/build.xml

2005-09-13 Thread jerome
Author: jerome
Date: Tue Sep 13 05:52:13 2005
New Revision: 280549

URL: http://svn.apache.org/viewcvs?rev=280549view=rev
Log:
Sorted alphabetically for easy maintenance

Modified:
lucene/nutch/trunk/src/plugin/build.xml

Modified: lucene/nutch/trunk/src/plugin/build.xml
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/build.xml?rev=280549r1=280548r2=280549view=diff
==
--- lucene/nutch/trunk/src/plugin/build.xml (original)
+++ lucene/nutch/trunk/src/plugin/build.xml Tue Sep 13 05:52:13 2005
@@ -6,89 +6,89 @@
   !-- Build  deploy all the plugin jars.--
   !-- == --
   target name=deploy
+ ant dir=clustering-carrot2 target=deploy/
+ ant dir=creativecommons target=deploy/
+ ant dir=index-basic target=deploy/
+ ant dir=index-more target=deploy/
+ ant dir=languageidentifier target=deploy/
  ant dir=lib-jakarta-poi target=deploy/
  ant dir=nutch-extensionpoints target=deploy/
+ ant dir=ontology target=deploy/
  ant dir=protocol-file target=deploy/
  ant dir=protocol-ftp target=deploy/
  ant dir=protocol-http target=deploy/
  ant dir=protocol-httpclient target=deploy/
+ ant dir=parse-ext target=deploy/
  ant dir=parse-html target=deploy/
  ant dir=parse-js target=deploy/
- ant dir=parse-text target=deploy/
+ !-- ant dir=parse-mp3 target=deploy/ --
+ ant dir=parse-mspowerpoint target=deploy/
+ ant dir=parse-msword target=deploy/
  ant dir=parse-pdf target=deploy/
  ant dir=parse-rss target=deploy/
- ant dir=parse-msword target=deploy/
- ant dir=parse-mspowerpoint target=deploy/
-!-- ant dir=parse-mp3 target=deploy/ --
-!-- ant dir=parse-rtf target=deploy/ --
- ant dir=parse-ext target=deploy/
+ !-- ant dir=parse-rtf target=deploy/ --
+ ant dir=parse-text target=deploy/
  ant dir=parse-zip target=deploy/
- ant dir=index-basic target=deploy/
- ant dir=index-more target=deploy/
  ant dir=query-basic target=deploy/
  ant dir=query-more target=deploy/
  ant dir=query-site target=deploy/
  ant dir=query-url target=deploy/
- ant dir=urlfilter-regex target=deploy/
  ant dir=urlfilter-prefix target=deploy/
- ant dir=creativecommons target=deploy/
- ant dir=languageidentifier target=deploy/
- ant dir=clustering-carrot2 target=deploy/
- ant dir=ontology target=deploy/
+ ant dir=urlfilter-regex target=deploy/
   /target
 
   !-- == --
   !-- Test all of the plugins.   --
   !-- == --
   target name=test
+ ant dir=creativecommons target=test/
+ ant dir=languageidentifier target=test/
+ ant dir=ontology target=test/
  ant dir=protocol-http target=test/
+ ant dir=parse-ext target=test/
  ant dir=parse-html target=test/
+ !-- ant dir=parse-mp3 target=test/ --
+ ant dir=parse-mspowerpoint target=test/
+ ant dir=parse-msword target=test/
  ant dir=parse-pdf target=test/
  ant dir=parse-rss target=test/
- ant dir=parse-msword target=test/
- ant dir=parse-mspowerpoint target=test/
- !-- ant dir=parse-mp3 target=test/ --
  !-- ant dir=parse-rtf target=test/ --
- ant dir=parse-ext target=test/
  ant dir=parse-zip target=test/
- ant dir=creativecommons target=test/
- ant dir=languageidentifier target=test/
- ant dir=ontology target=test/
   /target
 
   !-- == --
   !-- Clean all of the plugins.  --
   !-- == --
   target name=clean
+ant dir=clustering-carrot2 target=clean/
+ant dir=creativecommons target=clean/
+ant dir=index-basic target=clean/
+ant dir=index-more target=clean/
+ant dir=languageidentifier target=clean/
 ant dir=lib-jakarta-poi target=clean/
 ant dir=nutch-extensionpoints target=clean/
+ant dir=ontology target=clean/
 ant dir=protocol-file target=clean/
 ant dir=protocol-ftp target=clean/
 ant dir=protocol-http target=clean/
 ant dir=protocol-httpclient target=clean/
+ant dir=parse-ext target=clean/
 ant dir=parse-html target=clean/
 ant dir=parse-js target=clean/
-ant dir=parse-text target=clean/
+ant dir=parse-mp3 target=clean/
+ant dir=parse-mspowerpoint target=clean/
+ant dir=parse-msword target=clean/
 ant dir=parse-pdf target=clean/
 ant dir=parse-rss target=clean/
-ant dir=parse-msword target=clean/
-ant dir=parse-mspowerpoint target=clean/
-ant dir=parse-mp3 target=clean/
 ant dir=parse-rtf target=clean/
-ant dir=parse-ext target=clean/
+ant dir=parse-text target=clean/
 ant dir=parse-zip target=clean/
-ant dir=index-basic target=clean/
-

svn commit: r280551 - in /lucene/nutch/trunk/src/plugin: build.xml lib-lucene-analyzers/ lib-lucene-analyzers/build.xml lib-lucene-analyzers/lib/ lib-lucene-analyzers/lib/lucene-analyzers-1.9-rc1-dev.jar lib-lucene-analyzers/plugin.xml

2005-09-13 Thread jerome
Author: jerome
Date: Tue Sep 13 06:06:32 2005
New Revision: 280551

URL: http://svn.apache.org/viewcvs?rev=280551view=rev
Log:
Add a lib plugin for lucene analyzers

Added:
lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/
lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/build.xml   (with props)
lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/lib/

lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/lib/lucene-analyzers-1.9-rc1-dev.jar
   (with props)
lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/plugin.xml   (with props)
Modified:
lucene/nutch/trunk/src/plugin/build.xml

Modified: lucene/nutch/trunk/src/plugin/build.xml
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/build.xml?rev=280551r1=280550r2=280551view=diff
==
--- lucene/nutch/trunk/src/plugin/build.xml (original)
+++ lucene/nutch/trunk/src/plugin/build.xml Tue Sep 13 06:06:32 2005
@@ -12,6 +12,7 @@
  ant dir=index-more target=deploy/
  ant dir=languageidentifier target=deploy/
  ant dir=lib-jakarta-poi target=deploy/
+ ant dir=lib-lucene-analyzers target=deploy/
  ant dir=nutch-extensionpoints target=deploy/
  ant dir=ontology target=deploy/
  ant dir=protocol-file target=deploy/
@@ -66,6 +67,7 @@
 ant dir=index-more target=clean/
 ant dir=languageidentifier target=clean/
 ant dir=lib-jakarta-poi target=clean/
+ant dir=lib-lucene-analyzers target=clean/
 ant dir=nutch-extensionpoints target=clean/
 ant dir=ontology target=clean/
 ant dir=protocol-file target=clean/

Added: lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/build.xml
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/build.xml?rev=280551view=auto
==
--- lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/build.xml (added)
+++ lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/build.xml Tue Sep 13 
06:06:32 2005
@@ -0,0 +1,17 @@
+?xml version=1.0?
+
+project name=lib-lucene-analyzers default=jar
+
+  import file=../build-plugin.xml/
+
+  !--
+   ! Override the compile and jar targets,
+   ! since there is nothing to compile here.
+   ! --
+  target name=compile depends=init
+echo message=Compiling plugin: ${name}/
+  /target
+
+  target name=jar depends=compile/
+
+/project

Propchange: lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/build.xml
--
svn:eol-style = native

Added: 
lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/lib/lucene-analyzers-1.9-rc1-dev.jar
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/lib/lucene-analyzers-1.9-rc1-dev.jar?rev=280551view=auto
==
Binary file - no diff available.

Propchange: 
lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/lib/lucene-analyzers-1.9-rc1-dev.jar
--
svn:mime-type = application/octet-stream

Added: lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/plugin.xml
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/plugin.xml?rev=280551view=auto
==
--- lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/plugin.xml (added)
+++ lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/plugin.xml Tue Sep 13 
06:06:32 2005
@@ -0,0 +1,21 @@
+?xml version=1.0 encoding=UTF-8?
+!--
+ ! Lucene Analyzers
+ ! (http://lucene.apache.org/java/docs/lucene-sandbox/)
+ !
+ ! Dowload : http://www.apache.org/dyn/closer.cgi/jakarta/lucene/binaries/
+ ! License : http://www.apache.org/licenses/LICENSE-2.0.txt
+ !--
+plugin
+   id=lib-lucene-analyzers
+   name=Lucene Analysers
+   version=1.9-rc1-dev
+   provider-name=org.apache.lucene
+
+   runtime
+ library name=lucene-analyzers-1.9-rc1-dev.jar
+export name=*/
+ /library
+   /runtime
+
+/plugin

Propchange: lucene/nutch/trunk/src/plugin/lib-lucene-analyzers/plugin.xml
--
svn:eol-style = native




svn commit: r280556 - in /lucene/nutch/trunk/src/plugin: ./ analysis-de/ analysis-de/src/ analysis-de/src/java/ analysis-de/src/java/org/ analysis-de/src/java/org/apache/ analysis-de/src/java/org/apache/nutch/ analysis-de/src/java/org/apache/nutch/anal...

2005-09-13 Thread jerome
Author: jerome
Date: Tue Sep 13 07:03:36 2005
New Revision: 280556

URL: http://svn.apache.org/viewcvs?rev=280556view=rev
Log:
French and German analyzers added

Added:
lucene/nutch/trunk/src/plugin/analysis-de/
lucene/nutch/trunk/src/plugin/analysis-de/build.xml   (with props)
lucene/nutch/trunk/src/plugin/analysis-de/plugin.xml   (with props)
lucene/nutch/trunk/src/plugin/analysis-de/src/
lucene/nutch/trunk/src/plugin/analysis-de/src/java/
lucene/nutch/trunk/src/plugin/analysis-de/src/java/org/
lucene/nutch/trunk/src/plugin/analysis-de/src/java/org/apache/
lucene/nutch/trunk/src/plugin/analysis-de/src/java/org/apache/nutch/

lucene/nutch/trunk/src/plugin/analysis-de/src/java/org/apache/nutch/analysis/

lucene/nutch/trunk/src/plugin/analysis-de/src/java/org/apache/nutch/analysis/de/

lucene/nutch/trunk/src/plugin/analysis-de/src/java/org/apache/nutch/analysis/de/GermanAnalyzer.java
   (with props)
lucene/nutch/trunk/src/plugin/analysis-fr/
lucene/nutch/trunk/src/plugin/analysis-fr/build.xml   (with props)
lucene/nutch/trunk/src/plugin/analysis-fr/plugin.xml   (with props)
lucene/nutch/trunk/src/plugin/analysis-fr/src/
lucene/nutch/trunk/src/plugin/analysis-fr/src/java/
lucene/nutch/trunk/src/plugin/analysis-fr/src/java/org/
lucene/nutch/trunk/src/plugin/analysis-fr/src/java/org/apache/
lucene/nutch/trunk/src/plugin/analysis-fr/src/java/org/apache/nutch/

lucene/nutch/trunk/src/plugin/analysis-fr/src/java/org/apache/nutch/analysis/

lucene/nutch/trunk/src/plugin/analysis-fr/src/java/org/apache/nutch/analysis/fr/

lucene/nutch/trunk/src/plugin/analysis-fr/src/java/org/apache/nutch/analysis/fr/FrenchAnalyzer.java
   (with props)
Modified:
lucene/nutch/trunk/src/plugin/build.xml

Added: lucene/nutch/trunk/src/plugin/analysis-de/build.xml
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/analysis-de/build.xml?rev=280556view=auto
==
--- lucene/nutch/trunk/src/plugin/analysis-de/build.xml (added)
+++ lucene/nutch/trunk/src/plugin/analysis-de/build.xml Tue Sep 13 07:03:36 2005
@@ -0,0 +1,13 @@
+?xml version=1.0?
+
+project name=analysis-de default=jar
+
+  import file=../build-plugin.xml/
+
+  path id=plugin.deps
+fileset dir=../lib-lucene-analyzers/lib
+  include name=*.jar /
+/fileset
+  /path
+
+/project

Propchange: lucene/nutch/trunk/src/plugin/analysis-de/build.xml
--
svn:eol-style = native

Added: lucene/nutch/trunk/src/plugin/analysis-de/plugin.xml
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/analysis-de/plugin.xml?rev=280556view=auto
==
--- lucene/nutch/trunk/src/plugin/analysis-de/plugin.xml (added)
+++ lucene/nutch/trunk/src/plugin/analysis-de/plugin.xml Tue Sep 13 07:03:36 
2005
@@ -0,0 +1,29 @@
+?xml version=1.0 encoding=UTF-8?
+plugin
+   id=analysis-de
+   name=German Analysis Plug-in
+   version=1.0.0
+   provider-name=org.apache.nutch
+
+   runtime
+  library name=analysis-de.jar
+ export name=*/
+  /library
+   /runtime
+
+   requires
+  import plugin=nutch-extensionpoints/
+  import plugin=lib-lucene-analyzers/
+   /requires
+
+   extension id=org.apache.nutch.analysis.de
+  name=GermanAnalyzer
+  point=org.apache.nutch.analysis.NutchAnalyzer
+
+  implementation id=org.apache.nutch.analysis.de.GermanAnalyzer
+  class=org.apache.nutch.analysis.de.GermanAnalyzer
+  lang=de/
+
+   /extension
+
+/plugin

Propchange: lucene/nutch/trunk/src/plugin/analysis-de/plugin.xml
--
svn:eol-style = native

Added: 
lucene/nutch/trunk/src/plugin/analysis-de/src/java/org/apache/nutch/analysis/de/GermanAnalyzer.java
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/analysis-de/src/java/org/apache/nutch/analysis/de/GermanAnalyzer.java?rev=280556view=auto
==
--- 
lucene/nutch/trunk/src/plugin/analysis-de/src/java/org/apache/nutch/analysis/de/GermanAnalyzer.java
 (added)
+++ 
lucene/nutch/trunk/src/plugin/analysis-de/src/java/org/apache/nutch/analysis/de/GermanAnalyzer.java
 Tue Sep 13 07:03:36 2005
@@ -0,0 +1,48 @@
+/**
+ * Copyright 2005 The Apache Software Foundation
+ *
+ * Licensed under the Apache License, Version 2.0 (the License);
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY