svn commit: r376638 - in /lucene/nutch/trunk/src/plugin/parse-msword: lib/poi-2.1-20040508.jar lib/poi-scratchpad-2.1-20040508.jar plugin.xml

2006-02-10 Thread jerome
Author: jerome
Date: Fri Feb 10 03:24:37 2006
New Revision: 376638

URL: http://svn.apache.org/viewcvs?rev=376638view=rev
Log:
Remove no more used POI libs

Removed:
lucene/nutch/trunk/src/plugin/parse-msword/lib/poi-2.1-20040508.jar

lucene/nutch/trunk/src/plugin/parse-msword/lib/poi-scratchpad-2.1-20040508.jar
Modified:
lucene/nutch/trunk/src/plugin/parse-msword/plugin.xml

Modified: lucene/nutch/trunk/src/plugin/parse-msword/plugin.xml
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/plugin/parse-msword/plugin.xml?rev=376638r1=376637r2=376638view=diff
==
--- lucene/nutch/trunk/src/plugin/parse-msword/plugin.xml (original)
+++ lucene/nutch/trunk/src/plugin/parse-msword/plugin.xml Fri Feb 10 03:24:37 
2006
@@ -9,8 +9,6 @@
   library name=parse-msword.jar
  export name=*/
   /library
-  library name=poi-2.1-20040508.jar/
-  library name=poi-scratchpad-2.1-20040508.jar/
/runtime
 
requires




[Nutch Wiki] Update of GettingNutchRunningWithRedHatApplicationServer by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/GettingNutchRunningWithRedHatApplicationServer

--
  This was written for Red``Hat Application Server 1.0.  There may be new 
versions of the rpms listed, so you might need to translate the version number 
from what is listed here.  I'm assuming that you're using the Red``Hat Network 
to get these packages.  That should help you when it comes to getting all the 
dependencies.
  
- == Packages to Install ==
+ == Packages to Install from Red``Hat Application Server Channel ==
  
* ant-1.6.2-3jpp_3rh.noarch.rpm
* ant-apache-regexp-1.6.2-3jpp_3rh.noarch.rpm
@@ -48, +48 @@

* xml-commons-apis-1.0-0.b2.6jpp_3rh.noarch.rpm
* xml-commons-resolver-1.1-1jpp_2rh.noarch.rpm
  
+ == Packages to Install from Extras Channel ==
+ 
+   * java-1.4.2-ibm-1.4.2.2-1jpp_9rh
+   * java-1.4.2-ibm-devel-1.4.2.2-1jpp_9rh
+ 
  One big caveat is that the version of ant that comes from Red``Hat throws an 
error when you try to compile a war file.  I suggest building ant from source.  
I'll add some hints about doing that.
  
   FrontPage


[Nutch Wiki] Update of GettingNutchRunningWithRedHatApplicationServer by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/GettingNutchRunningWithRedHatApplicationServer

--
  This was written for Red``Hat Application Server 1.0.  There may be new 
versions of the rpms listed, so you might need to translate the version number 
from what is listed here.  I'm assuming that you're using the Red``Hat Network 
to get these packages.  That should help you when it comes to getting all the 
dependencies.
  
  == Packages to Install from Red``Hat Application Server Channel ==
+ 
+ One big caveat is that the version of ant that comes from Red``Hat throws an 
error when you try to compile a war file.  I suggest building ant from source.  
I'll add some hints about doing that.
  
* ant-1.6.2-3jpp_3rh.noarch.rpm
* ant-apache-regexp-1.6.2-3jpp_3rh.noarch.rpm
@@ -50, +52 @@

  
  == Packages to Install from Extras Channel ==
  
+ This assumes you want to use IBM's Java.
+ 
* java-1.4.2-ibm-1.4.2.2-1jpp_9rh
* java-1.4.2-ibm-devel-1.4.2.2-1jpp_9rh
  
- One big caveat is that the version of ant that comes from Red``Hat throws an 
error when you try to compile a war file.  I suggest building ant from source.  
I'll add some hints about doing that.
  
   FrontPage
  


[Nutch Wiki] Update of GettingNutchRunningWithRedHatApplicationServer by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/GettingNutchRunningWithRedHatApplicationServer

--
  This was written for Red``Hat Application Server 1.0.  There may be new 
versions of the rpms listed, so you might need to translate the version number 
from what is listed here.  I'm assuming that you're using the Red``Hat Network 
to get these packages.  That should help you when it comes to getting all the 
dependencies.
+ 
+ These instructions have you installing software from multiple channels.  If 
your server was not previously subscribed to one or both of these chanels, make 
sure that you have rhn update the package list for the server after you've 
added the subscriptions to these channels.
  
  == Packages to Install from Red``Hat Application Server Channel ==
  


[Nutch Wiki] Update of GettingNutchRunningWithRedHatApplicationServer by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/GettingNutchRunningWithRedHatApplicationServer

--
* java-1.4.2-ibm-1.4.2.2-1jpp_9rh
* java-1.4.2-ibm-devel-1.4.2.2-1jpp_9rh
  
+ == Replacing Ant ==
+ 
+ When I tried to build the nutch war file using the ant rpm listed above I got 
errors about XslpLiaison.  In the end I installed ant from scratch and got it 
to work for me.  Here's what I did:
+ 
+   1. Download the ant source from http://ant.apache.org/
+   
  
   FrontPage
  


[Nutch Wiki] Update of GettingNutchRunningWithRedHatApplicationServer by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/GettingNutchRunningWithRedHatApplicationServer

--
  When I tried to build the nutch war file using the ant rpm listed above I got 
errors about XslpLiaison.  In the end I installed ant from scratch and got it 
to work for me.  Here's what I did:
  
1. Download the ant source from http://ant.apache.org/
+   1. Verify the version you downloaded is legit (they offer pgp and other 
signatures).
+   1. Extract the package and copy it to /usr/share (creating 
/usr/share/apache-ant-[version])

  
   FrontPage


[Nutch Wiki] Update of GettingNutchRunningWithRedHatApplicationServer by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/GettingNutchRunningWithRedHatApplicationServer

--
  
1. Download the ant source from http://ant.apache.org/
1. Verify the version you downloaded is legit (they offer pgp and other 
signatures).
-   1. Extract the package and copy it to /usr/share (creating 
/usr/share/apache-ant-[version])
+   1. Extract the package and copy it to /usr/share (creating 
/usr/share/apache-ant-[version]).
+   1. Replace the original version of ant with the fresh one.  As root:
+  1.  cd /usr/bin
+  1.  mv ant ant_orig
+  1.  ln -s /usr/share/apache-ant-1.6.5/bin/ant ./

  
   FrontPage


[Nutch Wiki] Update of GettingNutchRunningWithRedHatApplicationServer by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/GettingNutchRunningWithRedHatApplicationServer

--
   1.  cd /usr/bin
   1.  mv ant ant_orig
   1.  ln -s /usr/share/apache-ant-1.6.5/bin/ant ./
+  1.  mv /etc/ant.conf /etc/ant.conf_rpm_orig

  
   FrontPage


svn commit: r376815 - /lucene/nutch/trunk/lib/hadoop-0.1-dev.jar

2006-02-10 Thread cutting
Author: cutting
Date: Fri Feb 10 11:44:47 2006
New Revision: 376815

URL: http://svn.apache.org/viewcvs?rev=376815view=rev
Log:
Update Hadoop jar.

Modified:
lucene/nutch/trunk/lib/hadoop-0.1-dev.jar

Modified: lucene/nutch/trunk/lib/hadoop-0.1-dev.jar
URL: 
http://svn.apache.org/viewcvs/lucene/nutch/trunk/lib/hadoop-0.1-dev.jar?rev=376815r1=376814r2=376815view=diff
==
Binary files - no diff available.




[Nutch Wiki] Update of FrontPage by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/FrontPage

--
   * AcademicArticles that deal with Nutch
  
  == Nutch Administration ==
-  * [http://www.apache.org/dyn/closer.cgi/lucene/nutch/ Download Nutch]
+  * DownloadingNutch
   * HardwareRequirements
   * [http://lucene.apache.org/nutch/tutorial.html Tutorial] -- A Step-by-Step 
guide to getting Nutch up and running.
   * [FAQ]


[Nutch Wiki] Update of GettingNutchRunningWithRedHatApplicationServer by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/GettingNutchRunningWithRedHatApplicationServer

--
   1.  mv /etc/ant.conf /etc/ant.conf_rpm_orig

  
+ Now you can download nutch (see DownloadingNutch) and run through the 
tutorial (http://lucene.apache.org/nutch/tutorial.html).  One thing to note is 
that you'll want to put the segments directory under /usr/share/tomcat5.  Once 
it's there, you can start tomcat with the command service tomcat5 start.
+ 
   FrontPage
  


[Nutch Wiki] Update of FrontPage by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/FrontPage

--
   * CommandLineOptions for the nutch shell script.
   * MultiLingualSupport - ''In development''.
   * OverviewDeploymentConfigs
-  * GettingNutchRunningWithUtf8
+  * GettingNutchRunningWithUtf8 - This is needed to support non-ASII character 
sets. In particular Chinese, Japanese and Korean
   * GettingNutchRunningWithResin
   * GettingNutchRunningWithUbuntu
   * GettingNutchRunningWithWindows


[Nutch Wiki] Update of FrontPage by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/FrontPage

--
   * CommandLineOptions for the nutch shell script.
   * MultiLingualSupport - ''In development''.
   * OverviewDeploymentConfigs
-  * GettingNutchRunningWithUtf8 - This is needed to support non-ASII character 
sets. In particular Chinese, Japanese and Korean
+  * GettingNutchRunningWithUtf8 - For support of non-ASCII characters 
(Chinese, Japanese and Korean).
   * GettingNutchRunningWithResin - Resin is a JSP/Servlet/EJB application 
server (alternative to tomcat).
   * GettingNutchRunningWithUbuntu
   * GettingNutchRunningWithWindows


[Nutch Wiki] Update of PluginCentral by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/PluginCentral

--
  
   * index-basic - Adds url, content and anchor fields to the index.
   * index-more - Adds date, content-length, contentType, primaryType and 
subtype fields to the index.
-  * languageidentifier - Adds a lang field to the index.
+  * [wiki:MultiLingualSupport languageidentifier] - Adds a lang field to the 
index and allows you to query against it.
   * [wiki:OntologyPlugin ontology] - Helps refine queries based on owl files.
   * parse-ext - A wrapper that invokes external command to do real parsing job.
   * parse-html - Parses HTML documents


[Nutch Wiki] Update of PluginCentral by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/PluginCentral

--
  
   * index-basic - Adds url, content and anchor fields to the index.
   * index-more - Adds date, content-length, contentType, primaryType and 
subtype fields to the index.
-  * [wiki:MultiLingualSupport languageidentifier] - Adds a lang field to the 
index and allows you to query against it.
+  * languageidentifier - Adds a lang field to the index and allows you to 
query against it.
   * [wiki:OntologyPlugin ontology] - Helps refine queries based on owl files.
   * parse-ext - A wrapper that invokes external command to do real parsing job.
   * parse-html - Parses HTML documents


[Nutch Wiki] Update of FrontPage by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/FrontPage

--
   * [http://lucene.apache.org/nutch/tutorial.html Tutorial] -- A Step-by-Step 
guide to getting Nutch up and running.
   * [FAQ]
   * CommandLineOptions for the nutch shell script.
-  * MultiLingualSupport - ''In development''.
   * OverviewDeploymentConfigs
   * GettingNutchRunningWithUtf8 - For support of non-ASCII characters 
(Chinese, Japanese and Korean).
   * GettingNutchRunningWithResin - Resin is a JSP/Servlet/EJB application 
server (alternative to tomcat).
@@ -35, +34 @@

   * InternalDocumentation -- How Nutch works.
   * [http://lucene.apache.org/nutch/apidocs/index.html JavaDocs] -- The 
!JavaDocs for Nutch.
   * [http://lucene.apache.org/nutch/version_control.html Nutch Version Control]
+  * MultiLingualSupport - ''In development''.
   * HowToContribute
   * TaskList -- Tasks for Nutch developers.
   * [Development] -- More tasks for Nutch developers.


[Nutch Wiki] Update of Search Theory by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/Search_Theory

--
  
  [http://www.almaden.ibm.com/cs/k53/clever.html IBM Clever Project] Dozens of 
great papers on search processing, link mapping, intelligent spidering 
processes.
  
- [http://www.cse.lehigh.edu/~brian/pubs/1999/www8/ DiscoWeb] Applying link 
analysis to the web. DiscoWeb is what became of WebSpider and eventually the 
likes of Alta Vista  Teoma
+ [http://www.cse.lehigh.edu/~brian/pubs/1999/www8/ DiscoWeb] Applying link 
analysis to the web. Disco''Web is what became of WebSpider and eventually the 
likes of Alta Vista  Teoma
  
  [http://www.cs.toronto.edu/~georgem/hilltop/ Hilltop] Search Engine based on 
expert Documents.
  


[Nutch Wiki] Update of Search Theory by JakeVanderdray

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JakeVanderdray:
http://wiki.apache.org/nutch/Search_Theory

--
  
  [http://www.almaden.ibm.com/cs/k53/clever.html IBM Clever Project] Dozens of 
great papers on search processing, link mapping, intelligent spidering 
processes.
  
- [http://www.cse.lehigh.edu/~brian/pubs/1999/www8/ DiscoWeb] Applying link 
analysis to the web. Disco''Web is what became of WebSpider and eventually the 
likes of Alta Vista  Teoma
+ [http://www.cse.lehigh.edu/~brian/pubs/1999/www8/ DiscoWeb] Applying link 
analysis to the web. Disco``Web is what became of Web``Spider and eventually 
the likes of Alta Vista  Teoma
  
  [http://www.cs.toronto.edu/~georgem/hilltop/ Hilltop] Search Engine based on 
expert Documents.
  


[Nutch Wiki] Update of PluginCentral by JeromeCharron

2006-02-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by JeromeCharron:
http://wiki.apache.org/nutch/PluginCentral

--
   * parse-html - Parses HTML documents
   * parse-js - Parses Java``Script
   * parse-mp3 - Parses MP3s
+  * parse-msexcel - Parses MS Excel documents
+  * parse-mspowerpoint - Parses MS Powerpoint documents
   * parse-msword - Parses MS Word documents
   * parse-pdf - Parses PDFs
   * parse-rss - Parses RSS feeds
   * parse-rtf - Parses RTF files
+  * parse-swf - Parses Flash SWF files
   * parse-text - Parses text documents
   * protocol-file - Retreives documents from the filesystem
   * protocol-ftp - Retreives documents through ftp