Author: dmeikle
Date: Wed Dec 3 16:51:41 2008
New Revision: 723172
URL: http://svn.apache.org/viewvc?rev=723172&view=rev
Log:
Changes to POM and Site for 0.2 release
Modified:
lucene/tika/branches/0.2/pom.xml
lucene/tika/branches/0.2/src/site/apt/download.apt
lucene/tika/branches/0.2/src/site/apt/gettingstarted.apt
lucene/tika/branches/0.2/src/site/apt/index.apt
Modified: lucene/tika/branches/0.2/pom.xml
URL:
http://svn.apache.org/viewvc/lucene/tika/branches/0.2/pom.xml?rev=723172&r1=723171&r2=723172&view=diff
==============================================================================
--- lucene/tika/branches/0.2/pom.xml (original)
+++ lucene/tika/branches/0.2/pom.xml Wed Dec 3 16:51:41 2008
@@ -33,7 +33,7 @@
<groupId>org.apache.tika</groupId>
<artifactId>tika</artifactId>
- <version>0.3-SNAPSHOT</version>
+ <version>0.2</version>
<name>Apache Tika</name>
<!-- Keep on a single line, see http://jira.codehaus.org/browse/MJAR-39 -->
@@ -275,7 +275,7 @@
</archive>
</configuration>
</plugin>
- <plugin>
+ <!-- <plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2-beta-2</version>
@@ -297,7 +297,7 @@
</goals>
</execution>
</executions>
- </plugin>
+ </plugin> -->
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>retrotranslator-maven-plugin</artifactId>
Modified: lucene/tika/branches/0.2/src/site/apt/download.apt
URL:
http://svn.apache.org/viewvc/lucene/tika/branches/0.2/src/site/apt/download.apt?rev=723172&r1=723171&r2=723172&view=diff
==============================================================================
--- lucene/tika/branches/0.2/src/site/apt/download.apt (original)
+++ lucene/tika/branches/0.2/src/site/apt/download.apt Wed Dec 3 16:51:41 2008
@@ -19,12 +19,12 @@
Download Apache Tika
- The first official release of Apache Tika, 0.1-incubating, is now available.
- See the
{{{http://www.apache.org/dist/incubator/tika/CHANGES-0.1-incubating.txt}CHANGES.txt}}
+ Apache Tika 0.2 is now available.
+ See the
{{{http://www.apache.org/dist/lucene/tika/CHANGES-0.2.txt}CHANGES.txt}}
file for more information on the list of updates in this initial release.
- *
{{{http://www.apache.org/dyn/closer.cgi/incubator/tika/apache-tika-0.1-incubating-src.tar.gz}apache-tika-0.1-incubating-src.tar.gz}}
-
({{{http://www.apache.org/dist/incubator/tika/apache-tika-0.1-incubating-src.tar.gz.asc}PGP}})
+ *
{{{http://www.apache.org/dyn/closer.cgi/lucene/tika/apache-tika-0.2.tar.gz}apache-tika-0.2.tar.gz}}
+
({{{http://www.apache.org/dist/lucene/tika/apache-tika-0.2.tar.gz.asc}PGP}})
[]
Modified: lucene/tika/branches/0.2/src/site/apt/gettingstarted.apt
URL:
http://svn.apache.org/viewvc/lucene/tika/branches/0.2/src/site/apt/gettingstarted.apt?rev=723172&r1=723171&r2=723172&view=diff
==============================================================================
--- lucene/tika/branches/0.2/src/site/apt/gettingstarted.apt (original)
+++ lucene/tika/branches/0.2/src/site/apt/gettingstarted.apt Wed Dec 3
16:51:41 2008
@@ -1,6 +1,6 @@
- --------------------------------
- Getting Started with Apache Tika
- --------------------------------
+ --------------------------------
+ Getting Started with Apache Tika
+ --------------------------------
~~ Licensed to the Apache Software Foundation (ASF) under one or more
~~ contributor license agreements. See the NOTICE file distributed with
@@ -19,126 +19,189 @@
Getting Started with Apache Tika
- This document describes how to build Apache Tika from sources and
- how to start using Tika in an application.
+ This document describes how to build Apache Tika from sources and
+ how to start using Tika in an application.
Getting and building the sources
- To build Tika from sources you first need to either
- {{{download.html}download}} a source release or
- {{{source-repository.html}checkout}} the latest sources from
- version control.
-
- Once you have the sources, you can build them using the
- {{{http://maven.apache.org/}Maven 2}} build system. Executing the
- following command in the source directory will build the sources
- and install the resulting artifacts in your local Maven repository.
+ To build Tika from sources you first need to either
+ {{{download.html}download}} a source release or
+ {{{source-repository.html}checkout}} the latest sources from
+ version control.
+
+ Once you have the sources, you can build them using the
+ {{{http://maven.apache.org/}Maven 2}} build system. Executing the
+ following command in the source directory will build the sources
+ and install the resulting artifacts in your local Maven repository.
---
mvn install
---
- See the Maven documentation for more information about the available
- build options.
+ See the Maven documentation for more information about the available
+ build options.
- Note that you need Java 5 or higher to build Tika.
+ Note that you need Java 5 or higher to build Tika.
Build artifacts
- The Tika build produces the following libraries in the <<<target>>>
- directory (x.y stands for the current Tika version number).
+ The Tika build produces the following libraries in the <<<target>>>
+ directory (x.y stands for the current Tika version number).
- * tika-x.y.jar
+ * tika-x.y.jar
- The main build artifact (tika-x.y.jar) contains the compiled Java
- classes and interfaces in the <<<org.apache.tika>>> packages and
- the default Tika configuration settings.
+ * tika-x.y-jdk14.jar (available since 0.2)
+
+ The main build artifact (tika-x.y.jar) contains the compiled Java
+ classes and interfaces in the <<<org.apache.tika>>> packages and
+ the default Tika configuration settings.
+
+ The second build artifact (tika-x.y-jdk14.jar, available since version 0.2)
+ is a {{{http://retrotranslator.sourceforge.net/}retrotranslated}} version
+ of the main Tika build artifact. Normally Tika only works with Java 5 or
+ higher, but you can use this version of Tika also with Java 1.4.
Using Tika as a Maven dependency
- Using Tika in a Maven project is very straightforward. Just select the
- version of Tika you want to use, and add the following dependency.
+ Using Tika in a Maven project is very straightforward. Just select the
+ version of Tika you want to use, and add the following dependency.
---
<dependency>
- <groupId>org.apache.tika</groupId>
- <artifactId>tika</artifactId>
- <version>x.y</version>
+<groupId>org.apache.tika</groupId>
+<artifactId>tika</artifactId>
+<version>x.y</version>
</dependency>
---
- Note that the incubating 0.1 release of Tika is not available in the
- central Maven repository. You need to build and install Tika locally
- to use it as a Maven dependency.
-
- Note that adding the Tika dependency will introduce a number of
- transitive dependencies to your project. You need to make sure that
- these dependencies won't conflict with your existing project dependencies.
- The listing below shows all the compile-scope dependencies of Tika 0.1.
- You can use the command "mvn dependency:tree" to check the latest tree
- of dependencies.
+ The first version of the org.apache.tika:tika artifact available in the
+ central Maven repository is 0.2. For the 0.1 version or for SNAPSHOT
+ dependencies you need to build and install Tika locally.
+
+ If your application uses Java 1.4, you need to use the retrotranslated
+ version of Tika. This version is identified by the classifier "jdk14".
---
-org.apache.tika:tika:jar:0.1-incubating
+<dependency>
+<groupId>org.apache.tika</groupId>
+<artifactId>tika</artifactId>
+<version>x.y</version>
+<classifier>jdk14</classifier>
+</dependency>
+---
+
+ The retrotranslated version will be available in the central Maven
+ repository starting with Tika version 0.2.
+
+ Note that adding the Tika dependency will introduce a number of
+ transitive dependencies to your project. You need to make sure that
+ these dependencies won't conflict with your existing project dependencies.
+ The listing below shows all the compile-scope dependencies of the
+ current Tika release (0.2, December 2008). You can use the
+ command "mvn dependency:tree" to check the latest tree of dependencies.
+
+---
+org.apache.tika:tika:jar:0.2
+- commons-lang:commons-lang:jar:2.1:compile
+- commons-logging:commons-logging:jar:1.0.4:compile
+- commons-codec:commons-codec:jar:1.3:compile
++- commons-io:commons-io:jar:1.4:compile
+- pdfbox:pdfbox:jar:0.7.3:compile
| +- org.fontbox:fontbox:jar:0.1.0:compile
| +- org.jempbox:jempbox:jar:0.2.0:compile
| +- bouncycastle:bcmail-jdk14:jar:136:compile
| \- bouncycastle:bcprov-jdk14:jar:136:compile
-+- org.apache.poi:poi:jar:3.0-FINAL:compile
-+- jdom:jdom:jar:1.0:compile
-+- jaxen:jaxen:jar:1.1.1:compile
-| +- dom4j:dom4j:jar:1.6.1:compile
-| +- xml-apis:xml-apis:jar:1.3.02:compile
-| +- xerces:xercesImpl:jar:2.6.2:compile
-| \- xom:xom:jar:1.0:compile
-| +- xerces:xmlParserAPIs:jar:2.6.2:compile
-| \- xalan:xalan:jar:2.6.0:compile
-+- nekohtml:nekohtml:jar:0.9.5:compile
-+- com.ibm.icu:icu4j:jar:3.4.4:compile
-\- log4j:log4j:jar:1.2.14:compile
++- org.apache.poi:poi:jar:3.1-FINAL:compile
++- org.apache.poi:poi-scratchpad:jar:3.1-FINAL:compile
++- net.sourceforge.nekohtml:nekohtml:jar:1.9.9:compile
+| \- xerces:xercesImpl:jar:2.8.1:compile
+| \- xml-apis:xml-apis:jar:1.3.03:compile
++- com.ibm.icu:icu4j:jar:3.8:compile
++- asm:asm:jar:3.1:compile
++- log4j:log4j:jar:1.2.14:compile
+\- junit:junit:jar:3.8.1:test
---
Using Tika in an Ant project
- Unless you use a dependency manager tool like
- {{{http://ant.apache.org/ivy/}Apache Ivy}} you need to add both the
- Tika jar and all dependency jars individually in your
- {{{http://ant.apache.org/}Ant}} build. You can leave out some parser
- libraries if you don't need support for certain file formats.
+ Unless you use a dependency manager tool like
{{{http://ant.apache.org/ivy/}Apache Ivy}},
+ to use Tika in you application you can include the main Tika jar file and its
dependencies individually.
---
<classpath>
- ... <!-- your other classpath entries -->
- <pathelement location="path/to/tika-x.y.jar"/>
- <pathelement location="path/to/commons-lang-2.1.jar"/>
- <pathelement location="path/to/commons-logging-1.0.4.jar"/>
- <pathelement location="path/to/commons-codec-1.3.jar"/>
- <pathelement location="path/to/commons-io-1.4.jar"/>
- <pathelement location="path/to/pdfbox-0.7.3.jar"/>
- <pathelement location="path/to/fontbox-0.1.0.jar"/>
- <pathelement location="path/to/jempbox-0.2.0.jar"/>
- <pathelement location="path/to/bcmail-jdk14-136.jar"/>
- <pathelement location="path/to/bcprov-jdk14-136.jar"/>
- <pathelement location="path/to/poi-3.0-FINAL.jar"/>
- <pathelement location="path/to/jdom-1.0.jar"/>
- <pathelement location="path/to/jaxen-1.1.1.jar"/>
- <pathelement location="path/to/dom4j-1.6.1.jar"/>
- <pathelement location="path/to/xml-apis-1.3.02.jar"/>
- <pathelement location="path/to/xercesImpl-2.6.2.jar"/>
- <pathelement location="path/to/xom-1.0.jar"/>
- <pathelement location="path/to/xmlParserAPIs-2.6.2.jar"/>
- <pathelement location="path/to/xalan-2.6.0.jar"/>
- <pathelement location="path/to/nekohtml-0.9.5.jar"/>
- <pathelement location="path/to/icu4j-3.4.4.jar"/>
- <pathelement location="path/to/log4j-1.2.14.jar"/>
+... <!-- your other classpath entries -->
+<pathelement location="path/to/tika-x.y.jar"/>
+<pathelement location="path/to/commons-lang-2.1.jar"/>
+<pathelement location="path/to/commons-logging-1.0.4.jar"/>
+<pathelement location="path/to/commons-codec-1.3.jar"/>
+<pathelement location="path/to/commons-io-1.4.jar"/>
+<pathelement location="path/to/pdfbox-0.7.3.jar"/>
+<pathelement location="path/to/fontbox-0.1.0.jar"/>
+<pathelement location="path/to/jempbox-0.2.0.jar"/>
+<pathelement location="path/to/bcmail-jdk14-136.jar"/>
+<pathelement location="path/to/bcprov-jdk14-136.jar"/>
+<pathelement location="path/to/poi-3.1-FINAL.jar"/>
+<pathelement location="path/to/poi-scratchpad-3.1-FINAL.jar"/>
+<pathelement location="path/to/nekohtml-1.9.7.jar"/>
+<pathelement location="path/to/xercesImpl-2.8.1.jar"/>
+<pathelement location="path/to/xml-apis-1.3.03.jar"/>
+<pathelement location="path/to/icu4j-3.4.4.jar"/>
+<pathelement location="path/to/asm-3.1.jar"/>
+<pathelement location="path/to/log4j-1.2.14.jar"/>
</classpath>
---
- An easy way to gather all these libraries is to run
- "mvn dependency:copy-dependencies" in the Tika source directory.
- This will copy all Tika dependencies to the <<<target/dependencies>>>
- directory.
+ If you're using Java 1.4 as the base platform of your project,
+ use the tika-x.y-jdk14.jar instead.
+
+ An easy way to gather all these libraries is to run
+ "mvn dependency:copy-dependencies" in the Tika source directory.
+ This will copy all Tika dependencies to the <<<target/dependencies>>>
+ directory.
+
+Using Tika as a command line utility
+
+ The tika jar (tika-x.y.jar) can be used as a command
+ line utility for extracting text content and metadata from all sorts of
+ files, provided the dependancies are on the classpath.
+
+ The usage instructions are shown below.
+
+---
+usage: java -jar tika-x.y.jar [option] file
+
+Options:
+ -? or --help Print this usage message
+ -v or --verbose Print debug level messages
+ -g or --gui Start the Apache Tika GUI
+ -x or --xml Output XHTML content (default)
+ -h or --html Output HTML content
+ -t or --text Output plain text content
+ -m or --metadata Output only metadata
+
+Description:
+ Apache Tika will parse the file(s) specified on the
+ command line and output the extracted text content
+ or metadata to standard output.
+
+ Instead of a file name you can also specify the URL
+ of a document to be parsed.
+
+ Use "-" as the file name to parse the standard
+ input stream.
+
+ Use the "--gui" (or "-g") option to start
+ the Apache Tika GUI. You can drag and drop files
+ from a normal file explorer to the GUI window to
+ extract text content and metadata from the files.
+---
+
+ You can also use the jar as a component in a Unix pipeline or
+ as an external tool in many scripting languages.
+
+---
+# Check if an Internet resource contains a specific keyword
+curl http://.../document.doc \
+ | java -jar tika-x.y.jar --text \
+ | grep -q keyword
+---
Modified: lucene/tika/branches/0.2/src/site/apt/index.apt
URL:
http://svn.apache.org/viewvc/lucene/tika/branches/0.2/src/site/apt/index.apt?rev=723172&r1=723171&r2=723172&view=diff
==============================================================================
--- lucene/tika/branches/0.2/src/site/apt/index.apt (original)
+++ lucene/tika/branches/0.2/src/site/apt/index.apt Wed Dec 3 16:51:41 2008
@@ -33,6 +33,8 @@
{{{http://www.apache.org/}Apache Software Foundation}}.
Latest News
+ [December 2008: Apache Tika Release]
+ Apache Tika 0.2 has been released. Please see the download page for more
details.
[November 2008: User mailing list created]
A new mailing list, [EMAIL PROTECTED], has been created