Author: jukka
Date: Wed Aug  5 21:12:50 2009
New Revision: 801412

URL: http://svn.apache.org/viewvc?rev=801412&view=rev
Log:
TIKA-265: Web-Site http://lucene.apache.org/tika/gettingstarted.html does not 
correspond to current release

Update Getting Started instructions.

Modified:
    lucene/tika/trunk/src/site/apt/gettingstarted.apt

Modified: lucene/tika/trunk/src/site/apt/gettingstarted.apt
URL: 
http://svn.apache.org/viewvc/lucene/tika/trunk/src/site/apt/gettingstarted.apt?rev=801412&r1=801411&r2=801412&view=diff
==============================================================================
--- lucene/tika/trunk/src/site/apt/gettingstarted.apt (original)
+++ lucene/tika/trunk/src/site/apt/gettingstarted.apt Wed Aug  5 21:12:50 2009
@@ -45,55 +45,55 @@
 
 Build artifacts
 
- The Tika build produces the following libraries in the <<<target>>>
- directory (x.y stands for the current Tika version number).
-
-    * tika-x.y.jar
-
-    * tika-x.y-jdk14.jar (available since 0.2)
-
- The main build artifact (tika-x.y.jar) contains the compiled Java
- classes and interfaces in the <<<org.apache.tika>>> packages and
- the default Tika configuration settings.
-
- The second build artifact (tika-x.y-jdk14.jar, available since version 0.2)
- is a {{{http://retrotranslator.sourceforge.net/}retrotranslated}} version
- of the main Tika build artifact. Normally Tika only works with Java 5 or
- higher, but you can use this version of Tika also with Java 1.4.
+ Starting with Tika 0.4, the build consists of a number of components
+ and produces the following main binaries (x.y stands for the current
+ Tika version number):
+
+ [tika-core/target/tika-core-x.y.jar]
+  Tika core library. Contains the core interfaces and classes of Tika,
+  but none of the parser implementations. Depends only on Java 5.
+
+ [tika-core/target/tika-core-x.y-jdk14.jar]
+  Java 1.4 version of the Tika core library.
+
+ [tika-parsers/target/tika-parsers-x.y.jar]
+  Tika parsers. Collection of classes that implement the Tika Parser
+  interface based on various external parser libraries.
+
+ [tika-app/target/tika-app-x.y.jar]
+  Tika application. Combines the above libraries and all the external
+  parser libraries into a single runnable jar with a GUI and a command
+  line interface.
 
 Using Tika as a Maven dependency
 
- Using Tika in a Maven project is very straightforward. Just select the
- version of Tika you want to use, and add the following dependency.
+ Since the 0.4 release Tika has been split to components to give you
+ more control over which parts of Tika you want to use in your application.
+ The core library, tika-core, contains the key interfaces and classes, so
+ you'll always want to include a dependency to it:
 
 ---
-<dependency>
-<groupId>org.apache.tika</groupId>
-<artifactId>tika</artifactId>
-<version>x.y</version>
-</dependency>
+  <dependency>
+    <groupId>org.apache.tika</groupId>
+    <artifactId>tika-core</artifactId>
+    <version>x.y</version>  <!-- 0.4 or higher -->
+  </dependency>
 ---
 
- The first version of the org.apache.tika:tika artifact available in the
- central Maven repository is 0.2. For the 0.1 version or for SNAPSHOT
- dependencies you need to build and install Tika locally.
-
- If your application uses Java 1.4, you need to use the retrotranslated
- version of Tika. This version is identified by the classifier "jdk14".
+ This dependency only gives you basic Tika functionality without any of
+ the parser libraries. If you want to use Tika to parse documents (instead
+ of simply detecting document types, etc.), you also need the tika-parsers
+ dependency: 
 
 ---
-<dependency>
-<groupId>org.apache.tika</groupId>
-<artifactId>tika</artifactId>
-<version>x.y</version>
-<classifier>jdk14</classifier>
-</dependency>
+  <dependency>
+    <groupId>org.apache.tika</groupId>
+    <artifactId>tika-parsers</artifactId>
+    <version>x.y</version>  <!-- same version as in tika-core -->
+  </dependency>
 ---
 
- The retrotranslated version will be available in the central Maven
- repository starting with Tika version 0.2.
-
- Note that adding the Tika dependency will introduce a number of
+ Note that adding this dependency will introduce a number of
  transitive dependencies to your project. You need to make sure that
  these dependencies won't conflict with your existing project dependencies.
  The listing below shows all the compile-scope dependencies of the
@@ -122,82 +122,87 @@
 +- net.sourceforge.nekohtml:nekohtml:jar:1.9.9:compile
 |  \- xerces:xercesImpl:jar:2.8.1:compile
 +- asm:asm:jar:3.1:compile
-+- log4j:log4j:jar:1.2.14:compile
-\- junit:junit:jar:3.8.1:test
+\- log4j:log4j:jar:1.2.14:compile
 ---
 
 Using Tika in an Ant project
 
- Unless you use a dependency manager tool like 
{{{http://ant.apache.org/ivy/}Apache Ivy}},
- to use Tika in you application you can include the main Tika jar file and its 
dependencies individually.
+ Unless you use a dependency manager tool like
+ {{{http://ant.apache.org/ivy/}Apache Ivy}}, to use Tika in you application
+ you can include the Tika jar files and the dependencies individually.
 
 ---
 <classpath>
-... <!-- your other classpath entries -->
-<pathelement location="path/to/tika-x.y.jar"/>
-<pathelement location="path/to/commons-lang-2.1.jar"/>
-<pathelement location="path/to/commons-logging-1.0.4.jar"/>
-<pathelement location="path/to/commons-codec-1.3.jar"/>
-<pathelement location="path/to/commons-io-1.4.jar"/>
-<pathelement location="path/to/pdfbox-0.7.3.jar"/>
-<pathelement location="path/to/fontbox-0.1.0.jar"/>
-<pathelement location="path/to/jempbox-0.2.0.jar"/>
-<pathelement location="path/to/bcmail-jdk14-136.jar"/>
-<pathelement location="path/to/bcprov-jdk14-136.jar"/>
-<pathelement location="path/to/poi-3.1-FINAL.jar"/>
-<pathelement location="path/to/poi-scratchpad-3.1-FINAL.jar"/>
-<pathelement location="path/to/nekohtml-1.9.7.jar"/>
-<pathelement location="path/to/xercesImpl-2.8.1.jar"/>
-<pathelement location="path/to/xml-apis-1.3.03.jar"/>
-<pathelement location="path/to/icu4j-3.4.4.jar"/>
-<pathelement location="path/to/asm-3.1.jar"/>
-<pathelement location="path/to/log4j-1.2.14.jar"/>
+  ... <!-- your other classpath entries -->
+  <pathelement location="path/to/tika-core-0.4.jar"/>
+  <pathelement location="path/to/tika-parsers-0.4.jar"/>
+  <pathelement location="path/to/commons-logging-1.1.1.jar"/>
+  <pathelement location="path/to/commons-compress-1.0.jar"/>
+  <pathelement location="path/to/pdfbox-0.7.3.jar"/>
+  <pathelement location="path/to/fontbox-0.1.0.jar"/>
+  <pathelement location="path/to/jempbox-0.2.0.jar"/>
+  <pathelement location="path/to/bcmail-jdk14-136.jar"/>
+  <pathelement location="path/to/bcprov-jdk14-136.jar"/>
+  <pathelement location="path/to/poi-3.5-beta6.jar"/>
+  <pathelement location="path/to/poi-scratchpad-3.5-beta6.jar"/>
+  <pathelement location="path/to/poi-ooxml-3.5-beta6.jar"/>
+  <pathelement location="path/to/ooxml-schemas-1.0.jar"/>
+  <pathelement location="path/to/xmlbeans-2.3.0.jar"/>
+  <pathelement location="path/to/dom4j-1.6.1.jar"/>
+  <pathelement location="path/to/nekohtml-1.9.9.jar"/>
+  <pathelement location="path/to/xercesImpl-2.8.1.jar"/>
+  <pathelement location="path/to/xml-apis-1.0.b2.jar"/>
+  <pathelement location="path/to/geronimo-stax-api_1.0_spec-1.0.jar"/>
+  <pathelement location="path/to/asm-3.1.jar"/>
+  <pathelement location="path/to/log4j-1.2.14.jar"/>
 </classpath>
 ---
 
- If you're using Java 1.4 as the base platform of your project,
- use the tika-x.y-jdk14.jar instead.
-
  An easy way to gather all these libraries is to run
  "mvn dependency:copy-dependencies" in the Tika source directory.
  This will copy all Tika dependencies to the <<<target/dependencies>>>
  directory.
 
+ Alternatively you can simply drop the entire tika-app jar to your
+ classpath to get all of the above dependencies in a single archive.
+
 Using Tika as a command line utility
 
- The tika jar (tika-x.y.jar) can be used as a command
+ The Tika application jar (tika-app-x.y.jar) can be used as a command
  line utility for extracting text content and metadata from all sorts of
- files, provided the dependencies detailed previously are included on the 
classpath.
+ files. This runnable jar contains all the dependencies it needs, so
+ you don't need to worry about classpath settings to run it.
 
  The usage instructions are shown below.
 
 ---
-usage: java -jar tika-x.y.jar [option] file
+usage: java -jar tika-app-x.y.jar [option] [file]
 
 Options:
-  -? or --help       Print this usage message
-  -v or --verbose    Print debug level messages
-  -g or --gui        Start the Apache Tika GUI
-  -x or --xml        Output XHTML content (default)
-  -h or --html       Output HTML content
-  -t or --text       Output plain text content
-  -m or --metadata   Output only metadata
+    -? or --help       Print this usage message
+    -v or --verbose    Print debug level messages
+    -g or --gui        Start the Apache Tika GUI
+    -x or --xml        Output XHTML content (default)
+    -h or --html       Output HTML content
+    -t or --text       Output plain text content
+    -m or --metadata   Output only metadata
 
 Description:
-  Apache Tika will parse the file(s) specified on the
-  command line and output the extracted text content
-  or metadata to standard output.
-
-  Instead of a file name you can also specify the URL
-  of a document to be parsed.
-
-  Use "-" as the file name to parse the standard
-  input stream.
-
-  Use the "--gui" (or "-g") option to start
-  the Apache Tika GUI. You can drag and drop files
-  from a normal file explorer to the GUI window to
-  extract text content and metadata from the files.
+    Apache Tika will parse the file(s) specified on the
+    command line and output the extracted text content
+    or metadata to standard output.
+
+    Instead of a file name you can also specify the URL
+    of a document to be parsed.
+
+    If no file name or URL is specified (or the special
+    name "-" is used), then the standard input stream
+    is parsed.
+
+    Use the "--gui" (or "-g") option to start
+    the Apache Tika GUI. You can drag and drop files
+    from a normal file explorer to the GUI window to
+    extract text content and metadata from the files.
 ---
 
  You can also use the jar as a component in a Unix pipeline or
@@ -206,6 +211,6 @@
 ---
 # Check if an Internet resource contains a specific keyword
 curl http://.../document.doc \
-  | java -jar tika-x.y.jar --text \
+  | java -jar tika-app-x.y.jar --text \
   | grep -q keyword
 ---


Reply via email to