Author: jukka
Date: Tue Dec  2 13:51:53 2008
New Revision: 722625

URL: http://svn.apache.org/viewvc?rev=722625&view=rev
Log:
TIKA-176: Getting Started guide

Reverted the parts that don't work with Tika 0.1.

Modified:
    lucene/tika/trunk/src/site/apt/gettingstarted.apt

Modified: lucene/tika/trunk/src/site/apt/gettingstarted.apt
URL: 
http://svn.apache.org/viewvc/lucene/tika/trunk/src/site/apt/gettingstarted.apt?rev=722625&r1=722624&r2=722625&view=diff
==============================================================================
--- lucene/tika/trunk/src/site/apt/gettingstarted.apt (original)
+++ lucene/tika/trunk/src/site/apt/gettingstarted.apt Tue Dec  2 13:51:53 2008
@@ -50,27 +50,10 @@
 
       * tika-x.y.jar
 
-      * tika-x.y-standalone.jar (available since 0.3)
-
-      * tika-x.y-jdk14.jar (available since 0.3)
-
    The main build artifact (tika-x.y.jar) contains the compiled Java
    classes and interfaces in the <<<org.apache.tika>>> packages and
    the default Tika configuration settings.
 
-   The standalone jar (tika-x.y-standalone.jar, available since version 0.3)
-   includes also the classes and resources from all Tika dependencies. You
-   can just drop this jar file in your application to access the full
-   functionality of all Tika parsers. This is a runnable jar that runs the
-   Tika command line and graphical user interfaces without needing any other
-   libraries (except of course the standard Java 5 class libraries) in the
-   classpath.
-
-   The final build artifact (tika-x.y-jdk14.jar, available since version 0.3)
-   is a {{{http://retrotranslator.sourceforge.net/}retrotranslated}} version
-   of the main Tika build artifact. Normally Tika only works with Java 5 or
-   higher, but you can use this version of Tika also with Java 1.4.
-
 Using Tika as a Maven dependency
 
    Using Tika in a Maven project is very straightforward. Just select the
@@ -84,72 +67,48 @@
 </dependency>
 ---
 
-   The first version of the org.apache.tika:tika artifact available in the
-   central Maven repository is 0.2. For the 0.1 version or for SNAPSHOT
-   dependencies you need to build and install Tika locally.
-
-   If your application uses Java 1.4, you need to use the retrotranslated
-   version of Tika. This version is identified by the classifier "jdk14".
-
----
-<dependency>
-  <groupId>org.apache.tika</groupId>
-  <artifactId>tika</artifactId>
-  <version>x.y</version>
-  <classifier>jdk14</classifier>
-</dependency>
----
-
-   The retrotranslated version will be available in the central Maven
-   repository starting with Tika version 0.3.
+   Note that the incubating 0.1 release of Tika is not available in the
+   central Maven repository. You need to build and install Tika locally
+   to use it as a Maven dependency.
 
    Note that adding the Tika dependency will introduce a number of
    transitive dependencies to your project. You need to make sure that
    these dependencies won't conflict with your existing project dependencies.
-   The listing below shows all the compile-scope dependencies of the
-   current Tika trunk (0.3-SNAPSHOT, November 2008). You can use the
-   command "mvn dependency:tree" to check the latest tree of dependencies.
+   The listing below shows all the compile-scope dependencies of Tika 0.1.
+   You can use the command "mvn dependency:tree" to check the latest tree
+   of dependencies.
 
 ---
-org.apache.tika:tika:jar:0.3-SNAPSHOT
+org.apache.tika:tika:jar:0.1-incubating
 +- commons-lang:commons-lang:jar:2.1:compile
 +- commons-logging:commons-logging:jar:1.0.4:compile
 +- commons-codec:commons-codec:jar:1.3:compile
-+- commons-io:commons-io:jar:1.4:compile
 +- pdfbox:pdfbox:jar:0.7.3:compile
 |  +- org.fontbox:fontbox:jar:0.1.0:compile
 |  +- org.jempbox:jempbox:jar:0.2.0:compile
 |  +- bouncycastle:bcmail-jdk14:jar:136:compile
 |  \- bouncycastle:bcprov-jdk14:jar:136:compile
-+- org.apache.poi:poi:jar:3.1-FINAL:compile
-+- org.apache.poi:poi-scratchpad:jar:3.1-FINAL:compile
-+- net.sourceforge.nekohtml:nekohtml:jar:1.9.7:compile
-|  \- xerces:xercesImpl:jar:2.8.1:compile
-|     \- xml-apis:xml-apis:jar:1.3.03:compile
++- org.apache.poi:poi:jar:3.0-FINAL:compile
++- jdom:jdom:jar:1.0:compile
++- jaxen:jaxen:jar:1.1.1:compile
+|  +- dom4j:dom4j:jar:1.6.1:compile
+|  +- xml-apis:xml-apis:jar:1.3.02:compile
+|  +- xerces:xercesImpl:jar:2.6.2:compile
+|  \- xom:xom:jar:1.0:compile
+|     +- xerces:xmlParserAPIs:jar:2.6.2:compile
+|     \- xalan:xalan:jar:2.6.0:compile
++- nekohtml:nekohtml:jar:0.9.5:compile
 +- com.ibm.icu:icu4j:jar:3.4.4:compile
-+- asm:asm:jar:3.1:compile
 \- log4j:log4j:jar:1.2.14:compile
 ---
 
 Using Tika in an Ant project
 
    Unless you use a dependency manager tool like
-   {{{http://ant.apache.org/ivy/}Apache Ivy}}, the easiest way to include
-   Tika in your {{{http://ant.apache.org/}Ant}} build is to include the
-   standalone jar in your classpath settings. The standalone jar contains
-   everything you need, Tika and all the required dependencies, in a single
-   package.
-
----
-<classpath>
-  ... <!-- your other classpath entries -->
-  <pathelement location="path/to/tika-x.y-standalone.jar"/>
-</classpath>
----
-
-   If you want more control over which specific parser libraries you want
-   to include in your application, you can include main Tika jar file and
-   all the dependencies individually.
+   {{{http://ant.apache.org/ivy/}Apache Ivy}} you need to add both the
+   Tika jar and all dependency jars individually in your
+   {{{http://ant.apache.org/}Ant}} build. You can leave out some parser
+   libraries if you don't need support for certain file formats.
 
 ---
 <classpath>
@@ -164,69 +123,22 @@
   <pathelement location="path/to/jempbox-0.2.0.jar"/>
   <pathelement location="path/to/bcmail-jdk14-136.jar"/>
   <pathelement location="path/to/bcprov-jdk14-136.jar"/>
-  <pathelement location="path/to/poi-3.1-FINAL.jar"/>
-  <pathelement location="path/to/poi-scratchpad-3.1-FINAL.jar"/>
-  <pathelement location="path/to/nekohtml-1.9.7.jar"/>
-  <pathelement location="path/to/xercesImpl-2.8.1.jar"/>
-  <pathelement location="path/to/xml-apis-1.3.03.jar"/>
+  <pathelement location="path/to/poi-3.0-FINAL.jar"/>
+  <pathelement location="path/to/jdom-1.0.jar"/>
+  <pathelement location="path/to/jaxen-1.1.1.jar"/>
+  <pathelement location="path/to/dom4j-1.6.1.jar"/>
+  <pathelement location="path/to/xml-apis-1.3.02.jar"/>
+  <pathelement location="path/to/xercesImpl-2.6.2.jar"/>
+  <pathelement location="path/to/xom-1.0.jar"/>
+  <pathelement location="path/to/xmlParserAPIs-2.6.2.jar"/>
+  <pathelement location="path/to/xalan-2.6.0.jar"/>
+  <pathelement location="path/to/nekohtml-0.9.5.jar"/>
   <pathelement location="path/to/icu4j-3.4.4.jar"/>
-  <pathelement location="path/to/asm-3.1.jar"/>
   <pathelement location="path/to/log4j-1.2.14.jar"/>
 </classpath>
 ---
 
-   If you're using Java 1.4 as the base platform of your project,
-   use the tika-x.y-jdk14.jar instead.
-
    An easy way to gather all these libraries is to run
    "mvn dependency:copy-dependencies" in the Tika source directory.
    This will copy all Tika dependencies to the <<<target/dependencies>>>
    directory.
-
-Using Tika as a command line utility
-
-   The standalone jar (tika-x.y-standalone.jar) can be used as a command
-   line utility for extracting text content and metadata from all sorts of
-   files. The usage instructions are shown below.
-
----
-usage: java -jar tika-x.y-standalone.jar [option] file
-
-Options:
-    -? or --help       Print this usage message
-    -v or --verbose    Print debug level messages
-    -g or --gui        Start the Apache Tika GUI
-    -x or --xml        Output XHTML content (default)
-    -h or --html       Output HTML content
-    -t or --text       Output plain text content
-    -m or --metadata   Output only metadata
-
-Description:
-    Apache Tika will parse the file(s) specified on the
-    command line and output the extracted text content
-    or metadata to standard output.
-
-    Instead of a file name you can also specify the URL
-    of a document to be parsed.
-
-    Use "-" as the file name to parse the standard
-    input stream.
-
-    Use the "--gui" (or "-g") option to start
-    the Apache Tika GUI. You can drag and drop files
-    from a normal file explorer to the GUI window to
-    extract text content and metadata from the files.
----
-
-   The standalone jar is fully self-contained and should work wherever
-   a Java 5 (or higher) runtime environment is available.
-
-   You can also use the jar as a component in a Unix pipeline or
-   as an external tool in many scripting languages.
-
----
-# Check if an Internet resource contains a specific keyword
-curl http://.../document.doc \
-    | java -jar tika-x.y-standalone.jar --text \
-    | grep -q keyword
----


Reply via email to