Author: jukka
Date: Sun Jan 31 01:00:50 2010
New Revision: 904944
URL: http://svn.apache.org/viewvc?rev=904944&view=rev
Log:
site: More Tika 0.6 updates
Modified:
lucene/tika/site/src/site/apt/0.6/gettingstarted.apt
lucene/tika/site/src/site/apt/download.apt
lucene/tika/site/src/site/apt/index.apt
Modified: lucene/tika/site/src/site/apt/0.6/gettingstarted.apt
URL:
http://svn.apache.org/viewvc/lucene/tika/site/src/site/apt/0.6/gettingstarted.apt?rev=904944&r1=904943&r2=904944&view=diff
==============================================================================
--- lucene/tika/site/src/site/apt/0.6/gettingstarted.apt (original)
+++ lucene/tika/site/src/site/apt/0.6/gettingstarted.apt Sun Jan 31 01:00:50
2010
@@ -45,76 +45,68 @@
Build artifacts
- Starting with Tika 0.5, the build consists of a number of components
- and produces the following main binaries (x.y stands for the current
- Tika version number):
+ The Tika 0.6 build consists of a number of components and produces
+ the following main binaries:
- [tika-core/target/tika-core-x.y.jar]
+ [tika-core/target/tika-core-0.6.jar]
Tika core library. Contains the core interfaces and classes of Tika,
but none of the parser implementations. Depends only on Java 5.
- [tika-core/target/tika-core-x.y-jdk14.jar]
- Java 1.4 version of the Tika core library.
-
- [tika-parsers/target/tika-parsers-x.y.jar]
+ [tika-parsers/target/tika-parsers-0.6.jar]
Tika parsers. Collection of classes that implement the Tika Parser
interface based on various external parser libraries.
- [tika-app/target/tika-app-x.y.jar]
+ [tika-app/target/tika-app-0.6.jar]
Tika application. Combines the above libraries and all the external
parser libraries into a single runnable jar with a GUI and a command
line interface.
+ [tika-bundle/target/tika-bundle-0.6.jar]
+ Tika bundle. An OSGi bundle that includes everything you need to use all
+ Tika functionality in an OSGi environment.
+
Using Tika as a Maven dependency
- Since the 0.5 release Tika has been split to components to give you
- more control over which parts of Tika you want to use in your application.
- The core library, tika-core, contains the key interfaces and classes, so
- you'll always want to include a dependency to it:
+ The core library, tika-core, contains the key interfaces and classes of Tika
+ and can be used by itself if you don't need the full set of parsers from
+ the tika-parsers component. The tika-core dependency looks like this:
---
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
- <version>x.y</version> <!-- 0.5 or higher -->
+ <version>0.6</version>
</dependency>
---
- This dependency only gives you basic Tika functionality without any of
- the parser libraries. If you want to use Tika to parse documents (instead
- of simply detecting document types, etc.), you also need the tika-parsers
- dependency:
+ If you want to use Tika to parse documents (instead of simply detecting
+ document types, etc.), you'll want to depend on tika-parsers instead:
---
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
- <version>x.y</version> <!-- same version as in tika-core -->
+ <version>0.6</version>
</dependency>
---
Note that adding this dependency will introduce a number of
- transitive dependencies to your project. You need to make sure that
- these dependencies won't conflict with your existing project dependencies.
- The listing below shows all the compile-scope dependencies of the
- current Tika parsers release (0.5, November 2009). You can use the
- command "mvn dependency:tree" to check the latest tree of dependencies on any
- one of Tika's core, parsers and app projects.
+ transitive dependencies to your project, including one on tika-core.
+ You need to make sure that these dependencies won't conflict with your
+ existing project dependencies. The listing below shows all the
+ compile-scope dependencies of tika-parsers in the Tika 0.6 release.
---
-org.apache.tika:tika-parent:pom:0.5
-org.apache.tika:tika-core:bundle:0.5
-\- junit:junit:jar:3.8.1:test
-org.apache.tika:tika-parsers:bundle:0.5
-+- org.apache.tika:tika-core:jar:0.5:compile
+org.apache.tika:tika-parsers:bundle:0.6
++- org.apache.tika:tika-core:jar:0.6:compile
+- org.apache.commons:commons-compress:jar:1.0:compile
+- org.apache.pdfbox:pdfbox:jar:0.8.0-incubating:compile
| +- org.apache.pdfbox:fontbox:jar:0.8.0-incubator:compile
| \- org.apache.pdfbox:jempbox:jar:0.8.0-incubator:compile
-+- org.apache.poi:poi:jar:3.5-FINAL:compile
-+- org.apache.poi:poi-scratchpad:jar:3.5-FINAL:compile
-+- org.apache.poi:poi-ooxml:jar:3.5-FINAL:compile
-| +- org.apache.poi:ooxml-schemas:jar:1.0:compile
++- org.apache.poi:poi:jar:3.6:compile
++- org.apache.poi:poi-scratchpad:jar:3.6:compile
++- org.apache.poi:poi-ooxml:jar:3.6:compile
+| +- org.apache.poi:poi-ooxml-schemas:jar:3.6:compile
| | \- org.apache.xmlbeans:xmlbeans:jar:2.3.0:compile
| \- dom4j:dom4j:jar:1.6.1:compile
| \- xml-apis:xml-apis:jar:1.0.b2:compile
@@ -123,31 +115,7 @@
+- org.ccil.cowan.tagsoup:tagsoup:jar:1.2:compile
+- asm:asm:jar:3.1:compile
+- log4j:log4j:jar:1.2.14:compile
-+- junit:junit:jar:3.8.1:test
-+- org.mockito:mockito-core:jar:1.7:test
-| +- org.hamcrest:hamcrest-core:jar:1.1:test
-| \- org.objenesis:objenesis:jar:1.0:test
\- com.drewnoakes:metadata-extractor:jar:2.4.0-beta-1:compile
-org.apache.tika:tika-app:bundle:0.5
-\- org.apache.tika:tika-parsers:jar:0.5:provided
- +- org.apache.tika:tika-core:jar:0.5:provided
- +- org.apache.commons:commons-compress:jar:1.0:provided
- +- org.apache.pdfbox:pdfbox:jar:0.8.0-incubating:provided
- | +- org.apache.pdfbox:fontbox:jar:0.8.0-incubator:provided
- | \- org.apache.pdfbox:jempbox:jar:0.8.0-incubator:provided
- +- org.apache.poi:poi:jar:3.5-FINAL:provided
- +- org.apache.poi:poi-scratchpad:jar:3.5-FINAL:provided
- +- org.apache.poi:poi-ooxml:jar:3.5-FINAL:provided
- | +- org.apache.poi:ooxml-schemas:jar:1.0:provided
- | | \- org.apache.xmlbeans:xmlbeans:jar:2.3.0:provided
- | \- dom4j:dom4j:jar:1.6.1:provided
- | \- xml-apis:xml-apis:jar:1.0.b2:provided
- +- org.apache.geronimo.specs:geronimo-stax-api_1.0_spec:jar:1.0.1:provided
- +- commons-logging:commons-logging:jar:1.1.1:provided
- +- org.ccil.cowan.tagsoup:tagsoup:jar:1.2:provided
- +- asm:asm:jar:3.1:provided
- +- log4j:log4j:jar:1.2.14:provided
- \- com.drewnoakes:metadata-extractor:jar:2.4.0-beta-1:provided
---
Using Tika in an Ant project
@@ -159,32 +127,30 @@
---
<classpath>
... <!-- your other classpath entries -->
- <pathelement location="path/to/tika-core-0.5.jar"/>
- <pathelement location="path/to/tika-parsers-0.5.jar"/>
+ <pathelement location="path/to/tika-core-0.6.jar"/>
+ <pathelement location="path/to/tika-parsers-0.6.jar"/>
<pathelement location="path/to/commons-logging-1.1.1.jar"/>
<pathelement location="path/to/commons-compress-1.0.jar"/>
- <pathelement location="path/to/pdfbox-0.7.3.jar"/>
- <pathelement location="path/to/fontbox-0.1.0.jar"/>
- <pathelement location="path/to/jempbox-0.2.0.jar"/>
- <pathelement location="path/to/bcmail-jdk14-136.jar"/>
- <pathelement location="path/to/bcprov-jdk14-136.jar"/>
- <pathelement location="path/to/poi-3.5-beta6.jar"/>
- <pathelement location="path/to/poi-scratchpad-3.5-beta6.jar"/>
- <pathelement location="path/to/poi-ooxml-3.5-beta6.jar"/>
- <pathelement location="path/to/ooxml-schemas-1.0.jar"/>
+ <pathelement location="path/to/pdfbox-0.8.0-incubating.jar"/>
+ <pathelement location="path/to/fontbox-0.8.0-incubator.jar"/>
+ <pathelement location="path/to/jempbox-0.8.0-incubator.jar"/>
+ <pathelement location="path/to/poi-3.6.jar"/>
+ <pathelement location="path/to/poi-scratchpad-3.6.jar"/>
+ <pathelement location="path/to/poi-ooxml-3.6.jar"/>
+ <pathelement location="path/to/poi-ooxml-schemas-3.6.jar"/>
<pathelement location="path/to/xmlbeans-2.3.0.jar"/>
<pathelement location="path/to/dom4j-1.6.1.jar"/>
- <pathelement location="path/to/nekohtml-1.9.9.jar"/>
- <pathelement location="path/to/xercesImpl-2.8.1.jar"/>
<pathelement location="path/to/xml-apis-1.0.b2.jar"/>
<pathelement location="path/to/geronimo-stax-api_1.0_spec-1.0.jar"/>
+ <pathelement location="path/to/tagsoup-1.2.jar"/>
<pathelement location="path/to/asm-3.1.jar"/>
<pathelement location="path/to/log4j-1.2.14.jar"/>
+ <pathelement location="path/to/metadata-extractor-2.4.0-beta-1.jar"/>
</classpath>
---
An easy way to gather all these libraries is to run
- "mvn dependency:copy-dependencies" in the Tika source directory.
+ "mvn dependency:copy-dependencies" in the tika-parsers source directory.
This will copy all Tika dependencies to the <<<target/dependencies>>>
directory.
@@ -193,7 +159,7 @@
Using Tika as a command line utility
- The Tika application jar (tika-app-x.y.jar) can be used as a command
+ The Tika application jar (tika-app-0.6.jar) can be used as a command
line utility for extracting text content and metadata from all sorts of
files. This runnable jar contains all the dependencies it needs, so
you don't need to worry about classpath settings to run it.
@@ -201,7 +167,7 @@
The usage instructions are shown below.
---
-usage: java -jar tika-app-x.y.jar [option] [file]
+usage: java -jar tika-app-0.6.jar [option] [file]
Options:
-? or --help Print this usage message
@@ -236,6 +202,6 @@
---
# Check if an Internet resource contains a specific keyword
curl http://.../document.doc \
- | java -jar tika-app-x.y.jar --text \
+ | java -jar tika-app-0.6.jar --text \
| grep -q keyword
---
Modified: lucene/tika/site/src/site/apt/download.apt
URL:
http://svn.apache.org/viewvc/lucene/tika/site/src/site/apt/download.apt?rev=904944&r1=904943&r2=904944&view=diff
==============================================================================
--- lucene/tika/site/src/site/apt/download.apt (original)
+++ lucene/tika/site/src/site/apt/download.apt Sun Jan 31 01:00:50 2010
@@ -19,12 +19,12 @@
Download Apache Tika
- Apache Tika 0.5 is now available.
- See the
{{{http://www.apache.org/dist/lucene/tika/CHANGES-0.5.txt}CHANGES.txt}}
+ Apache Tika 0.6 is now available.
+ See the
{{{http://www.apache.org/dist/lucene/tika/CHANGES-0.6.txt}CHANGES.txt}}
file for more information on the list of updates in this initial release.
- *
{{{http://www.apache.org/dyn/closer.cgi/lucene/tika/apache-tika-0.5-src.zip}apache-tika-0.5-src.zip}}
-
({{{http://www.apache.org/dist/lucene/tika/apache-tika-0.5-src.zip.asc}PGP}})
+ *
{{{http://www.apache.org/dyn/closer.cgi/lucene/tika/apache-tika-0.6-src.zip}apache-tika-0.6-src.zip}}
+
({{{http://www.apache.org/dist/lucene/tika/apache-tika-0.6-src.zip.asc}PGP}})
[]
Modified: lucene/tika/site/src/site/apt/index.apt
URL:
http://svn.apache.org/viewvc/lucene/tika/site/src/site/apt/index.apt?rev=904944&r1=904943&r2=904944&view=diff
==============================================================================
--- lucene/tika/site/src/site/apt/index.apt (original)
+++ lucene/tika/site/src/site/apt/index.apt Sun Jan 31 01:00:50 2010
@@ -21,11 +21,9 @@
Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
- libraries. For more information about Tika, please see the
- {{{formats.html}list of supported document formats}} and the
- {{{documentation.html}available documentation}}. You can find the
- latest release on the {{{download.html}download page}}. See the
- {{{gettingstarted.html}Getting Started}} guide for instructions on
+ libraries. You can find the latest release on the
+ {{{download.html}download page}}. See the
+ {{{0.6/gettingstarted.html}Getting Started}} guide for instructions on
how to start using Tika.
Tika is a subproject of {{{http://lucene.apache.org/}Apache Lucene}}.