Author: jukka
Date: Tue Sep 16 16:29:27 2008
New Revision: 696101
URL: http://svn.apache.org/viewvc?rev=696101&view=rev
Log:
TIKA-157: List all the document formats supported by Tika
Formatting
Modified:
incubator/tika/trunk/src/site/apt/formats.apt
Modified: incubator/tika/trunk/src/site/apt/formats.apt
URL:
http://svn.apache.org/viewvc/incubator/tika/trunk/src/site/apt/formats.apt?rev=696101&r1=696100&r2=696101&view=diff
==============================================================================
--- incubator/tika/trunk/src/site/apt/formats.apt (original)
+++ incubator/tika/trunk/src/site/apt/formats.apt Tue Sep 16 16:29:27 2008
@@ -41,19 +41,33 @@
property sets and exposes them as the following document metadata:
* <<<TITLE>>> Title
+
* <<<SUBJECT>>> Subject
+
* <<<AUTHOR>>> Author
+
* <<<KEYWORDS>>> Keywords
+
* <<<COMMENTS>>> Comments
+
* <<<TEMPLATE>>> Template
+
* <<<LAST_SAVED>>> Last Saved By
+
* <<<REVISION_NUMBER>>> Revision Number
+
* <<<LAST_PRINTED>>> Last Printed
+
* <<<LAST_SAVED>>> Last Saved Time/Date
+
* <<<LAST_SAVED>>> Last Saved Time/Date
+
* <<<PAGE_COUNT>>> Number of Pages
+
* <<<WORD_COUNT>>> Number of Words
+
* <<<CHARACTER_COUNT>>> Number of Characters
+
* <<<APPLICATION_NAME>>> Name of Creating Application
Note that in practice the metadata in many documents is either missing,
@@ -126,7 +140,7 @@
The Outlook parser extracts the subject of the message and the From,
To, Cc, and Bcc addresses (formatted for display) along with the body
- text of text/plain messages. The <<<AUTHOR>>>, <<<TITLE>> and
+ text of text/plain messages. The <<<AUTHOR>>>, <<<TITLE>>> and
<<<SUBJECT>>> metadata properties are set explicitly, overriding
potential generic document metadata retrieved from OLE2 property sets.